[
  {
    "path": ".gitattributes",
    "content": "*.config linguist-language=nextflow\n"
  },
  {
    "path": ".github/.dockstore.yml",
    "content": "# Dockstore config version, not pipeline version\nversion: 1.2\nworkflows:\n  - subclass: nfl\n    primaryDescriptorPath: /nextflow.config\n    publish: True\n"
  },
  {
    "path": ".github/CONTRIBUTING.md",
    "content": "# nf-core/eager: Contributing Guidelines\n\nHi there!\nMany thanks for taking an interest in improving nf-core/eager.\n\nWe try to manage the required tasks for nf-core/eager using GitHub issues, you probably came to this page when creating one.\nPlease use the pre-filled template to save time.\n\nHowever, don't be put off by this template - other more general issues and suggestions are welcome!\nContributions to the code are even more welcome ;)\n\n> If you need help using or modifying nf-core/eager then the best place to ask is on the nf-core Slack [#eager](https://nfcore.slack.com/channels/eager) channel ([join our Slack here](https://nf-co.re/join/slack)).\n\n## Contribution workflow\n\nIf you'd like to write some code for nf-core/eager, the standard workflow is as follows:\n\n1. Check that there isn't already an issue about your idea in the [nf-core/eager issues](https://github.com/nf-core/eager/issues) to avoid duplicating work\n    * If there isn't one already, please create one so that others know you're working on this\n2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/eager repository](https://github.com/nf-core/eager) to your GitHub account\n3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)\n4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).\n5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged\n\nIf you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).\n\n## Tests\n\nWhen you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.\nTypically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.\n\nThere are typically two types of tests that run:\n\n### Lint tests\n\n`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.\nTo enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.\n\nIf any failures or warnings are encountered, please follow the listed URL for more documentation.\n\n### Pipeline tests\n\nEach `nf-core` pipeline should be set up with a minimal set of test-data.\n`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.\nIf there are any failures then the automated tests fail.\nThese tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.\n\n## Patch\n\n:warning: Only in the unlikely and regretful event of a release happening with a bug.\n\n* On your own fork, make a new branch `patch` based on `upstream/master`.\n* Fix the bug, and bump version (X.Y.Z+1).\n* A PR should be made on `master` from patch to directly this particular bug.\n\n## Getting help\n\nFor further information/help, please consult the [nf-core/eager documentation](https://nf-co.re/eager/usage) and don't hesitate to get in touch on the nf-core Slack [#eager](https://nfcore.slack.com/channels/eager) channel ([join our Slack here](https://nf-co.re/join/slack)).\n\n## Pipeline contribution conventions\n\nTo make the nf-core/eager code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.\n\n### Adding a new step\n\nIf you wish to contribute a new step, please use the following coding standards:\n\n1. Define the corresponding input channel into your new process from the expected previous process channel\n2. Write the process block (see below).\n3. Define the output channel if needed (see below).\n4. Add any new flags/options to `nextflow.config` with a default (see below).\n5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`).\n6. Add sanity checks for all relevant parameters.\n7. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.\n8. Do local tests that the new code works properly and as expected.\n9. Add a new test command in `.github/workflow/ci.yaml`.\n10. If applicable add a [MultiQC](https://https://multiqc.info/) module.\n11. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.\n12. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.\n\n### Default values\n\nParameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.\n\nOnce there, use `nf-core schema build .` to add to `nextflow_schema.json`.\n\n### Default processes resource requirements\n\nSensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.\n\n:warning: Note that in nf-core/eager we currently have our own custom process labels, so please check `base.config`!\n\nThe process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.\n\n### Naming schemes\n\nPlease use the following naming schemes, to make it easy to understand what is going where.\n\n* initial process channel: `ch_output_from_<process>`\n* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`\n* skipped process output: `ch_<previousstage>_for_<skipprocess>`(this goes out of the bypass statement described above)\n\n### Nextflow version bumping\n\nIf you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`\n\n### Software version reporting\n\nIf you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.\n\nAdd to the script block of the process, something like the following:\n\n```bash\n<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true\n```\n\nor\n\n```bash\n<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true\n```\n\nYou then need to edit the script `bin/scrape_software_versions.py` to:\n\n1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`\n2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.\n\n### Images and figures\n\nFor overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).\n\nFor all internal nf-core/eager documentation images we are using the 'Kalam' font by the Indian Type Foundry and licensed under the Open Font License. It can be found for download here [here](https://fonts.google.com/specimen/Kalam).\n\n## Process Concept\n\nWe are providing a highly configurable pipeline, with many options to turn on and off different processes in different combinations. This can make a very complex graph structure that can cause a large amount of duplicated channels coming out of every process to account for each possible combination.\n\nThe EAGER pipeline can currently be broken down into the following 'stages', where a stage is a collection of  non-terminal mutually exclusive processes, which is the output of which is used for another file reporting module (but not reporting!) .\n\n* Input\n* Convert BAM\n* PolyG Clipping\n* AdapterRemoval\n* Mapping (either `bwa`, `bwamem`, or `circularmapper`)\n* BAM Filtering\n* Deduplication (either `dedup` or `markduplicates`)\n* BAM Trimming\n* PMDtools\n* Genotyping\n\nEvery step can potentially be skipped, therefore the output of a previous stage must be able to be passed to the next stage, if the given stage is not run.\n\nTo somewhat simplify this logic, we have implemented the following structure.\n\nThe concept is as follows:\n\n* Every 'stage' of the pipeline (i.e. collection of mutually exclusive processes) must always have a if else statement following it.\n* This if else 'bypass' statement collects and standardises all possible input files into single channel(s) for the next stage.\n* Importantly - within the bypass statement, a channel from the previous stage's bypass mixes into these output channels. This additional channel is named `ch_previousstage_for_skipcurrentstage`. This contains the output from the previous stage, i.e. not the modified version from the current stage.\n* The bypass statement works as follows:\n  * If the current stage is turned on: will mix the previous stage and current stage output and filter for file suffixes unique to the current stage output\n  * If the current stage is turned off or skipped: will mix the previous stage and current stage output. However as there there is no files in the output channel from the current stage, no filtering is required and the files in the 'ch_XXX_for_skipXXX' stage will be used.\n  \n This ensures the same channel inputs to the next stage is 'homogeneous' - i.e. all comes from the same source (the bypass statement)\n  \n An example schematic can be given as follows\n\n```nextflow\n // PREVIOUS STAGE OUTPUT\nif (params.run_bam_filtering) {\n    ch_input_for_skipconvertbam.mix(ch_output_ch_convertbam)\n        .filter{ it =~/.*converted.fq/}\n        .into { ch_convertbam_for_fastp; ch_convertbam_for_skipfastp }\n} else {\n    ch_input_for_skipconvertbam\n        .into { ch_convertbam_for_fastp; ch_convertbam_for_skipfastp }\n}\n\n// SKIPPABLE CURRENT STAGE PROCESS\nprocess fastp {\n    publishDir \"${params.outdir}/fastp\", mode: 'copy'\n\n    when:\n    params.run_fastp\n\n    input:\n    file fq from ch_convertbam_for_fastp\n\n    output:\n    file \"*pG.fq\" into ch_output_from_fastp\n\n    script:\n    \"\"\"\n    echo \"I have been fastp'd\" > ${fq}  \n    mv ${fq} ${fq}.pG.fq\n    \"\"\"\n}\n\n// NEXT STAGE INPUT PREPARATION\nif (params.run_fastp) {\n    ch_convertbam_for_skipfastp.mix(ch_output_from_fastp)\n        .filter { it =~/.*pG.fq/ }\n        .into { ch_fastp_for_adapterremoval; ch_fastp_for_skipadapterremoval }\n} else {\n    ch_convertbam_for_skipfastp\n        .into { ch_fastp_for_adapterremoval; ch_fastp_for_skipadapterremoval }\n}\n\n ```\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Report something that is broken or incorrect\nlabels: bug\n---\n\n<!--\n# nf-core/eager bug report\n\nHi there!\n\nThanks for telling us about a problem with the pipeline.\nPlease delete this text and anything that's not relevant from the template below:\n-->\n\n## Check Documentation\n\nI have checked the following places for your error:\n\n- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)\n- [ ] [nf-core/eager pipeline documentation](https://nf-co.re/nf-core/eager/usage)\n      - nf-core/eager FAQ/troubleshooting can be found [here](https://nf-co.re/eager/usage#troubleshooting-and-faqs)\n\n## Description of the bug\n\n<!-- A clear and concise description of what the bug is. -->\n\n## Steps to reproduce\n\nSteps to reproduce the behaviour:\n\n1. Command line: `nextflow run ...`\n2. See error: _Please provide your error message_\n\n## Expected behaviour\n\n<!-- A clear and concise description of what you expected to happen. -->\n\n## Log files\n\nHave you provided the following extra information/files:\n\n- [ ] The command used to run the pipeline\n- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline -->\n- [ ] The exact error: <!-- [Please provide your error message] -->\n\n## System\n\n- Hardware: <!-- [e.g. HPC, Desktop, Cloud...] -->\n- Executor: <!-- [e.g. slurm, local, awsbatch...] -->\n- OS: <!-- [e.g. CentOS Linux, macOS, Linux Mint...] -->\n- Version <!-- [e.g. 7, 10.13.6, 18.3...] -->\n\n## Nextflow Installation\n\n- Version: <!-- [e.g. 19.10.0] -->\n\n## Container engine\n\n- Engine: <!-- [e.g. Conda, Docker, Singularity, Podman, Shifter or Charliecloud] -->\n- version: <!-- [e.g. 1.0.0] -->\n- Image tag: <!-- [e.g. nfcore/eager:1.0.0] -->\n\n## Additional context\n\n<!-- Add any other context about the problem here. -->\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "content": "blank_issues_enabled: false\ncontact_links:\n  - name: Join nf-core\n    url: https://nf-co.re/join\n    about: Please join the nf-core community here\n  - name: \"Slack #eager channel\"\n    url: https://nfcore.slack.com/channels/eager\n    about: Discussion about the nf-core/eager pipeline\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for the nf-core/eager pipeline\nlabels: enhancement\n---\n\n<!--\n# nf-core/eager feature request\n\nHi there!\n\nThanks for suggesting a new feature for the pipeline!\nPlease delete this text and anything that's not relevant from the template below:\n-->\n\n## Is your feature request related to a problem? Please describe\n\n<!-- A clear and concise description of what the problem is. -->\n\n<!-- e.g. [I'm always frustrated when ...] -->\n\n## Describe the solution you'd like\n\n<!-- A clear and concise description of what you want to happen. -->\n\n## Describe alternatives you've considered\n\n<!-- A clear and concise description of any alternative solutions or features you've considered. -->\n\n## Additional context\n\n<!-- Add any other context about the feature request here. -->\n"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE/pull_request_template.md",
    "content": "Many thanks to contributing to nf-core/eager!\n\nPlease fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).\n\n## PR checklist\n\n - [ ] This comment contains a description of changes (with reason).\n - [ ] If you've fixed a bug or added code that should be tested, add tests!\n   - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`\n   - [ ] If necessary, also make a PR on the [nf-core/eager branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/eager).\n - [ ] Make sure your code lints (`nf-core lint .`).\n - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).\n - [ ] Usage Documentation in `docs/usage.md` is updated.\n - [ ] Output Documentation in `docs/output.md` is updated.\n - [ ] `CHANGELOG.md` is updated.\n - [ ] `README.md` is updated (including new tool citations and authors/contributors).\n\n**Learn more about contributing:** https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md\n"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "content": "<!--\n# nf-core/eager pull request\n\nMany thanks for contributing to nf-core/eager!\n\nPlease fill in the appropriate checklist below (delete whatever is not relevant).\nThese are the most common things requested on pull requests (PRs).\n\nRemember that PRs should be made against the dev branch, unless you're preparing a pipeline release.\n\nLearn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md)\n-->\n<!-- markdownlint-disable ul-indent -->\n\n## PR checklist\n\n- [ ] This comment contains a description of changes (with reason).\n- [ ] If you've fixed a bug or added code that should be tested, add tests!\n    - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`\n    - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](<https://github.com/>nf-core/eager/tree/master/.github/CONTRIBUTING.md)\n    - [ ] If necessary, also make a PR on the nf-core/eager _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.\n- [ ] Make sure your code lints (`nf-core lint .`).\n- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).\n- [ ] Usage Documentation in `docs/usage.md` is updated.\n- [ ] Output Documentation in `docs/output.md` is updated.\n- [ ] `CHANGELOG.md` is updated.\n- [ ] `README.md` is updated (including new tool citations and authors/contributors).\n"
  },
  {
    "path": ".github/markdownlint.yml",
    "content": "# Markdownlint configuration file\ndefault: true\nline-length: false\nno-duplicate-header:\n    siblings_only: true\nno-inline-html:\n    allowed_elements:\n        - img\n        - p\n        - kbd\n        - details\n        - summary\n"
  },
  {
    "path": ".github/workflows/awsfulltest.yml",
    "content": "name: nf-core AWS full size tests\n# This workflow is triggered on published releases.\n# It can be additionally triggered manually with GitHub actions workflow dispatch.\n# It runs the -profile 'test_full' on AWS batch\n\non:\n  workflow_run:\n    workflows: [\"nf-core Docker push (release)\"]\n    types: [completed]\n  workflow_dispatch:\n\n\nenv:\n  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}\n  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}\n  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}\n  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}\n  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}\n  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}\n\n\njobs:\n  run-awstest:\n    name: Run AWS full tests\n    if: github.repository == 'nf-core/eager'\n    runs-on: ubuntu-latest\n    steps:\n      - name: Setup Miniconda\n        uses: conda-incubator/setup-miniconda@v2\n        with:\n          auto-update-conda: true\n          python-version: 3.7\n      - name: Install awscli\n        run: conda install -c conda-forge awscli\n      - name: Start AWS batch job\n        # Add full size test data (but still relatively small datasets for few samples)\n        # on the `test_full.config` test runs with only one set of parameters\n        # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command\n        run: |\n          aws batch submit-job \\\n            --region eu-west-1 \\\n            --job-name nf-core-eager \\\n            --job-queue $AWS_JOB_QUEUE \\\n            --job-definition $AWS_JOB_DEFINITION \\\n            --container-overrides '{\"command\": [\"nf-core/eager\", \"-r '\"${GITHUB_SHA}\"' -profile test_full --outdir s3://'\"${AWS_S3_BUCKET}\"'/eager/results-'\"${GITHUB_SHA}\"' -w s3://'\"${AWS_S3_BUCKET}\"'/eager/work-'\"${GITHUB_SHA}\"' -with-tower\"], \"environment\": [{\"name\": \"TOWER_ACCESS_TOKEN\", \"value\": \"'\"$TOWER_ACCESS_TOKEN\"'\"}]}'\n"
  },
  {
    "path": ".github/workflows/awstest.yml",
    "content": "name: nf-core AWS test\n# This workflow is triggered on push to the master branch.\n# It can be additionally triggered manually with GitHub actions workflow dispatch.\n# It runs the -profile 'test' on AWS batch.\n\non:\n  workflow_dispatch:\n\n\nenv:\n  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}\n  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}\n  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}\n  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}\n  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}\n  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}\n\n\njobs:\n  run-awstest:\n    name: Run AWS tests\n    if: github.repository == 'nf-core/eager'\n    runs-on: ubuntu-latest\n    steps:\n      - name: Setup Miniconda\n        uses: conda-incubator/setup-miniconda@v2\n        with:\n          auto-update-conda: true\n          python-version: 3.7\n      - name: Install awscli\n        run: conda install -c conda-forge awscli\n      - name: Start AWS batch job\n        # For example: adding multiple test runs with different parameters\n        # Remember that you can parallelise this by using strategy.matrix\n        run: |\n          aws batch submit-job \\\n          --region eu-west-1 \\\n          --job-name nf-core-eager \\\n          --job-queue $AWS_JOB_QUEUE \\\n          --job-definition $AWS_JOB_DEFINITION \\\n          --container-overrides '{\"command\": [\"nf-core/eager\", \"-r '\"${GITHUB_SHA}\"' -profile test_tsv_complex --outdir s3://'\"${AWS_S3_BUCKET}\"'/eager/results-'\"${GITHUB_SHA}\"' -w s3://'\"${AWS_S3_BUCKET}\"'/eager/work-'\"${GITHUB_SHA}\"' -with-tower\"], \"environment\": [{\"name\": \"TOWER_ACCESS_TOKEN\", \"value\": \"'\"$TOWER_ACCESS_TOKEN\"'\"}]}'\n"
  },
  {
    "path": ".github/workflows/branch.yml",
    "content": "name: nf-core branch protection\n# This workflow is triggered on PRs to master branch on the repository\n# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`\non:\n  pull_request_target:\n    branches: [master]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches\n      - name: Check PRs\n        if: github.repository == 'nf-core/eager'\n        run: |\n          { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/eager ]] && [[ $GITHUB_HEAD_REF = \"dev\" ]]; } || [[ $GITHUB_HEAD_REF == \"patch\" ]]\n\n\n      # If the above check failed, post a comment on the PR explaining the failure\n      # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets\n      - name: Post PR comment\n        if: failure()\n        uses: mshick/add-pr-comment@v1\n        with:\n          message: |\n            ## This PR is against the `master` branch :x:\n\n            * Do not close this PR\n            * Click _Edit_ and change the `base` to `dev`\n            * This CI test will remain failed until you push a new commit\n\n            ---\n\n            Hi @${{ github.event.pull_request.user.login }},\n\n            It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.\n            The `master` branch on nf-core repositories should always contain code from the latest release.\n            Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.\n\n            You do not need to close this PR, you can change the target branch to `dev` by clicking the _\"Edit\"_ button at the top of this page.\n            Note that even after this, the test will continue to show as failing until you push a new commit.\n\n            Thanks again for your contribution!\n          repo-token: ${{ secrets.GITHUB_TOKEN }}\n          allow-repeats: false\n\n"
  },
  {
    "path": ".github/workflows/ci.yml",
    "content": "name: nf-core CI\n# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors\non:\n  push:\n    branches:\n      - dev\n  pull_request:\n  release:\n    types: [published]\n\n# Uncomment if we need an edge release of Nextflow again\n# env: NXF_EDGE: 1\n\njobs:\n  test:\n    name: Run workflow tests\n    # Only run on push if this is the nf-core dev branch (merged PRs)\n    if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/eager') }}\n    runs-on: ubuntu-latest\n    env:\n      NXF_VER: ${{ matrix.nxf_ver }}\n      NXF_ANSI_LOG: false\n    strategy:\n      matrix:\n        # Nextflow versions: check pipeline minimum and current latest\n        nxf_ver: [\"20.07.1\", \"22.10.6\"]\n    steps:\n      - name: Check out pipeline code\n        uses: actions/checkout@v2\n      - name: Install older Java\n        uses: actions/setup-java@v4\n        with:\n          distribution: \"temurin\" # See 'Supported distributions' for available options\n          java-version: \"11\"\n      - name: Check if Dockerfile or Conda environment changed\n        uses: technote-space/get-diff-action@v4\n        with:\n          FILES: |\n            Dockerfile\n            environment.yml\n\n      - name: Build new docker image\n        if: env.MATCHED_FILES\n        run: docker build --no-cache . -t nfcore/eager:2.5.3\n\n      - name: Pull docker image\n        if: ${{ !env.MATCHED_FILES }}\n        run: |\n          docker pull nfcore/eager:dev\n          docker tag nfcore/eager:dev nfcore/eager:2.5.3\n      - name: Install Nextflow\n        env:\n          CAPSULE_LOG: none\n        run: |\n          wget -qO- https://github.com/nextflow-io/nextflow/releases/download/v22.10.6/nextflow | bash\n          sudo mv nextflow /usr/local/bin/\n      - name: HELPTEXT Run with the help flag\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} --help\n      - name: Get test data for cases where we don't use TSV input\n        run: |\n          git clone --single-branch --branch eager https://github.com/nf-core/test-datasets.git data\n      - name: DELAY to try address some odd behaviour with what appears to be a conflict between parallel htslib jobs leading to CI hangs\n        run: |\n          if [[ $NXF_VER = '' ]]; then sleep 1200; fi\n      - name: BASIC Run the basic pipeline with directly supplied single-end FASTQ\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/fastq/*_R1_*.fq.gz' --single_end\n      - name: BASIC Run the basic pipeline with directly supplied paired-end FASTQ\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/fastq/*_{R1,R2}_*tengrand.fq.gz'\n      - name: BASIC Run the basic pipeline with supplied --input BAM\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/bam/*_R1_*.bam' --bam --single_end\n      - name: BASIC Run the basic pipeline with the test profile with, PE/SE, bwa aln\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --save_reference\n      - name: REFERENCE Basic workflow, with supplied indices\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --bwa_index 'results/reference_genome/bwa_index/BWAIndex/' --fasta_index 'https://github.com/nf-core/test-datasets/blob/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.fai'\n      - name: REFERENCE Run the basic pipeline with FastA reference with `fna` extension\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker\n      - name: REFERENCE Test with zipped reference input\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --fasta 'https://github.com/nf-core/test-datasets/raw/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.gz'\n      - name: FASTP Test fastp complexity filtering\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --complexity_filter_poly_g\n      - name: ADAPTERREMOVAL Test skip paired end collapsing\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --skip_collapse\n      - name: ADAPTERREMOVAL Test paired end collapsing but no trimming\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_pretrim,docker --skip_trim\n      - name: ADAPTERREMOVAL Run the basic pipeline with paired end data without adapterRemoval\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --skip_adapterremoval\n      - name: ADAPTERREMOVAL Run the basic pipeline with preserve5p end option\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preserve5p\n      - name: ADAPTERREMOVAL Run the basic pipeline with merged only option\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mergedonly\n      - name: ADAPTERREMOVAL Run the basic pipeline with preserve5p end and merged reads only options\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preserve5p --mergedonly\n      - name: ADAPTER LIST Run the basic pipeline using an adapter list\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_adapters_list 'https://github.com/nf-core/test-datasets/raw/eager/databases/adapters/adapter-list.txt'\n      - name: ADAPTER LIST Run the basic pipeline using an adapter list, skipping adapter removal\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_adapters_list 'https://github.com/nf-core/test-datasets/raw/eager/databases/adapters/adapter-list.txt' --skip_adapterremoval\n      - name: POST_AR_FASTQ_TRIMMING Run the basic pipeline post-adapterremoval FASTQ trimming\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_post_ar_trimming\n      - name: POST_AR_FASTQ_TRIMMING Run the basic pipeline post-adapterremoval FASTQ trimming, but skip adapterremoval\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_post_ar_trimming --skip_adapterremoval\n      - name: MAPPER_CIRCULARMAPPER Test running with CircularMapper\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'circularmapper' --circulartarget 'NC_007596.2'\n      - name: MAPPER_BWAMEM Test running with BWA Mem\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'bwamem' --skip_collapse\n      - name: MAPPER_BT2 Test running with BowTie2\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'bowtie2' --bt2_alignmode 'local' --bt2_sensitivity 'sensitive' --bt2n 1 --bt2l 16 --bt2_trim5 1 --bt2_trim3 1\n      - name: HOST_REMOVAL_FASTQ Run the basic pipeline with output unmapped reads as fastq\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --hostremoval_input_fastq\n      - name: BAM_FILTERING Run basic mapping pipeline with mapping quality filtering, and unmapped export\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering --bam_mapping_quality_threshold 37  --bam_unmapped_type 'fastq'\n      - name: BAM_FILTERING Run basic mapping pipeline with post-mapping length filtering\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_readlength 0 --run_bam_filtering --bam_filter_minreadlength 50\n      - name: PRESEQ Run basic mapping pipeline with different preseq mode\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preseq_mode 'lc_extrap' --preseq_maxextrap 10000 --preseq_bootstrap 10\n      - name: DEDUPLICATION Test with dedup\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --dedupper 'dedup' --dedup_all_merged\n      - name: BEDTOOLS Test bedtools feature annotation\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bedtools_coverage --anno_file 'https://github.com/nf-core/test-datasets/raw/eager/reference/Mammoth/Mammoth_MT_Krause.gff3'\n      - name: MAPDAMAGE2 damage calculation\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --damage_calculation_tool 'mapdamage'\n      - name: GENOTYPING_HC Test running GATK HaplotypeCaller\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker --run_genotyping --genotyping_tool 'hc' --gatk_hc_out_mode 'EMIT_ALL_ACTIVE_SITES' --gatk_hc_emitrefconf 'BP_RESOLUTION'\n      - name: GENOTYPING_FB Test running FreeBayes\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'freebayes'\n      - name: GENOTYPING_PC Test running pileupCaller\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --run_genotyping --genotyping_tool 'pileupcaller'\n      - name: GENOTYPING_ANGSD Test running ANGSD genotype likelihood calculation\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --run_genotyping --genotyping_tool 'angsd'\n      - name: GENOTYPING_BCFTOOLS Test running FreeBayes with bcftools stats turned on\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'freebayes' --run_bcftools_stats\n      - name: SKIPPING Test checking all skip steps work i.e. input bam, skipping straight to genotyping\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --skip_preseq --skip_damage_calculation --run_genotyping --genotyping_tool 'freebayes'\n      - name: TRIMBAM Test bamutils works alone\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_trim_bam\n      - name: PMDTOOLS Test PMDtools works alone\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_pmdtools\n      - name: GENOTYPING_UG AND MULTIVCFANALYZER Test running GATK UnifiedGenotyper and MultiVCFAnalyzer, additional VCFS\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies\n      - name: COMPLEX LANE/LIBRARY MERGING Test running lane and library merging prior to GATK UnifiedGenotyper and running MultiVCFAnalyzer\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer\n      - name: GENOTYPING_UG ON TRIMMED BAM Test\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'\n      - name: BAM_INPUT Run the basic pipeline with the bam input profile, skip AdapterRemoval as no convertBam\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_adapterremoval\n      - name: BAM_INPUT Run the basic pipeline with the bam input profile, convert to FASTQ for adapterremoval test and downstream\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --run_convertinputbam\n      - name: METAGENOMIC Download MALT database\n        run: |\n          mkdir -p databases/malt\n          readlink -f databases/malt/\n          for i in index0.idx ref.db ref.idx ref.inf table0.db table0.idx taxonomy.idx taxonomy.map taxonomy.tre; do wget https://github.com/nf-core/test-datasets/raw/eager/databases/malt/\"$i\" -P databases/malt/; done\n      - name: METAGENOMIC Run the basic pipeline but with unmapped reads going into MALT\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database \"/home/runner/work/eager/eager/databases/malt/\" --malt_sam_output\n      - name: METAGENOMIC Run the basic pipeline but low-complexity filtered reads going into MALT\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database \"/home/runner/work/eager/eager/databases/malt/\" --metagenomic_complexity_filter\n      - name: MALTEXTRACT Download resource files\n        run: |\n          mkdir -p databases/maltextract\n          for i in ncbi.tre ncbi.map; do wget https://github.com/rhuebler/HOPS/raw/0.33/Resources/\"$i\" -P databases/maltextract/; done\n      - name: MALTEXTRACT Basic with MALT plus MaltExtract\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database \"/home/runner/work/eager/eager/databases/malt\" --run_maltextract --maltextract_ncbifiles \"/home/runner/work/eager/eager/databases/maltextract/\" --maltextract_taxon_list 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/maltextract/MaltExtract_list.txt'\n      - name: METAGENOMIC Run the basic pipeline but with unmapped reads going into Kraken\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_kraken,docker --run_bam_filtering  --bam_unmapped_type 'fastq'\n      - name: SNPCAPTURE Run the basic pipeline with the bam input profile, generating statistics with a SNP capture bed\n        run: |\n          wget https://github.com/nf-core/test-datasets/raw/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz && gunzip 1240K.pos.list_hs37d5.0based.bed.gz\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --snpcapture_bed 1240K.pos.list_hs37d5.0based.bed\n      - name: SEXDETERMINATION Run the basic pipeline with the bam input profile, but don't convert BAM, skip everything but sex determination\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --run_sexdeterrmine\n      - name: NUCLEAR CONTAMINATION Run basic pipeline with bam input profile, but don't convert BAM, skip everything but nuclear contamination estimation\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --run_nuclear_contamination\n      - name: MTNUCRATIO Run basic pipeline with bam input profile, but don't convert BAM, skip everything but nmtnucratio\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --skip_preseq --skip_damage_calculation --run_mtnucratio\n      - name: RESCALING Run basic pipeline with basic pipeline but with mapDamage rescaling of BAM files. Note this will be slow\n        run: |\n          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_mapdamage_rescaling --run_genotyping --genotyping_tool hc --genotyping_source 'rescaled'\n"
  },
  {
    "path": ".github/workflows/linting.yml",
    "content": "name: nf-core linting\n# This workflow is triggered on pushes and PRs to the repository.\n# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines\non:\n  push:\n  pull_request:\n  release:\n    types: [published]\n\njobs:\n  Markdown:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v2\n      - uses: actions/setup-node@v2\n\n      - name: Install markdownlint\n        run: npm install -g markdownlint-cli\n      - name: Run Markdownlint\n        run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml\n\n      # If the above check failed, post a comment on the PR explaining the failure\n      - name: Post PR comment\n        if: failure()\n        uses: mshick/add-pr-comment@v1\n        with:\n          message: |\n            ## Markdown linting is failing\n\n            To keep the code consistent with lots of contributors, we run automated code consistency checks.\n            To fix this CI test, please run:\n\n            * Install `markdownlint-cli`\n                * On Mac: `brew install markdownlint-cli`\n                * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`)\n            * Fix the markdown errors\n                * Automatically: `markdownlint . --config .github/markdownlint.yml --fix`\n                * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml`\n\n            Once you push these changes the test should pass, and you can hide this comment :+1:\n\n            We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!\n\n            Thanks again for your contribution!\n          repo-token: ${{ secrets.GITHUB_TOKEN }}\n          allow-repeats: false\n\n  YAML:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v1\n      - uses: actions/setup-node@v2\n\n      - name: Install yaml-lint\n        run: npm install -g yaml-lint\n      - name: Run yaml-lint\n        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name \"*.yml\" -o -name \"*.yaml\") -c .github/yamllint.yml\n\n      # If the above check failed, post a comment on the PR explaining the failure\n      - name: Post PR comment\n        if: failure()\n        uses: mshick/add-pr-comment@v1\n        with:\n          message: |\n            ## YAML linting is failing\n\n            To keep the code consistent with lots of contributors, we run automated code consistency checks.\n            To fix this CI test, please run:\n\n            * Install `yaml-lint`\n                * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`)\n            * Fix the markdown errors\n                * Run the test locally: `yamllint $(find . -type f -name \"*.yml\" -o -name \"*.yaml\")`\n                * Fix any reported errors in your YAML files\n\n            Once you push these changes the test should pass, and you can hide this comment :+1:\n\n            We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!\n\n            Thanks again for your contribution!\n          repo-token: ${{ secrets.GITHUB_TOKEN }}\n          allow-repeats: false\n\n  nf-core:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Check out pipeline code\n        uses: actions/checkout@v2\n\n      - name: Install Nextflow\n        env:\n          CAPSULE_LOG: none\n        run: |\n          wget -qO- get.nextflow.io | bash\n          sudo mv nextflow /usr/local/bin/\n\n      - uses: actions/setup-python@v1\n        with:\n          python-version: \"3.6\"\n          architecture: \"x64\"\n\n      - name: Install dependencies\n        run: |\n          python -m pip install --upgrade pip\n          pip install nf-core==1.14\n\n      - name: Run nf-core lint\n        env:\n          GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n          GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}\n        run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} --markdown lint_results.md\n\n      - name: Save PR number\n        if: ${{ always() }}\n        run: echo ${{ github.event.pull_request.number }} > PR_number.txt\n\n      - name: Upload linting log file artifact\n        if: ${{ always() }}\n        uses: actions/upload-artifact@v2\n        with:\n          name: linting-logs\n          path: |\n            lint_log.txt\n            lint_results.md\n            PR_number.txt\n"
  },
  {
    "path": ".github/workflows/linting_comment.yml",
    "content": "\nname: nf-core linting comment\n# This workflow is triggered after the linting action is complete\n# It posts an automated comment to the PR, even if the PR is coming from a fork\n\non:\n  workflow_run:\n    workflows: [\"nf-core linting\"]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Download lint results\n        uses: dawidd6/action-download-artifact@v2\n        with:\n          workflow: linting.yml\n\n      - name: Get PR number\n        id: pr_number\n        run: echo \"::set-output name=pr_number::$(cat linting-logs/PR_number.txt)\"\n\n      - name: Post PR comment\n        uses: marocchino/sticky-pull-request-comment@v2\n        with:\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n          number: ${{ steps.pr_number.outputs.pr_number }}\n          path: linting-logs/lint_results.md\n\n"
  },
  {
    "path": ".github/workflows/push_dockerhub_dev.yml",
    "content": "name: nf-core Docker push (dev)\n# This builds the docker image and pushes it to DockerHub\n# Runs on nf-core repo releases and push event to 'dev' branch (PR merges)\non:\n  push:\n    branches:\n      - dev\n\njobs:\n  push_dockerhub:\n    name: Push new Docker image to Docker Hub (dev)\n    runs-on: ubuntu-latest\n    # Only run for the nf-core repo, for releases and merged PRs\n    if: ${{ github.repository == 'nf-core/eager' }}\n    env:\n      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}\n      DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }}\n    steps:\n      - name: Check out pipeline code\n        uses: actions/checkout@v2\n\n      - name: Build new docker image\n        run: docker build --no-cache . -t nfcore/eager:dev\n\n      - name: Push Docker image to DockerHub (dev)\n        run: |\n          echo \"$DOCKERHUB_PASS\" | docker login -u \"$DOCKERHUB_USERNAME\" --password-stdin\n          docker push nfcore/eager:dev\n"
  },
  {
    "path": ".github/workflows/push_dockerhub_release.yml",
    "content": "name: nf-core Docker push (release)\n# This builds the docker image and pushes it to DockerHub\n# Runs on nf-core repo releases and push event to 'dev' branch (PR merges)\non:\n  release:\n    types: [published]\n\njobs:\n  push_dockerhub:\n    name: Push new Docker image to Docker Hub (release)\n    runs-on: ubuntu-latest\n    # Only run for the nf-core repo, for releases and merged PRs\n    if: ${{ github.repository == 'nf-core/eager' }}\n    env:\n      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}\n      DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }}\n    steps:\n      - name: Check out pipeline code\n        uses: actions/checkout@v2\n\n      - name: Build new docker image\n        run: docker build --no-cache . -t nfcore/eager:latest\n\n      - name: Push Docker image to DockerHub (release)\n        run: |\n          echo \"$DOCKERHUB_PASS\" | docker login -u \"$DOCKERHUB_USERNAME\" --password-stdin\n          docker push nfcore/eager:latest\n          docker tag nfcore/eager:latest nfcore/eager:${{ github.event.release.tag_name }}\n          docker push nfcore/eager:${{ github.event.release.tag_name }}\n"
  },
  {
    "path": ".github/yamllint.yml",
    "content": "rules:\n  document-start: disable\n  comments: disable\n  truthy: disable\n  line-length: disable\n  empty-lines: disable\n  \n"
  },
  {
    "path": ".gitignore",
    "content": ".nextflow*\nwork/\ndata/\nresults/\n.DS_Store\ntests/\ntesting/\ntesting*\n*.pyc\nmain_playground.nf\n.vscode\n*.code-workspace\nnf-params.json"
  },
  {
    "path": ".gitpod.yml",
    "content": "image: nfcore/gitpod:latest\n\nvscode:\n  extensions: # based on nf-core.nf-core-extensionpack\n    - codezombiech.gitignore # Language support for .gitignore files\n    # - cssho.vscode-svgviewer                 # SVG viewer\n    - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code\n    - eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed\n    - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files\n    - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar\n    - mechatroner.rainbow-csv # Highlight columns in csv files in different colors\n    # - nextflow.nextflow                      # Nextflow syntax highlighting\n    - oderwat.indent-rainbow # Highlight indentation level\n    - streetsidesoftware.code-spell-checker # Spelling checker for source code\n"
  },
  {
    "path": ".nf-core-lint.yml",
    "content": "files_unchanged:\n  - assets/multiqc_config.yaml\n  - .github/CONTRIBUTING.md\n  - .github/ISSUE_TEMPLATE/bug_report.md\n  - docs/README.md\n  - .github/workflows/linting.yml\n"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# nf-core/eager: Changelog\n\nThe format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)\nand this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).\n\n## [2.5.3] - 2025-03-18\n\n### `Added`\n\n### `Fixed`\n\n- [#1119](https://github.com/nf-core/eager/issues/1119) - Fix typo in variable of IndelRealigner step of UnifiedGenotyper when generating a targetIntervals file (♥ to @Dog13Golf for reporting, fix by @jfy133).\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.5.2] - 2024-06-28\n\n### `Added`\n\n- [#1079](https://github.com/nf-core/eager/issues/1079) - Added the `lanemerging` output directory in the output documentation (♥ to @TessaZei for reporting, fix by @TCLamnidis).\n\n### `Fixed`\n\n- [#1037](https://github.com/nf-core/eager/issues/1073) - Fixed post-adapterremoval trimmed FastQC results not being displayed in MultiQC (♥ to @kieren-j-mitchell for reporting, fix by @jfy133 and @TCLamnidis)\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.5.1] - 2024-02-21\n\n### `Added`\n\n- [#1037](https://github.com/nf-core/eager/issues/1037) Added an option to deactivate the `-sorted` option of bedtools coverage, in case the feature file is not sorted the same way as the fasta file, albeit with the caveat this will be very slow. (♥ Thanks to @IdoBar for reporting, and contributing.)\n\n### `Fixed`\n\n- [#1048](https://github.com/nf-core/eager/issues/1048) `--vcf2genome_outfile` parameter now gets prefixed by the sample_name and suffixed with `.fasta` (i.e. `<sample_name>_<vcf2genome_outfile>.fasta`). This ensures we avoid overwriting the output fasta of one sample with that of another when the option is provided. (♥ Thanks to @MeriamOs for reporting.)\n- [#1047](https://github.com/nf-core/eager/issues/1047) Changed the row some statistics were reported in the General Stats table. The File name collision fixed in 2.5.0 (see #1017) caused these statistics to be reported in the wrong row due to an added suffix.\n- [#1051](https://github.com/nf-core/eager/issues/1051) An error is now thrown if input BAM files end in `.unmapped.bam`, as this breaks the bam filtering process and empties the bam files in the process. (♥ Thanks to @PCQuilis for reporting.)\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.5.0] - Bopfingen - 2023-11-03\n\n### `Added`\n\n- [#1020](https://github.com/nf-core/eager/issues/1020) Added mapDamage2 as an alternative for damage calculation.\n\n### `Fixed`\n\n- [#1017](https://github.com/nf-core/eager/issues/1017) Fixed file name collision in niche cases with multiple libraries of multiple UDG treatments.\n- [#1024](https://github.com/nf-core/eager/issues/1024) `multiqc_general_stats.txt` is now generated even if the table is a beeswarm plot in the report.\n- [#655](https://github.com/nf-core/eager/issues/655) Updated RG tags for all mappers. RG-id now includes Sample as well as Library ID. Added `LB:` tag with the library ID.\n- [#1031](https://github.com/nf-core/eager/issues/1031) Always index fasta regardless of mapper. This ensures that DamageProfiler and genotyping processes get submitted when using bowtie2 and not providing a fasta index.\n\n### `Dependencies`\n\n- `multiqc`: 1.14 -> 1.16\n\n### `Deprecated`\n\n## [2.4.7] - 2023-05-16\n\n### `Added`\n\n### `Fixed`\n\n- [#983](https://github.com/nf-core/eager/issues/983) Bump `pygments` version due to incompatibility with MultiQC dependencies (♥ to @MinLuke for reporting)\n\n### `Dependencies`\n\n- `pygments`: 2.9 -> 2.14\n- `multiqc`: 1.13 -> 1.14\n\n### `Deprecated`\n\n## [2.4.6] - 2022-11-14\n\n### `Added`\n\n- [#933](https://github.com/nf-core/eager/issues/933) Added support for customising --seq-length in mapDamage rescaling (♥ to @ashildv for requesting)\n\n### `Fixed`\n\n- Changed endors.py license from GPL to MIT (♥ to @aidaanva for fixing)\n- Removed erroneous R2 in single-end example in input TSV of usage docs (♥ to @aidaanva for fixing)\n- [#928](https://github.com/nf-core/eager/issues/928) Fixed read group incompatibility by re-adding picard AddOrReplaceReadGroups for MultiVCFAnalyzer (♥ to @aidaanva, @meganemichel for reporting)\n- Fixed edge case of DamageProfiler occasionally requiring FASTA index (♥ to @asmaa-a-abdelwahab for reporting)\n- [#834](https://github.com/nf-core/eager/issues/834) Increased significance values in general stats table for Qualimap mean/median coverages (♥ to @neija2611 for reporting)\n- Fixed parameter documentation for `--snpcapture_bed` regarding on-target SNP stats to state these stats currently not displayed in MultiQC only in the Qualimap results (♥ to @meganemichel and @TCLamnidis for reporting)\n- [#934](https://github.com/nf-core/eager/issues/934) Fixed broken parameter setting in mapDamage2 rescale length (♥ to @ashildv for reporting)\n\n### `Dependencies`\n\n- Updated MultiQC to official 1.13 version (rather than alpha)\n- Added pinned MALT dependency to ensure working version in future versions of eager\n\n### `Deprecated`\n\n## [2.4.5] - 2022-08-02\n\n### `Added`\n\n### `Fixed`\n\n- [#882](https://github.com/nf-core/eager/pull/882) Define DSL1 execution explicitly, as new versions Nextflow made DSL2 default (♥ to & fix from @Lehmann-Fabian)\n- [#879](https://github.com/nf-core/eager/issues/879) Add missing threads parameter for pre-clipping FastQC for single end data that caused insufficient memory in some cases (♥ to @marcel-keller for reporting)\n- [#880](https://github.com/nf-core/eager/issues/880) Fix failure of endorSpy to be cached or reexecuted on resume (♥ to @KathrinNaegele, @TCLamnidis, & @mahesh-panchal for reporting and debugging)\n- [#885](https://github.com/nf-core/eager/issues/885) Specify task memory for all tools in get_software_versions to account for incompatibilty of java with some SGE clusters causing hanging of the process (♥ to @maxibor for reporting)\n- [#887](https://github.com/nf-core/eager/issues/887) Clarify what is considered 'ultra-short' reads in the help text of clip_readlength, for when you may wish to turn of length filtering during AdapterRemoval (♥ to @TCLamnidis for reporting)\n- [#889](https://github.com/nf-core/eager/issues/889) Remove/update parameters from benchmarking test profiles (♥ to @TCLamnidis for reporting)\n- [#895](https://github.com/nf-core/eager/issues/895) Output documentation typo fix and added location of output docs in pipeline summary (♥ to @RodrigoBarquera for reporting)\n- [#897](https://github.com/nf-core/eager/issues/897) Fix pipeline crash if no Kraken2 results generated (♥ to @alexandregilardet for reporting)\n- [#899](https://github.com/nf-core/eager/issues/897) Fix pipeline crash for circulargenerator if reference file does not end in .fasta (♥ to @scarlhoff for reporting)\n- Fixed some missing default values in the nextflow parameter schema JSON\n- [#789](https://github.com/nf-core/eager/issues/789) Substantial speed and memory optimisation of the `extract_map_reads.py` script (♥ to @ivelsko for reporting, @maxibor for optimisation)\n- Fix staging of input bams for genotyping_pileupcaller process. Downstream changes from changes introduced when fixing endorspy caching.\n- Made slight correction on metro map diagram regarding input data to SexDeterrmine (only BAM trimming output files)\n\n### `Dependencies`\n\n- Updated MultiQC to latest stable alpha version on bioconda, correcting the previously nonsensical AdapterRemoval plots (♥ to @NiemannJ for fixing in MultiQC)\n\n### `Deprecated`\n\n## [2.4.4] - 2022-04-08\n\n### `Added`\n\n### `Fixed`\n\n- Fixed some auxiliary files (adapater list, snpcapture/pileupcaller/sexdeterrmine BED files, and pileupCaller SNP file, PMD reference mask) in some cases only be used against one sample (❤ to @meganemichel for reporting, fix by @jfy133)\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.4.3] - 2022-03-24\n\n### `Added`\n\n### `Fixed`\n\n- [#828](https://github.com/nf-core/eager/issues/828) Improved error message if required metagenomic screening parameters not set correctly\n- [#836](https://github.com/nf-core/eager/issues/836) Remove deprecated parameters from test profiles\n- [#838](https://github.com/nf-core/eager/issues/838) Fix --snpcapture_bed files not being picked up by Nextflow (❤ to @meganemichel for reporting)\n- [#843](https://github.com/nf-core/eager/issues/843) Re-add direct piping of AdapterRemovalFixPrefix to pigz\n- [#844](https://github.com/nf-core/eager/issues/844) Fixed reference masking prior to pmdtools\n- [#845](https://github.com/nf-core/eager/issues/845) Updates parameter documention to specify `-s` preseq parameter also applies to lc_extrap\n- [#851](https://github.com/nf-core/eager/issues/851) Fixes a file-name clash during additional_library_merge, post-BAM trimming of different UDG treated libraries of a sample (❤ to @alexandregilardet for reporting)\n- Renamed a range of MultiQC general stats table headers to improve clarity, documentation has been updated accordingly\n- [#857](https://github.com/nf-core/eager/issues/857) Corrected samtools fastq flag to _retain_ read-pair information when converting off-target BAM files to fastq in paired-end mapping (❤ to @alexhbnr for reporting)\n- [#866](https://github.com/nf-core/eager/issues/866) Fixed a typo in the indexing step of BWA mem when not-collapsing (❤ to @alexhbnr for reporting)\n- Corrected tutorials to reflect updated BAM trimming flags (❤ to @marcel-keller for reporting and correcting)\n\n### `Dependencies`\n\n- [#829](https://github.com/nf-core/eager/issues/829) Bumped sequencetools: 1.4.0.5 -> 1.5.2\n- Bumped MultiQC: 1.11 -> 1.12 (for run-time optimisation and tool citation information)\n\n### `Deprecated`\n\n## [2.4.2] - 2022-01-24\n\n### `Added`\n\n### `Fixed`\n\n- [#824](https://github.com/nf-core/eager/issues/824) Fixes large memory footprint of bedtools coverage calculation.\n- [#822](https://github.com/nf-core/eager/issues/822) Fixed post-adapterremoval trimmed files not being lane-merged and included in downstream analyses\n- Fixed a couple of software version reporting commands\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.4.1] - 2021-11-30\n\n### `Added`\n\n- [#805](https://github.com/nf-core/eager/issues/805) Changes to bam_trim options to allow flexible trimming by library strandedness (in addition to UDG treatment). (@TCLamnidis)\n- [#808](https://github.com/nf-core/eager/issues/808) Retain read group information across bam merges. Sample set to sample name (rather than library name) in bwa output 'RG' readgroup tag. (@TCLamnidis)\n- Map and base quality filters prior to genotyping with pileupcaller can now be specified. (@TCLamnidis)\n- [#774](https://github.com/nf-core/eager/issues/774) Added support for multi-threaded Bowtie2 build reference genome indexing (@jfy133)\n- [#804](https://github.com/nf-core/eager/issues/804) Improved output documentation description to add how 'cluster factor' is calculated (thanks to @meganemichel)\n\n### `Fixed`\n\n- [#803](https://github.com/nf-core/eager/issues/803) Fixed mistake in metro-map diagram (`samtools index` is now correctly `samtools faidx`) (@jfy133)\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.4.0] - Wangen - 2021-09-14\n\n### `Added`\n\n- [#317](https://github.com/nf-core/eager/issues/317) Added bcftools stats for general genotyping statistics of VCF files\n- [#651](https://github.com/nf-core/eager/issues/651) - Adds removal of adapters specified in an AdapterRemoval adapter list file\n- [#642](https://github.com/nf-core/eager/issues/642) and [#431](https://github.com/nf-core/eager/issues/431) adds post-adapter removal barcode/fastq trimming\n- [#769](https://github.com/nf-core/eager/issues/769) - Adds lc_extrap mode to preseq (suggested by @roberta-davidson)\n\n### `Fixed`\n\n- Fixed some missing or incorrectly reported software versions\n- [#771](https://github.com/nf-core/eager/issues/771) Remove legacy code\n- Improved output documentation for MultiQC general stats table (thanks to @KathrinNaegele and @esalmela)\n- Improved output documentation for BowTie2 (thanks to @isinaltinkaya)\n- [#612](https://github.com/nf-core/eager/issues/612) Updated BAM trimming defaults to 0 to ensure no unwanted trimming when mixing half-UDG with no-UDG (thanks to @scarlhoff)\n- [#722](https://github.com/nf-core/eager/issues/722) Updated BWA mapping mapping parameters to latest recommendations - primarily alnn back to 0.01 and alno to 2 as per Oliva et al. 2021 (10.1093/bib/bbab076)\n- Updated workflow diagrams to reflect latest functionality\n- [#787](https://github.com/nf-core/eager/issues/787) Adds memory specification flags for the GATK UnifiedGenotyper and HaplotyperCaller steps (thanks to @nylander)\n- Fixed issue where MultiVCFAnalyzer would not pick up newly generated VCF files, when specifying additional VCF files.\n- [#790](https://github.com/nf-core/eager/issues/790) Fixed kraken2 report file-name collision when sample names have `.` in them\n- [#792](https://github.com/nf-core/eager/issues/792) Fixed java error messages for AdapterRemovalFixPrefix being hidden in output\n- [#794](https://github.com/nf-core/eager/issues/794) Aligned default test profile with nf-core standards (`test_tsv` is now `test`)\n\n### `Dependencies`\n\n- Bumped python: 3.7.3 -> 3.9.4\n- Bumped markdown: 3.2.2 -> 3.3.4\n- Bumped pymdown-extensions: 7.1 -> 8.2\n- Bumped pyments: 2.6.1 -> 2.9.0\n- Bumped adapterremoval: 2.3.1 -> 2.3.2\n- Bumped picard: 2.22.9 -> 2.26.0\n- Bumped samtools 1.9 -> 1.12\n- Bumped angsd: 0.933 -> 0.935\n- Bumped gatk4: 4.1.7.0 -> 4.2.0.0\n- Bumped multiqc: 1.10.1 -> 1.11\n- Bumped bedtools 2.29.2 -> 2.30.0\n- Bumped libiconv: 1.15 -> 1.16\n- Bumped preseq: 2.0.3 -> 3.1.2\n- Bumped bamutil: 1.0.14 -> 1.0.15\n- Bumped pysam: 0.15.4 -> 0.16.0\n- Bumped kraken2: 2.1.1 -> 2.1.2\n- Bumped pandas: 1.0.4 -> 1.2.4\n- Bumped freebayes: 1.3.2 -> 1.3.5\n- Bumped biopython: 1.76 -> 1.79\n- Bumped xopen: 0.9.0 -> 1.1.0\n- Bumped bowtie2: 2.4.2 -> 2.4.4\n- Bumped mapdamage2: 2.2.0 -> 2.2.1\n- Bumped bbmap: 38.87 -> 38.92\n- Added bcftools: 1.12\n\n### `Deprecated`\n\n## [2.3.5] - 2021-06-03\n\n### `Added`\n\n- [#722](https://github.com/nf-core/eager/issues/722) - Adds bwa `-o` flag for more flexibility in bwa parameters\n- [#736](https://github.com/nf-core/eager/issues/736) - Add printing of multiqc run report location on successful completion\n- New logo that is more visible when a user is using darkmode on GitHub or nf-core website!\n\n### `Fixed`\n\n- [#723](https://github.com/nf-core/eager/issues/723) - Fixes empty fields in TSV resulting in uninformative error\n- Updated template to nf-core/tools 1.14\n- [#688](https://github.com/nf-core/eager/issues/688) - Clarified the pipeline is not just for humans and microbes, but also plants and animals, and also for modern DNA\n- [#751](https://github.com/nf-core/eager/pull/751) - Added missing label to mtnucratio\n- General code cleanup and standardisation of parameters with no default setting\n- [#750](https://github.com/nf-core/eager/issues/750) - Fixed piped commands requesting the same number of CPUs at each command step\n- [#757](https://github.com/nf-core/eager/issues/757) - Removed confusing 'Data Type' variable from MultiQC workflow summary (not consistent with TSV input)\n- [#759](https://github.com/nf-core/eager/pull/759) - Fixed malformed software scraping regex that resulted in N/A in MultiQC report\n- [#761](https://github.com/nf-core/eager/pull/759) - Fixed issues related to instability of samtools filtering related CI tests\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.3.4] - 2021-05-05\n\n### `Added`\n\n- [#729](https://github.com/nf-core/eager/issues/729) - Added Bowtie2 flag `--maxins` for PE mapping modern DNA mapping contexts\n\n### `Fixed`\n\n- Corrected explanation of the \"--min_adap_overlap\" parameter for AdapterRemoval in the docs\n- [#725](https://github.com/nf-core/eager/pull/725) - `bwa_index` doc update\n- Re-adds gzip piping to AdapterRemovalFixPrefix to speed up process after reports of being very slow\n- Updated DamageProfiler citation from bioRxiv to publication\n\n### `Dependencies`\n\n- Removed pinning of `tbb` (upstream bug in bioconda fixed)\n- Bumped `pigz` to 2.6 to fix rare stall bug when compressing data after AdapterRemoval\n- Bumped Bowtie2 to 2.4.2 to fix issues with `tbb` version\n\n### `Deprecated`\n\n## [2.3.3] - 2021-04-08\n\n### `Added`\n\n- [#349](https://github.com/nf-core/eager/issues/349) - Added option enabling platypus formatted output of pmdtools misincorporation frequencies.\n\n### `Fixed`\n\n- [#719](https://github.com/nf-core/eager/pull/719) - Fix filename for bam output of `mapdamage_rescaling`\n- [#707](https://github.com/nf-core/eager/pull/707) - Fix typo in UnifiedGenotyper IndelRealigner command\n- Fixed some Java tools not following process memory specifications\n- Updated template to nf-core/tools 1.13.2\n- [#711](https://github.com/nf-core/eager/pull/711) - Fix conditional execution preventing multivcfanalyze to run\n- [#714](https://github.com/nf-core/eager/issues/714) - Fixes bug in nuc contamination by upgrading to latest MultiQC v1.10.1 bugfix release\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.3.2] - 2021-03-16\n\n### `Added`\n\n- [#687](https://github.com/nf-core/eager/pull/687) - Adds Kraken2 unique kmer counting report\n- [#676](https://github.com/nf-core/eager/issues/676) - Refactor help message / summary message formatting to automatic versions using nf-core library\n- [#682](https://github.com/nf-core/eager/issues/682) - Add AdapterRemoval `--qualitymax` flag to allow FASTQ Phred score range max more than 41\n\n### `Fixed`\n\n- [#666](https://github.com/nf-core/eager/issues/666) - Fixed input file staging for `print_nuclear_contamination`\n- [#631](https://github.com/nf-core/eager/issues/631) - Update minimum Nextflow version to 20.07.1, due to unfortunate bug in Nextflow 20.04.1 causing eager to crash if patch pulled\n- Made MultiQC crash behaviour stricter when dealing with large datasets, as reported by @ashildv\n- [#652](https://github.com/nf-core/eager/issues/652) - Added note to documentation that when using `--skip_collapse` this will use _paired-end_ alignment mode with mappers when using PE data\n- [#626](https://github.com/nf-core/eager/issues/626) - Add additional checks to ensure pipeline will give useful error if cells of a TSV column are empty\n- Added note to documentation that when using `--skip_collapse` this will use _paired-end_ alignment mode with mappers when using PE data\n- [#673](https://github.com/nf-core/eager/pull/673) - Fix Kraken database loading when loading from directory instead of compressed file\n- [#688](https://github.com/nf-core/eager/issues/668) - Allow pipeline to complete, even if Qualimap crashes due to an empty or corrupt BAM file for one sample/library\n- [#683](https://github.com/nf-core/eager/pull/683) - Sets `--igenomes_ignore` to true by default, as rarely used by users currently and makes resolving configs less complex\n- Added exit code `140` to re-tryable exit code list to account for certain scheduler wall-time limit fails\n- [#672](https://github.com/nf-core/eager/issues/672) - Removed java parameter from picard tools which could cause memory issues\n- [#679](https://github.com/nf-core/eager/issues/679) - Refactor within-process bash conditions to groovy/nextflow, due to incompatibility with some servers environments\n- [#690](https://github.com/nf-core/eager/pull/690) - Fixed ANGSD output mode for beagle by setting `-doMajorMinor 1` as default in that case\n- [#693](https://github.com/nf-core/eager/issues/693) - Fixed broken TSV input validation for the Colour Chemistry column\n- [#695](https://github.com/nf-core/eager/issues/695) - Fixed incorrect `-profile` order in tutorials (originally written reversed due to [nextflow bug](https://github.com/nextflow-io/nextflow/issues/1792))\n- [#653](https://github.com/nf-core/eager/issues/653) - Fixed file collision errors with sexdeterrmine for two same-named libraries with different strandedness\n\n### `Dependencies`\n\n- Bumped MultiQC to 1.10 for improved functionality\n- Bumped HOPS to 0.35 for MultiQC 1.10 compatibility\n\n### `Deprecated`\n\n## [2.3.1] - 2021-01-14\n\n### `Added`\n\n### `Fixed`\n\n- [#654](https://github.com/nf-core/eager/issues/654) - Fixed some values in JSON schema (used in launch GUI) not passing validation checks during run\n- [#655](https://github.com/nf-core/eager/issues/655) - Updated read groups for all mappers to allow proper GATK validation\n- Fixed issue with Docker container not being pullable by Nextflow due to version-number inconsistencies\n\n### `Dependencies`\n\n### `Deprecated`\n\n## [2.3.0] - Aalen - 2021-01-11\n\n### `Added`\n\n- [#640](https://github.com/nf-core/eager/issues/640) - Added a pre-metagenomic screening filtering of low-sequence complexity reads with `bbduk`\n- [#583](https://github.com/nf-core/eager/issues/583) - Added `mapDamage2` rescaling of BAM files to remove damage\n- Updated usage (merging files) and workflow images reflecting new functionality.\n\n### `Fixed`\n\n- Removed leftover old DockerHub push CI commands.\n- [#627](https://github.com/nf-core/eager/issues/627) - Added de Barros Damgaard citation to README\n- [#630](https://github.com/nf-core/eager/pull/630) - Better handling of Qualimap memory requirements and error strategy.\n- Fixed some incomplete schema options to ensure users supply valid input values\n- [#638](https://github.com/nf-core/eager/issues/638#issuecomment-748877567) Fixed inverted circularfilter filtering (previously filtering would happen by default, not when requested by user as originally recorded in documentation)\n- [DeDup:](https://github.com/apeltzer/DeDup/commit/07d47868f10a6830da8c9161caa3755d9da155bf) Fixed Null Pointer Bug in DeDup by updating to 0.12.8 version\n- [#650](https://github.com/nf-core/eager/pull/650) - Increased memory given to FastQC for larger files by making it multithreaded\n\n### `Dependencies`\n\n- Update: DeDup v0.12.7 to v0.12.8\n\n### `Deprecated`\n\n## [2.2.2] - 2020-12-09\n\n### `Added`\n\n- Added large scale 'stress-test' profile for AWS (using de Barros Damgaard et al. 2018's 137 ancient human genomes).\n  - This will now be run automatically for every release. All processed data will be available on the nf-core website: <https://nf-co.re/eager/results>\n    - You can run this yourself using `-profile test_full`\n\n### `Fixed`\n\n- Fixed AWS full test profile.\n- [#587](https://github.com/nf-core/eager/issues/587) - Re-implemented AdapterRemovalFixPrefix for DeDup compatibility of including singletons\n- [#602](https://github.com/nf-core/eager/issues/602) - Added the newly available GATK 3.5 conda package.\n- [#610](https://github.com/nf-core/eager/issues/610) - Create bwa_index channel when specifying circularmapper as mapper\n- Updated template to nf-core/tools 1.12.1\n- General documentation improvements\n\n### `Deprecated`\n\n- Flag `--gatk_ug_jar` has now been removed as GATK 3.5 is now avaliable within the nf-core/eager software environment.\n\n## [2.2.1] - 2020-10-20\n\n### `Fixed`\n\n- [#591](https://github.com/nf-core/eager/issues/591) - Fixed offset underlines in lane merging diagram in docs\n- [#592](https://github.com/nf-core/eager/issues/592) - Fixed issue where supplying Bowtie2 index reported missing bwamem_index error\n- [#590](https://github.com/nf-core/eager/issues/592) - Removed redundant dockstore.yml from root\n- [#596](https://github.com/nf-core/eager/issues/596) - Add workaround for issue regarding gzipped FASTAs and pre-built indices\n- [#589](https://github.com/nf-core/eager/issues/582) - Updated template to nf-core/tools 1.11\n- [#582](https://github.com/nf-core/eager/issues/582) - Clarify memory limit issue on FAQ\n\n## [2.2.0] - Ulm - 2020-10-20\n\n### `Added`\n\n- **Major** Automated cloud tests with large-scale data on [AWS](https://aws.amazon.com/)\n- **Major** Re-wrote input logic to accept a TSV 'map' file in addition to direct paths to FASTQ files\n- **Major** Added JSON Schema, enabling web GUI for configuration of pipeline available [here](https://nf-co.re/launch?pipeline=eager&release=2.2.0)\n- **Major** Lane and library merging implemented\n  - When using TSV input, one library with the multiple _lanes_ will be merged together, before mapping\n  - Strip FASTQ will also produce a lane merged 'raw' but 'stripped' FASTQ file\n  - When using TSV input, one sample with multiple (same treatment) libraries will be merged together\n  - Important: direct FASTQ paths will not have this functionality. TSV is required.\n- [#40](https://github.com/nf-core/eager/issues/40) - Added the pileupCaller genotyper from [sequenceTools](https://github.com/stschiff/sequenceTools)\n- Added validation check and clearer error message when `--fasta_index` is provided and filepath does not end in `.fai`.\n- Improved error messages\n- Added ability for automated emails using `mailutils` to also send MultiQC reports\n- General documentation additions, cleaning, and updated figures with CC-BY license\n- Added large 'full size' dataset test-profiles for ancient fish and human contexts human\n- [#257](https://github.com/nf-core/eager/issues/257) - Added the bowtie2 aligner as option for mapping, following Poullet and Orlando 2020 doi: [10.3389/fevo.2020.00105](https://doi.org/10.3389/fevo.2020.00105)\n- [#451](https://github.com/nf-core/eager/issues/451) - Adds ANGSD genotype likelihood calculations as an alternative to typical 'genotypers'\n- [#566](https://github.com/nf-core/eager/issues/466) - Add tutorials on how to set up nf-core/eager for different contexts\n- Nuclear contamination results are now shown in the MultiQC report\n- Tutorial on how to use profiles for reproducible science (i.e. parameter sharing between different groups)\n- [#522](https://github.com/nf-core/eager/issues/522) - Added post-mapping length filter to assist in more realistic endogenous DNA calculations\n- [#512](https://github.com/nf-core/eager/issues/512) - Added flexible trimming of BAMs by library type. 'half' and 'none' UDG libraries can now be trimmed differentially within a single eager run.\n- Added a `.dockstore.yml` config file for automatic workflow registration with [dockstore.org](https://dockstore.org/)\n- Updated template to nf-core/tools 1.10.2\n- [#544](https://github.com/nf-core/eager/pull/544) - Add script to perform bam filtering on fragment length\n- [#456](https://github.com/nf-core/eager/pull/546) - Bumps the base (default) runtime of all processes to 4 hours, and set shorter time limits for test profiles (1 hour)\n- [#552](https://github.com/nf-core/eager/issues/552) - Adds optional creation of MALT SAM files alongside RMA6 files\n- Added eigenstrat snp coverage statistics to MultiQC report. Process results are published in `genotyping/*_eigenstrat_coverage.txt`.\n\n### `Fixed`\n\n- [#368](https://github.com/nf-core/eager/issues/368) - Fixed the profile `test` to contain a parameter for `--paired_end`\n- Mini bugfix for typo in line 1260+1261\n- [#374](https://github.com/nf-core/eager/issues/374) - Fixed output documentation rendering not containing images\n- [#379](https://github.com/nf-core/eager/issues/378) - Fixed insufficient memory requirements for FASTQC edge case\n- [#390](https://github.com/nf-core/eager/issues/390) - Renamed clipped/merged output directory to be more descriptive\n- [#398](https://github.com/nf-core/eager/issues/498) - Stopped incompatible FASTA indexes being accepted\n- [#400](https://github.com/nf-core/eager/issues/400) - Set correct recommended bwa mapping parameters from [Schubert et al. 2012](https://doi.org/10.1186/1471-2164-13-178)\n- [#410](https://github.com/nf-core/eager/issues/410) - Fixed nf-core/configs not being loaded properly\n- [#473](https://github.com/nf-core/eager/issues/473) - Fixed bug in sexdet_process on AWS\n- [#444](https://github.com/nf-core/eager/issues/444) - Provide option for preserving realigned bam + index\n- Fixed deduplication output logic. Will now pass along only the post-rmdup bams if duplicate removal is not skipped, instead of both the post-rmdup and pre-rmdup bams\n- [#497](https://github.com/nf-core/eager/issues/497) - Simplifies number of parameters required to run bam filtering\n- [#501](https://github.com/nf-core/eager/issues/501) - Adds additional validation checks for MALT/MaltExtract database input files\n- [#508](https://github.com/nf-core/eager/issues/508) - Made Markduplicates default dedupper due to narrower context specificity of dedup\n- [#516](https://github.com/nf-core/eager/issues/516) - Made bedtools not report out of memory exit code when warning of inconsistent FASTA/Bed entry names\n- [#504](https://github.com/nf-core/eager/issues/504) - Removed uninformative sexdeterrmine-snps plot from MultiQC report.\n- Nuclear contamination is now reported with the correct library names.\n- [#531](https://github.com/nf-core/eager/pull/531) - Renamed 'FASTQ stripping' to 'host removal'\n- Merged all tutorials and FAQs into `usage.md` for display on [nf-co.re](https://www.nf-co.re)\n- Corrected header of nuclear contamination table (`nuclear_contamination.txt`).\n- Fixed a bug with `nSNPs` definition in `print_x_contamination.py`. Number of SNPs now correctly reported\n- `print_x_contamination.py` now correctly converts all NA values to \"N/A\"\n- Increased amount of memory MultiQC by default uses, to account for very large nf-core/eager runs (e.g. >1000 samples)\n\n### `Dependencies`\n\n- Added sequenceTools (1.4.0.6) that adds the ability to do genotyping with the 'pileupCaller'\n- Latest version of DeDup (0.12.6) which now reports mapped reads after deduplication\n- [#560](https://github.com/nf-core/eager/issues/560) Latest version of Dedup (0.12.7), which now correctly reports deduplication statistics based on calculations of mapped reads only (prior denominator was total reads of BAM file)\n- Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficient reads\n- Latest version of MultiQC (1.9) with support for lots of extra tools in the pipeline (MALT, SexDetERRmine, DamageProfiler, MultiVCFAnalyzer)\n- Latest versions of Pygments (7.1), Pymdown-Extensions (2.6.1) and Markdown (3.2.2) for documentation output\n- Latest version of Picard (2.22.9)\n- Latest version of GATK4 (4.1.7.0)\n- Latest version of sequenceTools (1.4.0.6)\n- Latest version of fastP (0.20.1)\n- Latest version of Kraken2 (2.0.9beta)\n- Latest version of FreeBayes (1.3.2)\n- Latest version of xopen (0.9.0)\n- Added Bowtie 2 (2.4.1)\n- Latest version of Sex.DetERRmine (1.1.2)\n- Latest version of endorS.py (0.4)\n\n## [2.1.0] - Ravensburg - 2020-03-05\n\n### `Added`\n\n- Added Support for automated tests using [GitHub Actions](https://github.com/features/actions), replacing travis\n- [#40](https://github.com/nf-core/eager/issues/40), [#231](https://github.com/nf-core/eager/issues/231) - Added genotyping capability through GATK UnifiedGenotyper (v3.5), GATK HaplotypeCaller (v4.1) and FreeBayes\n- Added MultiVCFAnalyzer module\n- [#240](https://github.com/nf-core/eager/issues/240) - Added human sex determination module\n- [#226](https://github.com/nf-core/eager/issues/226) - Added `--preserve5p` function for AdapterRemoval\n- [#212](https://github.com/nf-core/eager/issues/212) - Added ability to use only merged reads downstream from AdapterRemoval\n- [#265](https://github.com/nf-core/eager/issues/265) - Adjusted full markdown linting in Travis CI\n- [#247](https://github.com/nf-core/eager/issues/247) - Added nuclear contamination with angsd\n- [#258](https://github.com/nf-core/eager/issues/258) - Added ability to report bedtools stats to features (e.g. depth/breadth of annotated genes)\n- [#249](https://github.com/nf-core/eager/issues/249) - Added metagenomic classification of unmapped reads with MALT and aDNA authentication with MaltExtract\n- [#302](https://github.com/nf-core/eager/issues/302) - Added mitochondrial to nuclear ratio calculation\n- [#302](https://github.com/nf-core/eager/issues/302) - Added VCF2Genome for consensus sequence generation\n- Fancy new logo from [ZandraFagernas](https://github.com/ZandraFagernas)\n- [#286](https://github.com/nf-core/eager/issues/286) - Adds pipeline-specific profiles (loaded from nf-core configs)\n- [#310](https://github.com/nf-core/eager/issues/310) - Generalises base.config\n- [#326](https://github.com/nf-core/eager/pull/326) - Add Biopython and [xopen](https://github.com/marcelm/xopen/) dependencies\n- [#336](https://github.com/nf-core/eager/issues/336) - Change default Y-axis maximum value of DamageProfiler to 30% to match popular (but slower) mapDamage, and allow user to set their own value.\n- [#352](https://github.com/nf-core/eager/pull/352) - Add social preview image\n- [#355](https://github.com/nf-core/eager/pull/355) - Add Kraken2 metagenomics classifier\n- [#90](https://github.com/nf-core/eager/issues/90) - Added endogenous DNA calculator (original repository: [https://github.com/aidaanva/endorS.py/](https://github.com/aidaanva/endorS.py/))\n\n### `Fixed`\n\n- [#227](https://github.com/nf-core/eager/issues/227) - Large re-write of input/output process logic to allow maximum flexibility. Originally to address [#227](https://github.com/nf-core/eager/issues/227), but further expanded\n- Fixed Travis-Ci.org to Travis-Ci.com migration issues\n- [#266](https://github.com/nf-core/eager/issues/266) - Added sanity checks for input filetypes (i.e. only BAM files can be supplied if `--bam`)\n- [#237](https://github.com/nf-core/eager/issues/237) - Fixed and Updated script scrape_software_versions\n- [#322](https://github.com/nf-core/eager/pull/322) - Move extract map reads fastq compression to pigz\n- [#327](https://github.com/nf-core/eager/pull/327) - Speed up strip_input_fastq process and make it more robust\n- [#342](https://github.com/nf-core/eager/pull/342) - Updated to match nf-core tools 1.8 linting guidelines\n- [#339](https://github.com/nf-core/eager/issues/339) - Converted unnecessary zcat + gzip to just cat for a performance boost\n- [#344](https://github.com/nf-core/eager/issues/344) - Fixed pipeline still trying to run when using old nextflow version\n\n### `Dependencies`\n\n- adapterremoval=2.2.2 upgraded to 2.3.1\n- adapterremovalfixprefix=0.0.4 upgraded to 0.0.5\n- damageprofiler=0.4.3 upgraded to 0.4.9\n- angsd=0.923 upgraded to 0.931\n- gatk4=4.1.2.0 upgraded to 4.1.4.1\n- mtnucratio=0.5 upgraded to 0.6\n- conda-forge::markdown=3.1.1 upgraded to 3.2.1\n- bioconda::fastqc=0.11.8 upgraded to 0.11.9\n- bioconda::picard=2.21.4 upgraded to 2.22.0\n- bioconda::bedtools=2.29.0 upgraded to 2.29.2\n- pysam=0.15.3 upgraded to 0.15.4\n- conda-forge::pandas=1.0.0 upgraded to 1.0.1\n- bioconda::freebayes=1.3.1 upgraded to 1.3.2\n- conda-forge::biopython=1.75 upgraded to 1.76\n\n## [2.0.7] - 2019-06-10\n\n### `Added`\n\n- [#189](https://github.com/nf-core/eager/pull/189) - Outputting unmapped reads in a fastq files with the --strip_input_fastq flag\n- [#186](https://github.com/nf-core/eager/pull/186) - Make FastQC skipping [possible](https://github.com/nf-core/eager/issues/182)\n- Merged in [nf-core/tools](https://github.com/nf-core/tools) release V1.6 template changes\n- A lot more automated tests using Travis CI\n- Don't ignore DamageProfiler errors any more\n- [#220](https://github.com/nf-core/eager/pull/220) - Added post-mapping filtering statistics module and corresponding MultiQC statistics [#217](https://github.com/nf-core/eager/issues/217)\n\n### `Fixed`\n\n- [#152](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline any more](https://github.com/nf-core/eager/issues/171)\n- [#176](https://github.com/nf-core/eager/pull/176) - Increase runtime for DamageProfiler on [large reference genomes](https://github.com/nf-core/eager/issues/173)\n- [#172](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline any more](https://github.com/nf-core/eager/issues/171)\n- [#174](https://github.com/nf-core/eager/pull/190) - Publish DeDup files [properly](https://github.com/nf-core/eager/issues/183)\n- [#196](https://github.com/nf-core/eager/pull/196) - Fix reference [issues](https://github.com/nf-core/eager/issues/150)\n- [#196](https://github.com/nf-core/eager/pull/196) - Fix issues with PE data being mapped incompletely\n- [#200](https://github.com/nf-core/eager/pull/200) - Fix minor issue with some [typos](https://github.com/nf-core/eager/pull/196)\n- [#210](https://github.com/nf-core/eager/pull/210) - Fix PMDTools [encoding issue](https://github.com/pontussk/PMDtools/issues/6) from `samtools calmd` generated files by running through `sa]mtools view` first\n- [#221](https://github.com/nf-core/eager/pull/221) - Fix BWA Index [not being reused by multiple samples](https://github.com/nf-core/eager/issues/219)\n\n### `Dependencies`\n\n- Added DeDup v0.12.5 (json support)\n- Added mtnucratio v0.5 (json support)\n- Updated Picard 2.18.27 -> 2.20.2\n- Updated GATK 4.1.0.0 -> 4.1.2.0\n- Updated damageprofiler 0.4.4 -> 0.4.5\n- Updated r-rmarkdown 1.11 -> 1.12\n- Updated fastp 0.19.7 -> 0.20.0\n- Updated qualimap 2.2.2b -> 2.2.2c\n\n## [2.0.6] - 2019-03-05\n\n### `Added`\n\n- [#152](https://github.com/nf-core/eager/pull/152) - Clarified `--complexity_filter` flag to be specifically for poly G trimming.\n- [#155](https://github.com/nf-core/eager/pull/155) - Added [Dedup log to output folders](https://github.com/nf-core/eager/issues/154)\n- [#159](https://github.com/nf-core/eager/pull/159) - Added Possibility to skip AdapterRemoval, skip merging, skip trimming fixing [#64](https://github.com/nf-core/eager/issues/64),[#137](https://github.com/nf-core/eager/issues/137) - thanks to @maxibor, @jfy133\n\n### `Fixed`\n\n- [#151](https://github.com/nf-core/eager/pull/151) - Fixed [post-deduplication step errors](https://github.com/nf-core/eager/issues/128)\n- [#147](https://github.com/nf-core/eager/pull/147) - Fix Samtools Index for [large references](https://github.com/nf-core/eager/issues/146)\n- [#145](https://github.com/nf-core/eager/pull/145) - Added Picard Memory Handling [fix](https://github.com/nf-core/eager/issues/144)\n\n### `Dependencies`\n\n- Picard Tools 2.18.23 -> 2.18.27\n- GATK 4.0.12.0 -> 4.1.0.0\n- FastP 0.19.6 -> 0.19.7\n\n## [2.0.5] - 2019-01-28\n\n### `Added`\n\n- [#127](https://github.com/nf-core/eager/pull/127) - Added a second test case for testing the pipeline properly\n- [#129](https://github.com/nf-core/eager/pull/129) - Support BAM files as [input format](https://github.com/nf-core/eager/issues/41)\n- [#131](https://github.com/nf-core/eager/pull/131) - Support different [reference genome file extensions](https://github.com/nf-core/eager/issues/130)\n\n### `Fixed`\n\n- [#128](https://github.com/nf-core/eager/issues/128) - Fixed reference genome handling errors\n\n### `Dependencies`\n\n- Picard Tools 2.18.21 -> 2.18.23\n- R-Markdown 1.10 -> 1.11\n- FastP 0.19.5 -> 0.19.6\n\n## [2.0.4] - 2019-01-09\n\n### `Added`\n\n- [#111](https://github.com/nf-core/eager/pull/110) - Allow [Zipped FastA reference input](https://github.com/nf-core/eager/issues/91)\n- [#113](https://github.com/nf-core/eager/pull/113) - All files are now staged via channels, which is considered best practice by Nextflow\n- [#114](https://github.com/nf-core/eager/pull/113) - Add proper runtime defaults for multiple processes\n- [#118](https://github.com/nf-core/eager/pull/118) - Add [centralized configs handling](https://github.com/nf-core/configs)\n- [#115](https://github.com/nf-core/eager/pull/115) - Add DamageProfiler MultiQC support\n- [#122](https://github.com/nf-core/eager/pull/122) - Add pulling from Dockerhub again\n\n### `Fixed`\n\n- [#110](https://github.com/nf-core/eager/pull/110) - Fix for [MultiQC Missing Second FastQC report](https://github.com/nf-core/eager/issues/107)\n- [#112](https://github.com/nf-core/eager/pull/112) - Remove [redundant UDG options](https://github.com/nf-core/eager/issues/89)\n\n## [2.0.3] - 2018-12-12\n\n### `Added`\n\n- [#80](https://github.com/nf-core/eager/pull/80) - BWA Index file handling\n- [#77](https://github.com/nf-core/eager/pull/77) - Lots of documentation updates by [@jfy133](https://github.com/jfy133)\n- [#81](https://github.com/nf-core/eager/pull/81) - Renaming of certain BAM options\n- [#92](https://github.com/nf-core/eager/issues/92) - Complete restructure of BAM options\n\n### `Fixed`\n\n- [#84](https://github.com/nf-core/eager/pull/85) - Fix for [Samtools index issues](https://github.com/nf-core/eager/issues/84)\n- [#96](https://github.com/nf-core/eager/issues/96) - Fix for [MarkDuplicates issues](https://github.com/nf-core/eager/issues/96) found by [@nilesh-tawari](https://github.com/nilesh-tawari)\n\n### Other\n\n- Added Slack button to repository readme\n\n## [2.0.2] - 2018-11-03\n\n### `Changed`\n\n- [#70](https://github.com/nf-core/eager/issues/70) - Uninitialized `readPaths` warning removed\n\n### `Added`\n\n- [#73](https://github.com/nf-core/eager/pull/73) - Travis CI Testing of Conda Environment added\n\n### `Fixed`\n\n- [#72](https://github.com/nf-core/eager/issues/72) - iconv Issue with R in conda environment\n\n## [2.0.1] - 2018-11-02\n\n### `Fixed`\n\n- [#69](https://github.com/nf-core/eager/issues/67) - FastQC issues with conda environments\n\n## [2.0.0] \"Kaufbeuren\" - 2018-10-17\n\nInitial release of nf-core/eager:\n\n### `Added`\n\n- FastQC read quality control\n- (Optional) Read complexity filtering with FastP\n- Read merging and clipping using AdapterRemoval v2\n- Mapping using BWA / BWA Mem or CircularMapper\n- Library Complexity Estimation with Preseq\n- Conversion and Filtering of BAM files using Samtools\n- Damage assessment via DamageProfiler, additional filtering using PMDTools\n- Duplication removal via DeDup\n- BAM Clipping with BamUtil for UDGhalf protocols\n- QualiMap BAM quality control analysis\n\nFurthermore, this already creates an interactive report using MultiQC, which will be upgraded in V2.1 \"Ulm\" to contain more aDNA specific metrics.\n"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Code of Conduct at nf-core (v1.0)\n\n## Our Pledge\n\nIn the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of:\n\n- Age\n- Body size\n- Familial status\n- Gender identity and expression\n- Geographical location\n- Level of experience\n- Nationality and national origins\n- Native language\n- Physical and neurological ability\n- Race or ethnicity\n- Religion\n- Sexual identity and orientation\n- Socioeconomic status\n\nPlease note that the list above is alphabetised and is therefore not ranked in any order of preference or importance.\n\n## Preamble\n\n> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. \"We\", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply.\n\nAn up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva.\n\nnf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals.\n\nWe have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc.\n\nOur CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities.\n\nWe ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC.\n\nQuestions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re\n\n## Our Responsibilities\n\nThe safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour.\n\nThe safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.\n\nMembers of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC.\n\n## When are where does this Code of Conduct apply?\n\nParticipation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference:\n\n- Communicating with an official project email address.\n- Communicating with community members within the nf-core Slack channel.\n- Participating in hackathons organised by nf-core (both online and in-person events).\n- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence.\n- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc.\n- Representing nf-core on social media. This includes both official and personal accounts.\n\n## nf-core cares 😊\n\nnf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order):\n\n- Ask for consent before sharing another community member’s personal information (including photographs) on social media.\n- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity.\n- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !)\n- Demonstrate empathy towards other community members. (We don’t all have the same amount of time to dedicate to nf-core. If tasks are pending, don’t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.)\n- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so let’s do this the best we can)\n- Focus on what is best for the team and the community. (When in doubt, ask)\n- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn.\n- Introduce yourself to members of the community. (We’ve all been outsiders and we know that talking to strangers can be hard for some, but remember we’re interested in getting to know you and your visions for open science!)\n- Show appreciation and **provide clear feedback**. (This is especially important because we don’t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**)\n- Take breaks when you feel like you need them.\n- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.)\n\n## nf-core frowns on 😕\n\nThe following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces.\n\n- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom.\n- “Doxing” i.e. posting (or threatening to post) another person’s personal identifying information online.\n- Spamming or trolling of individuals on social media.\n- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention.\n- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience.\n\n### Online Trolling\n\nThe majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately.\n\nAll community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls.\n\n## Procedures for Reporting CoC violations\n\nIf someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible.\n\nYou can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s).\n\nIssues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course.\n\nAll reports will be handled with utmost discretion and confidentially.\n\n## Attribution and Acknowledgements\n\n- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4)\n- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition)\n- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/)\n- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla)\n\n## Changelog\n\n### v1.0 - March 12th, 2021\n\n- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC.\n"
  },
  {
    "path": "Dockerfile",
    "content": "FROM nfcore/base:1.14\nLABEL authors=\"The nf-core/eager community\" \\\n      description=\"Docker image containing all software requirements for the nf-core/eager pipeline\"\n\n# Install the conda environment\nCOPY environment.yml /\nRUN conda env create --quiet -f /environment.yml && conda clean -a\n\n# Add conda installation dir to PATH (instead of doing 'conda activate')\nENV PATH /opt/conda/envs/nf-core-eager-2.5.3/bin:$PATH\n\n# Dump the details of the installed packages to a file for posterity\nRUN conda env export --name nf-core-eager-2.5.3 > nf-core-eager-2.5.3.yml"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) The nf-core/eager community\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# ![nf-core/eager](docs/images/nf-core_eager_logo_outline_drop.png)\n\n**A fully reproducible and state-of-the-art ancient DNA analysis pipeline**.\n\n[![GitHub Actions CI Status](https://github.com/nf-core/eager/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/eager/actions)\n[![GitHub Actions Linting Status](https://github.com/nf-core/eager/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/eager/actions)\n[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.07.1-brightgreen.svg)](https://www.nextflow.io/)\n[![nf-core](https://img.shields.io/badge/nf--core-pipeline-brightgreen.svg)](https://nf-co.re/)\n[![DOI](https://zenodo.org/badge/135918251.svg)](https://zenodo.org/badge/latestdoi/135918251)\n[![Published in PeerJ](https://img.shields.io/badge/peerj-published-%2300B2FF)](https://peerj.com/articles/10947/)\n\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/)\n[![Docker](https://img.shields.io/docker/automated/nfcore/eager.svg)](https://hub.docker.com/r/nfcore/eager)\n![Singularity Container available](https://img.shields.io/badge/singularity-available-7E4C74.svg)\n\n[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23eager-4A154B?logo=slack)](https://nfcore.slack.com/channels/eager)\n\n>[!IMPORTANT]  \n> nf-core/eager versions 2.* are only compatible with Nextflow versions up to 22.10.6!\n\n## Introduction\n\n<!-- nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->\n**nf-core/eager** is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.\n\nThe pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.\n\n<p align=\"center\">\n    <img src=\"docs/images/usage/eager2_workflow.png\" alt=\"nf-core/eager schematic workflow\" width=\"70%\"\n</p>\n\n## Quick Start\n\n1. Install [`nextflow`](https://nf-co.re/usage/installation) (`>=20.07.1` && `<=22.10.6`)\n\n2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_\n\n3. Download the pipeline and test it on a minimal dataset with a single command:\n\n    ```bash\n    nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>\n    ```\n\n    > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.\n\n4. Start running your own analysis!\n\n    ```bash\n    nextflow run nf-core/eager -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'\n    ```\n\n5. Once your run has completed successfully, clean up the intermediate files.\n\n    ```bash\n    nextflow clean -f -k\n    ```\n\nSee [usage docs](https://nf-co.re/eager/usage) for all of the available options when running the pipeline.\n\n**N.B.** You can see an overview of the run in the MultiQC report located at `./results/MultiQC/multiqc_report.html`\n\nModifications to the default pipeline are easily made using various options as described in the documentation.\n\n## Pipeline Summary\n\n### Default Steps\n\nBy default the pipeline currently performs the following:\n\n* Create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)\n* Sequencing quality control (`FastQC`)\n* Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)\n* Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, or `bowtie2`)\n* Post-mapping processing, statistics and conversion to bam (`samtools`)\n* Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler` or `mapDamage`)\n* PCR duplicate removal (`DeDup` or `MarkDuplicates`)\n* Post-mapping statistics and BAM quality control (`Qualimap`)\n* Library Complexity Estimation (`preseq`)\n* Overall pipeline statistics summaries (`MultiQC`)\n\n### Additional Steps\n\nAdditional functionality contained by the pipeline currently includes:\n\n#### Input\n\n* Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)\n\n#### Preprocessing\n\n* Illumina two-coloured sequencer poly-G tail removal (`fastp`)\n* Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)\n* Automatic conversion of unmapped reads to FASTQ (`samtools`)\n* Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)\n\n#### aDNA Damage manipulation\n\n* Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)\n* Damaged reads extraction and assessment (`PMDTools`)\n* Nuclear DNA contamination estimation of human samples (`angsd`)\n\n#### Genotyping\n\n* Creation of VCF genotyping files (`GATK UnifiedGenotyper`, `GATK HaplotypeCaller` and `FreeBayes`)\n* Creation of EIGENSTRAT genotyping files (`pileupCaller`)\n* Creation of Genotype Likelihood files (`angsd`)\n* Consensus sequence FASTA creation (`VCF2Genome`)\n* SNP Table generation (`MultiVCFAnalyzer`)\n\n#### Biological Information\n\n* Mitochondrial to Nuclear read ratio calculation (`MtNucRatioCalculator`)\n* Statistical sex determination of human individuals (`Sex.DetERRmine`)\n\n#### Metagenomic Screening\n\n* Low-sequenced complexity filtering (`BBduk`)\n* Taxonomic binner with alignment (`MALT`)\n* Taxonomic binner without alignment (`Kraken2`)\n* aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)\n\n#### Functionality Overview\n\nA graphical overview of suggested routes through the pipeline depending on context can be seen below.\n\n<p align=\"center\">\n    <img src=\"docs/images/usage/eager2_metromap_complex.png\" alt=\"nf-core/eager metro map\" width=\"70%\"\n</p>\n\n## Documentation\n\nThe nf-core/eager pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/eager/usage) and [output](https://nf-co.re/eager/output).\n\n1. [Nextflow installation](https://nf-co.re/usage/installation)\n2. Pipeline configuration\n    * [Pipeline installation](https://nf-co.re/usage/local_installation)\n    * [Adding your own system config](https://nf-co.re/usage/adding_own_config)\n    * [Reference genomes](https://nf-co.re/usage/reference_genomes)\n3. [Running the pipeline](https://nf-co.re/eager/usage)\n   * This includes tutorials, FAQs, and troubleshooting instructions\n4. [Output and how to interpret the results](https://nf-co.re/eager/output)\n\n## Credits\n\nThis pipeline was mostly written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)) and [James A. Fellows Yates](https://github.com/jfy133), with contributions from [Stephen Clayton](https://github.com/sc13-bioinf), [Thiseas C. Lamnidis](https://github.com/TCLamnidis), [Maxime Borry](https://github.com/maxibor), [Zandra Fagernäs](https://github.com/ZandraFagernas), [Aida Andrades Valtueña](https://github.com/aidaanva) and [Maxime Garcia](https://github.com/MaxUlysse) and the nf-core community.\n\nWe thank the following people for their extensive assistance in the development\nof this pipeline:\n\n## Authors (alphabetical)\n\n* [Aida Andrades Valtueña](https://github.com/aidaanva)\n* [Alexander Peltzer](https://github.com/apeltzer)\n* [James A. Fellows Yates](https://github.com/jfy133)\n* [Judith Neukamm](https://github.com/JudithNeukamm)\n* [Maxime Borry](https://github.com/maxibor)\n* [Maxime Garcia](https://github.com/MaxUlysse)\n* [Stephen Clayton](https://github.com/sc13-bioinf)\n* [Thiseas C. Lamnidis](https://github.com/TCLamnidis)\n* [Zandra Fagernäs](https://github.com/ZandraFagernas)\n\n## Additional Contributors (alphabetical)\n\nThose who have provided conceptual guidance, suggestions, bug reports etc.\n\n* [Alex Hübner](https://github.com/alexhbnr)\n* [Alexandre Gilardet](https://github.com/alexandregilardet)\n* Arielle Munters\n* [Åshild Vågene](https://github.com/ashildv)\n* [Asmaa Ali](https://github.com/asmaa-a-abdelwahab)\n* [Charles Plessy](https://github.com/charles-plessy)\n* [Elina Salmela](https://github.com/esalmela)\n* [Fabian Lehmann](https://github.com/Lehmann-Fabian)\n* [He Yu](https://github.com/paulayu)\n* [Hester van Schalkwyk](https://github.com/hesterjvs)\n* [Ido Bar](https://github.com/IdoBar)\n* [Irina Velsko](https://github.com/ivelsko)\n* [Işın Altınkaya](https://github.com/isinaltinkaya)\n* [Johan Nylander](https://github.com/nylander)\n* [Jonas Niemann](https://github.com/NiemannJ)\n* [Katerine Eaton](https://github.com/ktmeaton)\n* [Kathrin Nägele](https://github.com/KathrinNaegele)\n* [Kevin Lord](https://github.com/lordkev)\n* [Laura Lacher](https://github.com/neija2611)\n* [Luc Venturini](https://github.com/lucventurini)\n* [Mahesh Binzer-Panchal](https://github.com/mahesh-panchal)\n* [Marcel Keller](https://github.com/marcel-keller)\n* [Megan Michel](https://github.com/meganemichel)\n* [Pierre Lindenbaum](https://github.com/lindenb)\n* [Pontus Skoglund](https://github.com/pontussk)\n* [Raphael Eisenhofer](https://github.com/EisenRa)\n* [Roberta Davidson](https://github.com/roberta-davidson)\n* [Rodrigo Barquera](https://github.com/RodrigoBarquera)\n* [Selina Carlhoff](https://github.com/scarlhoff)\n* [Torsten Günter](https://bitbucket.org/tguenther)\n\nIf you've contributed and you're missing in here, please let us know and we will add you in of course!\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#eager` channel](https://nfcore.slack.com/channels/eager) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\nIf you use `nf-core/eager` for your analysis, please cite the `eager` preprint as follows:\n\n> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947).\n\nYou can cite the eager zenodo record for a specific version using the following [doi: 10.5281/zenodo.3698082](https://zenodo.org/badge/latestdoi/135918251)\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n\nIn addition, references of tools and data used in this pipeline are as follows:\n\n* **EAGER v1**, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z).  Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI)\n* **FastQC** Download: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)\n* **AdapterRemoval v2** Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. [https://doi.org/10.1186/s13104-016-1900-2](https://doi.org/10.1186/s13104-016-1900-2). Download: [https://github.com/MikkelSchubert/adapterremoval](https://github.com/MikkelSchubert/adapterremoval)\n* **bwa** Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324). Download: [http://bio-bwa.sourceforge.net/bwa.shtml](http://bio-bwa.sourceforge.net/bwa.shtml)\n* **SAMtools** Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. [https://doi.org/10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352). Download: [http://www.htslib.org/](http://www.htslib.org/)\n* **DamageProfiler** Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). [https://doi.org/10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: [https://github.com/Integrative-Transcriptomics/DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)\n* **QualiMap** Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. [https://doi.org/10.1093/bioinformatics/btv566](https://doi.org/10.1093/bioinformatics/btv566). Download: [http://qualimap.bioinfo.cipf.es/](http://qualimap.bioinfo.cipf.es/)\n* **preseq** Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. [https://doi.org/10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375). Download: [http://smithlabresearch.org/software/preseq/](http://smithlabresearch.org/software/preseq/)\n* **PMDTools** Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. [https://doi.org/10.1073/pnas.1318934111](https://doi.org/10.1073/pnas.1318934111). Download: [https://github.com/pontussk/PMDtools](https://github.com/pontussk/PMDtools)\n* **MultiQC** Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354). Download: [https://multiqc.info/](https://multiqc.info/)\n* **BamUtils** Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. [https://doi.org/10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114). Download: [https://genome.sph.umich.edu/wiki/BamUtil](https://genome.sph.umich.edu/wiki/BamUtil)\n* **FastP** Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. [https://doi.org/10.1093/bioinformatics/bty560](https://doi.org/10.1093/bioinformatics/bty560). Download: [https://github.com/OpenGene/fastp](https://github.com/OpenGene/fastp)\n* **GATK 3.5** DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., … Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. [https://doi.org/10.1038/ng.806](https://doi.org/10.1038/ng.806.).Download: [https://console.cloud.google.com/storage/browser/gatk](https://console.cloud.google.com/storage/browser/gatk)\n* **GATK 4.X** - no citation available yet. Download: [https://github.com/broadinstitute/gatk/releases](https://github.com/broadinstitute/gatk/releases)\n* **VCF2Genome** - Alexander Herbig and Alex Peltzer (unpublished). Download: [https://github.com/apeltzer/VCF2Genome](https://github.com/apeltzer/VCF2Genome)\n* **MultiVCFAnalyzer** Bos, K.I. et al., 2014. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature, 514(7523), pp.494–497. Available at: [http://dx.doi.org/10.1038/nature13591](http://dx.doi.org/10.1038/nature13591). Download: [https://github.com/alexherbig/MultiVCFAnalyzer](https://github.com/alexherbig/MultiVCFAnalyzer)\n* **MTNucRatioCalculator** Alex Peltzter (Unpublished). Download: [https://github.com/apeltzer/MTNucRatioCalculator](https://github.com/apeltzer/MTNucRatioCalculator)\n* **Sex.DetERRmine.py** Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: [http://dx.doi.org/10.1038/s41467-018-07483-5](http://dx.doi.org/10.1038/s41467-018-07483-5). Download: [https://github.com/TCLamnidis/Sex.DetERRmine.git](https://github.com/TCLamnidis/Sex.DetERRmine.git)\n* **ANGSD** Korneliussen, T.S., Albrechtsen, A. & Nielsen, R., 2014. ANGSD: Analysis of Next Generation Sequencing Data. BMC bioinformatics, 15, p.356. Available at: [http://dx.doi.org/10.1186/s12859-014-0356-4](http://dx.doi.org/10.1186/s12859-014-0356-4). Download: [https://github.com/ANGSD/angsd](https://github.com/ANGSD/angsd)\n* **bedtools** Quinlan, A.R. & Hall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics , 26(6), pp.841–842. Available at: [http://dx.doi.org/10.1093/bioinformatics/btq033](http://dx.doi.org/10.1093/bioinformatics/btq033). Download: [https://github.com/arq5x/bedtools2/releases](https://github.com/arq5x/bedtools2/)\n* **MALT**. Download: [https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html](https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html)\n  * Vågene, Å.J. et al., 2018. Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature ecology & evolution, 2(3), pp.520–528. Available at: [http://dx.doi.org/10.1038/s41559-017-0446-6](http://dx.doi.org/10.1038/s41559-017-0446-6).\n  * Herbig, A. et al., 2016. MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman. bioRxiv, p.050559. Available at: [http://biorxiv.org/content/early/2016/04/27/050559](http://biorxiv.org/content/early/2016/04/27/050559).\n* **MaltExtract** Huebler, R. et al., 2019. HOPS: Automated detection and authentication of pathogen DNA in archaeological remains. bioRxiv, p.534198. Available at: [https://www.biorxiv.org/content/10.1101/534198v1?rss=1](https://www.biorxiv.org/content/10.1101/534198v1?rss=1). Download: [https://github.com/rhuebler/MaltExtract](https://github.com/rhuebler/MaltExtract)\n* **Kraken2** Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. Available at: [https://doi.org/10.1186/s13059-019-1891-0](https://doi.org/10.1186/s13059-019-1891-0). Download: [https://ccb.jhu.edu/software/kraken2/](https://ccb.jhu.edu/software/kraken2/)\n* **endorS.py** Aida Andrades Valtueña (Unpublished). Download: [https://github.com/aidaanva/endorS.py](https://github.com/aidaanva/endorS.py)\n* **Bowtie2**  Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: [10.1038/nmeth.1923](https:/dx.doi.org/10.1038/nmeth.1923).\n* **sequenceTools** Stephan Schiffels (Unpublished). Download: [https://github.com/stschiff/sequenceTools](https://github.com/stschiff/sequenceTools)\n* **EigenstratDatabaseTools** Thiseas C. Lamnidis (Unpublished). Download: [https://github.com/TCLamnidis/EigenStratDatabaseTools.git](https://github.com/TCLamnidis/EigenStratDatabaseTools.git)\n* **mapDamage** Jónsson, H., et al 2013. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics , 29(13), 1682–1684. [https://doi.org/10.1093/bioinformatics/btt193](https://doi.org/10.1093/bioinformatics/btt193)\n* **BBduk** Brian Bushnell (Unpublished). Download: [https://sourceforge.net/projects/bbmap/](sourceforge.net/projects/bbmap/)\n\n## Data References\n\nThis repository uses test data from the following studies:\n\n* Fellows Yates, J. A. et al. (2017) ‘Central European Woolly Mammoth Population Dynamics: Insights from Late Pleistocene Mitochondrial Genomes’, Scientific reports, 7(1), p. 17714. [doi: 10.1038/s41598-017-17723-1](https://doi.org/10.1038/s41598-017-17723-1).\n* Gamba, C. et al. (2014) ‘Genome flux and stasis in a five millennium transect of European prehistory’, Nature communications, 5, p. 5257. [doi: 10.1038/ncomms6257](https://doi.org/10.1038/ncomms6257).\n* Star, B. et al. (2017) ‘Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany’, Proceedings of the National Academy of Sciences of the United States of America, 114(34), pp. 9152–9157. [doi: 10.1073/pnas.1710186114](https://doi.org/10.1073/pnas.1710186114).\n* de Barros Damgaard, P. et al. (2018). '137 ancient human genomes from across the Eurasian steppes.', Nature, 557(7705), 369–374. [doi: 10.1038/s41586-018-0094-2](https://doi.org/10.1038/s41586-018-0094-2)\n"
  },
  {
    "path": "assets/angsd_resources/README",
    "content": "**These files are originally part of angsd (release 0.931). They have been added here for convinence.**\n\nThis file describes how the 'hapmap' and mappability files used by angsd is generated\n\n##download\nwget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz\nwget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt.gz\n\n#with the md5sum\na105316eaa2ebbdb3f8d62a9cb10a2d5  allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt.gz\n5a0f920951ce2ded4afe2f10227110ac  allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz\n\n\n##create dummy bed file to use the liftover tools\ngunzip -c allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz| awk '{print $2\" \"$3-1\" \"$3\" \"$11\" \"$12\" \"$4\" \"$14}'|sed 1d >allele.txt\n\n##do the liftover\nliftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit\n\n##now remove invarible sites, and redundant columns\ncut -f1,3 --complement hit |grep -v -P \"\\t1.0\"|grep -v -P \"\\t0\\t\"|gzip -c  >HapMapchrX.gz\n\n\n##create dummy bed file to use the liftover tools\ngunzip -c allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt| awk '{print $2\" \"$3-1\" \"$3\" \"$11\" \"$12\" \"$4\" \"$14}'|sed 1d >allele.txt\n\n##do the liftover\nliftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit\n\n##now remove invarible sites, and redundant columns\ncut -f1,3 --complement hit |grep -v -P \"\\t1.0\"|grep -v -P \"\\t0\\t\"|gzip -c  >HapMapchr21.gz\n\n\n#######\n##download 100kmer mappability\nwget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign100mer.bigWig\n\n#md5sum\na1b1a8c99431fedf6a3b4baef028cca4  wgEncodeCrgMapabilityAlign100mer.bigWig\n\n##download convert program\nwget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph\n\n##convert\n./bigWigToBedGraph wgEncodeCrgMapabilityAlign100mer.bigWig chrX -chrom=chrX\n./bigWigToBedGraph wgEncodeCrgMapabilityAlign100mer.bigWig chr21 -chrom=chr21\n\n##only keep unique regions and discard the chr* column\ngrep -P \"\\t1$\" chr21 |cut -f2-3 |gzip -c >chr21.unique.gz\ngrep -P \"\\t1$\" chrX |cut -f2-3 |gzip -c >chrX.unique.gz\n"
  },
  {
    "path": "assets/angsd_resources/getALL.txt",
    "content": "F=\"ASW CEU CHB CHD GIH JPT LWK MEX MKK TSI YRI\"\nfor f in $F\ndo \n    echo $f\n    wget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chrX_${f}_r28_nr.b36_fwd.txt.gz\ndone\n\ncat allele*.gz >allele_freqs_chrX_ALL_r28_nr.b36_fwd.txt.gz\n\ngunzip -c allele_freqs_chrX_ALL_r28_nr.b36_fwd.txt.gz| awk '{print $2\" \"$3-1\" \"$3\" \"$11\" \"$12\" \"$4\" \"$14}'|grep -v pos >allele.txt\n\n\n/opt/liftover/liftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit\ncut -f1,3 --complement hit |grep -v -P \"\\t1.0\"|grep -v -P \"\\t0\\t\"|gzip -c  >HapMapALL.gz\n\n"
  },
  {
    "path": "assets/email_template.html",
    "content": "<html>\n<head>\n  <meta charset=\"utf-8\">\n  <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n\n  <meta name=\"description\" content=\"nf-core/eager: A fully reproducible and state-of-the-art ancient DNA analysis pipeline\">\n  <title>nf-core/eager Pipeline Report</title>\n</head>\n<body>\n<div style=\"font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;\">\n\n<img src=\"cid:nfcorepipelinelogo\">\n\n<h1>nf-core/eager v${version}</h1>\n<h2>Run Name: $runName</h2>\n\n<% if (!success){\n    out << \"\"\"\n    <div style=\"color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;\">\n        <h4 style=\"margin-top:0; color: inherit;\">nf-core/eager execution completed unsuccessfully!</h4>\n        <p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>\n        <p>The full error message was:</p>\n        <pre style=\"white-space: pre-wrap; overflow: visible; margin-bottom: 0;\">${errorReport}</pre>\n    </div>\n    \"\"\"\n} else {\n    out << \"\"\"\n    <div style=\"color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;\">\n        nf-core/eager execution completed successfully!\n    </div>\n    \"\"\"\n}\n%>\n\n<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>\n<p>The command used to launch the workflow was as follows:</p>\n<pre style=\"white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;\">$commandLine</pre>\n\n<h3>Pipeline Configuration:</h3>\n<table style=\"width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;\">\n    <tbody style=\"border-bottom: 1px solid #ddd;\">\n        <% out << summary.collect{ k,v -> \"<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>\" }.join(\"\\n\") %>\n    </tbody>\n</table>\n\n<p>nf-core/eager</p>\n<p><a href=\"https://github.com/nf-core/eager\">https://github.com/nf-core/eager</a></p>\n\n</div>\n\n</body>\n</html>\n"
  },
  {
    "path": "assets/email_template.txt",
    "content": "----------------------------------------------------\n                                        ,--./,-.\n        ___     __   __   __   ___     /,-._.--~\\\\\n  |\\\\ | |__  __ /  ` /  \\\\ |__) |__         }  {\n  | \\\\| |       \\\\__, \\\\__/ |  \\\\ |___     \\\\`-._,-`-,\n                                        `._,._,'\n  nf-core/eager v${version}\n----------------------------------------------------\n\nRun Name: $runName\n\n<% if (success){\n    out << \"## nf-core/eager execution completed successfully! ##\"\n} else {\n    out << \"\"\"####################################################\n## nf-core/eager execution completed unsuccessfully! ##\n####################################################\nThe exit status of the task that caused the workflow execution to fail was: $exitStatus.\nThe full error message was:\n\n${errorReport}\n\"\"\"\n} %>\n\n\nThe workflow was completed at $dateComplete (duration: $duration)\n\nThe command used to launch the workflow was as follows:\n\n  $commandLine\n\n\n\nPipeline Configuration:\n-----------------------\n<% out << summary.collect{ k,v -> \" - $k: $v\" }.join(\"\\n\") %>\n\n--\nnf-core/eager\nhttps://github.com/nf-core/eager\n"
  },
  {
    "path": "assets/multiqc_config.yaml",
    "content": "custom_logo: \"nf-core_eager_logo_outline_drop.png\"\ncustom_logo_url: https://github.com/nf-core/eager/\ncustom_logo_title: \"nf-core/eager\"\n\nreport_comment: >\n  This report has been generated by the <a href=\"https://github.com/nf-core/eager\" target=\"_blank\">nf-core/eager</a>\n  analysis pipeline. For information about how to interpret these results, please see the\n  <a href=\"https://github.com/nf-core/eager\" target=\"_blank\">documentation</a>.\nrun_modules:\n  - adapterRemoval\n  - bowtie2\n  - custom_content\n  - damageprofiler\n  - dedup\n  - fastp\n  - fastqc\n  - gatk\n  - kraken\n  - malt\n  - mapdamage\n  - mtnucratio\n  - multivcfanalyzer\n  - picard\n  - preseq\n  - qualimap\n  - samtools\n  - sexdeterrmine\n  - hops\n  - bcftools\n\nextra_fn_clean_exts:\n  - \"_fastp\"\n  - \".pe.settings\"\n  - \".se.settings\"\n  - \".settings\"\n  - \".pe.combined\"\n  - \".se.truncated\"\n  - \".mapped\"\n  - \".mapped_rmdup\"\n  - \".mapped_rmdup_stats\"\n  - \"_libmerged_rg_rmdup\"\n  - \"_libmerged_rg_rmdup_stats\"\n  - \"_postfilterflagstat.stats\"\n  - \"_flagstat.stat\"\n  - \".filtered\"\n  - \".filtered_rmdup\"\n  - \".filtered_rmdup_stats\"\n  - \"_libmerged_rg_add\"\n  - \"_libmerged_rg_add_stats\"\n  - \"_rmdup\"\n  - \".unmapped\"\n  - \".fastq.gz\"\n  - \".fastq\"\n  - \".fq.gz\"\n  - \".fq\"\n  - \".bam\"\n  - \".kreport\"\n  - \".unifiedgenotyper\"\n  - \".trimmed_stats\"\n  - \"_libmerged\"\n  - \"_bt2\"\n  - type: \"regex\"\n    pattern: \"_udg(half|none|full)\"\n\ntop_modules:\n  - \"fastqc\":\n      name: \"FastQC (pre-Trimming)\"\n      path_filters:\n        - \"*_raw_fastqc.zip\"\n  - \"fastp\"\n  - \"adapterRemoval\"\n  - \"fastqc\":\n      name: \"FastQC (post-Trimming)\"\n      path_filters:\n        - \"*.truncated_fastqc.zip\"\n        - \"*.combined*_fastqc.zip\"\n        - \"*_postartrimmed_fastqc.zip\"\n  - \"bowtie2\":\n      path_filters:\n        - \"*_bt2.log\"\n  - \"malt\"\n  - \"hops\"\n  - \"kraken\"\n  - \"samtools\":\n      name: \"Samtools Flagstat (pre-samtools filter)\"\n      path_filters:\n        - \"*_flagstat.stats\"\n  - \"samtools\":\n      name: \"Samtools Flagstat (post-samtools filter)\"\n      path_filters:\n        - \"*_postfilterflagstat.stats\"\n  - \"dedup\"\n  - \"picard\"\n  - \"preseq\":\n      path_filters:\n        - \"*.preseq\"\n  - \"damageprofiler\"\n  - \"mapdamage\"\n  - \"mtnucratio\"\n  - \"qualimap\"\n  - \"sexdeterrmine\"\n  - \"bcftools\"\n  - \"multivcfanalyzer\":\n      path_filters:\n        - \"*MultiVCFAnalyzer.json\"\nqualimap_config:\n  general_stats_coverage:\n    - 1\n    - 2\n    - 3\n    - 4\n    - 5\n\nremove_sections:\n  - sexdeterrmine-snps\n\ntable_columns_visible:\n  FastQC (pre-Trimming):\n    percent_duplicates: False\n    percent_gc: True\n    avg_sequence_length: True\n  fastp:\n    pct_duplication: False\n    after_filtering_gc_content: True\n    pct_surviving: False\n  Adapter Removal:\n    aligned_total: False\n    percent_aligned: True\n  FastQC (post-Trimming):\n    avg_sequence_length: True\n    percent_duplicates: False\n    total_sequences: True\n    percent_gc: True\n  bowtie2:\n    overall_alignment_rate: True\n  MALT:\n    Taxonomic assignment success: False\n    Assig. Taxonomy: False\n    Mappability: True\n    Total reads: False\n    Num. of queries: False\n  Kraken:\n    \"% Unclassified\": True\n    \"% Top 5\": False\n  Samtools Flagstat (pre-samtools filter):\n    flagstat_total: True\n    mapped_passed: True\n  Samtools Flagstat (post-samtools filter):\n    mapped_passed: True\n  DeDup:\n    dup_rate: False\n    clusterfactor: True\n    mapped_after_dedup: True\n  Picard:\n    PERCENT_DUPLICATION: True\n  DamageProfiler:\n    5 Prime1: True\n    5 Prime2: True\n    3 Prime1: False\n    3 Prime2: False\n    mean_readlength: True\n    median: True\n  mapDamage:\n    5 Prime1: True\n    5 Prime2: True\n    3 Prime1: False\n    3 Prime2: False\n  mtnucratio:\n    mt_nuc_ratio: True\n  QualiMap:\n    mapped_reads: True\n    mean_coverage: True\n    1_x_pc: True\n    5_x_pc: True\n    percentage_aligned: False\n    median_insert_size: False\n  MultiVCFAnalyzer:\n    Heterozygous SNP alleles (percent): True\n  endorSpy:\n    endogenous_dna: True\n    endogenous_dna_post: True\n  nuclear_contamination:\n    Num_SNPs: True\n    Method1_MOM_estimate: False\n    Method1_MOM_SE: False\n    Method1_ML_estimate: True\n    Method1_ML_SE: True\n    Method2_MOM_estimate: False\n    Method2_MOM_SE: False\n    Method2_ML_estimate: False\n    Method2_ML_SE: False\n  snp_coverage:\n    Covered_Snps: True\n    Total_Snps: False\n\ntable_columns_placement:\n  FastQC (pre-Trimming):\n    total_sequences: 100\n    avg_sequence_length: 110\n    percent_gc: 120\n  fastp:\n    after_filtering_gc_content: 200\n  Adapter Removal:\n    percent_aligned: 300\n  FastQC (post-Trimming):\n    total_sequences: 400\n    avg_sequence_length: 410\n    percent_gc: 420\n  Bowtie 2 / HiSAT2:\n    overall_alignment_rate: 450\n  MALT:\n    Num. of queries: 430\n    Total reads: 440\n    Mappability: 450\n    Assig. Taxonomy: 460\n    Taxonomic assignment success: 470\n  Kraken:\n    \"% Unclassified\": 480\n  Samtools Flagstat (pre-samtools filter):\n    flagstat_total: 551\n    mapped_passed: 552\n  Samtools Flagstat (post-samtools filter):\n    flagstat_total: 600\n    mapped_passed: 620\n  endorSpy:\n    endogenous_dna: 610\n    endogenous_dna_post: 640\n  nuclear_contamination:\n    Num_SNPs: 1100\n    Method1_MOM_estimate: 1110\n    Method1_MOM_SE: 1120\n    Method1_ML_estimate: 1130\n    Method1_ML_SE: 1140\n    Method2_MOM_estimate: 1150\n    Method2_MOM_SE: 1160\n    Method2_ML_estimate: 1170\n    Method2_ML_SE: 1180\n  snp_coverage:\n    Covered_Snps: 1050\n    Total_Snps: 1060\n  DeDup:\n    mapped_after_dedup: 620\n    clusterfactor: 630\n  Picard:\n    PERCENT_DUPLICATION: 650\n  DamageProfiler:\n    5 Prime1: 700\n    5 Prime2: 710\n    3 Prime1: 720\n    3 Prime2: 730\n    mean_readlength: 740\n    median: 750\n  mapDamage:\n    5 Prime1: 760\n    5 Prime2: 765\n    3 Prime1: 770\n    3 Prime2: 775\n  mtnucratio:\n    mtreads: 780\n    mt_cov_avg: 785\n    mt_nuc_ratio: 790\n  QualiMap:\n    mapped_reads: 800\n    mean_coverage: 805\n    median_coverage: 810\n    1_x_pc: 820\n    2_x_pc: 830\n    3_x_pc: 840\n    4_x_pc: 850\n    5_x_pc: 860\n    avg_gc: 870\n  sexdeterrmine:\n    RateX: 1000\n    RateY: 1010\n  MultiVCFAnalyzer:\n    Heterozygous SNP alleles (percent): 1200\nread_count_multiplier: 1\nread_count_prefix: \"\"\nread_count_desc: \"\"\nancient_read_count_prefix: \"\"\nancient_read_count_desc: \"\"\nancient_read_count_multiplier: 1\ndecimalPoint_format: \".\"\nthousandsSep_format: \",\"\nreport_section_order:\n  software_versions:\n    order: -1000\n  nf-core-eager-summary:\n    order: -1001\nexport_plots: true\ntable_columns_name:\n  FastQC (pre-Trimming):\n    total_sequences: \"Nr. Input Reads\"\n    avg_sequence_length: \"Length Input Reads\"\n    percent_gc: \"% GC Input Reads\"\n    percent_duplicates: \"% Dups Input Reads\"\n    percent_fails: \"% Failed Input Reads\"\n  FastQC (post-Trimming):\n    total_sequences: \"Nr. Processed Reads\"\n    avg_sequence_length: \"Length Processed Reads\"\n    percent_gc: \"% GC Processed Reads\"\n    percent_duplicates: \"% Dups Processed Reads\"\n    percent_fails: \"%Failed Processed Reads\"\n  Samtools Flagstat (pre-samtools filter):\n    flagstat_total: \"Nr. Reads Into Mapping\"\n    mapped_passed: \"Nr. Mapped Reads\"\n  Samtools Flagstat (post-samtools filter):\n    flagstat_total: \"Nr. Mapped Reads Post-Filter\"\n    mapped_passed: \"Nr. Mapped Reads Passed Post-Filter\"\n  Endogenous DNA Post (%):\n    endogenous_dna_post (%): \"Endogenous DNA Post-Filter (%)\"\n  Picard:\n    PERCENT_DUPLICATION: \"% Dup. Mapped Reads\"\n  DamageProfiler:\n    mean_readlength: \"Mean Length Mapped Reads\"\n    median_readlength: \"Median Length Mapped Reads\"\n  QualiMap:\n    mapped_reads: \"Nr. Dedup. Mapped Reads\"\n    total_reads: \"Nr. Dedup. Total Reads\"\n    avg_gc: \"% GC Dedup. Mapped Reads\"\n  Bcftools Stats:\n    number_of_records: \"Nr. Overall Variants\"\n    number_of_SNPs: \"Nr. SNPs\"\n    number_of_indels: \"Nr. InDels\"\n  MALT:\n    Mappability: \"% Metagenomic Mappability\"\n  SexDetErrmine:\n    RateErrX: \"SexDet Err X Chr\"\n    RateErrY: \"SexDet Err Y Chr\"\n    RateX: \"SexDet Rate X Chr\"\n    RateY: \"SexDet Rate Y Chr\"\n  custom_table_header_config:\n    general_stats_table:\n      median_coverage:\n        format: \"{:,.3f}\"\n      mean_coverage:\n        format: \"{:,.3f}\"\n"
  },
  {
    "path": "assets/nf-core_eager_dummy.txt",
    "content": "This is a dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one."
  },
  {
    "path": "assets/nf-core_eager_dummy2.txt",
    "content": "This is a second dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one."
  },
  {
    "path": "assets/sendmail_template.txt",
    "content": "To: $email\nSubject: $subject\nMime-Version: 1.0\nContent-Type: multipart/related;boundary=\"nfcoremimeboundary\"\n\n--nfcoremimeboundary\nContent-Type: text/html; charset=utf-8\n\n$email_html\n\n--nfcoremimeboundary\nContent-Type: image/png;name=\"nf-core-eager_logo.png\"\nContent-Transfer-Encoding: base64\nContent-ID: <nfcorepipelinelogo>\nContent-Disposition: inline; filename=\"nf-core-eager_logo.png\"\n\n<% out << new File(\"$projectDir/assets/nf-core-eager_logo.png\").\n  bytes.\n  encodeBase64().\n  toString().\n  tokenize( '\\n' )*.\n  toList()*.\n  collate( 76 )*.\n  collect { it.join() }.\n  flatten().\n  join( '\\n' ) %>\n\n<%\nif (mqcFile){\ndef mqcFileObj = new File(\"$mqcFile\")\nif (mqcFileObj.length() < mqcMaxSize){\nout << \"\"\"\n--nfcoremimeboundary\nContent-Type: text/html; name=\\\"multiqc_report\\\"\nContent-Transfer-Encoding: base64\nContent-ID: <mqcreport>\nContent-Disposition: attachment; filename=\\\"${mqcFileObj.getName()}\\\"\n\n${mqcFileObj.\n  bytes.\n  encodeBase64().\n  toString().\n  tokenize( '\\n' )*.\n  toList()*.\n  collate( 76 )*.\n  collect { it.join() }.\n  flatten().\n  join( '\\n' )}\n\"\"\"\n}}\n%>\n\n--nfcoremimeboundary--\n"
  },
  {
    "path": "assets/where_are_my_files.txt",
    "content": "=====================\n Where are my files?\n=====================\n\nBy default, the nfcore/eager pipeline does not save large intermediate files to the\nresults directory. This is to try to conserve disk space.\n\nThese files can be found in the pipeline `work` directory if needed.\nAlternatively, re-run the pipeline using `-resume` in addition to one of\nthe below command-line options and they will be copied into the results directory:\n\n`--saveReference`\nSave any downloaded or generated reference genome files to your results folder.\nThese can then be used for future pipeline runs, reducing processing times.\n\n-----------------------------------\n Setting defaults in a config file\n-----------------------------------\nIf you would always like these files to be saved without having to specify this on\nthe command line, you can save the following to your personal configuration file\n(eg. `~/.nextflow/config`):\n\nparams.saveReference = true\n\nFor more help, see the following documentation:\n\nhttps://github.com/nf-core/eager/blob/master/docs/usage.md\nhttps://www.nextflow.io/docs/latest/getstarted.html\nhttps://www.nextflow.io/docs/latest/config.html\n"
  },
  {
    "path": "bin/endorS.py",
    "content": "#!/usr/bin/env python3\n\n# Written by Aida Andrades Valtueña and released under MIT license. \n# See git repository (https://github.com/aidaanva/endorS.py) for full license text.\n\n\"\"\"Script to calculate the endogenous DNA in a sample from samtools flag stats.\nIt can accept up to two files: pre-quality and post-quality filtering. We recommend\nto use both files but you can also use the pre-quality filtering.\n\"\"\"\nimport re\nimport sys\nimport json\nimport argparse\nimport textwrap\n\nparser = argparse.ArgumentParser(prog='endorS.py',\n   usage='python %(prog)s [-h] [--version] <samplesfile>.stats [<samplesfile>.stats]',\n   formatter_class=argparse.RawDescriptionHelpFormatter,\n   description=textwrap.dedent('''\\\n   author:\n     Aida Andrades Valtueña (aida.andrades[at]gmail.com)\n\n   description:\n     %(prog)s calculates endogenous DNA from samtools flagstat files and print to screen\n     Use --output flag to write results to a file\n   '''))\nparser.add_argument('samtoolsfiles', metavar='<samplefile>.stats', type=str, nargs='+',\n                    help='output of samtools flagstat in a txt file (at least one required). If two files are supplied, the mapped reads of the second file is divided by the total reads in the first, since it assumes that the <samplefile.stats> are related to the same sample. Useful after BAM filtering')\nparser.add_argument('-v','--version', action='version', version='%(prog)s 0.4')\nparser.add_argument('--output', '-o', nargs='?', help='specify a file format for an output file. Options: <json> for a MultiQC json output. Default: none')\nparser.add_argument('--name', '-n', nargs='?', help='specify name for the output file. Default: extracted from the first samtools flagstat file provided')\nargs = parser.parse_args()\n\n#Open the samtools flag stats pre-quality filtering:\ntry:\n    with open(args.samtoolsfiles[0], 'r') as pre:\n        contentsPre = pre.read()\n    #Extract number of total reads\n    totalReads = float((re.findall(r'^([0-9]+) \\+ [0-9]+ in total',contentsPre))[0])\n    #Extract number of mapped reads pre-quality filtering:\n    mappedPre = float((re.findall(r'([0-9]+) \\+ [0-9]+ mapped ',contentsPre))[0])\n    #Calculation of endogenous DNA pre-quality filtering:\n    if totalReads == 0.0:\n        endogenousPre = 0.000000\n        print(\"WARNING: no reads in the fastq input, Endogenous DNA raw (%) set to 0.000000\")\n    elif mappedPre == 0.0:\n        endogenousPre = 0.000000\n        print(\"WARNING: no mapped reads, Endogenous DNA raw (%) set to 0.000000\")\n    else:\n        endogenousPre = float(\"{0:.6f}\".format(round((mappedPre / totalReads * 100), 6)))\nexcept:\n    print(\"Incorrect input, please provide at least a samtools flag stats as input\\nRun:\\npython endorS.py --help \\nfor more information on how to run this script\")\n    sys.exit()\n#Check if the samtools stats post-quality filtering have been provided:\ntry:\n    #Open the samtools flag stats post-quality filtering:\n    with open(args.samtoolsfiles[1], 'r') as post:\n        contentsPost = post.read()\n    #Extract number of mapped reads post-quality filtering:\n    mappedPost = float((re.findall(r'([0-9]+) \\+ [0-9]+ mapped',contentsPost))[0])\n    #Calculation of endogenous DNA post-quality filtering:\n    if totalReads == 0.0:\n        endogenousPost = 0.000000\n        print(\"WARNING: no reads in the fastq input, Endogenous DNA modified (%) set to 0.000000\")\n    elif mappedPost == 0.0:\n        endogenousPost = 0.000000\n        print(\"WARNING: no mapped reads, Endogenous DNA modified (%) set to 0.000000\")\n    else:\n        endogenousPost = float(\"{0:.6f}\".format(round((mappedPost / totalReads * 100),6)))\nexcept:\n    print(\"Only one samtools flagstat file provided\")\n    #Set the number of reads post-quality filtering to 0 if samtools\n    #samtools flag stats not provided:\n    mappedPost = \"NA\"\n\n#Setting the name depending on the -name flag:\nif args.name is not None:\n    name = args.name\nelse:\n    #Set up the name based on the first samtools flagstats:\n    name= str(((args.samtoolsfiles[0].rsplit(\".\",1)[0]).rsplit(\"/\"))[-1])\n#print(name)\n\n\nif mappedPost == \"NA\":\n    #Creating the json file\n    jsonOutput={\n    \"id\": \"endorSpy\",\n    \"plot_type\": \"generalstats\",\n    \"pconfig\": {\n        \"endogenous_dna\": { \"max\": 100, \"min\": 0, \"title\": \"Endogenous DNA (%)\", \"format\": '{:,.2f}'}\n    },\n    \"data\": {\n        name : { \"endogenous_dna\": endogenousPre}\n    }\n    }\nelse:\n    #Creating the json file\n    jsonOutput={\n    \"id\": \"endorSpy\",\n    \"plot_type\": \"generalstats\",\n    \"pconfig\": {\n        \"endogenous_dna\": { \"max\": 100, \"min\": 0, \"title\": \"Endogenous DNA (%)\", \"format\": '{:,.2f}'},\n        \"endogenous_dna_post\": { \"max\": 100, \"min\": 0, \"title\": \"Endogenous DNA Post (%)\", \"format\": '{:,.2f}'}\n    },\n    \"data\": {\n        name : { \"endogenous_dna\": endogenousPre, \"endogenous_dna_post\": endogenousPost}\n    },\n    }\n#Checking for print to screen argument:\nif args.output is not None:\n   #Creating file with the named after the name variable:\n   #Writing the json output:\n   fileName = name + \"_endogenous_dna_mqc.json\"\n   #print(fileName)\n   with open(fileName, \"w+\") as outfile:\n      json.dump(jsonOutput, outfile)\n      print(fileName,\"has been generated\")\nelse:\n   if mappedPost == \"NA\":\n      print(\"Endogenous DNA (%):\",endogenousPre)\n   else:\n      print(\"Endogenous DNA raw (%):\",endogenousPre)\n      print(\"Endogenous DNA modified (%):\",endogenousPost)\n"
  },
  {
    "path": "bin/extract_map_reads.py",
    "content": "#!/usr/bin/env python3\n\n# Written by Maxime Borry and released under the MIT license.\n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport argparse\nimport pysam\nfrom xopen import xopen\nimport logging\nimport os\nfrom pathlib import Path\n\n\ndef _get_args():\n    \"\"\"This function parses and return arguments passed in\"\"\"\n    parser = argparse.ArgumentParser(\n        prog=\"extract_mapped_reads\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        description=\"Remove mapped in bam file from fastq files\",\n    )\n    parser.add_argument(\"bam_file\", help=\"path to bam file\")\n    parser.add_argument(\"fwd\", help=\"path to forward fastq file\")\n    parser.add_argument(\n        \"-merged\",\n        dest=\"merged\",\n        default=False,\n        action=\"store_true\",\n        help=\"specify if bam file was created from merged fastq files\",\n    )\n    parser.add_argument(\n        \"-rev\", dest=\"rev\", default=None, help=\"path to reverse fastq file\"\n    )\n    parser.add_argument(\n        \"-of\", dest=\"out_fwd\", default=None, help=\"path to forward output fastq file\"\n    )\n    parser.add_argument(\n        \"-or\", dest=\"out_rev\", default=None, help=\"path to forward output fastq file\"\n    )\n    parser.add_argument(\n        \"-m\",\n        dest=\"mode\",\n        default=\"remove\",\n        help=\"Read removal mode: remove reads (remove) or replace sequence by N (replace). Default = remove\",\n    )\n    parser.add_argument(\n        \"-t\", dest=\"threads\", default=4, help=\"Number of parallel threads\"\n    )\n\n    args = parser.parse_args()\n\n    bam = args.bam_file\n    in_fwd = args.fwd\n    merged = args.merged\n    in_rev = args.rev\n    out_fwd = args.out_fwd\n    out_rev = args.out_rev\n    mode = args.mode\n    threads = int(args.threads)\n\n    return (bam, in_fwd, merged, in_rev, out_fwd, out_rev, mode, threads)\n\n\ndef extract_mapped(bamfile, merged):\n    \"\"\"Get mapped reads in parallel\n    Args:\n        threads(int): number of threads to use\n        bam(str): path to bamfile\n    Returns:\n        bamfile(str): path to bam alignment file\n        result(set): list of mapped reads name (str)\n    \"\"\"\n    if bamfile.endswith(\".bam\") or bamfile.endswith(\".gz\"):\n        read_mode = \"rb\"\n    else:\n        read_mode = \"r\"\n    mapped_reads = set()\n    bamfile = pysam.AlignmentFile(bamfile, mode=read_mode)\n    for read in bamfile.fetch():\n        if read.flag != 4:\n            if merged:\n                if read.query_name.startswith(\"M_\"):\n                    mapped_reads.add(read.query_name[2:])\n                elif read.query_name.startswith(\"MT_\"):\n                    mapped_reads.add(read.query_name[3:])\n                else:\n                    mapped_reads.add(read.query_name)\n            else:\n                mapped_reads.add(read.query_name)\n    return mapped_reads\n\n\ndef read_write_fq(fq_in, fq_out, mapped_reads, mode, write_mode, proc):\n    \"\"\"\n    Read and write fastq file with mapped reads removed\n    Args:\n        fq_in(str): path to input fastq file\n        fq_out(str): path to output fastq file\n        mapped_reads(set): set of mapped reads name (str)\n        mode(str): read removal mode (remove or replace)\n        write_mode(str): write mode (w or wb)\n        proc(int): number of parallel processes\n        merged(bool): True if bam file was created from merged fastq files\n    \"\"\"\n    if write_mode == \"w\":\n        cm = open(fq_out, write_mode)\n    elif write_mode == \"wb\":\n        cm = xopen(fq_out, mode=write_mode, threads=proc)\n    with pysam.FastxFile(fq_in) as fh:\n        with cm as fh_out:\n            for read in fh:\n                try:\n                    if read.name in mapped_reads:\n                        if mode == \"replace\":\n                            read.sequence = \"N\" * len(read.sequence)\n                            read = str(read) + \"\\n\"\n                            if write_mode == \"w\":\n                                fh_out.write(read)\n                            elif write_mode == \"wb\":\n                                fh_out.write(read.encode())\n                    else:\n                        read = str(read) + \"\\n\"\n                        if write_mode == \"w\":\n                            fh_out.write(read)\n                        elif write_mode == \"wb\":\n                            fh_out.write(read.encode())\n                except Exception as e:\n                    logging.error(f\"Problem with {str(read)}\")\n                    logging.error(e)\n\ndef check_remove_mode(mode):\n    if mode.lower() not in [\"replace\", \"remove\"]:\n        logging.info(f\"Mode must be {' or '.join(mode)}\")\n    return mode.lower()\n\n\nif __name__ == \"__main__\":\n    BAM, IN_FWD, MERGED, IN_REV, OUT_FWD, OUT_REV, MODE, PROC = _get_args()\n\n    logging.basicConfig(level=logging.INFO, format=\"%(message)s\")\n\n    if OUT_FWD == None:\n        out_fwd = os.path.join(os.getcwd(), Path(IN_FWD).stem + \".r1.fq.gz\")\n    else:\n        out_fwd = OUT_FWD\n\n    if out_fwd.endswith(\".gz\"):\n        write_mode = \"wb\"\n    else:\n        write_mode = \"w\"\n\n    remove_mode = check_remove_mode(MODE)\n\n    # FORWARD OR SE FILE\n    logging.info(f\"- Extracting mapped reads from {BAM}\")\n    mapped_reads = extract_mapped(BAM, merged=MERGED)\n    logging.info(f\"- Checking forward fq file {IN_FWD}\")\n    read_write_fq(\n        fq_in=IN_FWD,\n        fq_out=out_fwd,\n        mapped_reads=mapped_reads,\n        mode=remove_mode,\n        write_mode=write_mode,\n        proc=PROC,\n    )\n    logging.info(f\"- Cleaned forward FastQ file written to {out_fwd}\")\n\n    # REVERSE FILE\n    if IN_REV:\n        if OUT_REV == None:\n            out_rev = os.path.join(os.getcwd(), Path(IN_REV).stem + \".r2.fq.gz\")\n        else:\n            out_rev = OUT_REV\n        logging.info(f\"- Checking reverse fq file {IN_FWD}\")\n        read_write_fq(\n            fq_in=IN_REV,\n            fq_out=out_rev,\n            mapped_reads=mapped_reads,\n            mode=remove_mode,\n            write_mode=write_mode,\n            proc=PROC,\n        )\n        logging.info(f\"- Cleaned reverse FastQ file written to {out_rev}\")\n"
  },
  {
    "path": "bin/filter_bam_fragment_length.py",
    "content": "#!/usr/bin/env python3\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport argparse\nimport pysam\n\n\ndef get_args():\n    \"\"\"This function parses and return arguments passed in\"\"\"\n    parser = argparse.ArgumentParser(\n        prog=\"bam_filter\", description=\"Filter bam on fragment length\"\n    )\n    parser.add_argument(\"bam\", help=\"Bam aligment file\")\n    parser.add_argument(\n        \"-l\",\n        dest=\"fraglen\",\n        default=35,\n        type=int,\n        help=\"Minimum fragment length. Default = 35\",\n    )\n    parser.add_argument(\n        \"-a\",\n        dest=\"all\",\n        default=False,\n        action=\"store_true\",\n        help=\"Include all reads, even unmapped\",\n    )\n    parser.add_argument(\n        \"-o\",\n        dest=\"output\",\n        default=None,\n        help=\"Output bam basename. Default = {bam_basename}.filtered.bam\",\n    )\n\n    args = parser.parse_args()\n\n    bam = args.bam\n    fraglen = args.fraglen\n    allreads = args.all\n    outfile = args.output\n\n    return (bam, fraglen, allreads, outfile)\n\n\ndef getBasename(file_name):\n    if (\"/\") in file_name:\n        basename = file_name.split(\"/\")[-1].split(\".\")[0]\n    else:\n        basename = file_name.split(\".\")[0]\n    return basename\n\n\ndef filter_bam(infile, outfile, fraglen, allreads):\n    \"\"\"Write bam to file\n\n    Args:\n        infile (stream): pysam stream\n        outfile (str): Path to output bam\n        fraglen(int): Minimum fragment length to keep\n        allreads(bool): Apply on all reads, not only mapped\n    \"\"\"\n    bamfile = pysam.AlignmentFile(infile, \"rb\")\n    bamwrite = pysam.AlignmentFile(outfile + \".filtered.bam\", \"wb\", template=bamfile)\n\n    for read in bamfile.fetch(until_eof=True):\n        if allreads:\n            if read.query_length >= fraglen:\n                bamwrite.write(read)\n        else:\n            if read.is_unmapped == False and read.query_length >= fraglen:\n                bamwrite.write(read)\n\n\nif __name__ == \"__main__\":\n    BAM, FRAGLEN, ALLREADS, OUTFILE = get_args()\n\n    BAMFILE = pysam.AlignmentFile(BAM, \"rb\")\n\n    if OUTFILE is None:\n        OUTFILE = getBasename(BAM)\n\n    filter_bam(BAM, OUTFILE, FRAGLEN, ALLREADS)\n\n"
  },
  {
    "path": "bin/kraken_parse.py",
    "content": "#!/usr/bin/env python\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport argparse\nimport csv\n\ndef _get_args():\n    '''This function parses and return arguments passed in'''\n    parser = argparse.ArgumentParser(\n        prog='kraken_parse',\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        description='Parsing kraken')\n    parser.add_argument('krakenReport', help=\"path to kraken report file\")\n    parser.add_argument(\n        '-c',\n        dest=\"count\",\n        default=50,\n        help=\"Minimum number of hits on clade to report it. Default = 50\")\n    parser.add_argument(\n        '-or',\n        dest=\"readout\",\n        default=None,\n        help=\"Read count output file. Default = <basename>.read_kraken_parsed.csv\")\n    parser.add_argument(\n        '-ok',\n        dest=\"kmerout\",\n        default=None,\n        help=\"Kmer Output file. Default = <basename>.kmer_kraken_parsed.csv\")\n\n    args = parser.parse_args()\n\n    infile = args.krakenReport\n    countlim = int(args.count)\n    readout = args.readout\n    kmerout = args.kmerout\n\n    return(infile, countlim, readout, kmerout)\n\n\ndef _get_basename(file_name):\n    if (\"/\") in file_name:\n        basename = file_name.split(\"/\")[-1].split(\".\")[0]\n    else:\n        basename = file_name.split(\".\")[0]\n    return(basename)\n\n\ndef parse_kraken(infile, countlim):\n    '''\n    INPUT:\n        infile (str): path to kraken report file\n        countlim (int): lowest count threshold to report hit\n    OUTPUT:\n        resdict (dict): key=taxid, value=readCount\n\n    '''\n    with open(infile, 'r') as f:\n        read_dict = {}\n        kmer_dict = {}\n        csvreader = csv.reader(f, delimiter='\\t')\n        for line in csvreader:\n            reads = int(line[1])\n            if reads >= countlim:\n                taxid = line[6]\n                kmer = line[3]\n                unique_kmer = line[4]\n                try:\n                    kmer_duplicity = float(kmer)/float(unique_kmer)\n                except ZeroDivisionError:\n                    kmer_duplicity = 0\n                read_dict[taxid] = reads\n                kmer_dict[taxid] = kmer_duplicity\n\n        return(read_dict, kmer_dict)\n\n\ndef write_output(resdict, infile, outfile):\n    with open(outfile, 'w') as f:\n        basename = _get_basename(infile)\n        f.write(f\"TAXID,{basename}\\n\")\n        for akey in resdict.keys():\n            f.write(f\"{akey},{resdict[akey]}\\n\")\n\n\nif __name__ == '__main__':\n    INFILE, COUNTLIM, readout, kmerout = _get_args()\n\n    if not readout:\n        read_outfile = _get_basename(INFILE)+\".read_kraken_parsed.csv\"\n    else:\n        read_outfile = readout\n    if not kmerout:    \n        kmer_outfile = _get_basename(INFILE)+\".kmer_kraken_parsed.csv\"\n    else:\n        kmer_outfile = kmerout\n\n    read_dict, kmer_dict = parse_kraken(infile=INFILE, countlim=COUNTLIM)\n    write_output(resdict=read_dict, infile=INFILE, outfile=read_outfile)\n    write_output(resdict=kmer_dict, infile=INFILE, outfile=kmer_outfile)\n"
  },
  {
    "path": "bin/markdown_to_html.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import print_function\nimport argparse\nimport markdown\nimport os\nimport sys\nimport io\n\n\ndef convert_markdown(in_fn):\n    input_md = io.open(in_fn, mode=\"r\", encoding=\"utf-8\").read()\n    html = markdown.markdown(\n        \"[TOC]\\n\" + input_md,\n        extensions=[\"pymdownx.extra\", \"pymdownx.b64\", \"pymdownx.highlight\", \"pymdownx.emoji\", \"pymdownx.tilde\", \"toc\"],\n        extension_configs={\n            \"pymdownx.b64\": {\"base_path\": os.path.dirname(in_fn)},\n            \"pymdownx.highlight\": {\"noclasses\": True},\n            \"toc\": {\"title\": \"Table of Contents\"},\n        },\n    )\n    return html\n\n\ndef wrap_html(contents):\n    header = \"\"\"<!DOCTYPE html><html>\n    <head>\n        <link rel=\"stylesheet\" href=\"https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css\" integrity=\"sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T\" crossorigin=\"anonymous\">\n        <style>\n            body {\n              font-family: -apple-system,BlinkMacSystemFont,\"Segoe UI\",Roboto,\"Helvetica Neue\",Arial,\"Noto Sans\",sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\",\"Segoe UI Symbol\",\"Noto Color Emoji\";\n              padding: 3em;\n              margin-right: 350px;\n              max-width: 100%;\n            }\n            .toc {\n              position: fixed;\n              right: 20px;\n              width: 300px;\n              padding-top: 20px;\n              overflow: scroll;\n              height: calc(100% - 3em - 20px);\n            }\n            .toctitle {\n              font-size: 1.8em;\n              font-weight: bold;\n            }\n            .toc > ul {\n              padding: 0;\n              margin: 1rem 0;\n              list-style-type: none;\n            }\n            .toc > ul ul { padding-left: 20px; }\n            .toc > ul > li > a { display: none; }\n            img { max-width: 800px; }\n            pre {\n              padding: 0.6em 1em;\n            }\n            h2 {\n\n            }\n        </style>\n    </head>\n    <body>\n    <div class=\"container\">\n    \"\"\"\n    footer = \"\"\"\n    </div>\n    </body>\n    </html>\n    \"\"\"\n    return header + contents + footer\n\n\ndef parse_args(args=None):\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"mdfile\", type=argparse.FileType(\"r\"), nargs=\"?\", help=\"File to convert. Defaults to stdin.\")\n    parser.add_argument(\n        \"-o\", \"--out\", type=argparse.FileType(\"w\"), default=sys.stdout, help=\"Output file name. Defaults to stdout.\"\n    )\n    return parser.parse_args(args)\n\n\ndef main(args=None):\n    args = parse_args(args)\n    converted_md = convert_markdown(args.mdfile.name)\n    html = wrap_html(converted_md)\n    args.out.write(html)\n\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n"
  },
  {
    "path": "bin/merge_kraken_res.py",
    "content": "#!/usr/bin/env python\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport argparse\nimport os\nimport pandas as pd\nimport numpy as np\n\ndef _get_args():\n    '''This function parses and return arguments passed in'''\n    parser = argparse.ArgumentParser(\n        prog='merge_kraken_res',\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        description='Merging csv count files in one table')\n    parser.add_argument(\n        '-or',\n        dest=\"readout\",\n        default=\"kraken_read_count_table.csv\",\n        help=\"Read count output file. Default = kraken_read_count_table.csv\")\n    parser.add_argument(\n        '-ok',\n        dest=\"kmerout\",\n        default=\"kraken_kmer_unicity_table.csv\",\n        help=\"Kmer unicity output file. Default = kraken_kmer_unicity_table.csv\")\n\n    args = parser.parse_args()\n\n    readout = args.readout\n    kmerout = args.kmerout\n\n    return(readout, kmerout)\n\n\ndef get_csv():\n    tmp = [i for i in os.listdir() if \".csv\" in i]\n    kmer = [i for i in tmp if '.kmer_' in i]\n    read = [i for i in tmp if '.read_' in i]\n    return(read, kmer)\n\n\ndef _get_basename(file_name):\n    if (\"/\") in file_name:\n        basename = file_name.split(\"/\")[-1].split(\".\")[0]\n    else:\n        basename = file_name.split(\".\")[0]\n    return(basename)\n\n\ndef merge_csv(all_csv):\n    df = pd.read_csv(all_csv[0], index_col=0)\n    for i in range(1, len(all_csv)):\n        df_tmp = pd.read_csv(all_csv[i], index_col=0)\n        df = pd.merge(left=df, right=df_tmp, on='TAXID', how='outer')\n    df.fillna(0, inplace=True)\n    return(df)\n\n\ndef write_csv(pd_dataframe, outfile):\n    pd_dataframe.to_csv(outfile)\n\n\nif __name__ == \"__main__\":\n    READOUT, KMEROUT = _get_args()\n    reads, kmers = get_csv()\n    read_df = merge_csv(reads)\n    kmer_df = merge_csv(kmers)\n    write_csv(read_df, READOUT)\n    write_csv(kmer_df, KMEROUT)"
  },
  {
    "path": "bin/parse_snp_cov.py",
    "content": "#!/usr/bin/env python3\n\n# Written by Thiseas C. Lamnidis and released under the MIT license. \n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport sys, json\nfrom collections import OrderedDict\n\njsonOut = OrderedDict()\ndata = OrderedDict()\n\n\ninput = open(sys.argv[1], 'r')\nfor line in input:\n  fields = line.strip().split()\n  sample_id = fields[0]\n  covered_snps = fields[1]\n  total_snps = fields[2]\n  if sample_id[0] == \"#\":\n    continue\n  \n  data[sample_id] = {\"Covered_Snps\":covered_snps, \"Total_Snps\":total_snps}\n\njsonOut = {\"plot_type\": \"generalstats\", \"id\": \"snp_coverage\",\n    \"pconfig\": {\n        \"Covered_Snps\" : {\"title\" : \"#SNPs Covered\"},\n        \"Total_Snps\" : {\"title\": \"#SNPs Total\"}\n    }, \n    \"data\" : data\n}\n\nwith open(sys.argv[1].rstrip('.txt')+'_mqc.json', 'w') as outfile:\n    json.dump(jsonOut, outfile)\n"
  },
  {
    "path": "bin/print_x_contamination.py",
    "content": "#!/usr/bin/env python3\n\n# Written by Thiseas C. Lamnidis and released under the MIT license. \n# See git repository (https://github.com/nf-core/eager) for full license text.\n\nimport sys, re, json\nfrom collections import OrderedDict\n\njsonOut=OrderedDict()\ndata=OrderedDict()\n\n## Function to convert a set of elements into floating point numbers, when possible, else leave them be.\ndef make_float(x):\n    # print (x)\n    output=[None for i in range(len(x))]\n    ## If value for an estimate/error is -nan, replace with \"NA\". JSON does not accept NaN as a valid field.\n    for i in range(len(x)):\n        if x[i] == \"-nan\" or x[i] == \"nan\":\n            output[i]=\"N/A\"\n            continue\n        try:\n            output[i]=float(x[i])\n        except:\n            output[i]=x[i]\n    \n    return(tuple(output))\n\n\nInput_files=sys.argv[1:]\n\noutput = open(\"nuclear_contamination.txt\", 'w')\nprint (\"Individual\", \"Num_SNPs\", \"Method1_MOM_estimate\", \"Method1_MOM_SE\", \"Method1_ML_estimate\", \"Method1_ML_SE\", \"Method2_MOM_estimate\", \"Method2_MOM_SE\", \"Method2_ML_estimate\", \"Method2_ML_SE\", sep=\"\\t\", file=output)\nfor fn in Input_files:\n    ## For each file, reset the values to \"N/A\" so they don't carry over from last file.\n    mom1, err_mom1= \"N/A\",\"N/A\"\n    ml1, err_ml1=\"N/A\",\"N/A\"\n    mom2, err_mom2= \"N/A\",\"N/A\"\n    ml2, err_ml2=\"N/A\",\"N/A\"\n    nSNPs=\"0\"\n    with open(fn, 'r') as f:\n        Estimates={}\n        Ind=re.sub('\\.X.contamination.out$', '', fn).split(\"/\")[-1]\n        for line in f:\n            fields=line.strip().split()\n            if line.strip()[0:19] == \"We have nSNP sites:\":\n                nSNPs=fields[4].rstrip(\",\")\n            elif line.strip()[0:7] == \"Method1\" and line.strip()[9:16] == 'new_llh':\n                mom1=fields[3].split(\":\")[1]\n                err_mom1=fields[4].split(\":\")[1]\n                ml1=fields[5].split(\":\")[1]\n                err_ml1=fields[6].split(\":\")[1]\n                ## Sometimes angsd fails to run method 2, and the error is printed directly after the SE for ML. When that happens, exclude the first word in the error from the output. (Method 2 jsonOut will be shown as NA)\n                if err_ml1.endswith(\"contamination\"):\n                    err_ml1 = err_ml1[:-13]\n            elif line.strip()[0:7] == \"Method2\" and line.strip()[9:16] == 'new_llh':\n                mom2=fields[3].split(\":\")[1]\n                err_mom2=fields[4].split(\":\")[1]\n                ml2=fields[5].split(\":\")[1]\n                err_ml2=fields[6].split(\":\")[1]\n        ## Convert estimates and errors to floating point numbers\n        (ml1, err_ml1, mom1, err_mom1, ml2, err_ml2, mom2, err_mom2) = make_float((ml1, err_ml1, mom1, err_mom1, ml2, err_ml2, mom2, err_mom2))\n        data[Ind]={ \"Num_SNPs\" : int(nSNPs), \"Method1_MOM_estimate\" : mom1, \"Method1_MOM_SE\" : err_mom1, \"Method1_ML_estimate\" : ml1, \"Method1_ML_SE\" : err_ml1, \"Method2_MOM_estimate\" : mom2, \"Method2_MOM_SE\" : err_mom2, \"Method2_ML_estimate\" : ml2, \"Method2_ML_SE\" : err_ml2 }\n        print (Ind, nSNPs, mom1, err_mom1, ml1, err_ml1, mom2, err_mom2, ml2, err_ml2, sep=\"\\t\", file=output)\n\n\njsonOut = {\"plot_type\": \"generalstats\", \"id\": \"nuclear_contamination\",\n    \"pconfig\": {\n        \"Num_SNPs\" : {\"title\" : \"Number of SNPs\"},\n        \"Method1_MOM_estimate\" : {\"title\": \"Contamination Estimate (Method1_MOM)\"},\n        \"Method1_MOM_SE\" : {\"title\": \"Estimate Error (Method1_MOM)\"},\n        \"Method1_ML_estimate\" : {\"title\": \"Contamination Estimate (Method1_ML)\"},\n        \"Method1_ML_SE\" : {\"title\": \"Estimate Error (Method1_ML)\"},\n        \"Method2_MOM_estimate\" : {\"title\": \"Contamination Estimate (Method2_MOM)\"},\n        \"Method2_MOM_SE\" : {\"title\": \"Estimate Error (Method2_MOM)\"},\n        \"Method2_ML_estimate\" : {\"title\": \"Contamination Estimate (Method2_ML)\"},\n        \"Method2_ML_SE\" : {\"title\": \"Estimate Error (Method2_ML)\"}\n    }, \n    \"data\" : data\n}\nwith open('nuclear_contamination_mqc.json', 'w') as outfile:\n    json.dump(jsonOut, outfile)\n"
  },
  {
    "path": "bin/scrape_software_versions.py",
    "content": "#!/usr/bin/env python\nfrom __future__ import print_function\nfrom collections import OrderedDict\nimport re\n\nregexes = {\n    \"nf-core/eager\": [\"v_pipeline.txt\", r\"(\\S+)\"],\n    \"Nextflow\": [\"v_nextflow.txt\", r\"(\\S+)\"],\n    \"FastQC\": [\"v_fastqc.txt\", r\"FastQC v(\\S+)\"],\n    \"MultiQC\": [\"v_multiqc.txt\", r\"multiqc, version (\\S+)\"],\n    'AdapterRemoval':['v_adapterremoval.txt', r\"AdapterRemoval ver. (\\S+)\"],\n    'Picard MarkDuplicates': ['v_markduplicates.txt', r\"Version:(\\S+)\"],\n    'Samtools': ['v_samtools.txt', r\"samtools (\\S+)\"],\n    'Preseq': ['v_preseq.txt', r\"Version: (\\S+)\"],\n    'BWA': ['v_bwa.txt', r\"Version: (\\S+)\"], \n    'Bowtie2': ['v_bowtie2.txt', r\"bowtie2-([0-9]+\\.[0-9]+\\.[0-9]+) -fdebug\"],\n    'Qualimap': ['v_qualimap.txt', r\"QualiMap v.(\\S+)\"],\n    'GATK HaplotypeCaller': ['v_gatk.txt', r\"The Genome Analysis Toolkit \\(GATK\\) v(\\S+)\"],\n    'GATK UnifiedGenotyper': ['v_gatk3.txt', r\"(\\S+)\"],\n    'bamUtil' : ['v_bamutil.txt', r\"Version: (\\S+);\"],\n    'fastP': ['v_fastp.txt', r\"([\\d\\.]+)\"],\n    'DamageProfiler' : ['v_damageprofiler.txt', r\"DamageProfiler v(\\S+)\"],\n    'angsd':['v_angsd.txt',r\"version: (\\S+)\"],\n    'bedtools':['v_bedtools.txt',r\"bedtools v(\\S+)\"],\n    'circulargenerator':['v_circulargenerator.txt',r\"CircularGeneratorv(\\S+)\"],\n    'DeDup':['v_dedup.txt',r\"DeDup v(\\S+)\"],\n    'freebayes':['v_freebayes.txt',r\"v([0-9]\\S+)\"],\n    'sequenceTools':['v_sequencetools.txt',r\"(\\S+)\"],\n    'maltextract':['v_maltextract.txt', r\"version(\\S+)\"],\n    'malt':['v_malt.txt',r\"version (\\S+)\"],\n    'multivcfanalyzer':['v_multivcfanalyzer.txt', r\"MultiVCFAnalyzer - (\\S+)\"],\n    'pmdtools':['v_pmdtools.txt',r\"pmdtools v(\\S+)\"],\n    'sexdeterrmine':['v_sexdeterrmine.txt',r\"(\\S+)\"],\n    'MTNucRatioCalculator':['v_mtnucratiocalculator.txt',r\"Version: (\\S+)\"],\n    'VCF2genome':['v_vcf2genome.txt', r\"VCF2Genome \\(v. ([0-9].[0-9]+) \"],\n    'endorS.py':['v_endorSpy.txt', r\"endorS.py (\\S+)\"],\n    'kraken':['v_kraken.txt', r\"Kraken version (\\S+)\"],\n    'eigenstrat_snp_coverage':['v_eigenstrat_snp_coverage.txt',r\"(\\S+)\"],\n    'mapDamage2':['v_mapdamage.txt',r\"(\\S+)\"],\n    'bbduk':['v_bbduk.txt',r\"(.*)\"],\n    'bcftools':['v_bcftools.txt',r\"(\\S+)\"]\n}\n\nresults = OrderedDict()\nresults[\"nf-core/eager\"] = '<span style=\"color:#999999;\">N/A</span>'\nresults[\"Nextflow\"] = '<span style=\"color:#999999;\">N/A</span>'\nresults[\"FastQC\"] = '<span style=\"color:#999999;\">N/A</span>'\nresults[\"MultiQC\"] = '<span style=\"color:#999999;\">N/A</span>'\nresults['AdapterRemoval'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['fastP'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['BWA'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['Bowtie2'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['circulargenerator'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['Samtools'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['endorS.py'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['DeDup'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['Picard MarkDuplicates'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['Qualimap'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['Preseq'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['GATK HaplotypeCaller'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['GATK UnifiedGenotyper'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['freebayes'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['sequenceTools'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['VCF2genome'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['MTNucRatioCalculator'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['bedtools'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['DamageProfiler'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['bamUtil'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['pmdtools'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['angsd'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['sexdeterrmine'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['multivcfanalyzer'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['malt'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['kraken'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['maltextract'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['eigenstrat_snp_coverage'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['mapDamage2'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['bbduk'] = '<span style=\"color:#999999;\\\">N/A</span>'\nresults['bcftools'] = '<span style=\"color:#999999;\\\">N/A</span>'\n\n# Search each file using its regex\nfor k, v in regexes.items():\n    try:\n        with open(v[0]) as x:\n            versions = x.read()\n            match = re.search(v[1], versions)\n            if match:\n                results[k] = \"v{}\".format(match.group(1))\n    except IOError:\n        results[k] = False\n\n# Remove software set to false in results\nfor k in list(results):\n    if not results[k]:\n        del results[k]\n\n# Dump to YAML\nprint(\n    \"\"\"\nid: 'software_versions'\nsection_name: 'nf-core/eager Software Versions'\nsection_href: 'https://github.com/nf-core/eager'\nplot_type: 'html'\ndescription: 'are collected at run time from the software output.'\ndata: |\n    <dl class=\"dl-horizontal\">\n\"\"\"\n)\nfor k, v in results.items():\n    print(\"        <dt>{}</dt><dd><samp>{}</samp></dd>\".format(k, v))\nprint(\"    </dl>\")\n\n# Write out regexes as csv file:\nwith open(\"software_versions.csv\", \"w\") as f:\n    for k, v in results.items():\n        f.write(\"{}\\t{}\\n\".format(k, v))\n"
  },
  {
    "path": "conf/base.config",
    "content": "/*\n * -------------------------------------------------\n *  nf-core/eager Nextflow base config file\n * -------------------------------------------------\n * A 'blank slate' config file, appropriate for general\n * use on most high performace compute environments.\n * Assumes that all software is installed and available\n * on the PATH. Runs in `local` mode - all jobs will be\n * run on the logged in environment.\n */\n\nprocess {\n  cpus = { check_max( 1 * task.attempt, 'cpus' ) }\n  memory = { check_max( 7.GB * task.attempt, 'memory' ) }\n  time = { check_max( 24.h * task.attempt, 'time' ) }\n\n  errorStrategy = { task.exitStatus in [143,137,104,134,139, 140] ? 'retry' : 'finish' }\n  maxRetries = 3\n  maxErrors = '-1'\n\n  // Process-specific resource requirements\n  // NOTE - Only one of the labels below are used in the fastqc process in the main script.\n  //        If possible, it would be nice to keep the same label naming convention when\n  //        adding in your processes.\n  // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors\n\n  // Generic resource requirements - s(ingle)c(ore)/m(ulti)c(ore)\n\n  withLabel:'sc_tiny'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 1.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'sc_small'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 4.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'sc_medium'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_small'{\n      cpus = { check_max( 2 * task.attempt, 'cpus' ) }\n      memory = { check_max( 4.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_medium' {\n      cpus = { check_max( 4 * task.attempt, 'cpus' ) }\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_large'{\n      cpus = { check_max( 8 * task.attempt, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_huge'{\n      cpus = { check_max( 32 * task.attempt, 'cpus' ) }\n      memory = { check_max( 256.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n  }\n\n  // Process-specific resource requirements (others leave at default, e.g. Fastqc)\n  withName:get_software_versions {\n    cache = false\n  }\n\n  withName:qualimap{\n    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : task.exitStatus in [255] ? 'ignore' : 'finish' }\n  }\n\n  withName:preseq {\n    errorStrategy = 'ignore'\n  }\n\n  withName:damageprofiler {\n    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' }\n  }\n\n  // Add 1 retry for certain java tools as not enough heap space java errors gives exit code 1\n  withName: dedup {\n    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' } \n  }\n  \n  withName: markduplicates {\n    errorStrategy = { task.exitStatus in [143,137, 140] ? 'retry' : 'finish' } \n  }\n\n  // Add 1 retry as not enough heapspace java error gives exit code 1\n  withName: malt {\n    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' } \n  }\n\n  // other process specific exit statuses\n  withName: nuclear_contamination {\n    errorStrategy = { task.exitStatus in [143,137,104,134,139, 140] ? 'ignore' : 'retry' }\n  }\n\n}\n\nparams {\n  // Defaults only, expecting to be overwritten\n  max_memory = 128.GB\n  max_cpus = 16\n  max_time = 240.h\n  igenomes_base = 's3://ngi-igenomes/igenomes/'\n}\n"
  },
  {
    "path": "conf/benchmarking_human.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nparams {\n   config_profile_name = 'nf-core/eager benchmarking - human profile'\n   config_profile_description = \"A 'fullsized' benchmarking profile for deepish Human sequencing aDNA data\" \n\n   //Input data\n   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_human.tsv'\n   // Genome reference\n   fasta = 'https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz'\n\n   run_bam_filtering = true\n   bam_unmapped_type = 'discard'\n   bam_mapping_quality_threshold = 30\n\n   dedupper = 'markduplicates'\n  \n   run_trim_bam = true\n   bamutils_clip_double_stranded_none_udg_left = 1\n   bamutils_clip_double_stranded_none_udg_right = 1\n   \n   // JAR will need to be downloaded first!\n   run_genotyping = true\n   genotyping_tool = 'ug'\n   genotyping_source = 'trimmed'\n   gatk_call_conf = 20\n\n   run_sexdeterrmine = true\n   sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_HG19.0based.bed.gz'\n\n   run_nuclear_contamination = true\n   contamination_chrom_name = 'chrX'\n\n   run_mtnucratio = true\n}\n\nprocess {\n   withName:'makeBWAIndex'{\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n   }\n   withName:'adapter_removal'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 2.h * task.attempt, 'time' ) }\n   }\n   withName:'bwa'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n   }\n   withName:'markDup'{\n      cpus = { check_max( 16, 'cpus' ) }\n      memory = { check_max( 64.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n   }\n   withName:'damageprofiler'{\n      cpus = 1\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 2.h * task.attempt, 'time' ) }\n   }\n}\n"
  },
  {
    "path": "conf/benchmarking_vikingfish.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nparams {\n   config_profile_name = 'nf-core/eager benchmarking - Viking Fish profile'\n   config_profile_description = \"A 'fullsized' benchmarking profile for deepish sequencing aDNA data\" \n   \n   //Input data\n   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish.tsv'   \n   // Genome reference\n   fasta = 's3://nf-core-awsmegatests/eager/ENA_Data_Fish/GCF_902167405.1_gadMor3.0_genomic.fna.gz'\n   \n   bwaalnn = 0.04\n   bwaalnl = 1024\n   \n   run_bam_filtering = true\n   bam_unmapped_type = 'discard'\n   bam_mapping_quality_threshold = 25\n     \n   run_genotyping = true\n   genotyping_tool = 'hc'\n   genotyping_source = 'raw'\n   gatk_ploidy = 2\n}\n\nprocess {\n   withName:'adapter_removal'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 2.h * task.attempt, 'time' ) }\n   }\n   withName:'bwa'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 8.h * task.attempt, 'time' ) }\n   }\n   withName:'dedup'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n   }\n   withName:'genotyping_hc'{\n     cpus = { check_max( 8, 'cpus' ) }\n     memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n     time = { check_max( 8.h * task.attempt, 'time' ) }\n   }\n\n}\n"
  },
  {
    "path": "conf/igenomes.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for iGenomes paths\n * -------------------------------------------------\n * Defines reference genomes, using iGenome paths\n * Can be used by any config that customises the base\n * path using $params.igenomes_base / --igenomes_base\n */\n\nparams {\n  // illumina iGenomes reference file paths\n  genomes {\n    'GRCh37' {\n      fasta       = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt\"\n      mito_name   = \"MT\"\n      macs_gsize  = \"2.7e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/GRCh37-blacklist.bed\"\n    }\n    'GRCh38' {\n      fasta       = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"2.7e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/hg38-blacklist.bed\"\n    }\n    'GRCm38' {\n      fasta       = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt\"\n      mito_name   = \"MT\"\n      macs_gsize  = \"1.87e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/GRCm38-blacklist.bed\"\n    }\n    'TAIR10' {\n      fasta       = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt\"\n      mito_name   = \"Mt\"\n    }\n    'EB2' {\n      fasta       = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt\"\n    }\n    'UMD3.1' {\n      fasta       = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'WBcel235' {\n      fasta       = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed\"\n      mito_name   = \"MtDNA\"\n      macs_gsize  = \"9e7\"\n    }\n    'CanFam3.1' {\n      fasta       = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'GRCz10' {\n      fasta       = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed\"\n      mito_name   = \"MT\"\n    }\n    'BDGP6' {\n      fasta       = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed\"\n      mito_name   = \"M\"\n      macs_gsize  = \"1.2e8\"\n    }\n    'EquCab2' {\n      fasta       = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'EB1' {\n      fasta       = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt\"\n    }\n    'Galgal4' {\n      fasta       = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed\"\n      mito_name   = \"MT\"\n    }\n    'Gm01' {\n      fasta       = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt\"\n    }\n    'Mmul_1' {\n      fasta       = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'IRGSP-1.0' {\n      fasta       = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed\"\n      mito_name   = \"Mt\"\n    }\n    'CHIMP2.1.4' {\n      fasta       = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'Rnor_6.0' {\n      fasta       = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed\"\n      mito_name   = \"MT\"\n    }\n    'R64-1-1' {\n      fasta       = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed\"\n      mito_name   = \"MT\"\n      macs_gsize  = \"1.2e7\"\n    }\n    'EF2' {\n      fasta       = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt\"\n      mito_name   = \"MT\"\n      macs_gsize  = \"1.21e7\"\n    }\n    'Sbi1' {\n      fasta       = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt\"\n    }\n    'Sscrofa10.2' {\n      fasta       = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt\"\n      mito_name   = \"MT\"\n    }\n    'AGPv3' {\n      fasta       = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed\"\n      mito_name   = \"Mt\"\n    }\n    'hg38' {\n      fasta       = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"2.7e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/hg38-blacklist.bed\"\n    }\n    'hg19' {\n      fasta       = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"2.7e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/hg19-blacklist.bed\"\n    }\n    'mm10' {\n      fasta       = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"1.87e9\"\n      blacklist   = \"${projectDir}/assets/blacklists/mm10-blacklist.bed\"\n    }\n    'bosTau8' {\n      fasta       = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n    }\n    'ce10' {\n      fasta       = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"9e7\"\n    }\n    'canFam3' {\n      fasta       = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n    }\n    'danRer10' {\n      fasta       = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"1.37e9\"\n    }\n    'dm6' {\n      fasta       = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"1.2e8\"\n    }\n    'equCab2' {\n      fasta       = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n    }\n    'galGal4' {\n      fasta       = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n    }\n    'panTro4' {\n      fasta       = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n    }\n    'rn6' {\n      fasta       = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed\"\n      mito_name   = \"chrM\"\n    }\n    'sacCer3' {\n      fasta       = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/\"\n      readme      = \"${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n      macs_gsize  = \"1.2e7\"\n    }\n    'susScr3' {\n      fasta       = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa\"\n      bwa         = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa\"\n      bowtie2     = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/\"\n      star        = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/\"\n      bismark     = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/\"\n      gtf         = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf\"\n      bed12       = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed\"\n      readme      = \"${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt\"\n      mito_name   = \"chrM\"\n    }\n  }\n}\n"
  },
  {
    "path": "conf/test.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on GitHub Actions\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n}\n"
  },
  {
    "path": "conf/test_direct.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n *   nextflow run nf-core/eager -profile test,<docker/singularity>\n */\n\nincludeConfig 'test_resources.config'\n\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on GitHub Actions\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  single_end = false\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n  // Ignore `--input` as otherwise the parameter validation will throw an error\n  schema_ignore_params = 'genomes,input_paths,input'\n}\n"
  },
  {
    "path": "conf/test_full.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running full-size tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a full size pipeline test. Use as follows:\n *   nextflow run nf-core/eager -profile test_full,<docker/singularity>\n */\n\nparams {\n  config_profile_name = 'Full test profile for nf-core/eager'\n  config_profile_description = 'Full test dataset to check nf-core/eager function'\n\n  // Input data for full size test\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish.tsv'\n   \n   // Genome reference\n   fasta = 'https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/Gadus_morhua/representative/GCF_902167405.1_gadMor3.0/GCF_902167405.1_gadMor3.0_genomic.fna.gz'\n   \n   bwaalnn = 0.04\n   bwaalnl = 1024\n   \n   run_bam_filtering = true\n   bam_unmapped_type = 'discard'\n   bam_mapping_quality_threshold = 25\n     \n   run_genotyping = true\n   genotyping_tool = 'hc'\n   genotyping_source = 'raw'\n   gatk_ploidy = 2\n}\n\nprocess {\n   withName:'adapter_removal'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 2.h * task.attempt, 'time' ) }\n   }\n   withName:'bwa'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 8.h * task.attempt, 'time' ) }\n   }\n   withName:'dedup'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 4.h * task.attempt, 'time' ) }\n   }\n   withName:'genotyping_hc'{\n     cpus = { check_max( 8, 'cpus' ) }\n     memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n     time = { check_max( 8.h * task.attempt, 'time' ) }\n   }\n   \n  // Ignore `--input` as otherwise the parameter validation will throw an error\n  schema_ignore_params = 'genomes,input_paths,input'\n}\n"
  },
  {
    "path": "conf/test_resources.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines the base computing resources used across all CI tests (primarily the\n * time limit)\n */\n\n\nprocess {\n\n  withLabel:'sc_tiny'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 1.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'sc_small'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 4.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'sc_medium'{\n      cpus = { check_max( 1, 'cpus' ) }\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_small'{\n      cpus = { check_max( 2 * task.attempt, 'cpus' ) }\n      memory = { check_max( 4.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_medium' {\n      cpus = { check_max( 4 * task.attempt, 'cpus' ) }\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_large'{\n      cpus = { check_max( 8 * task.attempt, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withLabel:'mc_huge'{\n      cpus = { check_max( 32 * task.attempt, 'cpus' ) }\n      memory = { check_max( 256.GB * task.attempt, 'memory' ) }\n      time = { check_max( 10.m * task.attempt, 'time' ) }\n  }\n\n  withName:'mapdamage_rescaling'{\n      time = { check_max( 20.m * task.attempt, 'time' ) }\n  }\n\n}"
  },
  {
    "path": "conf/test_stresstest_human.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nparams {\n   config_profile_name = 'nf-core/eager stresstess - human profile'\n   config_profile_description = \"A large-scale benchmarking profile AWS stress-testing of large sample number study\" \n\n   //Input data\n   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/human_stresstest.tsv'\n   // Genome reference\n   fasta = 'https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz'\n\n   save_reference = true\n\n   email = 'james@nf-co.re'\n\n   run_mtnucratio = true\n   mtnucratio_header = 'ChrM'\n\n   run_bam_filtering = true\n   bam_unmapped_type = 'discard'\n   bam_mapping_quality_threshold = 30\n\n   dedupper = 'markduplicates'\n  \n   run_sexdeterrmine = true\n   sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_HG19.0based.bed.gz'\n\n   run_nuclear_contamination = true\n   contamination_chrom_name = 'chrX'\n\n   run_mtnucratio = true\n\n\n}\n\nprocess {\n\n   errorStrategy = 'retry'\n   \n   maxRetries = 5\n\n   withName:'makeBWAIndex'{\n      time = { check_max( 48.h * task.attempt, 'time' ) }\n   }\n   withName:'adapter_removal'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 48.h * task.attempt, 'time' ) }\n   }\n   withName:'bwa'{\n      cpus = { check_max( 8, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 48.h * task.attempt, 'time' ) }\n   }\n   withName:'markduplicates'{\n      errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }\n      cpus = { check_max( 16, 'cpus' ) }\n      memory = { check_max( 16.GB * task.attempt, 'memory' ) }\n      time = { check_max( 48.h * task.attempt, 'time' ) }\n   }\n   withName:'damageprofiler'{\n      cpus = 1\n      memory = { check_max( 8.GB * task.attempt, 'memory' ) }\n      time = { check_max( 48.h * task.attempt, 'time' ) }\n   }\n}\n"
  },
  {
    "path": "conf/test_tsv_bam.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on Travis\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_bam.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n}"
  },
  {
    "path": "conf/test_tsv_complex.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on GitHub Actions\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq_multilane_multilib.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n}\n"
  },
  {
    "path": "conf/test_tsv_fna.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on Travis\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fna'\n}\n"
  },
  {
    "path": "conf/test_tsv_humanbam.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on Travis\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Human/human_design_bam.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n  sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'\n  // Genotyping\n  pileupcaller_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'\n  pileupcaller_snpfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K_covered_in_JK2067_downsampled_s0.1.numeric_chromosomes.snp'\n}\n"
  },
  {
    "path": "conf/test_tsv_kraken.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile kraken'\n  config_profile_description = 'Minimal test dataset to check pipeline function with kraken metagenomic profiler'\n  // Limit resources so that this can run on Travis\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  metagenomic_tool = 'kraken'\n  run_metagenomic_screening = true\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n  database = 'https://github.com/nf-core/test-datasets/raw/eager/databases/kraken/eager_test.tar.gz'\n}\n"
  },
  {
    "path": "conf/test_tsv_pretrim.config",
    "content": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * -------------------------------------------------\n * Defines bundled input files and everything required\n * to run a fast and simple test. Use as follows:\n * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)\n */\n\nincludeConfig 'test_resources.config'\n\nparams {\n  config_profile_name = 'Test profile'\n  config_profile_description = 'Minimal test dataset to check pipeline function'\n  // Limit resources so that this can run on Travis\n  max_cpus = 2\n  max_memory = 6.GB\n  max_time = 48.h\n  genome = false\n  //Input data\n  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq_pretrim.tsv'\n  // Genome references\n  fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'\n}\n"
  },
  {
    "path": "docs/README.md",
    "content": "# nf-core/eager: Documentation\n\nThe nf-core/eager documentation is split into the following pages:\n\n* [Usage](usage.md)\n  * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.\n  * Also includes: FAQ, Troubleshooting and Tutorials\n* [Output](output.md)\n  * An overview of the different results produced by the pipeline and how to interpret them.\n\nYou can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re).\n\nAdditional pages are:\n\n* [Installation](https://nf-co.re/usage/installation)\n* Pipeline configuration\n  * [Local installation](https://nf-co.re/usage/local_installation)\n  * [Adding your own system config](https://nf-co.re/usage/adding_own_config)\n  * [Reference genomes](https://nf-co.re/usage/reference_genomes)\n* [Contribution Guidelines](../.github/CONTRIBUTING.md)\n  * Basic contribution & behaviour guidelines\n  * Checklists and guidelines for people who would like to contribute code\n  "
  },
  {
    "path": "docs/images/README.md",
    "content": "# Documentation Images Information\n\nThe font used for all documentation images is Kalam by Indian Type Foundry and is released under the [Open Font License](https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL)\n\nOriginally downloaded from [Google Fonts](https://fonts.google.com/specimen/Kalam?sidebar.open&selection.family=Kalam:wght@300;400;700)\n"
  },
  {
    "path": "docs/images/usage/nfcore-eager_tsv_template.tsv",
    "content": "Sample_Name\tLibrary_ID\tLane\tColour_Chemistry\tSeqType\tOrganism\tStrandedness\tUDG_Treatment\tR1\tR2\tBAM\n"
  },
  {
    "path": "docs/output.md",
    "content": "# nf-core/eager: Output\n\n## Introduction\n\nThe output of nf-core/eager primarily consists of the following main components: output alignment files (e.g. VCF, BAM or FASTQ files), and summary statistics of the whole run presented in a [`MultiQC`](https://multiqc.info) report. Intermediate files and module-specific statistics files are also retained depending on your particular run configuration.\n\n## Directory Structure\n\nThe default directory structure of nf-core/eager is as follows\n\n```bash\nresults/\n├── MultiQC/\n├── <MODULE_1>/\n├── <MODULE_2>/\n├── <MODULE_3>/\n├── pipeline_info/\n└── reference_genome/\nwork/\n```\n\n* The parent directory `<RUN_OUTPUT_DIRECTORY>` is the parent directory of the run, either the directory the pipeline was run from or as specified by the `--outdir` flag. The default name of the output directory (unless otherwise specified) will be `./results/`.\n\n### Primary Output Directories\n\nThese directories are the ones you will use on a day-to-day basis and are those which you should familiarise yourself with.\n\n* The `MultiQC` directory is the most important directory and contains the main summary report of the run in HTML format, which can be viewed in a web-browser of your choice. The sub-directory contains the MultiQC collected data used to build the HTML report. The Report allows you to get an overview of the sequencing and mapping quality as well as aDNA metrics (see the [MultiQC Report](#multiqc-report) section for more detail).\n* A `<MODULE>` directory contains the (cleaned-up) output from a particular software module. This is the second most important set of directories. This contains output files such as FASTQ, BAM, statistics, and/or plot files of a specific module (see the [Output Files](#output-files) section for more detail). The latter two are only needed when you need finer detail about that particular part of the pipeline.\n\n### Secondary Output Directories\n\nThese are less important directories which are used less often, normally in the context of bug-reporting.\n\n* `pipeline_info/`: [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.\n  * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.\n  * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`.\n  * Documentation for interpretation of results in HTML format: `results_description.html`.\n* `reference_genome/` contains either text files describing the location of specified reference genomes, and if not already supplied when running the pipeline, auxiliary indexing files. This is often useful when re-running other samples using the same reference genome, but is otherwise often not important.\n* The `work/` directory contains all the `nextflow` processing directories. This is where `nextflow` actually does all the work, but in an efficient programmatic procedure that is not intuitive to human-readers. Due to this, the directory is often not important to a user as all the useful output files are linked to the module directories (see above). Otherwise, this directory maybe useful when a bug-reporting.\n\n> :warning: Note that `work/` will be created wherever you are running the `nextflow run` command from, unless you specify the location with `-w`, i.e. it will not by default be in `outdir`!.\n\n## MultiQC Report\n\nIn this section we will run through the output of each **default** module as reported in a MultiQC output. This can be viewed by opening the HTML file in your `<RUN_OUTPUT_DIRECTORY>/MultiQC/` directory in a web browser. The section will also provide some basic tips on how to interpret the plots and values, although we highly recommend reading the READMEs or original papers of the tools used in the pipeline. A list of references can be seen on the [nf-core/eager github repository](https://github.com/nf-core/eager/)\n\nFor more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info)\n\n### General Stats Table\n\n#### Background\n\nThis is the main summary table produced by MultiQC that the report begins with. This section of the report is generated by MultiQC itself rather than stats produced by a specific module. It shows whatever each module considers to be as the 'most important' values to be displayed — however the nf-core/eager version has been somewhat customised to make it as close to the EAGER (v1) ReportTable format as possible, with some opinionated tweaks.\n\n#### Table\n\nThis table will report values per-file, library, or sample statistics depending on which stage along the pipeline you have gone through. MultiQC will try and collapse the rows as far as possible, if the log files have the same name. However, separate libraries will be displayed separately, for example when using DamageProfiler with the using TSV input and merging of samples is performed (which would be reported at the qualimap level). If you're only interested in a single part of the results (e.g. qualimap) you can use the `Configure Columns` to remove columns and the corresponding rows will be not displayed, resulting in a more compact table.\n\nEach column name is supplied by the module, so you may see similar column names. When unsure, hovering over the column name will allow you see which module it is derived from.\n\nThe possible columns displayed by default are as follows (note you may see additional columns depending on what other modules you activate):\n\n* **Sample Name** This is the log file name without file suffix(s). This will depend on the module outputs.\n* **Nr. Input Reads** This is from Pre-AdapterRemoval FastQC. Represents the number of raw reads in your untrimmed and (paired end) unmerged FASTQ file. Each row should be approximately equal to the number of reads you requested to be sequenced, divided by the number of FASTQ files you received for that library.\n* **Length Input Reads** This is from Pre-AdapterRemoval FastQC. This is the average read length in your untrimmed and (paired end) unmerged FASTQ file and should represent the number of cycles of your sequencing chemistry.\n* **% GC Input Reads** This is from Pre-AdapterRemoval FastQC. This is the average GC content in percent of all the reads in your untrimmed and (paired end) unmerged FASTQ file.\n* **GC content** This is from FastP. This is the average GC of all reads in your untrimmed and unmerged FASTSQ file after poly-G tail trimming. If you have lots of tails, this value should drop from the pre-AdapterRemoval FastQC  %GC column.\n* **% Trimmed** This is from AdapterRemoval. It is the percentage of reads which had an adapter sequence removed from the end of the read.\n* **Nr. Processed Reads** This is from Post-AdapterRemoval FastQC. Represents the number of preprocessed reads in your adapter trimmed (paired end) merged FASTQ file. The loss between this number and the Pre-AdapterRemoval FastQC can give you an idea of the quality of trimming and merging.\n* **% GC Processed Reads** This is from Post-AdapterRemoval FastQC. Represents the average GC of all preprocessed reads in your adapter trimmed (paired end) merged FASTQ file.\n* **Length Processed Reads** This is from post-AdapterRemoval FastQC. This is the average read length in your trimmed and (paired end) merged FASTQ file and should represent the 'realistic' average lengths of your DNA molecules\n* **% Aligned** This is from bowtie2. It reports the percentage of input reads that mapped to your reference genome. This number will be likely similar to Endogenous DNA % (see below).\n* **% Metagenomic Mappability** This is from MALT. It reports the percentage of the off-target reads (from mapping), that could map to your MALT metagenomic database. This can often be low for aDNA due to short reads and database bias.\n* **% Unclassified** This is from Kraken. It reports the percentage of reads that could not be aligned and taxonomically assigned against your Kraken metagenomic database. This can often be high for aDNA due to short reads and database bias.\n* **Nr. Reads Into Mapping** This is from Samtools. This is the raw number of preprocessed reads that went into mapping.\n* **Nr. Mapped Reads** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _prior_ map quality filtering.\n* **Endogenous DNA (%)** This is from the endorS.py tool. It displays a percentage of mapped reads over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). Assuming a perfect ancient sample with no modern contamination, this would be the amount of true ancient DNA in the sample. However this value _most likely_ include contamination and will not entirely be the true 'endogenous' content.\n* **Nr. Mapped Reads Post-Filter** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _after_ map quality filtering (note the column name does not distinguish itself from prior-map quality filtering, but the post-filter column is always second)\n* **Endogenous DNA Post-Filter (%)** This is from the endorS.py tool. It displays a percentage of mapped reads _after_ BAM filtering (i.e. for mapping quality and/or bam-level length filtering) over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). This column will only be displayed if BAM filtering is turned on and is based on the original mapping for total reads, and mapped reads as calculated from the post-filtering BAM.\n* **ClusterFactor** This is from **DeDup only**. This is a value representing how many duplicates in the library exist for each unique read. This ratio is calculated as `reads_before_deduplication / reads_after_deduplication`. Can be converted to %Dups by calculating `1 - (1  / CF)`. A cluster factor close to one indicates a highly complex library and could be sequenced further. Generally with a value of more than 2 you will not be gaining much more information by sequencing deeper.\n* **% Dup. Mapped Reads** This is from **Picard's markDuplicates only**. It represents the percentage of reads in your library that were exact duplicates of other reads in your library. The lower the better, as high duplication rate means lots of sequencing of the same information (and therefore is not time or cost effective).\n* **X Prime Y>Z N base** These columns are from DamageProfiler or mapDamage. The prime numbers represent which end of the reads the damage is referring to. The Y>Z is the type of substitution (C>T is the true damage, G>A is the complementary). You should see for no- and half-UDG treatment a decrease in frequency from the 1st to 2nd base.\n* **Mean Length Mapped Reads** This is from DamageProfiler. This is the mean length of all de-duplicated mapped reads. Ancient DNA normally will have a mean between 30-75, however this can vary.\n* **Median Length Mapped Reads** This is from DamageProfiler. This is the median length of all de-duplicated mapped reads. Ancient DNA normally will have a mean between 30-75, however this can vary.\n* **Nr. Dedup. Mapped Reads** This is from Qualimap. This is the total number of _deduplicated_ reads that mapped to your reference genome. This is the **best** number to report for final mapped reads in final publications.\n* **Mean/Median Coverage** This is from Qualimap. This is the mean/median number of times a base on your reference genome was covered by a read (i.e. depth coverage). This average includes bases with 0 reads covering that position.\n* **>= 1X** to **>= 5X** These are from Qualimap. This is the percentage of the genome covered at that particular depth coverage.\n* **% GC Dedup. Mapped Reads** This is the mean GC content in percent of all mapped reads post-deduplication. This should normally be close to the GC content of your reference genome.\n* **MT to Nuclear Ratio** This from MTtoNucRatio. This reports the number of reads aligned to a mitochondrial entry in your reference FASTA to all other entries. This will typically be high but will vary depending on tissue type.\n* **SexDet Rate X Chr** This is from Sex.DetERRmine. This is the relative depth of coverage on the X-chromosome.\n* **SexDet Rate Y Chr** This is from Sex.DetERRmine. This is the relative depth of coverage on the Y-chromosome.\n* **#SNPs Covered** This is from eigenstrat\\_snp\\_coverage. The number of called SNPs after genotyping with pileupcaller.\n* **#SNPs Total** This is from eigenstrat\\_snp\\_coverage. The maximum number of covered SNPs, i.e. the number of SNPs in the .snp file provided to pileupcaller with `--pileupcaller_snpfile`.\n* **Number of SNPs** This is from ANGSD. The number of SNPs left after removing sites with no data in a 5 base pair surrounding region.\n* **Contamination Estimate (Method1_ML)** This is from the nuclear contamination function of ANGSD. The Maximum Likelihood contamination estimate according to Method 1. The estimates using Method of Moments and/or those based on Method 2 can be unhidden through the \"Configure Columns\" button.\n* **Estimate Error (Method1_ML)** This is from ANGSD. The standard error of the Method1 Maximum likelihood estimate. The errors associated with Method of Moments and/or Method2 estimates can be unhidden through the \"Configure Columns\" button.\n* **% Hets** This is from MultiVCFAnalyzer. This reports the number of SNPs on an assumed haploid organism that have two possible alleles. A high percentage may indicate cross-mapping from a related species.\n\nFor other non-default columns (activated under 'Configure Columns'), hover over the column name for further descriptions.\n\n### FastQC\n\n#### Background\n\n[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your raw reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C) as sequenced. You also get information about adapter contamination and other overrepresented sequences.\n\nYou will receive output for each supplied FASTQ file.\n\nWhen dealing with ancient DNA data the MultiQC plots for FastQC will often show lots of 'warning' or 'failed' samples. You generally can discard this sort of information as we are dealing with very degraded and metagenomic samples which have artefacts that violate the FastQC 'quality definitions', while still being valid data for aDNA researchers. Instead you will _normally_ be looking for 'global' patterns across all samples of a sequencing run to check for library construction or sequencing failures. Decision on whether a individual sample has 'failed' or not should be made by the user after checking all the plots themselves (e.g. if the sample is consistently an outlier to all others in the run).\n\n[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences.\n\nFor further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).\n\n> **NB:** The FastQC (pre-Trimming) plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the FastQC (post-Trimming) section. You should expect after AdapterRemoval, that most of the artefacts are removed.\n> :warning: If you turned on `--post_ar_fastq_trimming` your 'post-Trimming' report the statistics _after_ this trimming. There is no separate report for the post-AdapterRemoval trimming.\n\n#### Sequence Counts\n\nThis shows a barplot with the overall number of sequences (x axis) in your raw library after demultiplexing, **per file** (y-axis). If you have paired end data, you will have one bar for Read 1 (or forward), and a second bar for Read 2 (or reverse). Each entire bar should represent approximately what you requested from the sequencer itself — unless you have your library sequenced over multiple lanes, where it should be what you request divided by the number of lanes it was split over.\n\nA section of the bar will also show an approximate estimation of the fraction of the total number of reads that are duplicates of another. This can derive from over-amplification of the library, or lots of single adapters. This can be later checked with the Deduplication check. A good library and sequencing run should have very low amounts of duplicates reads.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_sequence_counts.png\" width=\"75%\" height = \"75%\">\n</p>\n\n#### Sequence Quality Histograms\n\nThis line plot represents the Phred scores across each base pair of all the reads. The x-axis is the base position across each read, and the y-axis is the average base-calling score (Phred-scaled) of the nucleotides across all reads. Again, this is per FASTQ file (i.e. forward/reverse and/or lanes separately). The background colours represent approximate ranges of quality, with green section being acceptable quality, orange is dubious and red is bad.\n\nYou will often see that the first 5 or so bases have slightly lower quality than the rest of the read as this the calibration steps of the machine. The bulk of the read should then stay ~35. Do not worry if you see the last 10-20 bases of reads do often have lower quality base calls that the middle of the read, as the sequencing reagents start to deplete during these cycles (e.g. making nucleotide fluorescence weaker). Furthermore, the reverse reads of sequencing data will often be even lower at ends than forward reads for the same reason.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_sequence_quality_histogram.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* all positions having Phred scores less than 27\n* a sharp drop-off of quality early in the read\n* for paired-end data, if either R1 or R2 is significantly lower quality across the whole read compared to the complementary read.\n  \n#### Per Sequence Quality Scores\n\nThis is a further summary of the previous plot. This is a histogram of the _overall_ read quality (compared to per-base, above). The x axis is the mean read-quality score (summarising all the bases of the read in a single value), and the y-axis is the number of reads with this Phred score. You should see a peak with the majority of your reads between 27-35.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_per_sequence_quality_score.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* bi-modal peaks which suggests artefacts in some of the sequencing cycles\n* all peaks being in orange or red sections which suggests an overall bad sequencing run (possibly due to a faulty flow-cell).\n  \n#### Per Base Sequencing Content\n\nThis is a heatmap which shows the average percentage of C, G, T, and A nucleotides across ~4bp bins across all reads.\n\nYou expect to see whole heatmap to be a relatively equal block of colour (normally black), representing an equal mix of A, C, T, G colors (see legend).\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_per_base_sequence_content.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* If you see a particular colour becoming more prominent this suggests there is an over-representation of those bases at that base-pair range across all reads (e.g. 20-24bp). This could happen if you have lots of PCR duplicates, or poly-G tails from Illumina NextSeq/NovaSeq 2-colour chemistry data (where no fluorescence can mean both G or 'no-call').\n\n> If you see Poly-G tails, we recommend to turn on FastP poly-G trimming with EAGER. See the 'running' documentation page for details.\n\n#### Per Sequence GC Content\n\nThis line graph shows the number percentage reads (y-axis) with an average percent GC content (y-axis). In 'isolate' samples (i.e. majority of the reads should be from the host species of the sample), this should be represented by a sharp peak around the average percent GC content of the reference genome. In metagenomic contexts this should be a wide flat distribution with a mean around 50%, however this can be highly different for other types of data.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_per_sequence_GC_content.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* If you see particularly high percent GC content peak with NextSeq/NovaSeq data, you may have lots of PCR duplicates, or poly-G tails from Illumina NextSeq/NovaSeq 2-colour chemistry data (where no fluorescence can mean both G or 'no-call'). Consider re-running nf-core/eager using the poly-G trimming option from `fastp` See the 'running' documentation page for details.\n\n#### Per Base N Content\n\nThis line graph shows you the average numbers of Ns found across all reads of a sample. Ns can be caused for a variety of reasons such as low-confidence base call, or the base has been masked. The lines should be very low (as close to 0 as possible) and generally be flat across the whole read. Increases in Ns may reflect in HiSeq data issues of the last cycles running out of chemistry.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_per_base_n_content.png\" width=\"75%\" height = \"75%\">\n</p>\n\n> **NB:** Publicly downloaded data may have extremely high N contents across all reads. These normally come from 'masked' reads that may have originally be, for example, from a human sample for microbial analysis where the consent for publishing of the host DNA was not given. In these cases you do not need to worry about this plot.\n\n#### Sequence Duplication Levels\n\nThis plot is some-what similar to looking at duplication rate or 'cluster factor' of mapped reads. In this case however FastQC takes the sequences of the first 100 thousand reads of a library, and looks to see how often a read sequence is repeated in the rest of the library.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_sequence_duplication_level.png\" width=\"75%\" height = \"75%\">\n</p>\n\nA good library should have very low rates of duplication (vast majority of reads having a duplication rate of 1) — suggesting 'high complexity' or lots of unique reads and useful data. This is represented as a steep drop in the line plot and possible a very small curve at about a duplication rate of 2 or 3 and then remaining at ~0 for higher duplication rates.\n\nNote that good libraries may sometimes have small peaks at high duplication levels. This maybe due to free-adapters (with no inserts), or mono-nucleotide reads (e.g. GGGGG in NextSeq/NovaSeq data).\n\nBad libraries which have extremely low input DNA (so during amplification the same molecules been amplified repeatedly), or a good library that has been erroneously over-amplified will show very high duplication levels — so a very slowly decreasing curve. Alternatively, if your library construction failed and many adapters were not ligated to insert molecules, a high duplication rate may be caused by these free-adapters (see 'Overrepresented sequences' for more information).\n\n> **NB:** amplicon libraries such as for 16S rRNA analysis may appear here as having high duplication rates and these peaks can be ignored. This can be verified if no contaminants are found in the 'Overrepresented sequences' section.\n\n#### Overrepresented sequences\n\nAfter identifying duplicates (see the previous section), a table will be displayed in the 'Overrepresented sequences' section of the report. This is an attempt by FastQC to check to see if the duplicates identified match common contaminants such as free adapters or mono-nucleotide reads.\n\nYou can then use this table help inform you in identification where the problem occurred in the construction and sequencing of this library. E.g. if you have high duplication rates but no identified contaminants, this suggests over-amplification of reads rather than left over adapters.\n\n#### Adapter Content\n\nThis plot shows the percentage of reads (y-axis), which has an adapter starting at a particular position along a read (x-axis). There can be multiple lines per sample as each line represents a particular adapter.\n\nIt is common in aDNA libraries to see very rapid increases in the proportion of reads with an adapter 'early on' in the read, as by nature aDNA molecules are fragmented and very short. Palaeolithic samples can have reads as short as 25bp, so sequences can already start having adapters 25bp into a read.\n\nThis can already give you an indication on the authenticity of your library - as if you see very low proportions of reads with adapters this suggests long insert molecules that are less likely to derive from a 'true' aDNA library. On the flip-side, if you are working with modern DNA - it can give an indication of over-sonication if you have artificially fragmented your reads to lower than your target molecule length.\n\n<p align=\"center\">\n  <img src=\"images/output/fastqc/fastqc_adapter_content.png\" width=\"75%\" height = \"75%\">\n</p>\n\nIf you have downloaded public data this often is uploaded with adapters already removed, so you can expect a flat distribution straight away.\n\nWhen comparing pre- and post-AdapterRemoval FASTQC plots of fresh sequencing data (assuming your sequencing center doesn't already remove adapters), you expect to see something similar to the left panel of the example above _pre-_ adapter removal and the right hand panel _post-_ adapter removal.\n\n### FastP\n\n#### Background\n\nFastP is a general pre-processing toolkit for Illumina sequencing data. In nf-core/eager we currently only use the 'poly-G' trimming function. Poly-G tails occur at ends of reads when using two-colour chemistry kits (i.e. in NextSeq and NovaSeq). This occurs as 'no fluorescence' is interpreted by the machine; however if the chemistry runs out or the read is shorter than the number of cycles in the kit, you will get at the ends of reads lots of cycles with no nucleotides and these are then recorded as Gs.\n\nWhile the machine should detect a reduction in base-calling quality, this is not always the case and you will retain these tails in your FASTQ files. This can cause skews in GC content and false positive SNP calls when the reference genome has long mono-nucleotide stretches (typically in larger eukaryotic genomes).\n\nIn the case of dual-indexed paired-end sequencing, it is likely poly-G tails are less of an issue as during your AdapterRemoval step, anything passed the adapter will be clipped off anyway. However you can check this under the 'Per Sequence GC Content' plot in FastQC.\n\n> **NB:** As you are more likely to see this at the end of the run, in paired-end data you should see all 'Read 2' data having a higher GC content distribution than the 'Read 1'\n\nWhile the MultiQC report has multiple plots for FastP, we will only look at GC content as that's the functionality we use currently.\n\nThe pipeline will generate the respective output for each supplied FASTQ file.\n\n#### GC Content\n\nThis line plot shows the average GC content (Y axis) across each nucleotide of the reads (X-axis). There are two buttons per read (i.e. 2 for single-end, and 4 for paired-end) representing before and after the poly-G tail trimming.\n\nBefore filtering, if you have poly-G tails, you should see the lines going up  at the end of the right-hand side of the plot.\n\nAfter filtering, you should see that the average GC content along the reads is now reduced to around the general trend of the entire read.\n\nThings to look out for:\n\n* If you see a distinct GC content increase at the end of the reads, but are not removed after filtering, check to see where along the read the increase seems to start. If it is less than 10 base pairs from the end, consider reducing the overlap parameter `--complexity_filter_poly_g_min`, which tells FastP how far in the read the Gs need to go before removing them.\n\n### AdapterRemoval\n\n#### Background\n\nAdapterRemoval a tool that does the post-sequencing clean up of your sequencing reads. It performs the following functions\n\n* 'Merges' (or 'collapses') forward and reverse reads of Paired End data\n* Removes remaining library indexing adapters\n* Trims low quality base tails from ends of reads\n* Removes too-short reads\n\nIn more detail merging is where the same read from the forward and reverse files of a single library (based on the flowcell coordinates), are compared to find a stretch of sequence that are the same. If this overlap reaches certain quality thresholds, the two reads are 'collapsed' into a single read, with the base quality scores are updated accordingly accounting for the increase quality call precision.\n\nAdapter removal involves finding overlaps at the 5' and 3' end of reads for the artificial NGS library adapters (which connect the DNA molecule insert, and the index), and stretches that match each other are then removed from the read itself. Note, by default AdapterRemoval does _not_ remove 'internal barcodes' (between insert and the adapter), so these statistics are not considered.\n\nQuality trimming (or 'truncating') involves looking at ends of reads for low-confidence bases (i.e. where the FASTQ Phred score is below a certain threshold). These are then removed remove the read.\n\nLength filtering involves removing any read that does not reach the number of bases specified by a particular value.\n\nYou will receive output for each FASTQ file supplied for single end data, or for each pair of merged FASTQ files for paired end data.\n\n#### Retained and Discarded Reads Plot\n\nThese stacked bars plots are unfortunately a little confusing, when displayed in MultiQC. However are relatively straight-forward once you understand each category. They can be displayed as counts of reads per AdapterRemoval read-category, or as percentages of the same values. Each forward(/reverse) file combination are displayed once.\n\nThe most important value is the **Retained Read Pairs** which gives you the final number of reads output into the file that goes into mapping. Note, however, this section of the stack bar _includes_ the other categories displayed (see below) in the calculation.\n\nOther Categories:\n\n* If paired-end, the **Singleton [mate] R1(/R2)** categories represent reads which were unable to be collapsed, possibly due to the reads being too long to overlap.\n* If paired-end, **Full-length collapsed pairs** are reads which were collapsed and did not require low-quality bases at end of reads to be removed.\n* If paired-end, **Truncated collapsed pairs** are paired-end that were collapsed but did required the removal of low quality bases at the end of reads.\n* **Discarded [mate] R1/R2** represent reads which were a part of a pair, but one member of the pair did not reach other quality criteria and was discarded. However the other member of the pair is still retained in the output file as it still reached other quality criteria.\n\n<p align=\"center\">\n  <img src=\"images/output/adapter_removal/adapter_removal_discarded_reads.png\" width=\"75%\" height = \"75%\">\n</p>\n  \nFor ancient DNA, assuming a good quality run, you expect to see a the vast majority of your reads overlapping because we have such fragmented molecules. Large numbers of singletons suggest your molecules are too long and may not represent true ancient DNA.\n\nIf you see high numbers of discarded or truncated reads, you should check your FastQC results for low sequencing quality of that particular run.\n\n#### Length Distribution Plot\n\nThe length distribution plots show the number of reads at each read-length. You can change the plot to display different categories.\n\n* All represent the overall distribution of reads. In the case of paired-end sequencing You may see a peak at the turn around from forward to reverse cycles.\n* **Mate 1** and **Mate 2** represents the length of the forward and reverse read respectively prior collapsing\n* **Singleton** represent those reads that had a one member of a pair discarded\n* **Collapsed** and **Collapsed Truncated** represent reads that overlapped and able to merge into a single read, with the latter including base-quality trimming off ends of reads. These plots will start with a vertical rise representing where you are above the minimum-read threshold you set.\n* **Discarded** here represents the number of reads that did not each the read length filter. You will likely see a vertical drop at what your threshold was set to.\n\n<p align=\"center\">\n  <img src=\"images/output/adapter_removal/adapter_removal_length_distribution.png\" width=\"75%\" height = \"75%\">\n</p>\n\nWith paired-end ancient DNA sequencing runs You expect to see a slight increase in shorter fragments in the reverse (R2) read, as our fragments are so short we often don't reach the maximum number of cycles of that particular sequencing run.\n\n### Bowtie2\n\n#### Background\n\nThis module provides information on mapping when running the Bowtie2 aligner. Bowtie2, like bwa, takes raw FASTQ reads and finds the most likely place on the reference genome it derived from. While this module is somewhat redundant with the [Samtools](#samtools) (which reports mapping statistics for bwa) and the endorSp.y endogenous DNA value in the general statistics table, it does provide some details that could be useful in certain contexts.\n\nYou will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes in one value.\n\n#### Single/Paired-end alignments\n\nThis bar plot shows the number of different categories of reads that Bowtie2 was able to align to the reference genome. You will get slightly different plots for Paired-End (PE) and Single-End (SE) data, but they are basically the same.\n\nAncient DNA samples typically have low endogenous DNA values, as most of the DNA from the sample is from taphonomic sources (burial environment, modern handling etc), so it is normal to get low numbers of mapping reads.\n\n<p align=\"center\">\n  <img src=\"images/output/bowtie2/bowtie2_alignment_scores.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThe main additional useful information compared to [Samtools](#samtools) is that these plots can inform you how many reads had multiple places on the reference the read could align to. This can occur with low complexity reads or reads derived from e.g. repetitive regions on the genome. If you have large amounts of multi-mapping reads, this can be a warning flag that there is an issue either with the reference genome or library itself (e.g. library construction artefacts). You should investigate cases like this more closely before using the data downstream.\n\n### MALT\n\n#### Background\n\nMALT is a metagenomic aligner (equivalent to BLAST, but much faster). It produces direct alignments of sequencing reads in a reference genome. It is often used for metagenomic profiling or pathogen screening, and specifically in nf-core/eager, of off-target reads from genome mapping.\n\nYou will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes and sequencing configurations in one value.\n\n#### Metagenomic Mappability\n\nThis bar plot gives an approximation of how many reads in your off-target FASTQ file was able to align to your metagenomic database.\n\nDue to low 'endogenous' content of aDNA, and the high biodiversity of modern or environmental microbes that normally exists in archaeological and museum samples, you often will get relatively low mappability percentages.\n\n<p align=\"center\">\n  <img src=\"images/output/malt/malt_metagenomic_mappability.png\" width=\"75%\" height = \"75%\">\n</p>\n\n This can also be influenced by the type of database you supplied — many databases have an over-abundance of taxa of clinical or economic interest, so when you have a large amount of uncharacterised environmental taxa, this may also result in low mappability.\n\n#### Taxonomic assignment success\n\nIn addition to actually being able to align to a given reference sequence, MALT can also allow sequences without a 'taxonomic' ID to be included in a database. Furthermore, it utilises a 'lowest common ancestor' algorithm (LCA), that can result in a read getting no taxonomic identification (because it can align to multiple reference sequences with equal probability). Because of this, MultiQC also produces a bar plot indicating of the successfully aligned reads (see Metagenomic Mappability above), how many could be assigned a taxon ID.\n\n<p align=\"center\">\n  <img src=\"images/output/malt/malt_taxonomic_assignment_success.png\" width=\"75%\" height = \"75%\">\n</p>\n\nFor the same reasons above, you can often get not very many reads being taxonomically assigned when working with aDNA. This can also occur when many of your reads are from conservative regions of genomes and can map onto multiple references. At this point LCA pushes the possible taxon identification so high up the tree, it cannot give a taxonomic assignment.\n\nIf you have multiple samples of a similar level of preservation, but one with unusually low numbers of taxonomically assigned reads, it maybe worth investigating what the alignments look like in case\nthere is some sequencing artefact (although it could just be badly preserved and little DNA).\n\n### Kraken\n\n#### Background\n\nKraken is another metagenomic classifier, but takes a different approach to alignment as with [MALT](#malt). It uses 'K-mer similarity' between reads and references to very efficiently find similar patterns in sequences. It does not however, do alignment — meaning you cannot screen for authentication criteria such as damage patterns and fragment lengths.\n\nIt is useful when you do not have large computing power or you want very rapid but rough approximation of the metagenomic profile of your sample.\n\nYou will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes and sequencing configurations in one value.\n\n#### Top Taxa\n\nThis plot gives you an approximation of the abundance of the five top taxa identified. Typically for ancient DNA, this will be quite a small fraction of taxa, as archaeological and museum samples have a large biodiversity from environmental microbes — therefore a large fraction of 'unclassified' can be quite normal.\n\n<p align=\"center\">\n  <img src=\"images/output/kraken/kraken_top_taxa.png\" width=\"75%\" height = \"75%\">\n</p>\n\nHowever for screening for specific metagenomic profiles, such as ancient microbiomes, if the top taxa are from your specific microbiome of interest (e.g. looking at calculus for oral microbiomes, or paleofaeces for gut microbiome), this can be a good indicator that you have a well preserved sample. But of course, you must do further downstream (manual!) authentication of these taxa to ensure they are not from modern contamination.\n\n### Samtools\n\n#### Background\n\nThis module provides numbers in raw counts of the mapping of your DNA reads to your reference genome.\n\nYou will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes in one value.\n\n#### Flagstat Plot\n\nThis dot plot shows different statistics, and the number of reads (typically as an multiple e.g. million, or thousands), are represented by dots on the X axis.\n\nIn most cases the first two rows, 'Total Reads' and 'Total Passed QC' will be the same as EAGER (v1) does not do quality control of reads with samtools. This number should normally be the same the number of (clipped, and if paired-end, merged) retained reads coming out of AdapterRemoval.\n\nThe third row 'Mapped' represents the number of reads that found a place that could be aligned on your reference genome. This is the raw number of mapped reads, prior PCR duplication.\n\nThe remaining rows will be 0 when running `bwa aln` as these characteristics of the data are not considered by the algorithm by default.\n\n<p align=\"center\">\n  <img src=\"images/output/samtools_flagstat/samtools_flagstat.png\" width=\"80%\" height = \"80%\">\n</p>\n\n> **NB:** The Samtools (pre-samtools filter) plots displayed in the MultiQC report shows mapped reads without mapping quality filtering. This will contain reads that can map to multiple places on your reference genome with equal or slightly less mapping quality score. To see how your reads look after mapping quality, look at the FastQC reports in the Samtools (pre-samtools filter). You should expect after mapping quality filtering, that you will have less reads.\n\n### DeDup\n\nYou will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes of the library in one value.\n\n#### Background\n\nDeDup is a duplicate removal tool which searches for PCR duplicates and removes them from your BAM file. We remove these duplicates because otherwise you would be artificially increasing your coverage and subsequently confidence in genotyping, by considering these lab artefacts which are not biologically meaningful. DeDup looks for reads with the same start and end coordinates, and whether they have exactly the same sequence. The main difference of DeDup versus e.g. `samtools markduplicates` is that DeDup considers _both_ ends of a read, not just the start position, so it is more precise in removing actual duplicates without penalising often already low aDNA data.\n\n#### DeDup Plot\n\nThis stacked bar plot shows as a whole the total number of reads in the BAM file going into DeDup. The different sections of a given bar represents the following:\n\n* **Not Removed** — the overall number of reads remaining after duplicate removal. These may have had a duplicate (see below).\n* **Reverse Removed** — the number of reads that found to be a duplicate of another and removed that were un-collapsed reverse reads (from the earlier read merging step).\n* **Forward Removed** — the number of reads that found to be a duplicate of another and removed that were an un-collapsed forward reads (from the earlier read merging step).\n* **Merged Removed** — the number of reads that were found to be a duplicate and removed that were a collapsed read (from the earlier read merging step).\n  \nExceptions to the above:\n\n* If you do not have paired end data, you will not have sections for 'Merged removed' or 'Reverse removed'.\n* If you use the `--dedup_all_merged` flag, you will not have the 'Forward removed' or 'Reverse removed' sections.\n\n<p align=\"center\">\n  <img src=\"images/output/dedup/dedup_deduplicated_reads.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to look out for:\n\n* The smaller the number of the duplicates removed the better. If you have a small number of duplicates, and wish to sequence deeper, you can use the preseq module (see below) to make an estimate on how much deeper to sequence.\n* If you have a very large number of duplicates that were removed this may suggest you have an over amplified library, or a lot of left-over adapters that were able to map to your genome.\n\n### Picard\n\n#### Background\n\nPicard is a toolkit for general BAM file manipulation with many different functions. nf-core/eager most visibly uses the 'markduplicates' tool, for the removal of exact PCR duplicates that can occur during library amplification and results in false inflated coverages (and overly-confident genotyping calls).\n\n#### Mark Duplicates\n\nThe deduplication stats plot shows you how many reads were detected and then removed during deduplication of a mapped BAM file. Well-preserved and constructed libraries will typically have many unique reads and few duplicates. These libraries are often good candidates for deeper sequencing (if required), but low-endogenous DNA libraries that have been over-amplified will have few unique reads and many copies of each read. For better calculations you can see the [Preseq](#preseq) module below.\n\n<p align=\"center\">\n  <img src=\"images/output/picard/picard_deduplication_stats.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThe amount of unmapped reads will depend on whether you have filtered out unmapped reads out not (see the [usage/running the pipeline](usage.md) documentation for more information.\n\nThings to look out for:\n\n* The smaller the number of the duplicates removed the better. If you have a smaller number of duplicates, and wish to sequence deeper, you can use the preseq module (see below) to make an estimate on how much deeper to sequence.\n* If you have a very large number of duplicates that were removed this may suggest you have an over amplified library, a badly preserved sample with a very low yield, or a lot of left-over adapters that were able to map to your genome.\n\n### Preseq\n\n#### Background\n\nPreseq is a collection of tools that allow assessment of the complexity of the library, where complexity means the number of unique molecules in your library (i.e. not molecules with the exact same length and sequence).\n\nThere are two algorithms from the tools we use: `c_curve` and `lc_extrap`. The former gives you the expected number of unique reads if you were to repeated sequencing but with fewer reads than your first sequencing run. The latter tries to extrapolate the decay in the number of unique reads you would get with re-sequencing but with more reads than your initial sequencing run.\n\nDue to endogenous DNA being so low when doing initial screening, the maths behind `lc_extrap` often fails as there is not enough data. Therefore nf-core/eager sticks with `c_curve` which gives a similar approximation of the library complexity, but is more robust to smaller datasets.\n\nYou will receive output for each deduplicated _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes of the library in one value.\n\n#### Complexity Curve\n\nUsing the de-duplication information from DeDup, the calculated curve (a solid line) allows you to estimate: at this sequencing depth (on the X axis), how many unique molecules would you have sequenced (along the Y axs). When you start getting DNA sequences that are the mostly same as ones you've sequenced before, it is often not cost effective to continue sequencing and is a good point to stop.\n\nThe dashed line represents a 'perfect' library containing only unique molecules and no duplicates. You are looking for your library stay as close to this line as possible. Plateauing of your curve shows that at that point you would not be getting any more unique molecules and you shouldn't sequence further than this.\n\n<p align=\"center\">\n  <img src=\"images/output/preseq/preseq_complexity_curve.png\" width=\"75%\" height = \"75%\">\n</p>\n\nPlateauing can be caused by a number of reasons:\n\n* You have simply sequenced your library to exhaustion\n* You have an over-amplified library with many PCR duplicates. You should consider rebuilding the library to maximise data to cost ratio\n* You have a low quality library made up of mappable sequencing artefacts that were able to pass filtering (e.g. adapters)\n\n### Damage Calculation\n\n#### Background\n\nDamageProfiler and mapDamage are tools that calculate a variety of standard 'aDNA' metrics from a BAM file. The primary plots here are the misincorporation and length distribution plots. Ancient DNA undergoes depurination and hydrolysis, causing fragmentation of molecules into gradually shorter fragments, and cytosine to thymine deamination damage, that occur on the subsequent single-stranded overhangs at the ends of molecules.\n\nTherefore, three main characteristics of ancient DNA are:\n\n* Short DNA fragments\n* Elevated G and As (purines) just before strand breaks\n* Increased C and Ts at ends of fragments\n\nYou will receive output for each deduplicated _library_. This means that if you use TSV input and have one library sequenced over multiple lanes and sequencing types, these are merged and you will get mapping statistics of all lanes of the library in one value.\n  \n#### Misincorporation Plots\n\nThe MultiQC DamageProfiler and mapDamage module misincorporation plots shows the percent frequency (Y axis) of C to T mismatches at 5' read ends and complementary G to A mismatches at the 3' ends. The X axis represents base pairs from the end of the molecule from the given prime end, going into the middle of the molecule i.e. 1st base of molecule, 2nd base of molecule etc until the 14th base pair. The mismatches are when compared to the base of the reference genome at that position.\n\nWhen looking at the misincorporation plots, keep the following in mind:\n\n* As few-base single-stranded overhangs are more likely to occur than long overhangs, we expect to see a gradual decrease in the frequency of the modifications from position 1 to the inside of the reads.\n* If your library has been **partially-UDG treated**, only the first one or two bases will display the misincorporation frequency.\n* If your library has been **UDG treated** you will expect to see extremely-low to no misincorporations at read ends.\n* If your library is **single-stranded**, you will expect to see only C to T misincorporations at both 5' and 3' ends of the fragments.\n* We generally expect that the older the sample, or the less-ideal preservational environment (hot/wet) the greater the frequency of C to T/G to A.\n* The curve will be not smooth then you have few reads informing the frequency calculation. Read counts of less than 500 are likely not reliable.\n* If the `mapdamage_downsample` parameter was specified and mapDamage was used for damage calculation, the damage frequency for each base is based only on the specified number of reads.\n\n<p align=\"center\">\n  <img src=\"images/output/damageprofiler/damageprofiler_deaminationpatterns.png\" width=\"75%\" height = \"75%\">\n</p>\n\n> **NB:** An important difference to note compared to the mapDamage tool, which DamageProfiler is otherwise an exact-re-implementation of, is that the percent frequency on the Y axis is not fixed between 0 and 0.3, and will 'zoom' into small values the less damage there is\n\n#### Length Distribution\n\nThe MultiQC DamageProfiler and mapDamage module length distribution plots show the frequency of read lengths across forward and reverse reads respectively.\n\nWhen looking at the length distribution plots, keep in mind the following:\n\n* Your curves will likely not start at 0, and will start wherever your minimum read-length setting was when removing adapters.\n* You should typically see the bulk of the distribution falling between 40-120bp, which is normal for aDNA\n* You may see large peaks at paired-end turn-arounds, due to very-long reads that could not overlap for merging being present, however this reads are normally from modern contamination.\n* If the `mapdamage_downsample` parameter was specified and mapDamage was used for damage calculation, the length distribution is based only on the specified number of reads.\n\n### QualiMap\n\n#### Background\n\nQualimap is a tool which provides statistics on the quality of the mapping of your reads to your reference genome. It allows you to assess how well covered your reference genome is by your data, both in 'fold' depth (average number of times a given base on the reference is covered by a read) and 'percentage' (the percentage of all bases on the reference genome that is covered at a given fold depth). These outputs allow you to make decision if you have enough quality data for downstream applications like genotyping, and how to adjust the parameters for those tools accordingly.\n\n> NB: Neither fold coverage nor percent coverage on there own is sufficient to assess whether you have a high quality mapping. Abnormally high fold coverages of a smaller region such as highly conserved genes or un-removed-adapter-containing reference genomes can artificially inflate the mean coverage, yet a high percent coverage is not useful if all bases of the genome are covered at just 1x coverage.\n\nNote that many of the statistics from this module are displayed in the General Stats table (see above), as they represent single values that are not plottable.\n\nYou will receive output for each _sample_. This means you will statistics of deduplicated values of all types of libraries combined in a single value (i.e. non-UDG treated, full-UDG, paired-end, single-end all together).\n\n:warning: If your library has no reads mapping to the reference, this will result in an empty BAM file. Qualimap will therefore not produce any output even if a BAM exists!\n\n#### Coverage Histogram\n\nThis plot shows on the Y axis the range of fold coverages that the bases of the reference genome are possibly covered by. The Y axis shows the number of bases that were covered at the given fold coverage depth as indicated on the Y axis.\n\nThe greater the number of bases covered at as high as possible fold coverage, the better.\n\n<p align=\"center\">\n  <img src=\"images/output/qualimap/qualimap_coverage_histogram.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* You will typically see a direct decay from the lowest coverage to higher. A large range of coverages along the X axis is potentially suspicious.\n* If you have stacking of reads i.e. a small region with an abnormally large amount of reads despite the rest of the reference being quite shallowly covered, this will artificially increase your coverage. This would be represented by a small peak that is a much further along the X axis away from the main distribution of reads.\n  \n#### Cumulative Genome Coverage\n\nThis plot shows how much of the genome in percentage (X axis) is covered by a given fold depth coverage (Y axis).\n\nAn ideal plot for this is to see an increasing curve, representing larger greater fractions of the genome being increasingly covered at higher depth. However, for low-coverage ancient DNA data, you will be more likely to see decreasing curves starting at a large percentage of the genome being covered at 0 fold coverage — something particular true for large genomes such as for humans.\n\n<p align=\"center\">\n  <img src=\"images/output/qualimap/qualimap_cumulative_genome_coverage.png\" width=\"75%\" height = \"75%\">\n</p>\n\n#### GC Content Distribution\n\nThis plot shows the distribution of the frequency of reads at different GC contents. The X axis represents the GC content (i.e the percentage of Gs and Cs nucleotides in a given read), the Y axis represents the frequency.\n\n<p align=\"center\">\n  <img src=\"images/output/qualimap/qualimap_gc_content_distribution.png\" width=\"75%\" height = \"75%\">\n</p>\n\nThings to watch out for:\n\n* This plot should normally show a normal distribution around the average GC content of your reference genome.\n* Bimodal peaks may represent lab-based artefacts that should be further investigated.\n* Skews of the peak to a higher GC content that the reference in Illumina dual-colour chemistry data (e.g. NextSeq or NovaSeq), may suggest long poly-G tails that are mapping to poly-G stretches of your genome. The nf-core/eager trimming option `--complexity_filter_poly_g` can be used to remove these tails by utilising the tool FastP for detection and trimming.\n\n### Sex.DetERRmine\n\n#### Background\n\nSex.DetERRmine calculates the coverage of your mapped reads on the X and Y chromosomes relative to the coverage on the autosomes (X-/Y-rate). This metric can be thought of as the number of copies of chromosomes X and Y that is found within each cell, relative to the autosomal copies. The number of autosomal copies is assumed to be two, meaning that an X-rate of `1.0` means there are two X chromosomes in each cell, while `0.5` means there is a single copy of the X chromosome per cell. Human females have two copies of the X chromosome and no Y chromosome (XX), while human males have one copy of each of the X and Y chromosomes (XY).\n\nWhen a bedfile of specific sites is provided, Sex.DetERRmine additionally calculates error bars around each relative coverage estimate. For this estimate to be trustworthy, the sites included in the bedfile should be spaced apart enough that a single sequencing read cannot overlap multiple sites. Hence, when a bedfile has not been provided, this error should be ignored. When a suitable bedfile is provided, each observation of a covered site is independent, and the error around the coverage is equal to the binomial error estimate. This error is then propagated during the calculation of relative coverage for the X and Y chromosomes.\n\n> Note that in nf-core/eager this will be run on single- and double-stranded variants of the same library _separately_. This can also help assess for differential contamination between libraries.\n\n#### Relative Coverage\n\nTheoretically, males are expected to cluster around (0.5, 0.5) in the produced scatter plot, while females are expected to cluster around (1.0, 0.0). In practice, when analysing ancient DNA, these relative coverage on both axes is slightly lower than expected, and individuals can cluster around (0.45, 0.45) and (0.85, 0.05). As the number of covered sites for an individual gets smaller, the confidence on the estimate becomes lower, because it is increasingly more likely to be affected by randomness in the preservation and sequencing of ancient DNA.\nPlacement of individuals between the male and female clusters can be indicative of low coverage and in some cases contamination, when the contaminant and sampled individuals are of opposite biological sex.\nAneuploidy of the sex chromosomes can also be identified with this approach when the placement of an individual in the scatter plot is unexpected. For example, placement of an individual around (1.0, 0.5) despite good genomic coverage is indicative of XXY karyotype (Klinefelter syndrome), while placement around (0.5, 0) could be indicative of karyotype X (Turner's syndrome).\n\n<p align=\"center\">\n  <img src=\"images/output/sexdeterrmine/sexdeterrmine_relative_coverage.png\" width=\"75%\" height = \"75%\">\n</p>\n\n#### Read Counts\n\nThis plot gives you the number of reads mapped onto the autosomes, X or Y chromosomes. When the total number of mapped reads is low, the estimates are more likely to be dominated by random effects, and hence untrustworthy.\nFor well-covered data without any skews, you should see long bars that are comprised mostly of autosomal reads. The edge of the bars in female individuals should be mostly X (some small amounts of Y reads are expected and are usually caused by random mapping on the Y chromosome). In males, the number of X-reads will still be higher (since the X chromosome is longer), but the Y reads should be clearly visible on the rightmost end of the bars. The ratio between the number of sites in each bin should roughly correlate with the difference in length in base pairs of each chromosome type.\nIf this correlation is not observed, your data is skewed towards higher coverage on some chromosomes. This can be expected if you have enriched for a specific set of markers (e.g. Y-chromosome capture), or if the number of reads is too low.\n\n<p align=\"center\">\n  <img src=\"images/output/sexdeterrmine/sexdeterrmine_read_counts.png\" width=\"75%\" height = \"75%\">\n</p>\n\n### Bcftools\n\n### Background\n\nBcftools is a toolkit for processing and summarising of VCF files, i.e. variant call format files. nf-core/eager currently uses bcftools for the `stats` functionality. This summarises in a text file a range of statistics about VCF files, produced by GATK and FreeBayes variant callers.\n\n#### Variant Substitution Types\n\nThis stack bar plot shows you the distribution of all types of point-mutation variants away from the reference nucleotide at each position, (e.g. A>C, A>G etc.).\n\nFor low-coverage non-UDG treated, non-trimmed nor re-scaled aDNA data, you expect to see a C>T substitutions as the largest category, due to the most common ancient DNA damage being C to T deamination.\n\n#### Variant Quality\n\nThis gives you the distribution of variant-call _qualities_ in your VCF files. Each variant will get given a 'Phred-scale' like value that represents the confidence of the variant caller that it has made the right call. The scale is very similar to that of base-call values in FASTQ files (as assessed by FastQC). Distributions that have peaks at higher variant quality scores (>= 30) suggest more confident variant calls. However, in cases of low-coverage aDNA data, these distributions may not be so good.\n\nMore detailed explanation of variant quality scores can be seen in the Broad Institute's [GATK documentation](https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores).\n\n#### Indel Distribution\n\nThis plot shows you the distribution of the sizes of insertion- and deletions (InDels) in the variant calling (assuming you configured your variant caller parameters to do so). Low-coverage aDNA data often will not have high enough coverage to accurately assess InDels. In cases of high-coverage data of small-genomes such as microbes, large numbers of InDels, however, may indicate your reads are actually from a _relative_ of the reference mapped to - and should be verified downstream.\n\n#### Variant depths\n\nThis plot shows the distribution of depth coverages of each variant called. Typically higher coverage will result in higher quality variant calls (see Variant Quality, above), however in many cases in aDNA these may be low and unequally distributed (due to uneven mapping coverage from contamination).\n\n### MultiVCFAnalyzer\n\n#### Background\n\nMultiVCFanalyzer is a SNP alignment generation tool, that allows further evaluation and filtering of SNP calls made by the GATK UnifiedGenotyper. More specifically it takes one or more VCF files as well as a reference genome, and will allow filtering of SNPs via a variety of metrics and produces a FASTA file with each sample as an entry containing 'consensus calls' at each position.\n\n#### Summary metrics\n\nThis table shows the contents of the `snpStatistics.tsv` file produced by MultiVCFAnalyzer. Descriptions of each column can be seen at the MultiVCFAnalyzer page [here](https://github.com/alexherbig/MultiVCFAnalyzer#snpstatisticstsv).\n\n#### Call statistics barplot\n\nYou can get different variants of the call statistics bar plot, depending on how you configured  the MultiVCFAnalyzer options.\n\nIf you ran with `--min_allele_freq_hom` and `--min_allele_freq_het` set to two different values (left panel A in the figure below), this allows you to assess the number of multi-allelic positions that were called in your genome. Typically MultiVCFAnalyzer is used for analysing smallish haploid genomes (such as mitochondrial or bacterial genomes), therefore a position with multiple possible 'alleles' suggests some form of cross-mapping from other taxa or presence of multiple strains. If this is the case, you will need to be careful with downstream analysis of the consensus sequence (e.g. for phylogenetic tree analysis) as you may accidentally pick up SNPs from other taxa/strains — particularly when dealing with low coverage data. Therefore if you have a high level of 'het' values (see image), you should carefully check your alignments manually to see how clean your genomes are, or whether you can do some form of strain separation (e.g. by majority/minority calling).\n\n<p align=\"center\">\n  <img src=\"images/output/multivcfanalyzer/multivcfanalyzer_call_categories.png\" width=\"75%\" height = \"75%\">\n</p>\n\nIf you ran with `--min_allele_freq_hom` and `--min_allele_freq_het` set to the same value, you will have three sections to your bars: SNP Calls (hom), Reference Calls and Discard SNP Call. The overall size of the bars will generally depend on the percentage of the genome covered, meaning less well preserved samples will likely have smaller bars than well-preserved samples. Typically you wish to have a low number of discarded SNP calls, but this can be quite high when you have low coverage data (as many positions may not reach that threshold). The number of SNP calls to reference calls can vary depending on the mutation rate of your target organism.\n\n## Output Files\n\nThis section gives a brief summary of where to look for what files for downstream analysis. This covers _all_ modules.\n\nEach module has it's own output directory which sit alongside the `MultiQC/` directory from which you opened the report.\n\n* `reference_genome/`: this directory contains the indexing files  of your input reference genome (i.e. the various `bwa` indices, a `samtools`' `.fai` file, and a picard `.dict`), if you used the `--saveReference` flag.\n  * When masking of the reference is requested prior to running pmdtools, an additional directory `reference_genome/masked_genome` will be found here, containing the masked reference.\n* `fastqc/`: this contains the original per-FASTQ FastQC reports that are summarised with MultiQC. These occur in both `html` (the report) and `.zip` format (raw data). The `after_clipping` folder contains the same but for after AdapterRemoval.\n* `adapterremoval/`: this contains the log files (ending with `.settings`) with raw trimming (and merging) statistics after AdapterRemoval. In the `output` sub-directory, are the output trimmed (and merged) `fastq` files. These you can use for downstream applications such as taxonomic binning for metagenomic studies.\n* `lanemerging/`: this contains adapter-trimmed and merged (i.e. collapsed) FASTQ files that were merged across lanes, where applicable. These files are the reads that go into mapping (when multiple lanes were specified for a library), and can be used for downstream applications such as taxonomic binning for metagenomic studies.\n* `post_ar_fastq_trimmed`: this contains `fastq` files that have been additionally trimmed after AdapterRemoval (if turned on). These reads are usually that had internal barcodes, or damage that needed to be removed before mapping.\n* `mapping/`: this contains a sub-directory corresponding to the mapping tool you used, inside of which will be the initial BAM files containing the reads that mapped to your reference genome with no modification (see below). You will also find a corresponding BAM index file (ending in `.csi` or `.bai`), and if running the `bowtie2` mapper: a log ending in `_bt2.log`. You can use these for downstream applications e.g. if you wish to use a different de-duplication tool not included in nf-core/eager (although please feel free to add a new module request on the Github repository's [issue page](https://github.com/nf-core/eager/issues)!).\n* `samtools/`: this contains two sub-directories. `stats/` contain the raw mapping statistics files (ending in `.stats`) from directly after mapping. `filter/` contains BAM files that have had a mapping quality filter applied (set by the `--bam_mapping_quality_threshold` flag) and a corresponding index file. Furthermore, if you selected `--bam_discard_unmapped`, you will find your separate file with only unmapped reads in the format you selected. Note unmapped read BAM files will _not_ have an index file.\n* `deduplication/`: this contains a sub-directory called `dedup/`, inside here are sample specific directories. Each directory contains a BAM file containing mapped reads but with PCR duplicates removed, a corresponding index file and two stats file. `.hist.` contains raw data for a deduplication histogram used for tools like preseq (see below), and the `.log` contains overall summary deduplication statistics.\n* `endorSpy/`: this contains all JSON files exported from the endorSpy endogenous DNA calculation tool. The JSON files are generated specifically for display in the MultiQC general statistics table and is otherwise very likely not useful for you.\n* `preseq/`: this contains a `.preseq` file for every BAM file that had enough deduplication statistics to generate a complexity curve for estimating the amount unique reads that will be yield if the library is re-sequenced. You can use this file for plotting e.g. in `R` to find your sequencing target depth.\n* `qualimap/`: this contains a sub-directory for every sample, which includes a qualimap report and associated raw statistic files. You can open the `.html` file in your internet browser to see the in-depth report (this will be more detailed than in MultiQC). This includes stuff like percent coverage, depth coverage, GC content and so on of your mapped reads.\n* `damageprofiler/`: this contains sample specific directories containing raw statistics and damage plots from DamageProfiler. The `.pdf` files can be used to visualise C to T miscoding lesions or read length distributions of your mapped reads. All raw statistics used for the PDF plots are contained in the `.txt` files.\n* `mapdamage/`: this contains sample specific directories containing raw statistics and damage plots from mapDamage. The `.pdf` files can be used to visualise C to T miscoding lesions or read length distributions of your mapped reads. All raw statistics used for the PDF plots are contained in the `.txt` files. The `Runtime_log.txt` file contains runtime information.\n* `pmdtools/`: this contains raw output statistics of pmdtools (estimates of frequencies of substitutions), and BAM files which have been filtered to remove reads that do not have a Post-mortem damage (PMD) score of `--pmdtools_threshold`.\n* `trimmed_bam/`: this contains the BAM files with X number of bases trimmed off as defined with the `--bamutils_clip_half_udg_left`, `--bamutils_clip_half_udg_right`, `--bamutils_clip_none_udg_left`, and `--bamutils_clip_none_udg_right` flags and corresponding index files. You can use these BAM files for downstream analysis such as re-mapping data with more stringent parameters (if you set trimming to remove the most likely places containing damage in the read).\n* `damage_rescaling/`: this contains rescaled BAM files from mapDamage. These BAM files have damage probabilistically removed via a bayesian model, and can be used for downstream genotyping.\n* `genotyping/`: this contains all the (gzipped) genotyping files produced by your genotyping module. The file suffix will have the genotyping tool name. You will have files corresponding to each of your deduplicated BAM files (except pileupcaller), or any turned-on downstream processes that create BAMs (e.g. trimmed bams or pmd tools). If `--gatk_ug_keep_realign_bam` supplied, this may also contain BAM files from InDel realignment when using GATK 3 and UnifiedGenotyping for variant calling. When pileupcaller is used to create eigenstrat genotypes, this directory also contains eigenstrat SNP coverage statistics.\n* `multivcfanalyzer/`: this contains all output from MultiVCFAnalyzer, including SNP calling statistics, various SNP table(s) and FASTA alignment files.\n* `sex_determination/`: this contains the output for the sex determination run. This is a single `.tsv` file that includes a table with the sample name, the number of autosomal SNPs, number of SNPs on the X/Y chromosome, the number of reads mapping to the autosomes, the number of reads mapping to the X/Y chromosome, the relative coverage on the X/Y chromosomes, and the standard error associated with the relative coverages. These measures are provided for each bam file, one row per file. If the `sexdeterrmine_bedfile` option has not been provided, the error bars cannot be trusted, and runtime will be considerably longer.\n* `nuclear_contamination/`: this contains the output of the nuclear contamination processes. The directory contains one `*.X.contamination.out` file per individual, as well as `nuclear_contamination.txt` which is a summary table of the results for all individual. `nuclear_contamination.txt` contains a header, followed by one line per individual, comprised of the Method of Moments (MOM) and Maximum Likelihood (ML) contamination estimate (with their respective standard errors) for both Method1 and Method2.\n* `bedtools/`: this contains two files as the output from bedtools coverage. One file contains the 'breadth' coverage (`*.breadth.gz`). This file will have the contents of your annotation file (e.g. BED/GFF), and the following subsequent columns: no. reads on feature, # bases at depth, length of feature, and % of feature. The second file (`*.depth.gz`), contains the contents of your annotation file (e.g. BED/GFF), and an additional column which is mean depth coverage (i.e. average number of reads covering each position).\n* `metagenomic_complexity_filter`: this contains the output from filtering of input reads to metagenomic classification of low-sequence complexity reads as performed by `bbduk`. This will include the filtered FASTQ files (`*_lowcomplexityremoved.fq.gz`) and also the run-time log (`_bbduk.stats`) for each sample. **Note:** there are no sections in the MultiQC report for this module, therefore you must check the `._bbduk.stats` files to get summary statistics of the filtering.\n* `metagenomic_classification/`: this contains the output for a given metagenomic classifier.\n  * Running MALT will contain RMA6 files that can be loaded into MEGAN6 or MaltExtract for phylogenetic visualisation of read taxonomic assignments and aDNA characteristics respectively. Additional a `malt.log` file is provided which gives additional information such as run-time, memory usage and per-sample statistics of numbers of alignments with taxonomic assignment etc. This will also include gzip SAM files if requested.\n  * Running kraken will contain the Kraken output and report files, as well as a merged Taxon count table. You will also get a Kraken kmer duplication table, in a [KrakenUniq](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1568-0) fashion. This is very useful to check for breadth of coverage and detect read stacking. A small number of aligned reads (low coverage) and a kmer duplication >1 is usually a sign of read stacking, usually indicative of a false positive hit (e.g. from over-amplified libraries). _Kmer duplication is defined as: number of kmers / number of unique kmers_. You will find two kraken reports formats available:  \n    * the `*.kreport` which is the old report format, without distinct minimizer count information, used by some tools such as [Pavian](https://github.com/fbreitwieser/pavian)\n    * the `*.kraken2_report` which is the new kraken report format, with the distinct minimizer count information.  \n    * finally, the `*.kraken.out` file are the direct output of Kraken2\n    * ⚠️ If your sample has no hits, no kraken output files will be created for that sample!\n* `maltextract/`: this contains a `results` directory in which contains the output from MaltExtract - typically one folder for each filter type, an error and a log file. The characteristics of each node (e.g. damage, read lengths, edit distances - each in different txt formats) can be seen in each sub-folder of the filter folders. Output can be visualised either with the [HOPS postprocessing script](https://github.com/rhuebler/HOPS) or [MEx-IPA](https://github.com/jfy133/MEx-IPA)\n* `consensus_sequence/`: this contains three FASTA files from VCF2Genome of a consensus sequence based on the reference FASTA with each sample's unique modifications. The main FASTA is a standard file with bases not passing the specified thresholds as Ns. The two other FASTAS (`_refmod.fasta.gz`) and (`_uncertainity.fasta.gz`) are IUPAC uncertainty codes (rather than Ns) and a special number-based uncertainty system used for other downstream tools, respectively.\n   `merged_bams/initial`: these contain the BAM files that would go into UDG-treatment specific BAM trimming. All libraries of the sample sample, **and** same UDG-treatment type will be in these BAM files.\n* `merged_bams/additional`: these contain the final BAM files that would go into genotyping (if genotyping is turned on). This means the files will contain all libraries of a given sample (including trimmed non-UDG or half-UDG treated libraries, if BAM trimming turned on)\n* `bcftools`: this currently contains a single directory called `stats/` that includes general statistics on variant callers producing VCF files as output by `bcftools stats`. These includethings such as the number of positions, number of transititions/transversions and depth coverage of SNPs etc. These are only produced if `--run_bcftools_stats` is supplied.\n"
  },
  {
    "path": "docs/usage.md",
    "content": "# nf-core/eager: Usage\n\n## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/eager/usage](https://nf-co.re/eager/usage)\n\n> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._\n\n## Introduction\n\n## Running the pipeline\n\n### Quick Start\n\n> Before you start you should change into the output directory you wish your\n> results to go in. This will guarantee, that when you start the Nextflow job,\n> it will place all the log files and 'working' folders in the corresponding\n> output directory, (and not wherever else you may have executed the run from)\n\nThe typical command for running the pipeline is as follows:\n\n```bash\nnextflow run nf-core/eager --input '*_R{1,2}.fastq.gz' --fasta 'some.fasta' -profile standard,docker\n```\n\nwhere the reads are from FASTQ files of the same pairing.\n\nThis will launch the pipeline with the `docker` configuration profile. See below\nfor more information about profiles.\n\nNote that the pipeline will create the following files in your working\ndirectory:\n\n```bash\nwork            # Directory containing the Nextflow working files\nresults         # Finished results (configurable, see below)\n.nextflow.log   # Log file from Nextflow\n                # Other Nextflow hidden files, eg. history of pipeline runs and old logs.\n```\n\nTo see the the nf-core/eager pipeline help message run: `nextflow run\nnf-core/eager --help`\n\nIf you want to configure your pipeline interactively using a graphical user\ninterface, please visit [nf-co.re\nlaunch](https://nf-co.re/launch?pipeline=eager). Select the `eager` pipeline and\nthe version you intend to run, and follow the on-screen instructions to create a\nconfig for your pipeline run.\n\n### Updating the pipeline\n\nWhen you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:\n\n```bash\nnextflow pull nf-core/eager\n```\n\n### Reproducibility\n\nIt's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.\n\nFirst, go to the [nf-core/eager releases page](https://github.com/nf-core/eager/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`.\n\nThis version number will be logged in reports when you run the pipeline, so that\nyou'll know what you used when you look back in the future.\n\nAdditionally, nf-core/eager pipeline releases are named after Swabian German\nCities. The first release V2.0 is named \"Kaufbeuren\". Future releases are named\nafter cities named in the [Swabian league of\nCities](https://en.wikipedia.org/wiki/Swabian_League_of_Cities).\n\n### Automatic Resubmission\n\nBy default, if a pipeline step fails, nf-core/eager will resubmit the job with\ntwice the amount of CPU and memory. This will occur two times before failing.\n\n## Core Nextflow arguments\n\n> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline\n> parameters use a double-hyphen).\n\n### `-profile`\n\nUse this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.\n\nSeveral generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below.\n\n> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.\n\nThe pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).\n\nNote that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!\nThey are loaded in sequence, so later profiles can overwrite earlier profiles.\n\nIf `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended.\n\n* `docker`\n  * A generic configuration profile to be used with [Docker](https://docker.com/)\n  * Pulls software from Docker Hub: [`nfcore/eager`](https://hub.docker.com/r/nfcore/eager/)\n* `singularity`\n  * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)\n  * Pulls software from Docker Hub: [`nfcore/eager`](https://hub.docker.com/r/nfcore/eager/)\n* `podman`\n  * A generic configuration profile to be used with [Podman](https://podman.io/)\n  * Pulls software from Docker Hub: [`nfcore/eager`](https://hub.docker.com/r/nfcore/eager/)\n* `shifter`\n  * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)\n  * Pulls software from Docker Hub: [`nfcore/eager`](https://hub.docker.com/r/nfcore/eager/)\n* `charliecloud`\n  * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)\n  * Pulls software from Docker Hub: [`nfcore/eager`](https://hub.docker.com/r/nfcore/eager/)\n* `conda`\n  * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud.\n  * A generic configuration profile to be used with [Conda](https://conda.io/docs/)\n  * Pulls most software from [Bioconda](https://bioconda.github.io/)\n* `test`\n  * A profile with a complete configuration for automated testing\n  * Includes links to test data so needs no other parameters\n\n> _Important_: If running nf-core/eager on a cluster - ask your system\n> administrator what profile to use.\n\n**Institution Specific Profiles** These are profiles specific to certain **HPC\nclusters**, and are centrally maintained at\n[nf-core/configs](https://github.com/nf-core/configs). Those listed below are\nregular users of nf-core/eager, if you don't see your own institution here check\nthe [nf-core/configs](https://github.com/nf-core/configs) repository.\n\n* `uzh`\n  * A profile for the University of Zurich Research Cloud\n  * Loads Singularity and defines appropriate resources for running the\n    pipeline.\n* `binac`\n  * A profile for the BinAC cluster at the University of Tuebingen 0 Loads\n    Singularity and defines appropriate resources for running the pipeline\n* `shh`\n  * A profile for the S/CDAG cluster at the Department of Archaeogenetics of\n    the Max Planck Institute for the Science of Human History\n  * Loads Singularity and defines appropriate resources for running the pipeline\n\n**Pipeline Specific Institution Profiles** There are also pipeline-specific\ninstitution profiles. I.e., we can also offer a profile which sets special\nresource settings to specific steps of the pipeline, which may not apply to all\npipelines. This can be seen at\n[nf-core/configs](https://github.com/nf-core/configs) under\n[conf/pipelines/eager/](https://github.com/nf-core/configs/tree/master/conf/pipeline/eager).\n\nWe currently offer a nf-core/eager specific profile for\n\n* `shh`\n  * A profiler for the S/CDAG cluster at the Department of Archaeogenetics of\n    the Max Planck Institute for the Science of Human History\n  * In addition to the nf-core wide profile, this also sets the MALT resources\n    to match our commonly used databases\n\nFurther institutions can be added at\n[nf-core/configs](https://github.com/nf-core/configs). Please ask the eager\ndevelopers to add your institution to the list above, if you add one!\n\nIf you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.\n\nIf you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs).\n\n### `-resume`\n\nSpecify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.\n\nYou can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.\n\n### `-c`\n\nSpecify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information.\n\n#### Custom resource requests\n\nEach step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.\n\nWhilst these default requirements will hopefully work for most people with most\ndata, you may find that you want to customise the compute resources that the\npipeline requests. You can do this by creating a custom config file. For\nexample, to give the workflow process `star` 32GB of memory, you could use the\nfollowing config:\n\n```nextflow\nprocess {\n  withName: bwa {\n    memory = 32.GB\n  }\n}\n```\n\nTo find the exact name of a process you wish to modify the compute resources, check the live-status of a nextflow run displayed on your terminal or check the nextflow error for a line like so: `Error executing process > 'bwa'`. In this case the name to specify in the custom config file is `bwa`.\n\nSee the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information.\n\nIf you are likely to be running `nf-core` pipelines regularly it may be a good\nidea to request that your custom config file is uploaded to the\n`nf-core/configs` git repository. Before you do this please can you test that\nthe config file works with your pipeline of choice using the `-c` parameter (see\ndefinition below). You can then create a pull request to the `nf-core/configs`\nrepository with the addition of your config file, associated documentation file\n(see examples in\n[`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)),\nand amending\n[`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config)\nto include your custom profile.\n\nIf you have any questions or issues please send us a message on\n[Slack](https://nf-co.re/join/slack) on the [`#configs`\nchannel](https://nfcore.slack.com/channels/configs).\n\n#### `-name`\n\nName for the pipeline run. If not specified, Nextflow will automatically\ngenerate a random mnemonic.\n\nThis is used in the MultiQC report (if not default) and in the summary HTML /\ne-mail (always).\n\n**NB:** Single hyphen (core Nextflow option)\n\n### Running in the background\n\nNextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.\n\nThe Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.\n\nAlternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.\nSome HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).\n\nTo create a screen session:\n\n```bash\nscreen -R nf-core/eager\n```\n\nTo disconnect, press `ctrl+a` then `d`.\n\nTo reconnect, type:\n\n```bash\nscreen -r nf-core/eager\n```\n\nto end the screen session while in it type `exit`.\n\n#### Nextflow memory requirements\n\nIn some cases, the Nextflow Java virtual machines can start to request a large amount of memory.\nWe recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):\n\n```bash\nNXF_OPTS='-Xms1g -Xmx4g'\n```\n\n## Input Specifications\n\nThere are two possible ways of supplying input sequencing data to nf-core/eager. The most efficient but more simplistic is supplying direct paths (with wildcards) to your FASTQ or BAM files, with each file or pair being considered a single library and each one run independently. TSV input requires creation of an extra file by the user and extra metadata, but allows more powerful lane and library merging.\n\n### Direct Input Method\n\nThis method is where you specify with `--input`, the path locations of FASTQ (optionally gzipped) or BAM file(s). This option is mutually exclusive to the [TSV input method](#tsv-input-method), which is used for more complex input configurations such as lane and library merging.\n\nWhen using the direct method of `--input` you can specify one or multiple samples in one or more directories files. File names **must be unique**, even if in different directories.  \n\nBy default, the pipeline _assumes_ you have paired-end data. If you want to run single-end data you must specify [`--single_end`]('#single_end')\n\nFor example, for a single set of FASTQs, or multiple paired-end FASTQ files in one directory, you can specify:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq.gz'\n```\n\nIf you have multiple files in different directories, you can use additional wildcards (`*`) e.g.:\n\n```bash\n--input 'path/to/data/*/sample_*_{1,2}.fastq.gz'\n```\n\n> :warning: It is not possible to run a mixture of single-end and paired-end files in one run with the paths `--input` method! Please see the [TSV input method](#tsv-input-method) for possibilities.\n\n**Please note** the following requirements:\n\n1. Valid file extensions: `.fastq.gz`, `.fastq`, `.fq.gz`, `.fq`, `.bam`.\n2. The path **must** be enclosed in quotes\n3. The path must have at least one `*` wildcard character\n4. When using the pipeline with **paired end data**, the path must use `{1,2}`\n   notation to specify read pairs.\n5. Files names must be unique, having files with the same name, but in different directories is _not_ sufficient\n   * This can happen when a library has been sequenced across two sequencers on the same lane. Either rename the file, try a symlink with a unique name, or merge the two FASTQ files prior input.\n6. Due to limitations of downstream tools (e.g. FastQC), sample IDs may be truncated after the first `.` in the name, Ensure file names are unique prior to this!\n7. For input BAM files you should provide a small decoy reference genome with pre-made indices, e.g. the human mtDNA or phiX genome, for the mandatory parameter `--fasta` in order to avoid long computational time for generating the index files of the reference genome, even if you do not actually need a reference genome for any downstream analyses.\n\n### TSV Input Method\n\nAlternatively to the [direct input method](#direct-input-method), you can supply to `--input` a path to a TSV file that contains paths to FASTQ/BAM files and additional metadata. This allows for more complex procedures such as merging of sequencing data across lanes, sequencing runs, sequencing configuration types, and samples.\n\n<p align=\"center\">\n  <img src=\"https://github.com/nf-core/eager/raw/master/docs/images/usage/merging_files.png\" alt=\"Schematic diagram indicating merging points of different types of libraries, given a TSV input. Dashed boxes are optional library-specific processes\" width=\"70%\">\n</p>\n\n> Only different libraries from a single sample that have been BAM trimmed will be merged together. Rescaled or PMD filtered libraries will not be merged prior genotyping as each library _may_ have a different model applied to it and have their own biases (i.e. users may need to play around with settings to get the damage-removal optimal).\n\nThe use of the TSV `--input` method is recommended when performing more complex procedures such as lane or library merging. You do not need to specify `--single_end`, `--bam`, `--colour_chemistry`, `-udg_type` etc. when using TSV input - this is defined within the TSV file itself. You can only supply a single TSV per run (i.e. `--input '*.tsv'` will not work).\n\nThis TSV should look like the following:\n\n| Sample_Name | Library_ID | Lane | Colour_Chemistry | SeqType | Organism | Strandedness | UDG_Treatment | R1 | R2 | BAM |\n|-------------|------------|------|------------------|--------|----------|--------------|---------------|----|----|-----|\n| JK2782      | JK2782     | 1    | 4                | PE      | Mammoth  | double       | full          | [https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz](https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz) | [https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R2_001.fastq.gz.tengrand.fq.gz](https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2782_TGGCCGATCAACGA_L008_R2_001.fastq.gz.tengrand.fq.gz) | NA  |\n| JK2802      | JK2802     | 2    | 2                | SE      | Mammoth  | double       | full          | [https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2802_AGAATAACCTACCA_L008_R1_001.fastq.gz.tengrand.fq.gz](https://github.com/nf-core/test-datasets/raw/eager/testdata/Mammoth/fastq/JK2802_AGAATAACCTACCA_L008_R1_001.fastq.gz.tengrand.fq.gz) | NA | NA  |\n\nA template can be taken from\n[here](https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/TSV_template.tsv).\n\n> :warning: Cells **must not** contain spaces before or after strings, as this will make the TSV unreadable by nextflow. Strings containing spaces should be wrapped in quotes.\n\nWhen using TSV_input, nf-core/eager will merge FASTQ files of libraries with the same `Library_ID` but different `Lanes` values after adapter clipping (and merging), assuming all other metadata columns are the same. If you have the same `Library_ID` but with different `SeqType`, this will be merged directly after mapping prior BAM filtering. Finally, it will also merge BAM files with the same `Sample_ID` but different `Library_ID` after duplicate removal, but prior to genotyping. Please see caveats to this below.\n\nColumn descriptions are as follows:\n\n* **Sample_Name:** A text string containing the name of a given sample of which there can be multiple libraries. All libraries with the same sample name and same SeqType will be merged after deduplication.\n* **Library_ID:** A text string containing a given library, which there can be multiple sequencing lanes (with the same SeqType).\n* **Lane:** A number indicating which lane the library was sequenced on. Files from the libraries sequenced on different lanes (and different SeqType) will be concatenated after read clipping and merging.\n* **Colour Chemistry** A number indicating whether the Illumina sequencer the library was sequenced on was a 2 (e.g. Next/NovaSeq) or 4 (Hi/MiSeq) colour chemistry machine. This informs whether poly-G trimming (if turned on) should be performed.\n* **SeqType:** A text string of either 'PE' or 'SE', specifying paired end (with both an R1 [or forward] and R2 [or reverse]) and single end data (only R1 [forward], or BAM). This will affect lane merging if different per library.\n* **Organism:** A text string of the organism name of the sample or 'NA'. This currently has no functionality and can be set to 'NA', but will affect lane/library merging if different per library\n* **Strandedness:** A text string indicating whether the library type is'single' or 'double'. This will affect lane/library merging if different per library.\n* **UDG_Treatment:** A text string indicating whether the library was generated with UDG treatment - either 'full', 'half' or 'none'. Will affect lane/library merging if different per library.\n* **R1:** A text string of a file path pointing to a forward or R1 FASTQ file. This can be used with the R2 column. File names **must be unique**, even if they are in different directories.\n* **R2:** A text string of a file path pointing to a reverse or R2 FASTQ file, or 'NA' when single end data. This can be used with the R1 column. File names **must be unique**, even if they are in different directories.\n* **BAM:** A text string of a file path pointing to a BAM file, or 'NA'. Cannot be specified at the same time as R1 or R2, both of which should be set to 'NA'\n\nFor example, the following TSV table:\n\n| Sample_Name | Library_ID | Lane | Colour_Chemistry | SeqType | Organism | Strandedness | UDG_Treatment | R1                                                             | R2                                                             | BAM |\n|-------------|------------|------|------------------|---------|----------|--------------|---------------|----------------------------------------------------------------|----------------------------------------------------------------|-----|\n| JK2782      | JK2782     | 7    | 4                | PE      | Mammoth  | double       | full          | data/JK2782_TGGCCGATCAACGA_L007_R1_001.fastq.gz.tengrand.fq.gz | data/JK2782_TGGCCGATCAACGA_L007_R2_001.fastq.gz.tengrand.fq.gz | NA  |\n| JK2782      | JK2782     | 8    | 4                | PE      | Mammoth  | double       | full          | data/JK2782_TGGCCGATCAACGA_L008_R1_001.fastq.gz.tengrand.fq.gz | data/JK2782_TGGCCGATCAACGA_L008_R2_001.fastq.gz.tengrand.fq.gz | NA  |\n| JK2802      | JK2802     | 7    | 4                | PE      | Mammoth  | double       | full          | data/JK2802_AGAATAACCTACCA_L007_R1_001.fastq.gz.tengrand.fq.gz | data/JK2802_AGAATAACCTACCA_L007_R2_001.fastq.gz.tengrand.fq.gz | NA  |\n| JK2802      | JK2802     | 8    | 4                | SE      | Mammoth  | double       | full          | data/JK2802_AGAATAACCTACCA_L008_R1_001.fastq.gz.tengrand.fq.gz | NA                                                             | NA  |\n\nwill have the following effects:\n\n* After AdapterRemoval, and prior to mapping, FASTQ files from lane 7 and lane 8 _with the same `SeqType`_ (and all other _metadata_ columns) will be concatenated together for each **Library**.\n* After mapping, and prior BAM filtering, BAM files with different `SeqType` (but with all other metadata columns the same) will be merged together for each **Library**.\n* After duplicate removal, BAM files with different `Library_ID`s but with the same  `Sample_Name` and the same `UDG_Treatment` will be merged together.\n* If BAM trimming is turned on, all post-trimming BAMs (i.e. non-UDG and half-UDG ) will be merged with UDG-treated (untreated) BAMs, if they have the same `Sample_Name`.\n\nNote the following important points and limitations for setting up:\n\n* The TSV must use actual tabs (not spaces) between cells.\n* The input FASTQ filenames are discarded after FastQC, all other downstream results files are based on `Sample_Name`, `Library_ID` and `Lane` columns for filenames.\n* _File_ names must be unique regardless of file path, due to risk of over-writing (see: [https://github.com/nextflow-io/nextflow/issues/470](https://github.com/nextflow-io/nextflow/issues/470)).\n  * At different stages of the merging process, (as above) nf-core/eager will use as output filenames the information from the `Sample_Name`, `Library_ID` and/or `Lane` columns for filenames.\n  * Library_IDs must be unique (other than if they are spread across multiple lanes). For example, your .tsv file must not have rows with both the strings in the Library_ID column as `Library1` and `Library1`, for **both** `SampleA` and `SampleB` in the Sample_ID column, otherwise the two `Library1.fq.gz` files may result in a filename collision.\n  * If it is 'too late' and you already have duplicated FASTQ file names before starting a run, a workaround is to concatenate the FASTQ files together and supply this to a nf-core/eager run. The only downside is that you will not get independent FASTQC results for each file.\n* Lane IDs must be unique for each sequencing of each library.\n  * If you have a library sequenced e.g. on Lane 8 of two HiSeq runs, you can give a fake lane ID (e.g. 20) for one of the FASTQs, and the libraries will still be processed correctly.\n  * This also applies to the SeqType column, i.e. with the example above, if one run is PE and one run is SE, you need to give fake lane IDs to one of the runs as well.\n* All _BAM_ files must be specified as `SE` under `SeqType`.\n  * You should provide a small decoy reference genome with pre-made indices, e.g. the human mtDNA or phiX genome, for the mandatory parameter `--fasta` in order to avoid long computational time for generating the index files of the reference genome, even if you do not actually need a reference genome for any downstream analyses.\n* nf-core/eager will only merge multiple _lanes_ of sequencing runs with the same single-end or paired-end configuration\n* Accordingly nf-core/eager will not merge _lanes_ of FASTQs with BAM files (unless you use `--run_convertbam`), as only FASTQ files are lane-merged together.\n* nf-core/eager is able to correctly handle libraries that are sequenced multiple times on different sequencing configurations (i.e mixtures of single- and paired-end data). These will be merged after mapping and considered 'paired-end' during downstream processes.\n  * **Important** we do not recommend choosing to use DeDup (i.e. `--dedupper 'dedup'`) when mixing PE and SE data, as SE data will not necessarily have the correct end position of the read, and DeDup requires both ends of the molecule to remove a duplicate read. Therefore you may end up with inflated (false-positive) coverages due to suboptimal deduplication.\n  * When you wish to run PE/SE data together, the default `-dedupper markduplicates` is therefore preferred, as it only looks at the first position. While more conservative (i.e. it'll remove more reads even if not technically duplicates, because it assumes it can't see the true ends of molecules), it is more consistent.\n  * An error will be thrown if you try to merge both PE and SE and also supply `--skip_merging`.\n  * If you truly want to mix SE data and PE data but using mate-pair info for PE mapping, please run FASTQ preprocessing mapping manually and supply BAM files for downstream processing by nf-core/eager\n  * If you _regularly_ want to run the situation above, please leave a feature request on github.\n* DamageProfiler, NuclearContamination, MTtoNucRatio and PreSeq are performed on each unique library separately after deduplication (but prior same-treated library merging).\n* nf-core/eager functionality such as `--run_trim_bam` will be applied to only   non-UDG (UDG_Treatment: none) or half-UDG (UDG_Treatment: half) libraries. - Qualimap is run on each sample, after merging of libraries (i.e. your values   will reflect the values of all libraries combined - after being damage trimmed etc.).\n* Genotyping will be typically performed on each `sample` independently, as normally all libraries will have been merged together. However, if you have a mixture of single-stranded and double-stranded libraries, you will normally need to genotype separately. In this case you **must** give each the SS and DS libraries _distinct_ `Sample_IDs`; otherwise you will receive a `file collision` error in steps such as `sexdeterrmine`, and then you will need to merge these yourself. We will consider changing this behaviour in the future if there is enough interest.\n\n## Clean up\n\nOnce a run has completed, you will have _lots_ of (some very large) intermediate\nfiles in your output directory. These are stored within the directory named\n`work`.\n\nAfter you have verified your run completed correctly and everything in the\nmodule output directories are present as you expect and need, you can perform a\nclean-up.\n\n> **Important**: Once clean-up is completed, you will _not_ be able to re-rerun\n> the pipeline from an earlier step and you'll have to re-run from scratch.\n\nWhile in your output directory, firstly verify you're only deleting files stored\nin `work/` with the dry run command:\n\n```bash\nnextflow clean -n\n```\n\n> :warning: some institutional profiles already have clean-up on successful run\n> completion turned on by default.\n\nIf you're ready, you can then remove the files with\n\n```bash\nnextflow clean -f -k\n```\n\nThis will make your system administrator very happy as you will _halve_ the\nhard drive footprint of the run, so be sure to do this!\n\n## Troubleshooting and FAQs\n\n### I get a file name collision error during merging\n\nWhen using TSV input, nf-core/eager will attempt to merge all `Lanes` of a\n`Library_ID`, or all files with the same `Library_ID` or `Sample_ID`. However,\nif you have specified the same `Lane` or  `Library_ID` for two sets of FASTQ\nfiles you will likely receive an error such as\n\n```bash\nError executing process > 'library_merge (JK2782)'\nCaused by:\n  Process `library_merge` input file name collision -- There are multiple input files for each of the following file names: JK2782.mapped_rmdup.bam.csi, JK2782.mapped_rmdup.bam\nTip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`\nExecution cancelled -- Finishing pending tasks before exit\n```\n\nIn this case: for lane merging errors, you can give 'fake' lane IDs to ensure\nthey are unique (e.g. if one library was sequenced on Lane 8 of two HiSeq runs,\nspecify lanes as 8 and 16 for each FASTQ file respectively). For library merging\nerrors, you must modify your `Library_ID`s accordingly, to make them unique.\n\n### A library or sample is missing in my MultiQC report\n\nIn some cases it maybe no output log is produced by a particular tool for MultiQC. Therefore this sample will not be displayed.\n\nKnown cases include:\n\n* Qualimap: there will be no MultiQC output if the BAM file is empty. An empty BAM file is produced when no reads map to the reference and causes Qualimap to crash - this is crash is ignored by nf-core/eager (to allow the rest of the pipeline to continue) and will therefore have no log file for that particular sample/library\n\n## Tutorials\n\n### Tutorial - How to investigate a failed run\n\nAs with most pipelines, nf-core/eager can sometimes fail either through a\nproblem with the pipeline itself, but also sometimes through an issue of the\nprogram being run at the given step.\n\nTo help try and identify what has caused the error, you can perform the\nfollowing steps before reporting the issue:\n\n#### 1a Nextflow reports an 'error executing process' with command error\n\nFirstly, take a moment to read the terminal output that is printed by an\nnf-core/eager command.\n\nWhen reading the following, you can see that the actual _command_ failed. When\nyou get this error, this would suggest that an actual program used by the\npipeline has failed. This is identifiable when you get an `exit status` and a\n`Command error:`, the latter of which is what is reported by the failed program\nitself.\n\n```bash\nERROR ~ Error executing process > 'circulargenerator (hg19_complete_500.fasta)'\n\nCaused by:\n  Process `circulargenerator (hg19_complete_500.fasta)` terminated with an error exit status (1)\n\nCommand executed:\n\n  circulargenerator -e 500 -i hg19_complete.fasta -s MT\n  bwa index hg19_complete_500.fasta\n\nCommand exit status:\n  1\n\nCommand output:\n  (empty)\n\nCommand error:\n  Exception in thread \"main\" java.lang.OutOfMemoryError: Java heap space\n        at java.util.Arrays.copyOf(Arrays.java:3332)\n        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)\n        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)\n        at java.lang.StringBuffer.append(StringBuffer.java:270)\n        at CircularGenerator.extendFastA(CircularGenerator.java:155)\n        at CircularGenerator.main(CircularGenerator.java:119)\n\nWork dir:\n  /projects1/microbiome_calculus/RIII/03-preprocessing/mtCap_preprocessing/work/7f/52f33fdd50ed2593d3d62e7c74e408\n\nTip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`\n\n -- Check '.nextflow.log' file for details\n```\n\nIf you find it is a common error try and fix it yourself by changing your\noptions in your nf-core/eager run - it could be a configuration error on your\npart. However in some cases it could be an error in the way we've set up the\nprocess in nf-core/eager.\n\nTo further investigate, go to step 2.\n\n#### 1b Nextflow reports an 'error executing process' with no command error\n\nAlternatively, you may get an error with Nextflow itself. The most common one\nwould be a 'process fails' and it looks like the following.\n\n```bash\nError executing process > 'library_merge (JK2782)'\nCaused by:\n  Process `library_merge` input file name collision -- There are multiple input files for each of the following file names: JK2782.mapped_rmdup.bam.csi, JK2782.mapped_rmdup.bam\nTip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`\nExecution cancelled -- Finishing pending tasks before exit\n```\n\nHowever in this case, there is no `exit status` or `Command error:` message. In\nthis case this is a Nextflow issue.\n\nThe example above is because a user has specified multiple sequencing runs of\ndifferent libraries but with the same library name. In this case Nextflow could\nnot identify which is the correct file to merge because they have the same name.\n\nThis again can also be a user or Nextflow error, but the errors are often more\nabstract and less clear how to solve (unless you are familiar with Nextflow).\n\nTry to investigate a bit further and see if you can understand what the error\nrefers to, but if you cannot - please ask on the #eager channel on the [nf-core\nslack](https://nf-co.re/join/slack) or leave a [github\nissue](https://github.com/nf-core/eager/issues).\n\n#### 2 Investigating an failed process's `work/` directory\n\nIf you haven't found a clear solution to the failed process from the reported\nerrors, you can next go into the directory where the process was working in,\nand investigate the log and error messages that are produced by each command of\nthe process.\n\nFor example, in the error in\n[1a](#1a-nextflow-reports-an-error-executing-process-with-command-error) you can\nsee the following line\n\n```bash\nWork dir:\n  /projects1/microbiome_calculus/RIII/03-preprocessing/mtCap_preprocessing/work/7f/52f33fdd50ed2593d3d62e7c74e408\n```\n\n> A shortened version of the 'hash' directory ID can also be seen in your\n> terminal while the pipeline is running in the square brackets at the beginning\n> of each line.\n\nIf you change into this with `cd` and run `ls -la` you should see a collection\nof normal files, symbolic links (symlinks) and hidden files (indicated with `.`\nat the beginning of the file name).\n\n* Symbolic links: are typically input files from previous processes.\n* Normal files: are typically successfully completed output files from some of\n  some of the commands in the process\n* Hidden files are Nextflow generated files and include the submission commands\n  as well as log files\n\nWhen you have an error run, you can firstly check the contents of the output\nfiles to see if they are empty or not (e.g. with `cat` or `zcat`),\ninterpretation of which will depend on the program thus dependent on the user\nknowledge.\n\nNext, you can investigate `.command.err` and `.command.out`, or `.command.log`.\nThese represent the standard out or error (in the case of `.log`, both combined)\nof all the commands/programs in the process - i.e. what would be printed to\nscreen if you were running the command/program yourself. Again, view these with\ne.g. `cat` and see if you can identify the error of the program itself.\n\nFinally, you can also try running the commands _yourself_. You can firstly try\nto do this by loading your given nf-core/eager environment (e.g. `singularity\nshell /\\<path\\>/\\<to\\>/nf-core-eager-X-X-X.img` or `conda activate\nnf-core-eager-X.X.X`), then running `bash .command.sh`.\n\nIf this doesn't work, this suggests either there is something wrong with the\nnf-core/eager environment configuration, _or_ there is still a problem with the\nprogram itself. To confirm the former, try running the command within the\n`.command.sh` file (viewable with `cat`) but with locally installed versions of\nprograms you may already have on your system. If the command still doesn't work,\nit is a problem with the program or your specified configuration. If it does\nwork locally, please report as a [github\nissue](https://github.com/nf-core/eager/issues).\n\nIf it does, please ask the developer of the tool (although we will endeavour to\nhelp as much as we can via the [nf-core slack](https://nf-co.re/join/slack) in\nthe #eager channel).\n\n### Tutorial - What are profiles and how to use them\n\n#### Tutorial Profiles - Background\n\nA useful feature of Nextflow is the ability to use configuration _profiles_ that\ncan specify many default parameters and other settings on how to run your\npipeline.\n\nFor example, you can use it to set your preferred mapping parameters, or specify\nwhere to keep Docker, Singularity or Conda environments, and which cluster\nscheduling system (and queues) your pipeline runs should normally use.\n\nThis are defined in `.config` files, and these in-turn can contain different\nprofiles that can define parameters for different contexts.\n\nFor example, a `.config` file could contain two profiles, one for\nshallow-sequenced samples that uses only a small number of CPUs and memory e.g.\n`small`, and another for deep sequencing data, `deep`, that allows larger\nnumbers of CPUs and memory. As another example you could define one profile\ncalled `loose` that contains mapping parameters to allow reads with aDNA damage\nto map, and then another called `strict` that reduces the likelihood of damaged\nDNA to map and cause false positive SNP calls.\n\nWithin nf-core, there are two main levels of configs\n\n* Institutional-level profiles: these normally define things like paths to\n  common storage, resource maximums, scheduling system\n* Pipeline-level profiles: these normally define parameters specifically for a\n  pipeline (such as mapping parameters, turning specific modules on or off)\n\nAs well as allowing more efficiency and control at cluster or Institutional\nlevels in terms of memory usage, pipeline-level profiles can also assist in\nfacilitating reproducible science by giving a way for researchers to 'publish'\ntheir exact pipeline parameters in way other users can automatically re-run the\npipeline with the pipeline parameters used in the original publication but on\ntheir _own_ cluster.\n\nTo illustrate this, lets say we analysed our data on a HPC called 'blue' for\nwhich an institutional profile already exists, and for our analysis we defined a\nprofile called 'old_dna'. We will have run our pipeline with the following\ncommand\n\n```bash\nnextflow run nf-core/eager -c old_dna_profile.config -profile hpc_blue,old_dna <...>\n```\n\nThen our colleague wished to recreate your results. As long as the\n`old_dna_profile.config` was published alongside your results, they can run the\nsame pipeline settings but on their own cluster HPC 'purple'.\n\n```bash\nnextflow run nf-core/eager -c old_dna_profile.config -profile hpc_purple,old_dna <...>\n```\n\n(where the `old_dna` profile is defined in `old_dna_profile.config`, and\n`hpc_purple` is defined on nf-core/configs)\n\nThis tutorial will describe how to create and use profiles that can be used by\nor from other researchers.\n\n#### Tutorial Profiles - Inheritance Rules\n\n##### Tutorial Profiles - Profiles\n\nAn important thing to understand before you start writing your own profile is\nunderstanding 'inheritance' of profiles when specifying multiple profiles, when\nusing `nextflow run`.\n\nWhen specifying multiple profiles, parameters defined in the profile in the\nfirst position will be overwritten by those in the second, and everything defined in the\nfirst and second will be overwritten everything in a third.\n\nThis can be illustrated as follows.\n\n```bash\n              overwrites  overwrites\n               ┌──────┐   ┌──────┐\n               ▼      │   ▼      │\n-profile institution,cluster,my_paper\n```\n\nThis would be translated as follows.\n\nIf your parameters looked like the following\n\n| Parameter       | Resolved Parameters    | institution | cluster  | my_paper |\n| ----------------|------------------------|-------------|----------|----------|\n| --executor      | singularity            | singularity | \\<none\\> | \\<none\\> |\n| --max_memory    | 256GB                  | 756GB       | 256GB    | \\<none\\> |\n| --bwa_aln       | 0.1                    | \\<none\\>    | 0.01     | 0.1      |\n\n(where '\\<none\\>' is a parameter not defined in a given profile.)\n\nYou can see that `my_paper` inherited the `0.1` parameter over the `0.01`\ndefined in the `cluster` profile.\n\n> :warning: You must always check if parameters are defined in any 'upstream'\n> profiles that have been set by profile administrators that you may be unaware\n> of. This is make sure there are no unwanted or unreported 'defaults' away from\n> original nf-core/eager defaults.\n\n##### Tutorial Profiles - Configuration Files\n\n> :warning: This section is only needed for users that want to set up\n> institutional-level profiles. Otherwise please skip to [Writing your own profile](#tutorial-profiles---writing-your-own-profile)\n\nIn actuality, a nf-core/eager run already contains many configs and profiles,\nand will normally use _multiple_ configs profiles in a single run. Multiple\nconfiguration and profiles files can be used, and each new one selected will\ninherit all the previous one's parameters, and the parameters in the new one\nwill then overwrite any that have been changed from the original.\n\nThis can be visualised here\n\n<p align=\"center\">\n  <img src=\"images/tutorials/profiles/config_profile_inheritence.png\" width=\"75%\" height = \"75%\">\n</p>\n\nUsing the example given in the [background](#tutorial-profiles---background), if\nthe `hpc_blue` profile has the following pipeline parameters set\n\n```txt\n<...>\nmapper = 'bwamem'\ndedupper = 'markduplicates'\n<...>\n```\n\nHowever, the profile `old_dna` has only the following parameter\n\n```txt\n<...>\nmapper = 'bwaaln'\n<...>\n```\n\nThen running the pipeline with the profiles in the order of the following run\ncommand:\n\n```bash\nnextflow run nf-core/eager -c old_dna_profile.config -profile hpc_blue,old_dna <...>\n```\n\nIn the background, any parameters in the pipeline's `nextflow.config`\n(containing default parameters) will be overwritten by the\n`old_dna_profile.config`. In addition, the `old_dna` _profile_ will overwrite\nany parameters set in the config but outside the profile definition of\n`old_dna_profile.config`.\n\nTherefore, the final profile used by your given run would look like:\n\n```txt\n<...>\nmapper = 'bwaaln'\ndedupper = 'markduplicates'\n<...>\n```\n\nYou can see here that `markduplicates` has not changed as originally defined in\nthe `hpc_blue` profile, but the `mapper` parameter has been changed from\n`bwamem` to `bwaaln`, as specified in the `old_dna` profile.\n\nThe order of loading of different configuration files can be seen here:\n\n| Loading Order | Configuration File                                                                                              |\n| -------------:|:----------------------------------------------------------------------------------------------------------------|\n| 1             | `nextflow.config` in your current directory                                                                     |\n| 2             | (if using a script for `nextflow run`) a `nextflow.config` in the directory the script is located               |\n| 3             | `config` stored in your human directory under `~/.nextflow/`                                                    |\n| 4             | `<your_file>.config` if you specify in the `nextflow run` command with `-c`                                     |\n| 5             | general nf-core institutional configurations stored at [nf-core/configs](https://github.com/nf-core/configs)    |\n| 6             | pipeline-specific nf-core institutional configurations at [nf-core/configs](https://github.com/nf-core/configs) |\n\nThis loading order of these `.config` files will not normally affect the\nsettings you use for the pipeline run itself; `-profiles` are normally more\nimportant. However this is good to keep in mind when you need to debug profiles\nif your run does not use the parameters you expect.\n\n> :warning: It is also possible to ignore every configuration file other when\n> specifying a custom `.config` file by using `-C` (capital C) instead of `-c`\n> (which inherits previously specify parameters)\n\nAnother thing that is important to note is that if a specific _profile_ is\nspecified in `nextflow run`, this replaces any 'global' parameter that is\nspecified within the config file (but outside a profile) itself - **regardless**\nof profile order (see above).\n\nFor example, see the example adapted from the SHH nf-core/eager\npipeline-specific\n[configuration](https://github.com/nf-core/configs/blob/master/conf/pipeline/eager/shh.config).\n\nThis pipeline-specific profile is automatically loaded if nf-core/eager detects\nwe are running eager, and that we specified the profile as `shh`.\n\n```txt\n// global 'fallback' parameters\nparams {\n  // Specific nf-core/configs params\n  config_profile_contact = 'James Fellows Yates (@jfy133)'\n  config_profile_description = 'nf-core/eager SHH profile provided by nf-core/configs'\n  \n  // default BWA\n   bwaalnn = 0.04\n   bwaalnl = 32\n}\n\n}\n\n// profile specific parameters\nprofiles {\n  pathogen_loose {\n    params {\n      config_profile_description = 'Pathogen (loose) MPI-SHH profile, provided by nf-core/configs.'\n      bwaalnn = 0.01\n      bwaalnl = 16\n    }\n  }\n}\n\n```\n\nIf you run with `nextflow run -profile shh` to specify to use an\ninstitutional-level nf-core config, the parameters will be read as `--bwaalnn\n0.04` and `--bwaalnl 32` as these are the default 'fall back' params as\nindicated in the example above.\n\nIf you specify as `nextflow run -profile shh,pathogen_loose`, as expected\nNextflow will resolve the two parameters as `0.01` and `16`.\n\nImportantly however, if you specify `-profile pathogen_loose,shh` the\n`pathogen_loose` **profile** will **still** take precedence over just the\n'global' params.\n\nEqually, a **process**-level defined parameter (within the nf-core/eager code\nitself) will take precedence over the fallback parameters in the `config` file.\nThis is also described in the Nextflow documentation\n[here](https://www.nextflow.io/docs/latest/config.html#config-profiles)\n\nThis is because selecting a `profile` will always take precedence over the\nvalues specified in a config file, but outside of a profile.\n\n#### Tutorial Profiles - Writing your own profile\n\nWe will now provide an example of how to write, use and share a project specific\nprofile. We will use the example of [Andrades Valtueña et al.\n2016](https://doi.org/10.1016/j.cub.2017.10.025).\n\nIn it they used the original EAGER (v1) to map DNA from ancient DNA to the\ngenome of the bacterium **Yersinia pestis**.\n\nNow, we will generate a profile, that, if they were using nf-core/eager they\ncould share with other researchers.\n\nIn the methods they described the following:\n\n> ... reads mapped to **Y. pestis** CO92 reference with BWA aln (-l 16, -n 0.01,\n> hereby referred to as non-UDG parameters). Reads with mapping quality scores\n> lower than 37 were filtered out. PCR duplicates were removed with\n> MarkDuplicates.\"\n\nFurthermore, in their 'Table 1' they say they used the NCBI **Y. pestis** genome\n'NC_003143.1', which can be found on the NCBI FTP server at:\n[https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/065/GCF_000009065.1_ASM906v1/GCF_000009065.1_ASM906v1_genomic.fna.gz](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/065/GCF_000009065.1_ASM906v1/GCF_000009065.1_ASM906v1_genomic.fna.gz)\n\nTo make a profile with these parameters for use with nf-core/eager we first need\nto open a text editor, and define a Nextflow 'profile' block.\n\n```txt\nprofiles {\n\n}\n\n```\n\nNext we need to define the name of the profile. This is what we would write in\n`-profile`. Lets call this AndradesValtuena2018.\n\n```txt\nprofiles {\n  AndradesValtuena2018 {\n\n  }\n}\n```\n\nNow we need to make a `params` 'scope'. This means these are the parameters you\nspecifically pass to nf-core/eager itself (rather than Nextflow configuration\nparameters).\n\nYou should generally not add [non-`params`\nscopes](https://www.nextflow.io/docs/latest/config.html?highlight=profile#config-scopes)\nin profiles for a specific project. This is because these will normally modify\nthe way the pipeline will run on the computer (rather than just nf-core/eager\nitself, e.g. the scheduler/executor or maximum memory available), and thus not\nallow other researchers to reproduce your analysis on their own\ncomputer/clusters.\n\n```txt\nprofiles {\n  AndradesValtuena2018 {\n    params {\n\n    }\n  }\n}\n```\n\nNow, as a cool little trick, we can use a couple of nf-core specific parameters\nthat can help you keep track which profile you are using when running the\npipeline. The `config_profile_description` and `config_profile_contact` profiles\nare displayed in the console log when running the pipeline. So you can use these\nto check if your profile loaded as expected. These are free text boxes so you\ncan put what you like.\n\n```txt\nprofiles {\n  AndradesValtuena2018 {\n    params {\n        config_profile_description = 'non-UDG parameters used in Andrades Valtuena et al. 2018 Curr. Bio.'\n        config_profile_contact = 'Aida Andrades Valtueña (@aidaanva)'\n    }\n  }\n}\n```\n\nNow we can add the specific nf-core/eager parameters that will modify the\nmapping and deduplication parameters in nf-core/eager.\n\n```txt\nprofiles {\n  AndradesValtuena2018 {\n    params {\n        config_profile_description = 'non-UDG parameters used in Andrades Valtuena et al. 2018 Curr. Bio.'\n        config_profile_contact = 'Aida Andrades Valtueña (@aidaanva)'\n        fasta = 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/065/GCF_000009065.1_ASM906v1/GCF_000009065.1_ASM906v1_genomic.fna.gz'\n        bwaalnn = 0.01\n        bwaalnl = 16\n        run_bam_filtering = true\n        bam_mapping_quality_threshold = 37\n        dedupper = 'markduplicates'\n    }\n  }\n}\n```\n\nOnce filled in, we can save the file as `AndradesValtuena2018.config`. This you\ncan use yourself, or upload alongside your publication for others to use.\n\nTo use the profile you just need to specify the file containing the profile you\nwish to use, and then the profile itself.\n\nFor example, Aida (Andrades Valtueña) at the MPI-SHH (`shh`) in Jena could run\nthe following:\n\n```bash\nnextflow run nf-core/eager -c /<path>/<to>/AndradesValtuena2018.config -profile shh,AndradesValtuena2018 --input '/<path>/<to>/<some_input>/' <...>\n```\n\nThen a colleague at a different institution, such as the SciLifeLab, could run\nthe same profile on the UPPMAX cluster in Uppsala with:\n\n```bash\nnextflow run nf-core/eager -c /<path>/<to>/AndradesValtuena2018.config -profile uppmax,AndradesValtuena2018 --input '/<path>/<to>/<some_input>/' <...>\n```\n\nAnd that's all there is to it. Of course you should always check that there are\nno other 'default' parameters for your given pipeline are defined in any\npipeline-specific or institutional profiles. This ensures that someone\nre-running the pipeline with your settings is as close to the nf-core/eager\ndefaults as possible, and only settings specific to your given project are used.\nIf there are 'upstream' defaults, you should explicitly specify these in your\nproject profile.\n\n### Tutorial - How to set up nf-core/eager for human population genetics\n\n#### Tutorial Human Pop-Gen - Introduction\n\nThis tutorial will give a basic example on how to set up nf-core/eager to\nperform initial screening of samples in the context of ancient human population\ngenetics research.\n\n> :warning: this tutorial does not describe how to install and set up\n> nf-core/eager For this please see other documentation on the\n> [nf-co.re](https://nf-co.re/usage/installation) website.\n\nWe will describe how to set up mapping of ancient sequences against the human\nreference genome to allow sequencing and library quality-control, estimation of\nnuclear contamination, genetic sex determination, and production of random draw\ngenotypes in eigenstrat format for a specific set of sites, to be used in\nfurther analysis. For this example, I will be using the 1240k SNP set. This SNP\nset was first described in [Mathieson et al.\n2015](https://www.nature.com/articles/nature16152) and contains various\npositions along the genome that have been extensively genotyped in present-day\nand ancient populations, and are therefore useful for ancient population genetic\nanalysis. Capture techniques are often used to enrich DNA libraries for\nfragments, that overlap these SNPs, as is being assumed has been performed in\nthis example.\n\n> :warning: Please be aware that the settings used in this tutorial may not use\n> settings nor produce files you would actually use in 'real' analysis. The\n> settings are only specified for demonstration purposes. Please consult the\n> your colleagues, communities and the literature for optimal parameters.\n\n#### Tutorial Human Pop-Gen - Preparation\n\nPrior setting up the nf-core/eager run, we will need:\n\n1. Raw sequencing data in FASTQ format\n2. Reference genome in FASTA format, with associated pre-made `bwa`, `samtools`\n   and `picard SequenceDictionary` indices (however note these can be made for\n   you with nf-core/eager, but this can make a pipeline run take much longer!)\n3. A BED file with the positions of the sites of interest.\n4. An eigenstrat formatted `.snp` file for the positions of interest.\n\nWe should also ensure we have the very latest version of the nf-core/eager\npipeline so we have all latest bugfixes etc. In this case we will be using\nnf-core/eager version 2.2.0. You should always check on the\n[nf-core](https://nf-co.re/eager) website whether a newer release has been made\n(particularly point releases e.g. 2.2.1).\n\n```bash\nnextflow pull nf-core/eager -r 2.2.0\n```\n\nIt is important to note that if you are planning on running multiple runs of\nnf-core/eager for a given project, that the version should be **kept the same**\nfor all runs to ensure consistency in settings for all of your libraries.\n\n#### Tutorial Human Pop-Gen - Inputs and Outputs\n\nTo start, lets make a directory where all your nf-core/eager related files for\nthis run will go, and change into it.\n\n```bash\nmkdir projectX_preprocessing20200727\ncd projectX_preprocessing20200727\n```\n\nThe first part of constructing any nf-core/eager run is specifying a few generic\nparameters that will often be common across all runs. This will be which\npipeline, version and _profile_ we will use. We will also specify a unique name\nof the run to help us keep track of all the nf-core/eager runs you may be\nrunning.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n<...>\n```\n\nFor the `-profile` parameter, I have indicated that I wish to use Singularity as\nmy software container environment, and I will use the MPI-SHH institutional\nconfig as listed on\n[nf-core/configs](https://github.com/nf-core/configs/blob/master/conf/shh.config).\nThese profiles specify settings\noptimised for the specific cluster/institution, such as maximum memory available\nor which scheduler queues to submit to. More explanations about configs and\nprofiles can be seen in the [nf-core\nwebsite](https://nf-co.re/usage/configuration) and the [profile\ntutorial](#tutorial---what-are-profiles-and-how-to-use-them).\n\nNext we need to specify our input data. nf-core/eager can accept input FASTQs\nfiles in two main ways, either with direct paths to files (with wildcards), or\nwith a Tab-Separate-Value (TSV) file which contains the paths and extra\nmetadata. In this example, we will use the TSV method, as to simulate a\nrealistic use-case, such as receiving paired-end data from an Illumina NextSeq\nof double-stranded libraries. Illumina NextSeqs sequence a given library across\nfour different 'lanes', so for each library you will receive four FASTQ files.\nThe TSV input method is more useful for this context, as it allows 'merging' of\nthese lanes after preprocessing prior mapping (whereas direct paths will\nconsider each pair of FASTQ files as independent libraries/samples).\n\nOur TSV file will look something like the following:\n\n```bash\nSample_Name     Library_ID      Lane    Colour_Chemistry        SeqType Organism        Strandedness    UDG_Treatment   R1      R2      BAM\nEGR001  EGR001.B0101.SG1        1       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L001_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        2       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L002_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        3       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L003_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        4       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L004_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        5       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L001_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        6       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L002_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        7       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        8       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L004_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        1       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L001_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        2       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L002_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        3       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        4       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L004_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        5       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L001_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        6       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L002_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        7       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        8       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L004_R2_001.fastq.gz NA\n```\n\nYou can see that we have a single line for each pair of FASTQ files representing\neach `Lane`, but the `Sample_Name` and `Library_ID` columns identify and group\nthem together accordingly. Secondly, as we have NextSeq data, we have specified\nwe have `2` for `Colour_Chemistry`, which is important for downstream processing\n(see below). See the nf-core/eager\nparameter documentation above for more specifications on how to set up a\nTSV file (e.g. why despite NextSeqs\nonly having 4 lanes, we go up to 8 in the example above).\n\nAlongside our input TSV file, we will also specify the paths to our reference\nFASTA file and the corresponding indices.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n<...>\n```\n\nWe specify the paths to each reference genome and it's corresponding tool\nspecific index. Paths should always be encapsulated in quotes to ensure Nextflow\nevaluates them, rather than your shell! Also note that as `bwa` generates\nmultiple index files, nf-core/eager takes a _directory_ that must contain these\nindices instead.\n\n> Note the difference between single and double `-` parameters. The former\n> represent Nextflow flags, while the latter are nf-core/eager specific flags.\n\nFinally, we can also specify the output directory and the Nextflow `work/`\ndirectory (which contains 'intermediate' working files and directories).\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\`\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n<...>\n```\n\n#### Tutorial Human Pop-Gen - Pipeline Configuration\n\nNow that we have specified the input data, we can start moving onto specifying\nsettings for each different module we will be running. As mentioned above, we\nare pretending to run with NextSeq data, which is generated with a two-colour\nimaging technique. What this means is when you have shorter molecules than the\nnumber of cycles of the sequencing chemistry, the sequencer will repeatedly see\n'G' calls (no colour) at the last few cycles, and you get long poly-G 'tails' on\nyour reads. We therefore will turn on the poly-G clipping functionality offered\nby [`fastp`](https://github.com/OpenGene/fastp), and any pairs of files\nindicated in the TSV file as having `2` in the `Colour_Chemistry` column will be\npassed to `fastp`. We will not change the default minimum length of a poly-G\nstring to be clipped.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n<...>\n```\n\nSince our input data is paired-end, we will be using `DeDup` for duplicate\nremoval, which takes into account both the start and end of a merged read before\nflagging it as a duplicate. To ensure this happens works properly we first need\nto disable base quality trimming of collapsed reads within Adapter Removal. To\ndo this, we will provide the option `--preserve5p`. Additionally, Dedup should\nonly be provided with merged reads, so we will need to provide the option\n`--mergedonly` here as well. We can then specify which dedupper we want to use\nwith `--dedupper`.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n<...>\n```\n\nWe then need to specify the mapping parameters for this run. The default mapping\nparameters of nf-core/eager are fine for the purposes of our run. Personally, I\nlike to set `--bwaalnn` to `0.01`, (down from the default `0.04`) which reduces\nthe stringency in the number of allowed mismatches between the aligned sequences\nand the reference.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n<...>\n```\n\nWe may also want to remove ambiguous sequences from our alignments, and also\nremove off-target reads to speed up downstream processing (and reduce your\nhard-disk footprint). We can do this with the samtools filter module to set a\nmapping-quality filter (e.g. with a value of `25` to retain only slightly\nambiguous alignments that might occur from damage), and to indicate to discard\nunmapped reads.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n<...>\n```\n\nNext, we will set up trimming of the mapped reads to alleviate the effects of DNA\ndamage during genotyping. To do this we will activate trimming with\n`--run_trim_bam`. The libraries in this underwent 'half' UDG treatment. This\nwill generally restrict all remaining DNA damage to the first 2 base pairs of a\nfragment. We will therefore use `--bamutils_clip_half_udg_left` and\n`--bamutils_clip_half_udg_right` to trim 2bp on either side of each fragment.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n<...>\n```\n\nTo activate human sex determination (using\n[Sex.DetERRmine.py](https://github.com/TCLamnidis/Sex.DetERRmine)) we will\nprovide the option `--run_sexdeterrmine`. Additionally, we will provide\nsexdeterrmine with the BED file of our SNPs of interest using the\n`--sexdeterrmine_bedfile` flag. Here I will use the 1240k SNP set as an example.\nThis will cut down on computational time and also provide us with an\nerror bar around the relative coverage on the X and Y chromosomes.\nIf you wish to use the same bedfile to follow along with this tutorial,\nyou can download the file from [here](https://github.com/nf-core/test-datasets/blob/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz).\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n--run_trim_bam \\\n--bamutils_clip_half_udg_left 2 \\\n--bamutils_clip_half_udg_right 2 \\\n--run_sexdeterrmine \\\n--sexdeterrmine_bedfile '../Reference/genome/1240k.sites.bed' \\\n<...>\n```\n\nSimilarly, we will activate nuclear contamination estimation with\n`--run_nuclear_contamination`. This process requires us to also specify the\ncontig name of the X chromosome in the reference genome we are using with\n`--contamination_chrom_name`. Here, we are using hs37d5, where the X chromosome\nis simply named 'X'.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--run_sexdeterrmine \\\n--sexdeterrmine_bedfile '../Reference/genome/1240k.sites.bed' \\\n--run_nuclear_contamination \\\n--contamination_chrom_name 'X' \\\n<...>\n```\n\nBecause nuclear contamination estimates can only be provided for males, it is\npossible that we will need to get mitochondrial DNA contamination estimates for\nany females in our dataset. This cannot be done within nf-core/eager (v2.2.0)\nand we will need to do this manually at a later time. However, mtDNA\ncontamination estimates have been shown to only be reliable for nuclear\ncontamination when the ratio of mitochondrial to nuclear reads is low\n([Furtwängler et al. 2018](https://doi.org/10.1038/s41598-018-32083-0)). We can\nhave nf-core/eager calculate that ratio for us with `--run_mtnucratio`, and\nproviding the name of the mitochondrial DNA contig in our reference genome with\n`--mtnucratio_header`. Within hs37d5, the mitochondrial contig is named 'MT'.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--run_sexdeterrmine \\\n--sexdeterrmine_bedfile '../Reference/genome/1240k.sites.bed' \\\n--run_nuclear_contamination \\\n--contamination_chrom_name 'X' \\\n--run_mtnucratio \\\n--mtnucratio_header 'MT' \\\n<...>\n```\n\nFinally, we need to specify genotyping parameters. First, we need to activate\ngenotyping with `--run_genotyping`. It is also important to specify we wish to\nuse the **trimmed** data for genotyping, to avoid the effects of DNA damage. To\ndo this, we will specify the `--genotyping_source` as `'trimmed'`. Then we can\nspecify the genotyping tool to use with `--genotyping_tool`. We will be using\n`'pileupCaller'` to produce random draw genotypes in eigenstrat format. For this\nprocess we will need to specify a BED file of the sites of interest (the same as\nbefore) with `--pileupcaller_bedfile`, as well as an eigenstrat formatted `.snp`\nfile of these sites that is specified with `--pileupcaller_snpfile`.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/hs37d5.fa' \\\n--bwa_index '../Reference/genome/hs37d5/' \\\n--fasta_index '../Reference/genome/hs37d5.fa.fai' \\\n--seq_dict '../Reference/genome/hs37d5.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--preserve5p \\\n--mergedonly \\\n--dedupper 'dedup' \\\n--bwaalnn 0.01 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 25 \\\n--bam_unmapped_type 'discard' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--run_sexdeterrmine \\\n--sexdeterrmine_bedfile '../Reference/genome/1240k.sites.bed' \\\n--run_nuclear_contamination \\\n--contamination_chrom_name 'X' \\\n--run_mtnucratio \\\n--mtnucratio_header 'MT' \\\n--run_genotyping \\\n--genotyping_source 'trimmed' \\\n--genotyping_tool 'pileupcaller' \\\n--pileupcaller_bedfile '../Reference/genome/1240k.sites.bed' \\\n--pileupcaller_snpfile '../Datasets/1240k/1240k.snp'\n```\n\nWith this, we are ready to submit! If running on a remote cluster/server, Make\nsure to run this in a `screen` session or similar, so that if you get a `ssh`\nsignal drop or want to log off, Nextflow will not crash.\n\n#### Tutorial Human Pop-Gen - Results\n\nAssuming the run completed without any crashes (if problems do occur, check\nagainst [parameters](https://nf-core/eager/parameters) that all parameters are as expected, or\ncheck the [FAQ](#troubleshooting-and-faqs)), we can now check our results in\n`results/`.\n\n##### Tutorial Human Pop-Gen - MultiQC Report\n\nIn here there are many different directories containing different output files.\nThe first directory to check is the `MultiQC/` directory. You should\nfind a `multiqc_report.html` file. You will need to view this in a web browser,\nso I recommend either mounting your server to your file browser, or downloading\nit to your own local machine (PC/Laptop etc.).\n\nOnce you've opened this you can go through each section and evaluate all the\nresults. You will likely want to check these for artefacts (e.g. weird damage\npatterns on the human DNA, or weirdly skewed coverage distributions).\n\nFor example, I normally look for things like:\n\nGeneral Stats Table:\n\n* Do I see the expected number of raw sequencing reads (summed across each set\n  of FASTQ files per library) that was requested for sequencing?\n* Does the percentage of trimmed reads look normal for aDNA, and do lengths\n  after trimming look short as expected of aDNA?\n* Does ClusterFactor or 'Dups' look high (e.g. >2 or >10% respectively)\n  suggesting over-amplified or badly preserved samples?\n* Do the mapped reads show increased frequency of C>Ts on the 5' end of\n  molecules?\n* Is the number of SNPs used for nuclear contamination really low for any\n  individuals (e.g. < 100)? If so, then the estimates might not be very\n  accurate.\n\nFastQC (pre-AdapterRemoval):\n\n* Do I see any very early drop off of sequence quality scores suggesting a\n  problematic sequencing run?\n* Do I see outlier GC content distributions?\n* Do I see high sequence duplication levels?\n\nAdapterRemoval:\n\n* Do I see high numbers of singletons or discarded read pairs?\n\nFastQC (post-AdapterRemoval):\n\n* Do I see improved sequence quality scores along the length of reads?\n* Do I see reduced adapter content levels?\n\nSamtools Flagstat (pre/post Filter):\n\n* Do I see outliers, e.g. with unusually high levels of human DNA, (indicative\n  of contamination) that require downstream closer assessment? Are your samples\n  exceptionally preserved? If not, a value higher than e.g. 50% might require\n  your attention.\n\nDeDup/Picard MarkDuplicates:\n\n* Do I see large numbers of duplicates being removed, possibly indicating\n  over-amplified or badly preserved samples?\n\nDamageProfiler:\n\n* Do I see evidence of damage on human DNA?\n  * High numbers of mapped reads but no damage may indicate significant\n    modern contamination.\n  * Was the read trimming I specified enough to overcome damage effects?\n\nSexDetERRmine:\n\n* Do the relative coverages on the X and Y chromosome fall within the expected\n  areas of the plot?\n* Do all individuals have enough data for accurate sex determination?\n* Do the proportions of autosomal/X/Y reads make sense? If there is an\n  overrepresentation of reads within one bin, is the data enriched for that bin?\n\n> Detailed documentation and descriptions for all MultiQC modules can be seen in\n> the the 'Documentation' folder of the results directory or here in the [output\n> documentation](output.md)\n\nIf you're happy everything looks good in terms of sequencing, we then look at\nspecific directories to find any files you might want to use for downstream\nprocessing.\n\nNote that when you get back to writing up your publication, all the versions of\nthe tools can be found under the 'nf-core/eager Software Versions' section of\nthe MultiQC report. But be careful! All tools in the container are listed, so\nyou may have to remove some of them that you didn't actually use in the set up.\n\nFor example, in this example, we have used: Nextflow, nf-core/eager, FastQC,\nAdapterRemoval, fastP, BWA, Samtools, endorS.py, DeDup, Qualimap, PreSeq,\nDamageProfiler, bamUtil, sexdeterrmine, angsd, MTNucRatioCalculator,\nsequenceTools, and MultiQC.\n\nCitations to all used tools can be seen\n[here](https://nf-co.re/eager#tool-references)\n\n##### Tutorial Human Pop-Gen - Files for Downstream Analysis\n\nYou will find the eigenstrat dataset containing the random draw genotypes of\nyour run in the `genotyping/` directory. Genotypes from double stranded\nlibraries, like the ones in this example, are found in the dataset\n`pileupcaller.double.{geno,snp,ind}.txt`, while genotypes for any single\nstranded libraries will instead be in `pileupcaller.single.{geno,snp,ind}.txt`.\n\n#### Tutorial Human Pop-Gen - Clean up\n\nFinally, I would recommend cleaning up your `work/` directory of any\nintermediate files (if your `-profile` does not already do so). You can do this\nby going to above your `results/` and `work/` directory, e.g.\n\n```bash\ncd /<path>/<to>/projectX_preprocessing20200727\n```\n\nand running\n\n```bash\nnextflow clean -f -k\n```\n\n#### Tutorial Human Pop-Gen - Summary\n\nIn this this tutorial we have described an example on how to set up an\nnf-core/eager run to preprocess human aDNA for population genetic studies,\npreform some simple quality control checks, and generate random draw genotypes\nfor downstream analysis of the data. Additionally, we described what to look for\nin the run summary report generated by MultiQC and where to find output files\nthat can be used for downstream analysis.\n\n### Tutorial - How to set up nf-core/eager for metagenomic screening\n\n#### Tutorial Metagenomics - Introduction\n\nThe field of archaeogenetics is now expanding out from analysing the genomes of\nsingle organisms but to whole communities of taxa. One particular example is of\nhuman associated microbiomes, as preserved in ancient palaeofaeces (gut) or\ndental calculus (oral). This tutorial will give a basic example on how to set up\nnf-core/eager to perform initial screening of samples in the context of ancient\nmicrobiome research.\n\n> :warning: this tutorial does not describe how to install and set up\n> nf-core/eager For this please see other documentation on the\n> [nf-co.re](https://nf-co.re/usage/installation) website.\n\nWe will describe how to set up mapping of ancient dental calculus samples\nagainst the human reference genome to allow sequencing and library\nquality-control, but additionally perform taxonomic profiling of the off-target\nreads from this mapping using MALT, and perform aDNA authentication with HOPS.\n\n> :warning: Please be aware that the settings used in this tutorial may not use\n> settings nor produce files you would actually use in 'real' analysis. The\n> settings are only specified for demonstration purposes. Please consult the\n> your colleagues, communities and the literature for optimal parameters.\n\n#### Tutorial Metagenomics - Preparation\n\nPrior setting up an nf-core/eager run for metagenomic screening, we will need:\n\n1. Raw sequencing data in FASTQ format\n2. Reference genome in FASTA format, with associated pre-made `bwa`, `samtools`\n   and `picard SequenceDictionary` indices\n3. A MALT database of your choice (see [MALT\n   manual](https://software-ab.informatik.uni-tuebingen.de/download/malt/manual.pdf)\n   for set-up)\n4. A list of (NCBI) taxa containing well-known taxa of your microbiome (see\n   below)\n5. HOPS resources `.map` and `.tre` files (available\n   [here](https://github.com/rhuebler/HOPS/tree/external/Resources))\n\nWe should also ensure we have the very latest version of the nf-core/eager\npipeline so we have all latest bugfixes etc. In this case we will be using\nnf-core/eager version 2.2.0. You should always check on the\n[nf-core](https://nf-co.re/eager) website  whether a newer release has been made\n(particularly point releases e.g. 2.2.1).\n\n```bash\nnextflow pull nf-core/eager -r 2.2.0\n```\n\nIt is important to note that if you are planning on running multiple runs of\nnf-core/eager for a given project, that the version should be **kept the same**\nfor all runs to ensure consistency in settings for all of your libraries.\n\n#### Tutorial Metagenomics - Inputs and Outputs\n\nTo start, lets make a directory where all your nf-core/eager related files for\nthis run will go, and change into it.\n\n```bash\nmkdir projectX_screening20200720\ncd projectX_screening20200720\n```\n\nThe first part of constructing any nf-core/eager run is specifying a few generic\nparameters that will often be common across all runs. This will be which\npipeline, version and _profile_ we will use. We will also specify a unique name\nof the run to help us keep track of all the nf-core/eager runs you may be\nrunning.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n<...>\n```\n\nFor the `-profile` parameter, I have indicated that I wish to use Singularity as\nmy software container environment, and I will use the MPI-SHH institutional\nconfig as listed on\n[nf-core/configs](https://github.com/nf-core/configs/blob/master/conf/shh.config).\nThese profiles specify settings\noptimised for the specific cluster/institution, such as maximum memory available\nor which scheduler queues to submit to. More explanations about configs and\nprofiles can be seen in the [nf-core\nwebsite](https://nf-co.re/usage/configuration) and the [profile\ntutorial](#tutorial---what-are-profiles-and-how-to-use-them).\n\nNext we need to specify our input data. nf-core/eager can accept input FASTQs\nfiles in two main ways, either with direct paths to files (with wildcards), or\nwith a Tab-Separate-Value (TSV) file which contains the paths and extra\nmetadata. In this example, we will use the TSV method, as to simulate a\nrealistic use-case, such as receiving paired-end data from an Illumina NextSeq\nof double-stranded libraries. Illumina NextSeqs sequence a given library across\nfour different 'lanes', so for each library you will receive four FASTQ files.\nThe TSV input method is more useful for this context, as it allows 'merging' of\nthese lanes after preprocessing prior mapping (whereas direct paths will\nconsider each pair of FASTQ files as independent libraries/samples).\n\nOur TSV file will look something like the following:\n\n```bash\nSample_Name     Library_ID      Lane    Colour_Chemistry        SeqType Organism        Strandedness    UDG_Treatment   R1      R2      BAM\nEGR001  EGR001.B0101.SG1        1       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L001_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        2       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L002_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        3       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L003_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        4       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.1/EGR001.B0101.SG1.1_S0_L004_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        5       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L001_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        6       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L002_R2_001.fastq.gz NA\nEGR001  EGR001.B0101.SG1        7       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        8       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR001.B0101.SG1.2/EGR001.B0101.SG1.2_S0_L004_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        1       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L001_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        2       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L002_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        3       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        4       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.1/EGR002.B0201.SG1.1_S0_L004_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        5       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L001_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L001_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        6       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L002_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L002_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        7       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L003_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L003_R2_001.fastq.gz NA\nEGR002  EGR002.B0201.SG1        8       2       PE      homo_sapiens    double  half    ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L004_R1_001.fastq.gz ../../02-raw_data/EGR002.B0201.SG1.2/EGR002.B0201.SG1.2_S0_L004_R2_001.fastq.gz NA\n```\n\nYou can see that we have a single line for each pair of FASTQ files representing\neach `Lane`, but the `Sample_Name` and `Library_ID` columns identify and group\nthem together accordingly. Secondly, as we have NextSeq data, we have specified\nwe have `2` for `Colour_Chemistry`, which is important for downstream processing\n(see below). The other columns are less important for this particular context of\nmetagenomic screening. See the nf-core/eager [parameters](https://nf-core/eager/parameters)\ndocumentation for more specifications on how to set up a TSV file (e.g. why\ndespite NextSeqs only having 4 lanes, we go up to 8 in the example above).\n\nAlongside our input TSV file, we will also specify the paths to our reference\nFASTA file and the corresponding indices.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n<...>\n```\n\nWe specify the paths to each reference genome and it's corresponding tool\nspecific index. Paths should always be encapsulated in quotes to ensure Nextflow\nevaluates them, rather than your shell! Also note that as `bwa` generates\nmultiple index files, nf-core/eager takes a _directory_ that must contain these\nindices instead.\n\n> Note the difference between single and double `-` parameters. The former\n> represent Nextflow flags, while double are nf-core/eager specific flags.\n\nFinally, we can also specify the output directory and the Nextflow `work/`\ndirectory (which contains 'intermediate' working files and directories).\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n<...>\n```\n\n#### Tutorial Metagenomics - Pipeline Configuration\n\nNow that we have specified the input data, we can start moving onto specifying\nsettings for each different module we will be running. As mentioned above, we\nare pretending to run with NextSeq data, which is generated with a two-colour\nimaging technique. What this means is when you have shorter molecules than the\nnumber of cycles of the sequencing chemistry, the sequencer will repeatedly see\n'G' calls (no colour) at the last few cycles, and you get long poly-G 'tails' on\nyour reads. We therefore will turn on the poly-G clipping functionality offered\nby [`fastp`](https://github.com/OpenGene/fastp), and any pairs of files\nindicated in the TSV file as having `2` in the `Colour_Chemistry` column will be\npassed to `fastp`. We will not change the default minimum length of a poly-G\nstring to be clipped.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n<...>\n```\n\nWe will keep the default settings for mapping etc. against the reference genome\nas we will only use this for sequencing quality control, however we now need to\nspecify that we want to run metagenomic screening. To do this we firstly need to\ntell nf-core/eager what to do with the off target reads from the mapping.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--run_bam_filtering \\\n--bam_unmapped_type 'fastq' \\\n<...>\n```\n\nnf-core/eager will now take all unmapped reads after mapping and convert the BAM\nfile back to FASTQ, which can be accepted by MALT. But of course, we also then\nneed to tell nf-core/eager we actually want to run MALT. We will also specify\nthe location of the [pre-built database](#tutorial-metagenomics---preparation) and which 'min support'\nmethod we want to use (this specifies the minimum number of alignments is needed\nto a particular taxonomic node to be 'kept' in the MALT output files). Otherwise\nwe will keep all other parameters as default. For example using BlastN mode,\nrequiring a minimum of 85% identity, requiring at least 0.01% alignments for a\ntaxon to be saved (as specified with the `--malt_min_support_mode`). More\ndocumentation describing each parameters can be seen in the usage\n[documentation](usage.md)\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--run_bam_filtering \\\n--bam_unmapped_type 'fastq' \\\n--run_metagenomic_screening \\\n--metagenomic_tool 'malt' \\\n--database '../Reference/database/refseq-bac-arch-homo-2018_11' \\\n--malt_min_support_mode 'percent' \\\n<...>\n```\n\nFinally, to help quickly assess whether we our sample has taxa that are known to\nexist in (modern samples of) our expected microbiome, and that these alignments\nhave indicators of true aDNA, we will run 'maltExtract' of the\n[HOPS](https://github.com/rhuebler/HOPS) pipeline.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_screening20200720' \\\n--input 'screening20200720.tsv' \\\n--fasta '../Reference/genome/GRCh38.fa' \\\n--bwa_index '../Reference/genome/GRCh38/' \\\n--fasta_index '../Reference/genome/GRCh38.fa.fai' \\\n--seq_dict '../Reference/genome/GRCh38.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--run_bam_filtering \\\n--bam_unmapped_type 'fastq' \\\n--run_metagenomic_screening \\\n--metagenomic_tool 'malt' \\\n--database '../Reference/database/refseq-bac-arch-homo-2018_11' \\\n--malt_min_support_mode 'percent' \\\n--run_maltextract \\\n--maltextract_taxon_list '../Reference/taxa_list/core_genera-anthropoids_hominids_panhomo-20180131.txt' \\\n--maltextract_ncbifiles '../Reference/hops' \\\n--maltextract_destackingoff\n```\n\nIn the last parameters above we've specified the path to our list of taxa. This\ncontains something like (for oral microbiomes):\n\n```text\nActinomyces\nStreptococcus\nTannerella\nPorphyromonas\n```\n\nWe have also specified the path to the HOPS resources [downloaded\nearlier](#tutorial-metagenomics---preparation), and that I want to turn off 'destacking' (removal of any\nread that overlaps the positions of another - something only recommended to keep\non when you have high coverage data).\n\nWith this, we are ready to submit! If running on a remote cluster/server, Make\nsure to run this in a `screen` session or similar, so that if you get a `ssh`\nsignal drop or want to log off, Nextflow will not crash.\n\n#### Tutorial Metagenomics - Results\n\nAssuming the run completed without any crashes (if problems do occur, check\nagainst [parameters](https://nf-core/eager/parameters) that all parameters are as expected, or check\nthe [FAQ](#troubleshooting-and-faqs)), we can now check our results in\n`results/`.\n\n##### Tutorial Metagenomics - MultiQC Report\n\nIn here there are many different directories containing different output files.\nThe first directory to check is the `MultiQC/` directory. You should\nfind a `multiqc_report.html` file. You will need to view this in a web browser,\nso I recommend either mounting your server to your file browser, or downloading\nit to your own local machine (PC/Laptop etc.).\n\nOnce you've opened this you can go through each section and evaluate all the\nresults. You will likely not want to concern yourself too much with anything\nafter MALT - however you should check these for other artefacts (e.g. weird\ndamage patterns on the human DNA, or weirdly skewed coverage distributions).\n\nFor example, I normally look for things like:\n\nGeneral Stats Table:\n\n* Do I see the expected number of raw sequencing reads (summed across each set\n  of FASTQ files per library) that was requested for sequencing?\n* Does the percentage of trimmed reads look normal for aDNA, and do lengths\n  after trimming look short as expected of aDNA?\n* Does ClusterFactor or 'Dups' look high suggesting over-amplified or\n  badly preserved samples (e.g. >2 or >10% respectively - however\n  given this is on the human reads this is just a rule of thumb and may not\n  reflect the quality of the metagenomic profile) ?\n* Does the human DNA show increased frequency of C>Ts on the 5' end of\n  molecules?\n\nFastQC (pre-AdapterRemoval):\n\n* Do I see any very early drop off of sequence quality scores suggesting\n  problematic sequencing run?\n* Do I see outlier GC content distributions?\n* Do I see high sequence duplication levels?\n\nAdapterRemoval:\n\n* Do I see high numbers of singletons or discarded read pairs?\n\nFastQC (post-AdapterRemoval):\n\n* Do I see improved sequence quality scores along the length of reads?\n* Do I see reduced adapter content levels?\n\nMALT:\n\n* Do I have a reasonable level of mappability?\n  * Somewhere between 10-30% can be pretty normal for aDNA, whereas e.g. <1%\n    requires careful manual assessment\n* Do I have a reasonable taxonomic assignment success?\n  * You hope to have a large number of the mapped reads (from the mappability\n    plot) that also have taxonomic assignment.\n\nSamtools Flagstat (pre/post Filter):\n\n* Do I see outliers, e.g. with unusually high levels of human DNA, (indicative\n  of contamination) that require downstream closer assessment?\n\nDeDup/Picard MarkDuplicates:\n\n* Do I see large numbers of duplicates being removed, possibly indicating\n  over-amplified or badly preserved samples?\n\nDamageProfiler:\n\n* Do I see evidence of damage on human DNA? Note this is just a\n  rule-of-thumb/corroboration of any signals you might find in the metagenomic\n  screening and not essential.\n  * If you have high numbers of human DNA reads but no damage may indicate\n    significant modern contamination.\n\n> Detailed documentation and descriptions for all MultiQC modules can be seen in\n> the the 'Documentation' folder of the results directory or here in the [output\n> documentation](output.md)\n\nIf you're happy everything looks good in terms of sequencing, we then look at\nspecific directories to find any files you might want to use for downstream\nprocessing.\n\nNote that when you get back to writing up your publication, all the versions of\nthe tools can be found under the 'nf-core/eager Software Versions' section of\nthe MultiQC report. Note that all tools in the container are listed, so you may\nhave to remove some of them that you didn't actually use in the set up.\n\nFor example, in the example above, we have used: Nextflow, nf-core/eager,\nFastQC, AdapterRemoval, fastP, BWA, Samtools, endorS.py, Picard Markduplicates,\nQualimap, PreSeq, DamageProfiler, MALT, MaltExtract and MultiQC.\n\nCitations to all used tools can be seen\n[here](https://nf-co.re/eager#tool-references)\n\n##### Tutorial Metagenomics - Files for Downstream Analysis\n\nIf you wanted to look at the output of MALT more closely, such as in the GUI\nbased tool\n[MEGAN6](https://software-ab.informatik.uni-tuebingen.de/download/megan6/welcome.html),\nyou can find the `.rma6` files that is accepted by MEGAN under\n`metagenomic_classification/malt/`. The log file containing the information\nprinted to screen while MALT is running can also be found in this directory.\n\nAs we ran the HOPS pipeline (primarily the MaltExtract tool), we can look in\n`MaltExtract/results/` to find all the corresponding output files for the\nauthentication validation of the metagenomic screening (against the taxa you\nspecified in your `--maltextract_taxon_list`). First you can check the\n`heatmap_overview_Wevid.pdf` summary PDF from HOPS (again you will need to\neither mount the server or download), but to get the actual per-sample/taxon\ndamage patterns etc., you can look in `pdf_candidate_profiles`. In some cases\nthere maybe valid results that the HOPS 'postprocessing' script doesn't pick up.\nIn these cases you can go into the `default` directory to find all the raw text\nfiles which you can use to visualise and assess the authentication results\nyourself.\n\nFinally, if you want to re-run the taxonomic classification with a new database\nor tool, to find the raw `fastq/` files containing only unmapped reads that went\ninto MALT, you should go into `samtools/filter`. In here you will find files\nending in `unmapped.fastq.gz` for each library.\n\n#### Tutorial Metagenomics - Clean up\n\nFinally, I would recommend cleaning up your `work/` directory of any\nintermediate files (if your `-profile` does not already do so). You can do this\nby going to above your `results/` and `work/` directory, e.g.\n\n```bash\ncd /<path>/<to>/projectX_screening20200720\n```\n\nand running\n\n```bash\nnextflow clean -f -k\n```\n\n#### Tutorial Metagenomics - Summary\n\nIn this this tutorial we have described an example on how to set up a\nmetagenomic screening run of ancient microbiome samples. We have covered how to\nset up nf-core/eager to extract off-target reads in a form that can be used for\nMALT, and how to additionally run HOPS to authenticate expected taxa to be found\nin the human oral microbiome. Finally we have also described what to look for in\nthe MultiQC run summary report and where to find output files that can be used\nfor downstream analysis.\n\n### Tutorial - How to set up nf-core/eager for pathogen genomics\n\n#### Tutorial Pathogen Genomics - Introduction\n\nThis tutorial will give a basic example on how to set up nf-core/eager to\nperform bacterial genome reconstruction from samples in the context of ancient\npathogenomic research.\n\n> :warning: this tutorial does not describe how to install and set-up\n> nf-core/eager For this please see other documentation on the\n> [nf-co.re](https://nf-co.re/usage/installation) website.\n\nWe will describe how to set up mapping ancient pathogen samples against the\nreference of a targeted organism genome, to check sequencing and library\nquality-control, calculation of depth and breath of coverage, check for damage\nprofiles, feature-annotation statistics (e.g. for gene presence and absence),\nSNP calling, and producing an SNP alignment for its usage in downstream\nphylogenetic analysis.\n\nI will use as an example data from [Andrades Valtueña et al\n2017](https://doi.org/10.1016/j.cub.2017.10.025), who retrieved Late Neolithic/Bronze\nAge _Yersinia pestis_ genomes. This data is **very large shotgun data** and is\nnot ideal for testing, so running on your own data is recommended as otherwise\nrunning this data will require a lot of computing resources and time. However,\nnote the same procedure can equally be applied on shallow-shotgun and also\nwhole-genome enrichment data, so other than the TSV file you can apply this\ncommand explained below.\n\n> :warning: Please be aware that the settings used in this tutorial may not use\n> settings nor produce files you would actually use in 'real' analysis. The\n> settings are only specified for demonstration purposes. Please consult the\n> your colleagues, communities and the literature for optimal parameters.\n\n#### Tutorial Pathogen Genomics - Preparation\n\nPrior setting up the nf-core/eager run, we will need:\n\n1. Raw sequencing data in FASTQ format\n2. Reference genome in FASTA format, with associated pre-made `bwa`, `samtools`\n   and `picard SequenceDictionary` indices (however note these can be made for\n   you with nf-core/eager, but this can make a pipeline run take much longer!)\n3. A GFF file of gene sequence annotations (normally supplied with reference\n   genomes downloaded from NCBI Genomes, in this context from\n   [here](https://www.ncbi.nlm.nih.gov/genome/?term=Yersinia+pestis))\n4. [Optional] Previously made VCF GATK 3.5 files (see below for settings), of\n   previously published _Y. pestis_ genomes.\n\nWe should also ensure we have the very latest version of the nf-core/eager\npipeline so we have all latest bugfixes etc. In this case we will be using\nnf-core/eager version 2.2.0. You should always check on the\n[nf-core](https://nf-co.re/eager) website  whether a newer release has been made\n(particularly point releases e.g. 2.2.1).\n\n```bash\nnextflow pull nf-core/eager -r 2.2.0\n```\n\nIt is important to note that if you are planning on running multiple runs of\nnf-core/eager for a given project, that the version should be **kept the same**\nfor all runs to ensure consistency in settings for all of your libraries.\n\n#### Tutorial Pathogen Genomics - Inputs and Outputs\n\nTo start, lets make a directory where all your nf-core/eager related files for\nthis run will go, and change into it.\n\n```bash\nmkdir projectX_preprocessing20200727\ncd projectX_preprocessing20200727\n```\n\nThe first part of constructing any nf-core/eager run is specifying a few generic\nparameters that will often be common across all runs. This will be which\npipeline, version and _profile_ we will use. We will also specify a unique name\nof the run to help us keep track of all the nf-core/eager runs you may be\nrunning.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n<...>\n```\n\nFor the `-profile` parameter, I have indicated that I wish to use Singularity as\nmy software container environment, and I will use the MPI-SHH institutional\nconfig as listed on\n[nf-core/configs](https://github.com/nf-core/configs/blob/master/conf/shh.config).\nThese profiles specify settings\noptimised for the specific cluster/institution, such as maximum memory available\nor which scheduler queues to submit to. More explanations about configs and\nprofiles can be seen in the [nf-core\nwebsite](https://nf-co.re/usage/configuration) and the [profile\ntutorial](#tutorial---what-are-profiles-and-how-to-use-them).\n\nNext we need to specify our input data. nf-core/eager can accept input FASTQs\nfiles in two main ways, either with direct paths to files (with wildcards), or\nwith a Tab-Separate-Value (TSV) file which contains the paths and extra\nmetadata. In this example, we will use the TSV method, as to simulate a\nrealistic use-case, such as both receiving single-end and paired-end data from\nIllumina NextSeq _and_ Illumina HiSeqs of double-stranded libraries. Illumina\nNextSeqs sequence a given library across four different 'lanes', so for each\nlibrary you will receive four FASTQ files. Sometimes samples will be sequenced\nacross multiple HiSeq lanes to maintain complexity to improve imaging by of base\ncalls. The TSV input method is more useful for this context, as it allows\n'merging' of these lanes after preprocessing prior mapping (whereas direct paths\nwill consider each pair of FASTQ files as independent libraries/samples).\n\n```bash\nSample_Name   Library_ID    Lane    Colour_Chemistry    SeqType   Organism    Strandedness    UDG_Treatment   R1    R2    BAM\nKunilaII    KunilaII_nonUDG   4   4   PE    Yersinia pestis   double    none    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/007/ERR2112547/ERR2112547_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/007/ERR2112547/ERR2112547_2.fastq.gz    NA\nKunilaII    KunilaII_UDG    4   4   PE    Yersinia pestis   double    full    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/008/ERR2112548/ERR2112548_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/008/ERR2112548/ERR2112548_2.fastq.gz    NA\n6Post   6Post_PE    1   2   PE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/009/ERR2112549/ERR2112549_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/009/ERR2112549/ERR2112549_2.fastq.gz    NA\n6Post   6Post_PE    2   2   PE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/000/ERR2112550/ERR2112550_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/000/ERR2112550/ERR2112550_2.fastq.gz    NA\n6Post   6Post_PE    3   2   PE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/001/ERR2112551/ERR2112551_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/001/ERR2112551/ERR2112551_2.fastq.gz    NA\n6Post   6Post_PE    4   2   PE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/002/ERR2112552/ERR2112552_1.fastq.gz    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/002/ERR2112552/ERR2112552_2.fastq.gz    NA\n6Post   6Post_SE    1   4   SE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/009/ERR2112569/ERR2112569.fastq.gz    NA    NA\n6Post   6Post_SE    2   4   SE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/000/ERR2112570/ERR2112570.fastq.gz    NA    NA\n6Post   6Post_SE    3   4   SE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/001/ERR2112571/ERR2112571.fastq.gz    NA    NA\n6Post   6Post_SE    4   4   SE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/002/ERR2112572/ERR2112572.fastq.gz    NA    NA\n6Post   6Post_SE    8   4   SE    Yersinia pestis   double    half    ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR211/003/ERR2112573/ERR2112573.fastq.gz    NA    NA\n```\n\n> Note we also have a mixture of non-UDG and half-UDG treated libraries.\n\nYou can see that we have a single line for each set of FASTQ files representing\neach `Lane`, but the `Sample_Name` and `Library_ID` columns identify and group\nthem together accordingly. Secondly, as we have NextSeq data, we have specified\nwe have `2` for `Colour_Chemistry` vs `4` for HiSeq; something that is important\nfor downstream processing (see below). See the nf-core/eager\nparameter documentation above for more specifications on how to set up a TSV\nfile (e.g. why despite NextSeqs only having 4 lanes, we can also go up to 8 or\nmore when having a sample sequenced on two NextSeq runs).\n\nAlongside our input TSV file, we will also specify the paths to our reference\nFASTA file and the corresponding indices.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n<...>\n```\n\nWe specify the paths to each reference genome and it's corresponding tool\nspecific index. Paths should always be encapsulated in quotes to ensure Nextflow\nevaluates them, rather than your shell! Also note that as `bwa` generates\nmultiple index files, nf-core/eager takes a _directory_ that must contain these\nindices instead.\n\n> Note the difference between single and double `-` parameters. The former\n> represent Nextflow flags, while the latter are nf-core/eager specific flags.\n\nFinally, we can also specify the output directory and the Nextflow `work/`\ndirectory (which contains 'intermediate' working files and directories).\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n<...>\n```\n\n#### Tutorial Pathogen Genomics - Pipeline Configuration\n\nNow that we have specified the input data, we can start moving onto specifying\nsettings for each different module we will be running. As mentioned above, some\nof our samples were generated as NextSeq data, which is generated with a\ntwo-colour imaging technique. What this means is when you have shorter molecules\nthan the number of cycles of the sequencing chemistry, the sequencer will\nrepeatedly see 'G' calls (no colour) at the last few cycles, and you get long\npoly-G 'tails' on your reads. We therefore will turn on the poly-G clipping\nfunctionality offered by [`fastp`](https://github.com/OpenGene/fastp), and any\npairs of files indicated in the TSV file as having `2` in the `Colour_Chemistry`\ncolumn will be passed to `fastp` (the HiSeq data will not). We will not change\nthe default minimum length of a poly-G string to be clipped.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n<...>\n```\n\nWe then need to specify the mapping parameters for this run. Typically, to\naccount for damage of very old aDNA libraries and also sometimes for\nevolutionary divergence of the ancient genome to the modern reference, we should\nrelax the mapping thresholds that specify how many mismatches a read can have\nfrom the reference to be considered 'mapped'. We will also speed up the seeding\nstep of the seed-and-extend approach by specifying the length of the seed. We\nwill do this with `--bwaalnn` and `--bwaalnl` respectively.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n<...>\n```\n\nAs we are also interested at checking for gene presence/absence (see below), we\nwill ensure no mapping quality filter is applied (to account for gene\nduplication that may cause a read to map equally to to places) by setting the\nthreshold to 0. In addition, we will discard unmapped reads to reduce our\nhard-drive footprint.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n<...>\n```\n\nWhile some of our input data is paired-end, we will keep with the default of\nPicard's MarkDuplicates'for duplicate removal, as DeDup takes into account\nboth the start and end of a _merged_ read before flagging it as a duplicate -\nsomething that isn't valid for a single-end read (where the true end of the\nmolecule might not have been sequenced). We can then specify which dedupper we\nwant to use with `--dedupper`. While we are using the default (which does not\nneed to be directly specified), we will put it explicitly in our command for\nclarity.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n--dedupper 'markduplicates' \\\n<...>\n```\n\nAlongside making a SNP table for downstream phylogenetic analysis (we will get\nto this in a bit), you may be interested in generating some summary statistics\nof annotated parts of your reference genome, e.g. to see whether certain\nvirulence factors are present or absent. nf-core/eager offers some basic\nstatistics (percent and and depth coverage) of these via Bedtools. We will\ntherefore turn on this module and specify the GFF file we downloaded alongside\nour reference fasta. Note that this GFF file has a _lot_ of redundant data, so\noften a custom BED file with just genes of interest is recommended. Furthermore\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n--dedupper 'markduplicates' \\\n--run_bedtools_coverage \\\n--anno_file '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.gff'\n<...>\n```\n\nNext, we will set up trimming of the mapped reads to alleviate the effects of\nDNA damage during genotyping. To do this we will activate trimming with\n`--run_trim_bam`. The libraries in this example underwent either no or\n'half'-UDG treatment. The latter will generally restrict all remaining DNA\ndamage to the first 2 base pairs of a fragment. We will therefore use\n`--bamutils_clip_half_udg_left` and `--bamutils_clip_half_udg_right` to trim 2\nbp on either side of each fragment. For the non-UDG treated libraries we can\ntrim a little more to remove most damage with the `--bamutils_clip_none_udg_<*>`\nvariants of the flag. Note that there is a tendency in ancient pathogenomics to\ntrim damage _prior_ mapping, as it allows mapping with stricter parameters to\nimprove removal of reads deriving from potential evolutionary diverged\ncontaminants (this can be done nf-core/eager with the Bowtie2 aligner), however\nwe do BAM trimming instead here as another demonstration of functionality.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n--dedupper 'markduplicates' \\\n--run_bedtools_coverage \\\n--anno_file '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.gff'\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--bamutils_clip_double_stranded_none_udg_left 3 \\\n--bamutils_clip_double_stranded_none_udg_right 3 \\\n<...>\n```\n\nHere we will use MultiVCFAnalyzer for the generation of our SNP table. A\nMultiVCFAnalyzer SNP table allows downstream assessment of the level of\nmulti-allelic positions, something not expected when dealing with a single\nploidy organism and thus may reflect cross-mapping from multiple-strains,\nenvironmental relatives or other contaminants.\n\nFor this we need to run genotyping, but specifically with GATK UnifiedGenotyper\n3.5 (as MultiVCFAnalyzer requires this particular format of VCF files). We will\ntherefore turn on Genotyping, and\ncheck ploidy is set 2 so 'heterozygous' positions can be reported. We will also\nneed to specify that we want to use the trimmed bams from the previous step.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n--dedupper 'markduplicates' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--bamutils_clip_double_stranded_none_udg_left 3 \\\n--bamutils_clip_double_stranded_none_udg_right 3 \\\n--run_bedtools_coverage \\\n--anno_file '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.gff' \\\n--run_genotyping \\\n--genotyping_tool 'ug' \\\n--genotyping_source 'trimmed' \\\n--gatk_ploidy 2 \\\n--gatk_ug_mode 'EMIT_ALL_SITES' \\\n--gatk_ug_genotype_model 'SNP' \\\n<...>\n```\n\nFinally we can set up MultiVCFAnalyzer itself. First we want to make sure we\nspecified that we want to report the frequency of the given called allele at\neach position so we can assess cross mapping. Then, often with ancient\npathogens, such as _Y. pestis_, we also want to include to the SNP table\ncomparative data from previously published and ancient genomes. For this we\nspecify additional VCF files that have been generated in previous runs with the\nsame settings and reference genome. We can do this as follows.\n\n```bash\nnextflow run nf-core/eager \\\n-r 2.2.0 \\\n-profile singularity,shh \\\n-name 'projectX_preprocessing20200727' \\\n--input 'preprocessing20200727.tsv' \\\n--fasta '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa' \\\n--bwa_index '../Reference/genome/' \\\n--fasta_index '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.fai' \\\n--seq_dict '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.fa.dict' \\\n--outdir './results/' \\\n-w './work/' \\\n--complexity_filter_poly_g \\\n--bwaalnn 0.01 \\\n--bwaalnl 16 \\\n--run_bam_filtering \\\n--bam_mapping_quality_threshold 0 \\\n--bam_unmapped_type 'discard' \\\n--dedupper 'markduplicates' \\\n--run_trim_bam \\\n--bamutils_clip_double_stranded_half_udg_left 2 \\\n--bamutils_clip_double_stranded_half_udg_right 2 \\\n--bamutils_clip_double_stranded_none_udg_left 3 \\\n--bamutils_clip_double_stranded_none_udg_right 3 \\\n--run_bedtools_coverage \\\n--anno_file '../Reference/genome/Yersinia_pestis_C092_GCF_000009065.1_ASM906v1.gff' \\\n--run_genotyping \\\n--genotyping_tool 'ug' \\\n--genotyping_source 'trimmed' \\\n--gatk_ploidy 2 \\\n--gatk_ug_mode 'EMIT_ALL_SITES' \\\n--gatk_ug_genotype_model 'SNP' \\\n--run_multivcfanalyzer \\\n--write_allele_frequencies \\\n--min_base_coverage 5 \\\n--min_allele_freq_hom 0.9 \\\n--min_allele_freq_het 0.1 \\\n--additional_vcf_files '../vcfs/*.vcf.gz'\n```\n\nFor the two `min_allele_freq` parameters we specify that anything above 90%\nfrequency is considered 'homozygous', and anything above 10% (but below 90%) is\nconsidered an ambiguous call and the frequency will be reported. Note that you\nwould not normally use this SNP table with these parameters for downstream\nphylogenetic analysis, as the table will include ambiguous IUPAC codes, making\nit only useful for fine-comb checking of multi-allelic positions. Instead, set\nboth parameters to the same value (e.g. 0.8) and use that table for downstream\nphylogenetic analysis.\n\nWith this, we are ready to submit! If running on a remote cluster/server, Make\nsure to run this in a `screen` session or similar, so that if you get a `ssh`\nsignal drop or want to log off, Nextflow will not crash.\n\n#### Tutorial Pathogen Genomics - Results\n\nAssuming the run completed without any crashes (if problems do occur, check\nagainst [parameters](https://nf-core/eager/parameters) that all parameters are as expected, or\ncheck the [FAQ](#troubleshooting-and-faqs)), we can now check our results in\n`results/`.\n\n##### Tutorial Pathogen Genomics - MultiQC Report\n\nIn here there are many different directories containing different output files.\nThe first directory to check is the `MultiQC/` directory. You should\nfind a `multiqc_report.html` file. You will need to view this in a web browser,\nso I recommend either mounting your server to your file browser, or downloading\nit to your own local machine (PC/Laptop etc.).\n\nOnce you've opened this you can go through each section and evaluate all the\nresults. For example, I normally look for things like:\n\nGeneral Stats Table:\n\n* Do I see the expected number of raw sequencing reads (summed across each set\n  of FASTQ files per library) that was requested for sequencing?\n* Does the percentage of trimmed reads look normal for aDNA, and do lengths\n  after trimming look short as expected of aDNA?\n* Does the Endogenous DNA (%) columns look reasonable (high enough to indicate\n  you have received enough coverage for downstream, and/or do you lose an\n  unusually high reads after filtering )\n* Does ClusterFactor or '% Dups' look high (e.g. >2 or >10% respectively -  high\n  values suggesting over-amplified or badly preserved samples i.e. low\n  complexity; note that genome-enrichment libraries may by their nature look\n  higher).\n* Do you see an increased frequency of C>Ts on the 5' end of molecules in the\n  mapped reads?\n* Do median read lengths look relatively low (normally <= 100 bp) indicating\n  typically fragmented aDNA?\n* Does the % coverage decrease relatively gradually at each depth coverage, and\n  does not drop extremely drastically\n* Does the Median coverage and percent >3x (or whatever you set) show sufficient\n  coverage for reliable SNP calls and that a good proportion of the genome is\n  covered indicating you have the right reference genome?\n* Do you see a high proportion of % Hets, indicating many multi-allelic sites\n  (and possibly presence of cross-mapping from other species, that may lead to\n  false positive or less confident SNP calls)?\n\nFastQC (pre-AdapterRemoval):\n\n* Do I see any very early drop off of sequence quality scores suggesting\n  problematic sequencing run?\n* Do I see outlier GC content distributions?\n* Do I see high sequence duplication levels?\n\nAdapterRemoval:\n\n* Do I see high numbers of singletons or discarded read pairs?\n\nFastQC (post-AdapterRemoval):\n\n* Do I see improved sequence quality scores along the length of reads?\n* Do I see reduced adapter content levels?\n\nSamtools Flagstat (pre/post Filter):\n\n* Do I see outliers, e.g. with unusually low levels of mapped reads, (indicative\n  of badly preserved samples) that require downstream closer assessment?\n\nDeDup/Picard MarkDuplicates:\n\n* Do I see large numbers of duplicates being removed, possibly indicating\n  over-amplified or badly preserved samples?\n\nPreSeq:\n\n* Do I see a large drop off of a sample's curve away from the theoretical\n  complexity? If so, this may indicate it's not worth performing deeper\n  sequencing as you will get few unique reads (vs. duplicates that are not any\n  more informative than the reads you've already sequenced)\n\nDamageProfiler:\n\n* Do I see evidence of damage on the microbial DNA (i.e. a % C>T of more than ~5% in\n  the first few nucleotide positions?) ? If not, possibly your mapped\n  reads are deriving from modern contamination.\n\nQualiMap:\n\n* Do you see a peak of coverage (X) at a good level, e.g. >= 3x, indicating\n  sufficient coverage for reliable SNP calls?\n\nMultiVCFAnalyzer:\n\n* Do I have a good number of called SNPs that suggest the samples have genomes\n  with sufficient nucleotide diversity to inform phylogenetic analysis?\n* Do you have a large number of discarded SNP calls?\n* Are the % Hets very high indicating possible cross-mapping from off-target\n  organisms that may confounding variant calling?\n\n> Detailed documentation and descriptions for all MultiQC modules can be seen in\n> the the 'Documentation' folder of the results directory or here in the [output\n> documentation](output.md)\n\nIf you're happy everything looks good in terms of sequencing, we then look at\nspecific directories to find any files you might want to use for downstream\nprocessing.\n\nNote that when you get back to writing up your publication, all the versions of\nthe tools can be found under the 'nf-core/eager Software Versions' section of\nthe MultiQC report. Note that all tools in the container are listed, so you may\nhave to remove some of them that you didn't actually use in the set up.\n\nFor example, in the example above, we have used: Nextflow, nf-core/eager,\nFastQC, AdapterRemoval, fastP, BWA, Samtools, endorS.py, Picard Markduplicates,\nBedtools, Qualimap, PreSeq, DamageProfiler, MultiVCFAnalyzer and MultiQC.\n\nCitations to all used tools can be seen\n[here](https://nf-co.re/eager#tool-references)\n\n##### Tutorial Pathogen Genomics - Files for Downstream Analysis\n\nYou will find the most relevant output files in your `results/` directory. Each\ndirectory generally corresponds to a specific step or tool of the pipeline. Most\nimportantly you should look in `deduplication` for your de-duplicated BAM files\n(e.g. for viewing in IGV), bedtools for depth (X) and breadth (%) coverages of\nannotations of your reference (e.g. genes), `multivcfanalyzer` for final SNP\ntables etc that can be used for downstream phylogenetic applications.\n\n#### Tutorial Pathogen Genomics - Clean up\n\nFinally, I would recommend cleaning up your `work/` directory of any\nintermediate files (if your `-profile` does not already do so). You can do this\nby going to above your `results/` and `work/` directory, e.g.\n\n```bash\ncd /<path>/<to>/projectX_preprocessing20200727\n```\n\nand running\n\n```bash\nnextflow clean -f -k\n```\n\n#### Tutorial Pathogen Genomics - Summary\n\nIn this this tutorial we have described an example on how to set up an\nnf-core/eager run to process microbial aDNA for a relatively standard pathogen\ngenomics study for phylogenetics and basic functional screening. This includes\npreform some simple quality control checks, mapping, genotyping, and SNP table\ngeneration for downstream analysis of the data. Additionally, we described what\nto look for in the run summary report generated by MultiQC and where to find\noutput files that can be used for downstream analysis.\n"
  },
  {
    "path": "environment.yml",
    "content": "# You can use this file to create a conda environment for this pipeline:\n#   conda env create -f environment.yml\nname: nf-core-eager-2.5.3\nchannels:\n  - conda-forge\n  - bioconda\n  - defaults\ndependencies:\n  - conda-forge::python=3.9.4\n  - conda-forge::markdown=3.3.4\n  - conda-forge::pymdown-extensions=8.2\n  - conda-forge::pygments=2.14.0\n  - bioconda::rename=1.601\n  - conda-forge::openjdk=8.0.144 # Don't upgrade - required for GATK\n  - bioconda::fastqc=0.11.9\n  - bioconda::adapterremoval=2.3.2\n  - bioconda::adapterremovalfixprefix=0.0.5\n  - bioconda::bwa=0.7.17\n  - bioconda::picard=2.26.0\n  - bioconda::samtools=1.12\n  - bioconda::dedup=0.12.8\n  - bioconda::angsd=0.935\n  - bioconda::circularmapper=1.93.5\n  - bioconda::gatk4=4.2.0.0\n  - bioconda::gatk=3.5 ## Don't upgrade - required for MultiVCFAnalyzer\n  - bioconda::qualimap=2.2.2d\n  - bioconda::vcf2genome=0.91\n  - bioconda::damageprofiler=0.4.9 # Don't upgrade - later versions don't allow java 8\n  - bioconda::multiqc=1.16\n  - bioconda::pmdtools=0.60\n  - bioconda::bedtools=2.30.0\n  - conda-forge::libiconv=1.16\n  - conda-forge::pigz=2.6\n  - bioconda::sequencetools=1.5.2\n  - bioconda::preseq=3.1.2\n  - bioconda::fastp=0.20.1\n  - bioconda::bamutil=1.0.15\n  - bioconda::mtnucratio=0.7\n  - bioconda::pysam=0.16.0\n  - bioconda::kraken2=2.1.2\n  - conda-forge::pandas=1.2.4\n  - bioconda::freebayes=1.3.5\n  - bioconda::sexdeterrmine=1.1.2\n  - bioconda::multivcfanalyzer=0.85.2\n  - bioconda::hops=0.35\n  - bioconda::malt=0.61\n  - conda-forge::biopython=1.79\n  - conda-forge::xopen=1.1.0\n  - bioconda::bowtie2=2.4.4\n  - bioconda::eigenstratdatabasetools=1.0.2\n  - bioconda::mapdamage2=2.2.1\n  - bioconda::bbmap=38.92\n  - bioconda::bcftools=1.12"
  },
  {
    "path": "lib/Checks.groovy",
    "content": "import org.yaml.snakeyaml.Yaml\n\n/*\n * This file holds several functions used to perform standard checks for the nf-core pipeline template.\n */\n\nclass Checks {\n\n    static void check_conda_channels(log) {\n        Yaml parser = new Yaml()\n        def channels = []\n        try {\n            def config = parser.load(\"conda config --show channels\".execute().text)\n            channels = config.channels\n        } catch(NullPointerException | IOException e) {\n            log.warn \"Could not verify conda channel configuration.\"\n            return\n        }\n\n        // Check that all channels are present\n        def required_channels = ['conda-forge', 'bioconda', 'defaults']\n        def conda_check_failed = !required_channels.every { ch -> ch in channels }\n\n        // Check that they are in the right order\n        conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda'))\n        conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults'))\n\n        if (conda_check_failed) {\n            log.warn \"=============================================================================\\n\" +\n                     \"  There is a problem with your Conda configuration!\\n\\n\" + \n                     \"  You will need to set-up the conda-forge and bioconda channels correctly.\\n\" +\n                     \"  Please refer to https://bioconda.github.io/user/install.html#set-up-channels\\n\" +\n                     \"  NB: The order of the channels matters!\\n\" +\n                     \"===================================================================================\"\n        }\n    }\n\n    static void aws_batch(workflow, params) {\n        if (workflow.profile.contains('awsbatch')) {\n            assert (params.awsqueue && params.awsregion) : \"Specify correct --awsqueue and --awsregion parameters on AWSBatch!\"\n            // Check outdir paths to be S3 buckets if running on AWSBatch\n            // related: https://github.com/nextflow-io/nextflow/issues/813\n            assert params.outdir.startsWith('s3:')       : \"Outdir not on S3 - specify S3 Bucket to run on AWSBatch!\"\n            // Prevent trace files to be stored on S3 since S3 does not support rolling files.\n            assert !params.tracedir.startsWith('s3:')    :  \"Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.\"\n        }\n    }\n\n    static void hostname(workflow, params, log) {\n        Map colors = Headers.log_colours(params.monochrome_logs)\n        if (params.hostnames) {\n            def hostname = \"hostname\".execute().text.trim()\n            params.hostnames.each { prof, hnames ->\n                hnames.each { hname ->\n                    if (hostname.contains(hname) && !workflow.profile.contains(prof)) {\n                        log.info \"=${colors.yellow}====================================================${colors.reset}=\\n\" +\n                                  \"${colors.yellow}WARN: You are running with `-profile $workflow.profile`\\n\" +\n                                  \"      but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\\n\" +\n                                  \"      ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\\n\" +\n                                  \"=${colors.yellow}====================================================${colors.reset}=\"\n                    }\n                }\n            }\n        }\n    }\n\n    // Citation string\n    private static String citation(workflow) {\n        return \"If you use ${workflow.manifest.name} for your analysis please cite:\\n\\n\" +\n               \"* The pipeline\\n\" + \n               \"  https://doi.org/10.1101/2020.06.11.145615\\n\\n\" +\n               \"* The nf-core framework\\n\" +\n               \"  https://dx.doi.org/10.1038/s41587-020-0439-x\\n\" +\n               \"  https://rdcu.be/b1GjZ\\n\\n\" +\n               \"* Software dependencies\\n\" +\n               \"  https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md\"\n    }\n\n\n\n\n\n\n\n}\n"
  },
  {
    "path": "lib/Completion.groovy",
    "content": "/*\n * Functions to be run on completion of pipeline\n */\n\nclass Completion {\n    static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) {\n\n        // Set up the e-mail variables\n        def subject = \"[$workflow.manifest.name] Successful: $workflow.runName\"\n\n        if (!workflow.success) {\n            subject = \"[$workflow.manifest.name] FAILED: $workflow.runName\"\n        }\n\n        def summary = [:]\n        for (group in summary_params.keySet()) {\n            summary << summary_params[group]\n        }\n        \n        def misc_fields = [:]\n        misc_fields['Date Started']              = workflow.start\n        misc_fields['Date Completed']            = workflow.complete\n        misc_fields['Pipeline script file path'] = workflow.scriptFile\n        misc_fields['Pipeline script hash ID']   = workflow.scriptId\n        if (workflow.repository) misc_fields['Pipeline repository Git URL']    = workflow.repository\n        if (workflow.commitId)   misc_fields['Pipeline repository Git Commit'] = workflow.commitId\n        if (workflow.revision)   misc_fields['Pipeline Git branch/tag']        = workflow.revision\n        misc_fields['Nextflow Version']           = workflow.nextflow.version\n        misc_fields['Nextflow Build']             = workflow.nextflow.build\n        misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp\n\n        def email_fields = [:]\n        email_fields['version']             = workflow.manifest.version\n        email_fields['runName']             = workflow.runName\n        email_fields['success']             = workflow.success\n        email_fields['dateComplete']        = workflow.complete\n        email_fields['duration']            = workflow.duration\n        email_fields['exitStatus']          = workflow.exitStatus\n        email_fields['errorMessage']        = (workflow.errorMessage ?: 'None')\n        email_fields['errorReport']         = (workflow.errorReport ?: 'None')\n        email_fields['commandLine']         = workflow.commandLine\n        email_fields['projectDir']          = workflow.projectDir\n        email_fields['summary']             = summary << misc_fields\n        \n        // On success try attach the multiqc report\n        def mqc_report = null\n        try {\n            if (workflow.success) {\n                mqc_report = multiqc_report.getVal()\n                if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) {\n                    if (mqc_report.size() > 1) {\n                        log.warn \"[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one\"\n                    }\n                    mqc_report = mqc_report[0]\n                }\n            }\n        } catch (all) {\n            log.warn \"[$workflow.manifest.name] Could not attach MultiQC report to summary email\"\n        }\n\n        // Check if we are only sending emails on failure\n        def email_address = params.email\n        if (!params.email && params.email_on_fail && !workflow.success) {\n            email_address = params.email_on_fail\n        }\n\n        // Render the TXT template\n        def engine       = new groovy.text.GStringTemplateEngine()\n        def tf           = new File(\"$projectDir/assets/email_template.txt\")\n        def txt_template = engine.createTemplate(tf).make(email_fields)\n        def email_txt    = txt_template.toString()\n\n        // Render the HTML template\n        def hf            = new File(\"$projectDir/assets/email_template.html\")\n        def html_template = engine.createTemplate(hf).make(email_fields)\n        def email_html    = html_template.toString()\n\n        // Render the sendmail template\n        def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit \n        def smail_fields           = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: \"$projectDir\", mqcFile: mqc_report, mqcMaxSize:  max_multiqc_email_size.toBytes()]\n        def sf                     = new File(\"$projectDir/assets/sendmail_template.txt\")\n        def sendmail_template      = engine.createTemplate(sf).make(smail_fields)\n        def sendmail_html          = sendmail_template.toString()\n\n        // Send the HTML e-mail\n        Map colors = Headers.log_colours(params.monochrome_logs)\n        if (email_address) {\n            try {\n                if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') }\n                // Try to send HTML e-mail using sendmail\n                [ 'sendmail', '-t' ].execute() << sendmail_html\n                log.info \"-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-\"\n            } catch (all) {\n                // Catch failures and try with plaintext\n                def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ]\n                if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) {\n                    mail_cmd += [ '-A', mqc_report ]\n                }\n                mail_cmd.execute() << email_html\n                log.info \"-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-\"\n            }\n        }\n\n        // Write summary e-mail HTML to a file\n        def output_d = new File(\"${params.outdir}/pipeline_info/\")\n        if (!output_d.exists()) {\n            output_d.mkdirs()\n        }\n        def output_hf = new File(output_d, \"pipeline_report.html\")\n        output_hf.withWriter { w -> w << email_html }\n        def output_tf = new File(output_d, \"pipeline_report.txt\")\n        output_tf.withWriter { w -> w << email_txt }\n    }\n\n    static void summary(workflow, params, log, fail_percent_mapped=[:], pass_percent_mapped=[:]) {\n        Map colors = Headers.log_colours(params.monochrome_logs)\n\n        if (workflow.success) {\n            if (workflow.stats.ignoredCount == 0) {\n                log.info \"-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-\"\n            } else {\n                log.info \"-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-\"\n            }\n        } else {\n            Checks.hostname(workflow, params, log)\n            log.info \"-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-\"\n        }\n    }\n}\n"
  },
  {
    "path": "lib/Headers.groovy",
    "content": "/*\n * This file holds several functions used to render the nf-core ANSI header.\n */\n\nclass Headers {\n\n    private static Map log_colours(Boolean monochrome_logs) {\n        Map colorcodes = [:]\n        colorcodes['reset']       = monochrome_logs ? '' : \"\\033[0m\"\n        colorcodes['dim']         = monochrome_logs ? '' : \"\\033[2m\"\n        colorcodes['black']       = monochrome_logs ? '' : \"\\033[0;30m\"\n        colorcodes['green']       = monochrome_logs ? '' : \"\\033[0;32m\"\n        colorcodes['yellow']      = monochrome_logs ? '' :  \"\\033[0;33m\"\n        colorcodes['yellow_bold'] = monochrome_logs ? '' : \"\\033[1;93m\"\n        colorcodes['blue']        = monochrome_logs ? '' : \"\\033[0;34m\"\n        colorcodes['purple']      = monochrome_logs ? '' : \"\\033[0;35m\"\n        colorcodes['cyan']        = monochrome_logs ? '' : \"\\033[0;36m\"\n        colorcodes['white']       = monochrome_logs ? '' : \"\\033[0;37m\"\n        colorcodes['red']         = monochrome_logs ? '' : \"\\033[1;91m\"\n        return colorcodes\n    }\n\n    static String dashed_line(monochrome_logs) {\n        Map colors = log_colours(monochrome_logs)\n        return \"-${colors.dim}----------------------------------------------------${colors.reset}-\"\n    }\n\n    static String nf_core(workflow, monochrome_logs) {\n        Map colors = log_colours(monochrome_logs)\n        String.format(\n            \"\"\"\\n\n            ${dashed_line(monochrome_logs)}\n                                                    ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset}\n            ${colors.blue}        ___     __   __   __   ___     ${colors.green}/,-._.--~\\'${colors.reset}\n            ${colors.blue}  |\\\\ | |__  __ /  ` /  \\\\ |__) |__         ${colors.yellow}}  {${colors.reset}\n            ${colors.blue}  | \\\\| |       \\\\__, \\\\__/ |  \\\\ |___     ${colors.green}\\\\`-._,-`-,${colors.reset}\n                                                    ${colors.green}`._,._,\\'${colors.reset}\n            ${colors.purple}  ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset}\n            ${dashed_line(monochrome_logs)}\n            \"\"\".stripIndent()\n        )\n    }\n}\n"
  },
  {
    "path": "lib/NfcoreSchema.groovy",
    "content": "/*\n * This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template.\n */\n\nimport org.everit.json.schema.Schema\nimport org.everit.json.schema.loader.SchemaLoader\nimport org.everit.json.schema.ValidationException\nimport org.json.JSONObject\nimport org.json.JSONTokener\nimport org.json.JSONArray\nimport groovy.json.JsonSlurper\nimport groovy.json.JsonBuilder\n\nclass NfcoreSchema {\n\n    /*\n    * Function to loop over all parameters defined in schema and check\n    * whether the given paremeters adhere to the specificiations\n    */\n    /* groovylint-disable-next-line UnusedPrivateMethodParameter */\n    private static void validateParameters(params, jsonSchema, log) {\n        def has_error = false\n        //=====================================================================//\n        // Check for nextflow core params and unexpected params\n        def json = new File(jsonSchema).text\n        def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions')\n        def nf_params = [\n            // Options for base `nextflow` command\n            'bg',\n            'c',\n            'C',\n            'config',\n            'd',\n            'D',\n            'dockerize',\n            'h',\n            'log',\n            'q',\n            'quiet',\n            'syslog',\n            'v',\n            'version',\n\n            // Options for `nextflow run` command\n            'ansi',\n            'ansi-log',\n            'bg',\n            'bucket-dir',\n            'c',\n            'cache',\n            'config',\n            'dsl2',\n            'dump-channels',\n            'dump-hashes',\n            'E',\n            'entry',\n            'latest',\n            'lib',\n            'main-script',\n            'N',\n            'name',\n            'offline',\n            'params-file',\n            'pi',\n            'plugins',\n            'poll-interval',\n            'pool-size',\n            'profile',\n            'ps',\n            'qs',\n            'queue-size',\n            'r',\n            'resume',\n            'revision',\n            'stdin',\n            'stub',\n            'stub-run',\n            'test',\n            'w',\n            'with-charliecloud',\n            'with-conda',\n            'with-dag',\n            'with-docker',\n            'with-mpi',\n            'with-notification',\n            'with-podman',\n            'with-report',\n            'with-singularity',\n            'with-timeline',\n            'with-tower',\n            'with-trace',\n            'with-weblog',\n            'without-docker',\n            'without-podman',\n            'work-dir'\n        ]\n        def unexpectedParams = []\n\n        // Collect expected parameters from the schema\n        def expectedParams = []\n        for (group in schemaParams) {\n            for (p in group.value['properties']) {\n                expectedParams.push(p.key)\n            }\n        }\n\n        for (specifiedParam in params.keySet()) {\n            // nextflow params\n            if (nf_params.contains(specifiedParam)) {\n                log.error \"ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'\"\n                has_error = true\n            }\n            // unexpected params\n            def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params'\n            def expectedParamsLowerCase = expectedParams.collect{ it.replace(\"-\", \"\").toLowerCase() }\n            def specifiedParamLowerCase = specifiedParam.replace(\"-\", \"\").toLowerCase()\n            if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !expectedParamsLowerCase.contains(specifiedParamLowerCase)) {\n                // Temporarily remove camelCase/camel-case params #1035\n                def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace(\"-\", \"\").toLowerCase()}\n                if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){\n                    unexpectedParams.push(specifiedParam)\n                }\n            }\n        }\n\n        //=====================================================================//\n        // Validate parameters against the schema\n        InputStream inputStream = new File(jsonSchema).newInputStream()\n        JSONObject rawSchema = new JSONObject(new JSONTokener(inputStream))\n\n        // Remove anything that's in params.schema_ignore_params\n        rawSchema = removeIgnoredParams(rawSchema, params)\n\n        Schema schema = SchemaLoader.load(rawSchema)\n\n        // Clean the parameters\n        def cleanedParams = cleanParameters(params)\n\n        // Convert to JSONObject\n        def jsonParams = new JsonBuilder(cleanedParams)\n        JSONObject paramsJSON = new JSONObject(jsonParams.toString())\n\n        // Validate\n        try {\n            schema.validate(paramsJSON)\n        } catch (ValidationException e) {\n            println ''\n            log.error 'ERROR: Validation of pipeline parameters failed!'\n            JSONObject exceptionJSON = e.toJSON()\n            printExceptions(exceptionJSON, paramsJSON, log)\n            println ''\n            has_error = true\n        }\n\n        // Check for unexpected parameters\n        if (unexpectedParams.size() > 0) {\n            Map colors = log_colours(params.monochrome_logs)\n            println ''\n            def warn_msg = 'Found unexpected parameters:'\n            for (unexpectedParam in unexpectedParams) {\n                warn_msg = warn_msg + \"\\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}\"\n            }\n            log.warn warn_msg\n            log.info \"- ${colors.dim}Ignore this warning: params.schema_ignore_params = \\\"${unexpectedParams.join(',')}\\\" ${colors.reset}\"\n            println ''\n        }\n\n        if (has_error) {\n            System.exit(1)\n        }\n    }\n\n    // Loop over nested exceptions and print the causingException\n    private static void printExceptions(exJSON, paramsJSON, log) {\n        def causingExceptions = exJSON['causingExceptions']\n        if (causingExceptions.length() == 0) {\n            def m = exJSON['message'] =~ /required key \\[([^\\]]+)\\] not found/\n            // Missing required param\n            if (m.matches()) {\n                log.error \"* Missing required parameter: --${m[0][1]}\"\n            }\n            // Other base-level error\n            else if (exJSON['pointerToViolation'] == '#') {\n                log.error \"* ${exJSON['message']}\"\n            }\n            // Error with specific param\n            else {\n                def param = exJSON['pointerToViolation'] - ~/^#\\//\n                def param_val = paramsJSON[param].toString()\n                log.error \"* --${param}: ${exJSON['message']} (${param_val})\"\n            }\n        }\n        for (ex in causingExceptions) {\n            printExceptions(ex, paramsJSON, log)\n        }\n    }\n\n    // Remove an element from a JSONArray\n    private static JSONArray removeElement(jsonArray, element){\n        def list = []\n        int len = jsonArray.length()\n        for (int i=0;i<len;i++){\n            list.add(jsonArray.get(i).toString())\n        }\n        list.remove(element)\n        JSONArray jsArray = new JSONArray(list)\n        return jsArray\n    }\n\n    private static JSONObject removeIgnoredParams(rawSchema, params){\n        // Remove anything that's in params.schema_ignore_params\n        params.schema_ignore_params.split(',').each{ ignore_param ->\n            if(rawSchema.keySet().contains('definitions')){\n                rawSchema.definitions.each { definition ->\n                    for (key in definition.keySet()){\n                        if (definition[key].get(\"properties\").keySet().contains(ignore_param)){\n                            // Remove the param to ignore\n                            definition[key].get(\"properties\").remove(ignore_param)\n                            // If the param was required, change this\n                            if (definition[key].has(\"required\")) {\n                                def cleaned_required = removeElement(definition[key].required, ignore_param)\n                                definition[key].put(\"required\", cleaned_required)\n                            }\n                        }\n                    }\n                }\n            }\n            if(rawSchema.keySet().contains('properties') && rawSchema.get('properties').keySet().contains(ignore_param)) {\n                rawSchema.get(\"properties\").remove(ignore_param)\n            }\n            if(rawSchema.keySet().contains('required') && rawSchema.required.contains(ignore_param)) {\n                def cleaned_required = removeElement(rawSchema.required, ignore_param)\n                rawSchema.put(\"required\", cleaned_required)\n            }\n        }\n        return rawSchema\n    }\n\n    private static Map cleanParameters(params) {\n        def new_params = params.getClass().newInstance(params)\n        for (p in params) {\n            // remove anything evaluating to false\n            if (!p['value']) {\n                new_params.remove(p.key)\n            }\n            // Cast MemoryUnit to String\n            if (p['value'].getClass() == nextflow.util.MemoryUnit) {\n                new_params.replace(p.key, p['value'].toString())\n            }\n            // Cast Duration to String\n            if (p['value'].getClass() == nextflow.util.Duration) {\n                new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\\S)/, \"day\"))\n            }\n            // Cast LinkedHashMap to String\n            if (p['value'].getClass() == LinkedHashMap) {\n                new_params.replace(p.key, p['value'].toString())\n            }\n        }\n        return new_params\n    }\n\n     /*\n     * This method tries to read a JSON params file\n     */\n    private static LinkedHashMap params_load(String json_schema) {\n        def params_map = new LinkedHashMap()\n        try {\n            params_map = params_read(json_schema)\n        } catch (Exception e) {\n            println \"Could not read parameters settings from JSON. $e\"\n            params_map = new LinkedHashMap()\n        }\n        return params_map\n    }\n\n    private static Map log_colours(Boolean monochrome_logs) {\n        Map colorcodes = [:]\n\n        // Reset / Meta\n        colorcodes['reset']       = monochrome_logs ? '' : \"\\033[0m\"\n        colorcodes['bold']        = monochrome_logs ? '' : \"\\033[1m\"\n        colorcodes['dim']         = monochrome_logs ? '' : \"\\033[2m\"\n        colorcodes['underlined']  = monochrome_logs ? '' : \"\\033[4m\"\n        colorcodes['blink']       = monochrome_logs ? '' : \"\\033[5m\"\n        colorcodes['reverse']     = monochrome_logs ? '' : \"\\033[7m\"\n        colorcodes['hidden']      = monochrome_logs ? '' : \"\\033[8m\"\n\n        // Regular Colors\n        colorcodes['black']       = monochrome_logs ? '' : \"\\033[0;30m\"\n        colorcodes['red']         = monochrome_logs ? '' : \"\\033[0;31m\"\n        colorcodes['green']       = monochrome_logs ? '' : \"\\033[0;32m\"\n        colorcodes['yellow']      = monochrome_logs ? '' : \"\\033[0;33m\"\n        colorcodes['blue']        = monochrome_logs ? '' : \"\\033[0;34m\"\n        colorcodes['purple']      = monochrome_logs ? '' : \"\\033[0;35m\"\n        colorcodes['cyan']        = monochrome_logs ? '' : \"\\033[0;36m\"\n        colorcodes['white']       = monochrome_logs ? '' : \"\\033[0;37m\"\n\n        // Bold\n        colorcodes['bblack']      = monochrome_logs ? '' : \"\\033[1;30m\"\n        colorcodes['bred']        = monochrome_logs ? '' : \"\\033[1;31m\"\n        colorcodes['bgreen']      = monochrome_logs ? '' : \"\\033[1;32m\"\n        colorcodes['byellow']     = monochrome_logs ? '' : \"\\033[1;33m\"\n        colorcodes['bblue']       = monochrome_logs ? '' : \"\\033[1;34m\"\n        colorcodes['bpurple']     = monochrome_logs ? '' : \"\\033[1;35m\"\n        colorcodes['bcyan']       = monochrome_logs ? '' : \"\\033[1;36m\"\n        colorcodes['bwhite']      = monochrome_logs ? '' : \"\\033[1;37m\"\n\n        // Underline\n        colorcodes['ublack']      = monochrome_logs ? '' : \"\\033[4;30m\"\n        colorcodes['ured']        = monochrome_logs ? '' : \"\\033[4;31m\"\n        colorcodes['ugreen']      = monochrome_logs ? '' : \"\\033[4;32m\"\n        colorcodes['uyellow']     = monochrome_logs ? '' : \"\\033[4;33m\"\n        colorcodes['ublue']       = monochrome_logs ? '' : \"\\033[4;34m\"\n        colorcodes['upurple']     = monochrome_logs ? '' : \"\\033[4;35m\"\n        colorcodes['ucyan']       = monochrome_logs ? '' : \"\\033[4;36m\"\n        colorcodes['uwhite']      = monochrome_logs ? '' : \"\\033[4;37m\"\n\n        // High Intensity\n        colorcodes['iblack']      = monochrome_logs ? '' : \"\\033[0;90m\"\n        colorcodes['ired']        = monochrome_logs ? '' : \"\\033[0;91m\"\n        colorcodes['igreen']      = monochrome_logs ? '' : \"\\033[0;92m\"\n        colorcodes['iyellow']     = monochrome_logs ? '' : \"\\033[0;93m\"\n        colorcodes['iblue']       = monochrome_logs ? '' : \"\\033[0;94m\"\n        colorcodes['ipurple']     = monochrome_logs ? '' : \"\\033[0;95m\"\n        colorcodes['icyan']       = monochrome_logs ? '' : \"\\033[0;96m\"\n        colorcodes['iwhite']      = monochrome_logs ? '' : \"\\033[0;97m\"\n\n        // Bold High Intensity\n        colorcodes['biblack']     = monochrome_logs ? '' : \"\\033[1;90m\"\n        colorcodes['bired']       = monochrome_logs ? '' : \"\\033[1;91m\"\n        colorcodes['bigreen']     = monochrome_logs ? '' : \"\\033[1;92m\"\n        colorcodes['biyellow']    = monochrome_logs ? '' : \"\\033[1;93m\"\n        colorcodes['biblue']      = monochrome_logs ? '' : \"\\033[1;94m\"\n        colorcodes['bipurple']    = monochrome_logs ? '' : \"\\033[1;95m\"\n        colorcodes['bicyan']      = monochrome_logs ? '' : \"\\033[1;96m\"\n        colorcodes['biwhite']     = monochrome_logs ? '' : \"\\033[1;97m\"\n\n        return colorcodes\n    }\n\n    static String dashed_line(monochrome_logs) {\n        Map colors = log_colours(monochrome_logs)\n        return \"-${colors.dim}----------------------------------------------------${colors.reset}-\"\n    }\n\n    /*\n    Method to actually read in JSON file using Groovy.\n    Group (as Key), values are all parameters\n        - Parameter1 as Key, Description as Value\n        - Parameter2 as Key, Description as Value\n        ....\n    Group\n        -\n    */\n    private static LinkedHashMap params_read(String json_schema) throws Exception {\n        def json = new File(json_schema).text\n        def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions')\n        def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties')\n        /* Tree looks like this in nf-core schema\n         * definitions <- this is what the first get('definitions') gets us\n             group 1\n               title\n               description\n                 properties\n                   parameter 1\n                     type\n                     description\n                   parameter 2\n                     type\n                     description\n             group 2\n               title\n               description\n                 properties\n                   parameter 1\n                     type\n                     description\n         * properties <- parameters can also be ungrouped, outside of definitions\n            parameter 1\n             type\n             description\n        */\n\n        // Grouped params\n        def params_map = new LinkedHashMap()\n        schema_definitions.each { key, val ->\n            def Map group = schema_definitions.\"$key\".properties // Gets the property object of the group\n            def title = schema_definitions.\"$key\".title\n            def sub_params = new LinkedHashMap()\n            group.each { innerkey, value ->\n                sub_params.put(innerkey, value)\n            }\n            params_map.put(title, sub_params)\n        }\n\n        // Ungrouped params\n        def ungrouped_params = new LinkedHashMap()\n        schema_properties.each { innerkey, value ->\n            ungrouped_params.put(innerkey, value)\n        }\n        params_map.put(\"Other parameters\", ungrouped_params)\n\n        return params_map\n    }\n\n    /*\n     * Get maximum number of characters across all parameter names\n     */\n    private static Integer params_max_chars(params_map) {\n        Integer max_chars = 0\n        for (group in params_map.keySet()) {\n            def group_params = params_map.get(group)  // This gets the parameters of that particular group\n            for (param in group_params.keySet()) {\n                if (param.size() > max_chars) {\n                    max_chars = param.size()\n                }\n            }\n        }\n        return max_chars\n    }\n\n    /*\n     * Beautify parameters for --help\n     */\n    private static String params_help(workflow, params, json_schema, command) {\n        Map colors = log_colours(params.monochrome_logs)\n        Integer num_hidden = 0\n        String output  = ''\n        output        += 'Typical pipeline command:\\n\\n'\n        output        += \"  ${colors.cyan}${command}${colors.reset}\\n\\n\"\n        Map params_map = params_load(json_schema)\n        Integer max_chars  = params_max_chars(params_map) + 1\n        Integer desc_indent = max_chars + 14\n        Integer dec_linewidth = 160 - desc_indent\n        for (group in params_map.keySet()) {\n            Integer num_params = 0\n            String group_output = colors.underlined + colors.bold + group + colors.reset + '\\n'\n            def group_params = params_map.get(group)  // This gets the parameters of that particular group\n            for (param in group_params.keySet()) {\n                if (group_params.get(param).hidden && !params.show_hidden_params) {\n                    num_hidden += 1\n                    continue;\n                }\n                def type = '[' + group_params.get(param).type + ']'\n                def description = group_params.get(param).description\n                def defaultValue = group_params.get(param).default ? \" [default: \" + group_params.get(param).default.toString() + \"]\" : ''\n                def description_default = description + colors.dim + defaultValue + colors.reset\n                // Wrap long description texts\n                // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap\n                if (description_default.length() > dec_linewidth){\n                    List olines = []\n                    String oline = \"\" // \" \" * indent\n                    description_default.split(\" \").each() { wrd ->\n                        if ((oline.size() + wrd.size()) <= dec_linewidth) {\n                            oline += wrd + \" \"\n                        } else {\n                            olines += oline\n                            oline = wrd + \" \"\n                        }\n                    }\n                    olines += oline\n                    description_default = olines.join(\"\\n\" + \" \" * desc_indent)\n                }\n                group_output += \"  --\" +  param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\\n'\n                num_params += 1\n            }\n            group_output += '\\n'\n            if (num_params > 0){\n                output += group_output\n            }\n        }\n        output += dashed_line(params.monochrome_logs)\n        if (num_hidden > 0){\n            output += colors.dim + \"\\n Hiding $num_hidden params, use --show_hidden_params to show.\\n\" + colors.reset\n            output += dashed_line(params.monochrome_logs)\n        }\n        return output\n    }\n\n    /*\n     * Groovy Map summarising parameters/workflow options used by the pipeline\n     */\n    private static LinkedHashMap params_summary_map(workflow, params, json_schema) {\n        // Get a selection of core Nextflow workflow options\n        def Map workflow_summary = [:]\n        if (workflow.revision) {\n            workflow_summary['revision'] = workflow.revision\n        }\n        workflow_summary['runName']      = workflow.runName\n        if (workflow.containerEngine) {\n            workflow_summary['containerEngine'] = workflow.containerEngine\n        }\n        if (workflow.container) {\n            workflow_summary['container'] = workflow.container\n        }\n        workflow_summary['launchDir']    = workflow.launchDir\n        workflow_summary['workDir']      = workflow.workDir\n        workflow_summary['projectDir']   = workflow.projectDir\n        workflow_summary['userName']     = workflow.userName\n        workflow_summary['profile']      = workflow.profile\n        workflow_summary['configFiles']  = workflow.configFiles.join(', ')\n\n        // Get pipeline parameters defined in JSON Schema\n        def Map params_summary = [:]\n        def blacklist  = ['hostnames']\n        def params_map = params_load(json_schema)\n        for (group in params_map.keySet()) {\n            def sub_params = new LinkedHashMap()\n            def group_params = params_map.get(group)  // This gets the parameters of that particular group\n            for (param in group_params.keySet()) {\n                if (params.containsKey(param) && !blacklist.contains(param)) {\n                    def params_value = params.get(param)\n                    def schema_value = group_params.get(param).default\n                    def param_type   = group_params.get(param).type\n                    if (schema_value != null) {\n                        if (param_type == 'string') {\n                            if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) {\n                                def sub_string = schema_value.replace('\\$projectDir', '')\n                                sub_string     = sub_string.replace('\\${projectDir}', '')\n                                if (params_value.contains(sub_string)) {\n                                    schema_value = params_value\n                                }\n                            }\n                            if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) {\n                                def sub_string = schema_value.replace('\\$params.outdir', '')\n                                sub_string     = sub_string.replace('\\${params.outdir}', '')\n                                if (\"${params.outdir}${sub_string}\" == params_value) {\n                                    schema_value = params_value\n                                }\n                            }\n                        }\n                    }\n\n                    // We have a default in the schema, and this isn't it\n                    if (schema_value != null && params_value != schema_value) {\n                        sub_params.put(param, params_value)\n                    }\n                    // No default in the schema, and this isn't empty\n                    else if (schema_value == null && params_value != \"\" && params_value != null && params_value != false) {\n                        sub_params.put(param, params_value)\n                    }\n                }\n            }\n            params_summary.put(group, sub_params)\n        }\n        return [ 'Core Nextflow options' : workflow_summary ] << params_summary\n    }\n\n    /*\n     * Beautify parameters for summary and return as string\n     */\n    private static String params_summary_log(workflow, params, json_schema) {\n        Map colors = log_colours(params.monochrome_logs)\n        String output  = ''\n        def params_map = params_summary_map(workflow, params, json_schema)\n        def max_chars  = params_max_chars(params_map)\n        for (group in params_map.keySet()) {\n            def group_params = params_map.get(group)  // This gets the parameters of that particular group\n            if (group_params) {\n                output += colors.bold + group + colors.reset + '\\n'\n                for (param in group_params.keySet()) {\n                    output += \"  \" + colors.blue + param.padRight(max_chars) + \": \" + colors.green +  group_params.get(param) + colors.reset + '\\n'\n                }\n                output += '\\n'\n            }\n        }\n        output += dashed_line(params.monochrome_logs)\n        output += colors.dim + \"\\n Only displaying parameters that differ from defaults.\\n\" + colors.reset\n        output += dashed_line(params.monochrome_logs)\n        return output\n    }\n\n}\n"
  },
  {
    "path": "main.nf",
    "content": "#!/usr/bin/env nextflow\n/*\n------------------------------------------------------------------------------------------------------------\n                         nf-core/eager\n------------------------------------------------------------------------------------------------------------\n EAGER Analysis Pipeline. Started 2018-06-05\n #### Homepage / Documentation\n https://github.com/nf-core/eager\n #### Authors\n For a list of authors and contributors, see: https://github.com/nf-core/eager/tree/dev#authors-alphabetical\n------------------------------------------------------------------------------------------------------------\n*/\nnextflow.enable.dsl=1\n\nlog.info Headers.nf_core(workflow, params.monochrome_logs)\n\n////////////////////////////////////////////////////\n/* --               PRINT HELP                 -- */\n////////////////////////////////////////////////////+\ndef json_schema = \"$projectDir/nextflow_schema.json\"\nif (params.help) {\n    def command = \"nextflow run nf-core/eager --input '*_R{1,2}.fastq.gz' -profile docker\"\n    log.info NfcoreSchema.params_help(workflow, params, json_schema, command)\n    exit 0\n}\n\n////////////////////////////////////////////////////\n/* --         VALIDATE PARAMETERS              -- */\n////////////////////////////////////////////////////+\nif (params.validate_params) {\n    NfcoreSchema.validateParameters(params, json_schema, log)\n}\n\n// Validate BAM input isn't set to paired_end\nif ( params.bam && !params.single_end ) {\n  exit 1, \"[nf-core/eager] error: bams can only be specified with --single_end. Please check input command.\"\n}\n\n// Do not allow input bams to be suffixed with '.unmapped.bam'\nif (params.bam && params.input.endsWith('.unmapped.bam')) {\n  exit 1, \"[nf-core/eager] error: Input BAM file names ending in '.unmapped.bam' are not allowed. Please rename your input BAM(s).\"\n}\n\n// Validate that skip_collapse is only set to True for paired_end reads!\nif (!has_extension(params.input, \"tsv\") && params.skip_collapse  && params.single_end){\n    exit 1, \"[nf-core/eager] error: --skip_collapse can only be set for paired_end samples.\"\n}\n\n// Validate not trying to both skip collapse and skip trim\nif ( params.skip_collapse && params.skip_trim ) {\n  exit 1, \"[nf-core/eager error]: you have specified to skip both merging and trimming of paired end samples. Use --skip_adapterremoval instead.\"\n}\n\n// Bedtools validation\nif( params.run_bedtools_coverage && !params.anno_file ){\n  exit 1, \"[nf-core/eager] error: you have turned on bedtools coverage, but not specified a BED or GFF file with --anno_file. Please validate your parameters.\"\n}\n\n// Bedtools validation\nif( !params.skip_preseq && !( params.preseq_mode == 'c_curve' || params.preseq_mode == 'lc_extrap' ) ) {\n  exit 1, \"[nf-core/eager] error: you are running preseq with a unsupported mode. See documentation for more information. You gave: ${params.preseq_mode}.\"\n}\n\n// BAM filtering validation\nif (!params.run_bam_filtering && params.bam_mapping_quality_threshold != 0) {\n  exit 1, \"[nf-core/eager] error: please turn on BAM filtering if you want to perform mapping quality filtering! Provide: --run_bam_filtering.\"\n}\n\nif (params.dedupper == 'dedup' && !params.mergedonly) {\n    log.warn \"[nf-core/eager] Warning: you are using DeDup but without specifying --mergedonly for AdapterRemoval, dedup will likely fail! See documentation for more information.\"\n}\n\n// Genotyping validation\nif (params.run_genotyping){\n\n  if (params.genotyping_tool == 'pileupcaller' && ( !params.pileupcaller_bedfile || !params.pileupcaller_snpfile ) ) {\n    exit 1, \"[nf-core/eager] error: please check your pileupCaller bed file and snp file parameters. You must supply a bed file and a snp file.\"\n  }\n\n  if (params.genotyping_tool == 'angsd' && ! ( params.angsd_glformat == 'text' || params.angsd_glformat == 'binary' || params.angsd_glformat == 'binary_three' || params.angsd_glformat == 'beagle' ) ) {\n    exit 1, \"[nf-core/eager] error: please check your ANGSD output format! Options: 'text', 'binary', 'binary_three', 'beagle'. Found parameter: --angsd_glformat '${params.angsd_glformat}'.\"\n  }\n}\n\n// Consensus sequence generation validation\nif (params.run_vcf2genome) {\n    if (!params.run_genotyping) {\n      exit 1, \"[nf-core/eager] error: consensus sequence generation requires genotyping via UnifiedGenotyper on be turned on with the parameter --run_genotyping and --genotyping_tool 'ug'. Please check your genotyping parameters.\"\n    }\n\n    if (params.genotyping_tool != 'ug') {\n      exit 1, \"[nf-core/eager] error: consensus sequence generation requires genotyping via UnifiedGenotyper on be turned on with the parameter --run_genotyping and --genotyping_tool 'ug'. Found parameter: --genotyping_tool '${params.genotyping_tool}'.\"\n    }\n}\n\n// MultiVCFAnalyzer validation\nif (params.run_multivcfanalyzer) {\n  if (!params.run_genotyping) {\n    exit 1, \"[nf-core/eager] error: MultiVCFAnalyzer requires genotyping to be turned on with the parameter --run_genotyping. Please check your genotyping parameters.\"\n  }\n\n  if (params.genotyping_tool != \"ug\") {\n    exit 1, \"[nf-core/eager] error: MultiVCFAnalyzer only accepts VCF files from GATK UnifiedGenotyper. Found parameter: --genotyping_tool '${params.genotyping_tool}'.\"\n  }\n\n  if (params.gatk_ploidy != 2) {\n    exit 1, \"[nf-core/eager] error: MultiVCFAnalyzer only accepts VCF files generated with a GATK ploidy set to 2. Found parameter: --gatk_ploidy ${params.gatk_ploidy}.\"\n  }\n\n  if (params.additional_vcf_files) {\n      ch_extravcfs_for_multivcfanalyzer = Channel.fromPath(params.additional_vcf_files, checkIfExists: true)\n  }\n}\n\nif (params.run_metagenomic_screening) {\n\n  if ( !params.run_bam_filtering ) {\n  exit 1, \"[nf-core/eager] error: metagenomic classification can only run on unmapped reads. Please supply --run_bam_filtering --bam_unmapped_type 'fastq'.\"\n  }\n\n  if ( params.bam_unmapped_type != \"fastq\" ) {\n  exit 1, \"[nf-core/eager] error: metagenomic classification can only run on unmapped reads. Please supply --bam_unmapped_type 'fastq'. Supplied: --bam_unmapped_type '${params.bam_unmapped_type}'.\"\n  }\n\n  if (!params.database) {\n    exit 1, \"[nf-core/eager] error: metagenomic classification requires a path to a database directory. Please specify one with --database '/path/to/database/'.\"\n  }\n\n  if (params.metagenomic_tool == 'malt' && params.malt_min_support_mode == 'percent' && params.metagenomic_min_support_reads != 1) {\n    exit 1, \"[nf-core/eager] error: incompatible MALT min support configuration. Percent can only be used with --malt_min_support_percent. You modified: --metagenomic_min_support_reads.\"\n  }\n\n  if (params.metagenomic_tool == 'malt' && params.malt_min_support_mode == 'reads' && params.malt_min_support_percent != 0.01) {\n    exit 1, \"[nf-core/eager] error: incompatible MALT min support configuration. Reads can only be used with --malt_min_supportreads. You modified: --malt_min_support_percent.\"\n  }\n\n  if (!params.metagenomic_min_support_reads.toString().isInteger()){\n    exit 1, \"[nf-core/eager] error: incompatible min_support_reads configuration. min_support_reads can only be used with integers. --metagenomic_min_support_reads Found parameter: ${params.metagenomic_min_support_reads}.\"\n  }\n}\n\n// MaltExtract validation\nif (params.run_maltextract) {\n\n  if (params.run_metagenomic_screening && params.metagenomic_tool != 'malt') {\n    exit 1, \"[nf-core/eager] error: MaltExtract can only accept MALT output. Please supply --metagenomic_tool 'malt'. Found parameter: --metagenomic_tool '${params.metagenomic_tool}'\"\n  }\n\n  if (params.run_metagenomic_screening && params.metagenomic_tool != 'malt') {\n    exit 1, \"[nf-core/eager] error: MaltExtract can only accept MALT output. Please supply --metagenomic_tool 'malt'. Found parameter: --metagenomic_tool '${params.metagenomic_tool}'\"\n  }\n\n  if (!params.maltextract_taxon_list) {\n    exit 1, \"[nf-core/eager] error: MaltExtract requires a taxon list specifying the target taxa of interest. Specify the file with --params.maltextract_taxon_list.\"\n  }\n}\n\n/////////////////////////////////////////////////////////\n/* --          VALIDATE INPUT FILES                 -- */\n/////////////////////////////////////////////////////////\n\n// Set up channels for annotation file\nif (!params.run_bedtools_coverage){\n  ch_anno_for_bedtools = Channel.empty()\n} else {\n  ch_anno_for_bedtools = Channel.fromPath(params.anno_file, checkIfExists: true)\n    .ifEmpty { exit 1, \"[nf-core/eager] error: bedtools annotation file not found. Supplied parameter: --anno_file ${params.anno_file}.\"}\n}\n\nif (params.fasta) {\n    file(params.fasta, checkIfExists: true)\n    lastPath = params.fasta.lastIndexOf(File.separator)\n    lastExt = params.fasta.lastIndexOf(\".\")\n    fasta_base = params.fasta.substring(lastPath+1)\n    index_base = params.fasta.substring(lastPath+1,lastExt)\n    if (params.fasta.endsWith('.gz')) {\n        fasta_base = params.fasta.substring(lastPath+1,lastExt)\n        index_base = fasta_base.substring(0,fasta_base.lastIndexOf(\".\"))\n\n    }\n} else {\n    exit 1, \"[nf-core/eager] error: please specify --fasta with the path to your reference\"\n}\n\n// Validate reference inputs\nif(\"${params.fasta}\".endsWith(\".gz\")){\n    process unzip_reference{\n        tag \"${zipped_fasta}\"\n\n        input:\n        path zipped_fasta from file(params.fasta) // path doesn't like it if a string of an object is not prefaced with a root dir (/), so use file() to resolve string before parsing to `path` \n\n        output:\n        path \"$unzip\" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler, ch_fasta_for_mapdamage ,ch_fasta_for_qualimap,ch_unmasked_fasta_for_masking,ch_unmasked_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd,ch_fasta_for_damagerescaling,ch_fasta_for_bcftools_stats\n\n        script:\n        unzip = zipped_fasta.toString() - '.gz'\n        \"\"\"\n        pigz -f -d -p ${task.cpus} $zipped_fasta\n        \"\"\"\n        }\n    } else {\n    fasta_for_indexing = Channel\n    .fromPath(\"${params.fasta}\", checkIfExists: true)\n    .into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_mapdamage; ch_fasta_for_qualimap; ch_unmasked_fasta_for_masking; ch_unmasked_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer; ch_fasta_for_genotyping_angsd; ch_fasta_for_damagerescaling; ch_fasta_for_bcftools_stats }\n}\n\n// Check that fasta index file path ends in '.fai'\nif (params.fasta_index && !params.fasta_index.endsWith(\".fai\")) {\n    exit 1, \"The specified fasta index file (${params.fasta_index}) is not valid. Fasta index files should end in '.fai'.\"\n}\n\n// Check if genome exists in the config file. params.genomes is from igenomes.conf, params.genome specified by user\nif (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) {\n    exit 1, \"The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}\"\n}\n\n// Index files provided? Then check whether they are correct and complete\nif( params.bwa_index && (params.mapper == 'bwaaln' | params.mapper == 'bwamem' | params.mapper == 'circularmapper')){\n    Channel\n        .fromPath(params.bwa_index, checkIfExists: true)\n        .ifEmpty { exit 1, \"[nf-core/eager] error: bwa indices not found in: ${index_base}.\" }\n        .into {bwa_index; bwa_index_bwamem}\n\n    bt2_index = Channel.empty()\n}\n\nif( params.bt2_index && params.mapper == 'bowtie2' ){\n    lastPath = params.bt2_index.lastIndexOf(File.separator)\n    bt2_dir =  params.bt2_index.substring(0,lastPath+1)\n    bt2_base = params.bt2_index.substring(lastPath+1)\n\n    Channel\n        .fromPath(params.bt2_index, checkIfExists: true)\n        .ifEmpty { exit 1, \"[nf-core/eager] error: bowtie2 indices not found in: ${bt2_dir}.\" }\n        .into {bt2_index; bt2_index_bwamem}\n\n    bwa_index = Channel.empty()\n    bwa_index_bwamem = Channel.empty()\n}\n\n// Adapter removal adapter-list setup\nif ( !params.clip_adapters_list ) {\n    Channel\n      .fromPath(\"$projectDir/assets/nf-core_eager_dummy2.txt\", checkIfExists: true)\n      .ifEmpty { exit 1, \"[nf-core/eager] error: adapters list file not found. Please check input. Supplied: --clip_adapters_list '${params.clip_adapters_list}'.\" }\n      .collect()\n      .set {ch_adapterlist}\n} else {\n    Channel\n      .fromPath(\"${params.clip_adapters_list}\", checkIfExists: true)\n      .ifEmpty { exit 1, \"[nf-core/eager] error: adapters list file not found. Please check input. Supplied: --clip_adapters_list '${params.clip_adapters_list}'.\" }\n      .collect()\n      .set {ch_adapterlist}\n}\n\nif ( params.snpcapture_bed ) {\n    ch_snpcapture_bed = Channel.fromPath(params.snpcapture_bed, checkIfExists: true).collect()\n} else {\n    ch_snpcapture_bed = Channel.fromPath(\"$projectDir/assets/nf-core_eager_dummy.txt\").collect()\n}\n\n// Set up channel with pmdtools reference mask bedfile\nif (!params.pmdtools_reference_mask) {\n  ch_bedfile_for_reference_masking = Channel.fromPath(\"$projectDir/assets/nf-core_eager_dummy.txt\").collect()\n} else {\n  ch_bedfile_for_reference_masking = Channel.fromPath(params.pmdtools_reference_mask, checkIfExists: true).collect()\n}\n\n// SexDetermination channel set up and bedfile validation\nif (!params.sexdeterrmine_bedfile) {\n  ch_bed_for_sexdeterrmine = Channel.fromPath(\"$projectDir/assets/nf-core_eager_dummy.txt\").collect()\n} else {\n  ch_bed_for_sexdeterrmine = Channel.fromPath(params.sexdeterrmine_bedfile, checkIfExists: true).collect()\n}\n\n // pileupCaller channel generation and input checks for 'random sampling' genotyping\nif (!params.pileupcaller_bedfile) {\n  ch_bed_for_pileupcaller = Channel.fromPath(\"$projectDir/assets/nf-core_eager_dummy.txt\").collect()\n} else {\n  ch_bed_for_pileupcaller = Channel.fromPath(params.pileupcaller_bedfile, checkIfExists: true).collect()\n}\n\nif (!params.pileupcaller_snpfile) {\n  ch_snp_for_pileupcaller = Channel.fromPath(\"$projectDir/assets/nf-core_eager_dummy2.txt\").collect()\n} else {\n  ch_snp_for_pileupcaller = Channel.fromPath(params.pileupcaller_snpfile, checkIfExists: true).collect()\n}\n\n// Create input channel for MALT database directory, checking directory exists\nif ( !params.database ) {\n    ch_db_for_malt = Channel.empty()\n} else {\n    ch_db_for_malt = Channel.fromPath(params.database, checkIfExists: true)\n}\n\n// Create input channel for MaltExtract taxon list, to allow downloading of taxon list, checking file exists.\nif ( !params.maltextract_taxon_list ) {\n    ch_taxonlist_for_maltextract = Channel.empty()\n} else {\n    ch_taxonlist_for_maltextract = Channel.fromPath(params.maltextract_taxon_list, checkIfExists: true)\n}\n\n// Create input channel for MaltExtract NCBI files, checking files exists.\nif ( !params.maltextract_ncbifiles ) {\n    ch_ncbifiles_for_maltextract = Channel.empty()\n} else {\n    ch_ncbifiles_for_maltextract = Channel.fromPath(params.maltextract_ncbifiles, checkIfExists: true)\n}\n\n////////////////////////////////////////////////////\n/* --     Collect configuration parameters     -- */\n////////////////////////////////////////////////////\n\n// Check if genome exists in the config file\nif (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) {\n    exit 1, \"The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}\"\n}\n\n// Check AWS batch settings\nif (workflow.profile.contains('awsbatch')) {\n    // AWSBatch sanity checking\n    if (!params.awsqueue || !params.awsregion) exit 1, 'Specify correct --awsqueue and --awsregion parameters on AWSBatch!'\n    // Check outdir paths to be S3 buckets if running on AWSBatch\n    // related: https://github.com/nextflow-io/nextflow/issues/813\n    if (!params.outdir.startsWith('s3:')) exit 1, 'Outdir not on S3 - specify S3 Bucket to run on AWSBatch!'\n    // Prevent trace files to be stored on S3 since S3 does not support rolling files.\n    if (params.tracedir.startsWith('s3:')) exit 1, 'Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.'\n}\n\nch_multiqc_config = file(\"$projectDir/assets/multiqc_config.yaml\", checkIfExists: true)\nch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty()\nch_eager_logo = file(\"$projectDir/docs/images/nf-core_eager_logo_outline_drop.png\")\nch_output_docs = file(\"$projectDir/docs/output.md\", checkIfExists: true)\nch_output_docs_images = file(\"$projectDir/docs/images/\", checkIfExists: true)\nwhere_are_my_files = file(\"$projectDir/assets/where_are_my_files.txt\")\n\n///////////////////////////////////////////////////\n/* --    INPUT FILE LOADING AND VALIDATING    -- */\n///////////////////////////////////////////////////\n\n// check if we have valid --reads or --input\nif (!params.input) {\n  exit 1, \"[nf-core/eager] error: --input was not supplied! Please check '--help' or documentation under 'running the pipeline' for details\"\n}\n\n// Read in files properly from TSV file\ntsv_path = null\nif (params.input && (has_extension(params.input, \"tsv\"))) tsv_path = params.input\n\nch_input_sample = Channel.empty()\n\nif (tsv_path) {\n\n    tsv_file = file(tsv_path)\n    \n    if (tsv_file instanceof List) exit 1, \"[nf-core/eager] error: can only accept one TSV file per run.\"\n    if (!tsv_file.exists()) exit 1, \"[nf-core/eager] error: input TSV file could not be found. Does the file exist and is it in the right place? You gave the path: ${params.input}\"\n\n    ch_input_sample = extract_data(tsv_path)\n\n} else if (params.input && !has_extension(params.input, \"tsv\")) {\n\n    log.info \"\"\n    log.info \"No TSV file provided - creating TSV from supplied directory.\"\n    log.info \"Reading path(s): ${params.input}\\n\"\n    inputSample = retrieve_input_paths(params.input, params.colour_chemistry, params.single_end, params.single_stranded, params.udg_type, params.bam)\n    ch_input_sample = inputSample\n\n} else exit 1, \"[nf-core/eager] error: --input file(s) not correctly not supplied or improperly defined, see '--help' flag and documentation under 'running the pipeline' for details.\"\n\nch_input_sample\n  .into { ch_input_sample_downstream; ch_input_sample_check }\n\n///////////////////////////////////////////////////\n/* --         INPUT CHANNEL CREATION          -- */\n///////////////////////////////////////////////////\n\n// Check we don't have any duplicate file names\nch_input_sample_check\n    .map {\n      it ->\n        def r1 = file(it[8]).getName()\n        def r2 = file(it[9]).getName()\n        def bam = file(it[10]).getName()\n\n        // Throw error and exit if the input bam has a name ending in '.unmapped.bam'\n        if (bam.endsWith('.unmapped.bam')) { exit 1, \"[nf-core/eager] error: Input BAM file names ending in '.unmapped.bam' are not allowed. Please rename your input BAM(s).\" }\n\n      [r1, r2, bam]\n\n    }\n    .collect()\n    .map{\n      file -> \n      filenames = file\n      filenames -= 'NA'\n      \n      if( filenames.size() != filenames.unique().size() )\n          exit 1, \"[nf-core/eager] error: You have duplicate input FASTQ and/or BAM file names! All files must have unique names, different directories are not sufficent. Please check your input.\"\n    }\n\n// Drop samples with R1/R2 to fastQ channel, BAM samples to other channel\nch_branched_input = ch_input_sample_downstream.branch{\n    fastq: it[8] != 'NA' //These are all fastqs\n    bam: it[10] != 'NA' //These are all BAMs\n}\n\n//Removing BAM/BAI in case of a FASTQ input\nch_fastq_channel = ch_branched_input.fastq.map {\n  samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, r1, r2, bam ->\n    [samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, r1, r2]\n}\n\n//Removing R1/R2 in case of BAM input\nch_bam_channel = ch_branched_input.bam.map {\n  samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, r1, r2, bam ->\n    [samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, bam]\n}\n\n// Prepare starting channels, here we go\nch_input_for_convertbam = Channel.empty()\n\nch_bam_channel\n  .into { ch_input_for_convertbam; ch_input_for_indexbam; }\n\n// Also need to send raw files for lane merging, if we want to host removed fastq\nch_fastq_channel\n  .into { ch_input_for_skipconvertbam; ch_input_for_lanemerge_hostremovalfastq }\n  \n////////////////////////////////////////////////////\n/* --         PRINT PARAMETER SUMMARY          -- */\n////////////////////////////////////////////////////\n\nlog.info NfcoreSchema.params_summary_log(workflow, params, json_schema)\n\n// Header log info\ndef summary = [:]\nif (workflow.revision) summary['Pipeline Release'] = workflow.revision\nsummary['Run Name']         = workflow.runName\nsummary['Input']            = params.input\nsummary['Fasta Ref']        = params.fasta\nsummary['Max Resources']    = \"$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job\"\nif (workflow.containerEngine) summary['Container'] = \"$workflow.containerEngine - $workflow.container\"\nsummary['Output dir']       = params.outdir\nsummary['Launch dir']       = workflow.launchDir\nsummary['Working dir']      = workflow.workDir\nsummary['Script dir']       = workflow.projectDir\nsummary['User']             = workflow.userName\nif (workflow.profile.contains('awsbatch')) {\n    summary['AWS Region']   = params.awsregion\n    summary['AWS Queue']    = params.awsqueue\n    summary['AWS CLI']      = params.awscli\n}\nsummary['Config Profile'] = workflow.profile\nif (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description\nif (params.config_profile_contact)     summary['Config Profile Contact']     = params.config_profile_contact\nif (params.config_profile_url)         summary['Config Profile URL']         = params.config_profile_url\nsummary['Config Files'] = workflow.configFiles.join(', ')\nif (params.email || params.email_on_fail) {\n    summary['E-mail Address']    = params.email\n    summary['E-mail on failure'] = params.email_on_fail\n    summary['MultiQC maxsize']   = params.max_multiqc_email_size\n}\n\nChannel.from(summary.collect{ [it.key, it.value] })\n    .map { k,v -> \"<dt>$k</dt><dd><samp>${v ?: '<span style=\\\"color:#999999;\\\">N/A</a>'}</samp></dd>\" }\n    .reduce { a, b -> return [a, b].join(\"\\n            \") }\n    .map { x -> \"\"\"\n    id: 'nf-core-eager-summary'\n    description: \" - this information is collected when the pipeline is started.\"\n    section_name: 'nf-core/eager Workflow Summary'\n    section_href: 'https://github.com/nf-core/eager'\n    plot_type: 'html'\n    data: |\n        <dl class=\\\"dl-horizontal\\\">\n            $x\n        </dl>\n    \"\"\".stripIndent() }\n    .set { ch_workflow_summary }\n\n\n// Check the hostnames against configured profiles\ncheckHostname()\n\nlog.info \"Schaffa, Schaffa, Genome Baua!\"\n\n///////////////////////////////////////////////////\n/* --          REFERENCE FASTA INDEXING       -- */\n///////////////////////////////////////////////////\n\n// BWA Index\nif( !params.bwa_index && params.fasta && (params.mapper == 'bwaaln' || params.mapper == 'bwamem' || params.mapper == 'circularmapper')){\n  process makeBWAIndex {\n    label 'sc_medium'\n    tag \"${fasta}\"\n    publishDir path: \"${params.outdir}/reference_genome/bwa_index\", mode: params.publish_dir_mode, saveAs: { filename -> \n            if (params.save_reference) filename \n            else if(!params.save_reference && filename == \"where_are_my_files.txt\") filename\n            else null\n    }\n\n    input:\n    path fasta from ch_fasta_for_bwaindex\n    path where_are_my_files\n\n    output:\n    path \"BWAIndex\" into (bwa_index, bwa_index_bwamem)\n    path \"where_are_my_files.txt\"\n\n    script:\n    \"\"\"\n    bwa index $fasta\n    mkdir BWAIndex && mv ${fasta}* BWAIndex\n    \"\"\"\n    }\n    \n    bt2_index = Channel.empty()\n}\n\n// bowtie2 Index\nif( !params.bt2_index && params.fasta && params.mapper == \"bowtie2\"){\n  process makeBT2Index {\n    label 'mc_medium'\n    tag \"${fasta}\"\n    publishDir path: \"${params.outdir}/reference_genome/bt2_index\", mode: params.publish_dir_mode, saveAs: { filename -> \n            if (params.save_reference) filename \n            else if(!params.save_reference && filename == \"where_are_my_files.txt\") filename\n            else null\n    }\n\n    input:\n    path fasta from ch_fasta_for_bt2index\n    path where_are_my_files\n\n    output:\n    path \"BT2Index\" into (bt2_index)\n    path \"where_are_my_files.txt\"\n\n    script:\n    \"\"\"\n    bowtie2-build --threads ${task.cpus} $fasta $fasta\n    mkdir BT2Index && mv ${fasta}* BT2Index\n    \"\"\"\n    }\n\n  bwa_index = Channel.empty()\n  bwa_index_bwamem = Channel.empty()\n\n}\n\n// FASTA Index (FAI)\nif (params.fasta_index) {\n  Channel\n    .fromPath( params.fasta_index )\n    .set { ch_fai_for_skipfastaindexing }\n} else {\n  Channel\n    .empty()\n    .set { ch_fai_for_skipfastaindexing }\n}\n\nprocess makeFastaIndex {\n    label 'sc_small'\n    tag \"${fasta}\"\n    publishDir path: \"${params.outdir}/reference_genome/fasta_index\", mode: params.publish_dir_mode, saveAs: { filename -> \n            if (params.save_reference) filename \n            else if(!params.save_reference && filename == \"where_are_my_files.txt\") filename\n            else null\n    }\n    \n    when: !params.fasta_index && params.fasta\n\n    input:\n    path fasta from ch_fasta_for_faidx\n    path where_are_my_files\n\n    output:\n    path \"*.fai\" into ch_fasta_faidx_index\n    path \"where_are_my_files.txt\"\n\n    script:\n    \"\"\"\n    samtools faidx $fasta\n    \"\"\"\n}\n\nch_fai_for_skipfastaindexing.mix(ch_fasta_faidx_index) \n  .into { ch_fai_for_damageprofiler; ch_fai_for_ug; ch_fai_for_hc; ch_fai_for_freebayes; ch_fai_for_pileupcaller; ch_fai_for_angsd }\n\n// Stage dict index file if supplied, else load it into the channel\n\nif (params.seq_dict) {\n  Channel\n    .fromPath( params.seq_dict )\n    .set { ch_dict_for_skipdict }\n} else {\n  Channel\n    .empty()\n    .set { ch_dict_for_skipdict }\n}\n\nprocess makeSeqDict {\n    label 'sc_medium'\n    tag \"${fasta}\"\n    publishDir path: \"${params.outdir}/reference_genome/seq_dict\", mode: params.publish_dir_mode, saveAs: { filename -> \n            if (params.save_reference) filename \n            else if(!params.save_reference && filename == \"where_are_my_files.txt\") filename\n            else null\n    }\n    \n    when: !params.seq_dict && params.fasta\n\n    input:\n    path fasta from ch_fasta_for_seqdict\n    path where_are_my_files\n\n    output:\n    path \"*.dict\" into ch_seq_dict\n    path \"where_are_my_files.txt\"\n\n    script:\n    \"\"\"\n    picard -Xmx${task.memory.toMega()}M CreateSequenceDictionary R=$fasta O=\"${fasta.baseName}.dict\"\n    \"\"\"\n}\n\nch_dict_for_skipdict.mix(ch_seq_dict)\n  .into { ch_dict_for_ug; ch_dict_for_hc; ch_dict_for_freebayes; ch_dict_for_pileupcaller; ch_dict_for_angsd }\n\n//////////////////////////////////////////////////\n/* --         BAM INPUT PREPROCESSING        -- */\n//////////////////////////////////////////////////\n\n// Convert to FASTQ if re-mapping is requested\nprocess convertBam {\n    label 'mc_small'\n    tag \"$libraryid\"\n    \n    when: \n    params.run_convertinputbam\n\n    input: \n    tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, path(bam) from ch_input_for_convertbam \n\n    output:\n    tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, path(\"*fastq.gz\"), val('NA') into ch_output_from_convertbam\n\n    script:\n    base = \"${bam.baseName}\"\n    \"\"\"\n    samtools fastq -t ${bam} | pigz -p ${task.cpus} > ${base}.converted.fastq.gz\n    \"\"\" \n}\n\n// If not converted to FASTQ generate pipeline compatible BAM index file (i.e. with correct samtools version) \nprocess indexinputbam {\n  label 'sc_small'\n  tag \"$libraryid\"\n\n  when: \n  bam != 'NA' && !params.run_convertinputbam\n\n  input:\n  tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, path(bam) from ch_input_for_indexbam \n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), file(\"*.{bai,csi}\")  into ch_indexbam_for_filtering\n\n  script:\n  def size = params.large_ref ? '-c' : ''\n  \"\"\"\n  samtools index ${bam} ${size}\n  \"\"\"\n}\n\n// convertbam bypass\n    ch_input_for_skipconvertbam.mix(ch_output_from_convertbam)\n        .into { ch_convertbam_for_fastp; ch_convertbam_for_fastqc } \n\n//////////////////////////////////////////////////\n/* -- SEQUENCING QC AND FASTQ PREPROCESSING  -- */\n//////////////////////////////////////////////////\n\n// Raw sequencing QC - allow user evaluate if sequencing any good?\n\nprocess fastqc {\n    label 'mc_small'\n    tag \"${libraryid}_L${lane}\"\n    publishDir \"${params.outdir}/fastqc/input_fastq\", mode: params.publish_dir_mode,\n        saveAs: { filename ->\n                      filename.indexOf(\".zip\") > 0 ? \"zips/$filename\" : \"$filename\"\n                }\n\n\n    input:\n    tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_convertbam_for_fastqc\n\n    output:\n    path \"*_fastqc.{zip,html}\" into ch_prefastqc_for_multiqc\n\n    when: \n    !params.skip_fastqc\n\n    script:\n    if ( seqtype == 'PE' ) {\n    \"\"\"\n    fastqc -t ${task.cpus} -q $r1 $r2\n    rename 's/_fastqc\\\\.zip\\$/_raw_fastqc.zip/' *_fastqc.zip\n    rename 's/_fastqc\\\\.html\\$/_raw_fastqc.html/' *_fastqc.html\n    \"\"\"\n    } else {\n    \"\"\"\n    fastqc -t ${task.cpus} -q $r1\n    rename 's/_fastqc\\\\.zip\\$/_raw_fastqc.zip/' *_fastqc.zip\n    rename 's/_fastqc\\\\.html\\$/_raw_fastqc.html/' *_fastqc.html\n    \"\"\"\n    }\n}\n\n// Poly-G clipping for 2-colour chemistry sequencers, to reduce erroenous mapping of sequencing artefacts\n\nif (params.complexity_filter_poly_g) {\n  ch_input_for_fastp = ch_convertbam_for_fastp.branch{\n    twocol: it[3] == '2' // Nextseq/Novaseq data with possible sequencing artefact\n    fourcol: it[3] == '4'  // HiSeq/MiSeq data where polyGs would be true\n  }\n\n} else {\n  ch_input_for_fastp = ch_convertbam_for_fastp.branch{\n    twocol: it[3] == \"dummy\" // seq/Novaseq data with possible sequencing artefact\n    fourcol: it[3] == '4' || it[3] == '2'  // HiSeq/MiSeq data where polyGs would be true\n  }\n\n}\n\nprocess fastp {\n    label 'mc_small'\n    tag \"${libraryid}_L${lane}\"\n    publishDir \"${params.outdir}/FastP\", mode: params.publish_dir_mode\n\n    when: \n    params.complexity_filter_poly_g\n\n    input:\n    tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_input_for_fastp.twocol\n\n    output:\n    tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, path(\"*.pG.fq.gz\") into ch_output_from_fastp\n    path(\"*.json\") into ch_fastp_for_multiqc\n\n    script:\n    if( seqtype == 'SE' ){\n    \"\"\"\n    fastp --in1 ${r1} --out1 \"${r1.baseName}.pG.fq.gz\" -A -g --poly_g_min_len \"${params.complexity_filter_poly_g_min}\" -Q -L -w ${task.cpus} --json \"${r1.baseName}\"_L${lane}_fastp.json \n    \"\"\"\n    } else {\n    \"\"\"\n    fastp --in1 ${r1} --in2 ${r2} --out1 \"${r1.baseName}.pG.fq.gz\" --out2 \"${r2.baseName}.pG.fq.gz\" -A -g --poly_g_min_len \"${params.complexity_filter_poly_g_min}\" -Q -L -w ${task.cpus} --json \"${libraryid}\"_L${lane}_polyg_fastp.json \n    \"\"\"\n    }\n}\n\n// Colour column only useful for fastp, so dropping now to reduce complexity downstream\nch_input_for_fastp.fourcol\n  .map {\n      def samplename = it[0]\n      def libraryid  = it[1]\n      def lane = it[2]\n      def seqtype = it[4]\n      def organism = it[5]\n      def strandedness = it[6]\n      def udg = it[7]\n      def r1 = it[8]\n      def r2 = seqtype == \"PE\" ? it[9] : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n      \n      [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n    }\n  .set { ch_skipfastp_for_merge }\n\nch_output_from_fastp\n  .map{\n    def samplename = it[0]\n    def libraryid  = it[1]\n    def lane = it[2]\n    def seqtype = it[4]\n    def organism = it[5]\n    def strandedness = it[6]\n    def udg = it[7]\n    def r1 = it[8] instanceof ArrayList ? it[8].sort()[0] : it[8]\n    def r2 = seqtype == \"PE\" ? it[8].sort()[1] : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n    [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n  }\n  .set{ ch_fastp_for_merge }\n\nch_skipfastp_for_merge.mix(ch_fastp_for_merge)\n  .into { ch_fastp_for_adapterremoval; ch_fastp_for_skipadapterremoval } \n\n// Sequencing adapter clipping and optional paired-end merging in preparation for mapping\n\nprocess adapter_removal {\n    label 'mc_small'\n    tag \"${libraryid}_L${lane}\"\n    publishDir \"${params.outdir}/adapterremoval\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_fastp_for_adapterremoval\n    path adapterlist from ch_adapterlist.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"output/*{combined.fq,.se.truncated,pair1.truncated}.gz\") into ch_output_from_adapterremoval_r1\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"output/*pair2.truncated.gz\") optional true into ch_output_from_adapterremoval_r2\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"output/*.settings\") into ch_adapterremoval_logs\n    \n    when: \n    !params.skip_adapterremoval\n\n    script:\n    def base = \"${r1.baseName}_L${lane}\"\n    def adapters_to_remove = !params.clip_adapters_list ? \"--adapter1 ${params.clip_forward_adaptor} --adapter2 ${params.clip_reverse_adaptor}\" : \"--adapter-list ${adapterlist}\"\n    //This checks whether we skip trimming and defines a variable respectively\n    def preserve5p = params.preserve5p ? '--preserve5p' : '' // applies to any AR command - doesn't affect output file combination\n    \n    if ( seqtype == 'PE'  && !params.skip_collapse && !params.skip_trim  && !params.mergedonly && !params.preserve5p ) {\n    \"\"\"\n    mkdir -p output\n\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n\n    cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz\n    \n    mv *.settings output/\n\n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n    \n    \"\"\"\n    //PE mode, collapse and trim, outputting all reads, preserving 5p\n    } else if (seqtype == 'PE'  && !params.skip_collapse && !params.skip_trim  && !params.mergedonly && params.preserve5p) {\n    \"\"\"\n    mkdir -p output\n\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n\n    cat *.collapsed.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz\n\n    mv *.settings output/\n\n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n\n    \"\"\"\n    // PE mode, collapse and trim but only output collapsed reads\n    } else if ( seqtype == 'PE'  && !params.skip_collapse && !params.skip_trim && params.mergedonly && !params.preserve5p ) {\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe  --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n    \n    cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.tmp.fq.gz\n        \n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n\n    mv *.settings output/\n    \"\"\"\n    // PE mode, collapse and trim but only output collapsed reads, preserving 5p\n    } else if ( seqtype == 'PE'  && !params.skip_collapse && !params.skip_trim && params.mergedonly && params.preserve5p ) {\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe  --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n    \n    cat *.collapsed.gz > output/${base}.pe.combined.tmp.fq.gz\n    \n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n\n    mv *.settings output/\n    \"\"\"\n    // PE mode, collapsing but skip trim, (output all reads). Note: seems to still generate `truncated` files for some reason, so merging for safety.\n    // Will still do default AR length filtering I guess\n    } else if ( seqtype == 'PE'  && !params.skip_collapse && params.skip_trim && !params.mergedonly ) {\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --adapter1 \"\" --adapter2 \"\"\n    \n    cat *.collapsed.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz\n        \n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n\n    mv *.settings output/\n    \"\"\"\n    // PE mode, collapsing but skip trim, and only output collapsed reads. Note: seems to still generate `truncated` files for some reason, so merging for safety.\n    // Will still do default AR length filtering I guess\n    } else if ( seqtype == 'PE'  && !params.skip_collapse && params.skip_trim && params.mergedonly ) {\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p}  --adapter1 \"\" --adapter2 \"\"\n    \n    cat *.collapsed.gz > output/${base}.pe.combined.tmp.fq.gz\n    \n    ## Add R_ and L_ for unmerged reads for DeDup compatibility\n    AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz\n\n    mv *.settings output/\n    \"\"\"\n    // PE mode, skip collapsing but trim (output all reads, as merging not possible) - activates paired-end mapping!\n    } else if ( seqtype == 'PE'  && params.skip_collapse && !params.skip_trim ) {\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n    \n    mv ${base}.pe.pair*.truncated.gz *.settings output/\n    \"\"\"\n    } else if ( seqtype != 'PE' && !params.skip_trim ) {\n    //SE, collapse not possible, trim reads only\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --basename ${base}.se --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap}\n    mv *.settings *.se.truncated.gz output/\n    \"\"\"\n    } else if ( seqtype != 'PE' && params.skip_trim ) {\n    //SE, collapse not possible, trim reads only\n    \"\"\"\n    mkdir -p output\n    AdapterRemoval --file1 ${r1} --basename ${base}.se --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --adapter1 \"\" --adapter2 \"\"\n    mv *.settings *.se.truncated.gz output/\n    \"\"\"\n    }\n}\n\n// When not collapsing paired-end data, re-merge the R1 and R2 files into single map. Otherwise if SE or collapsed PE, R2 now becomes NA\n// Sort to make sure we get consistent R1 and R2 ordered when using `-resume`, even if not needed for FastQC\nif ( params.skip_collapse ){\n  ch_output_from_adapterremoval_r1\n    .mix(ch_output_from_adapterremoval_r2)\n    .groupTuple(by: [0,1,2,3,4,5,6])\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7].sort()[0])\n        def r2 = seqtype == \"PE\" ? file(it[7].sort()[1]) : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n    }\n    .into { ch_output_from_adapterremoval; ch_adapterremoval_for_postfastqc }\n} else {\n  ch_output_from_adapterremoval_r1\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7])\n        def r2 = file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n    }\n    .into { ch_output_from_adapterremoval; ch_adapterremoval_for_postfastqc }\n}\n\n// AdapterRemoval bypass when not running it\nif (!params.skip_adapterremoval) {\n    ch_output_from_adapterremoval.mix(ch_fastp_for_skipadapterremoval)\n        .filter { it =~/.*combined.fq.gz|.*truncated.gz/ }\n        .into { ch_adapterremoval_for_post_ar_trimming; ch_adapterremoval_for_skip_post_ar_trimming; } \n} else {\n    ch_fastp_for_skipadapterremoval\n        .into { ch_adapterremoval_for_post_ar_trimming; ch_adapterremoval_for_skip_post_ar_trimming; } \n}\n\n// Post AR fastq trimming\n\nprocess post_ar_fastq_trimming {\n  label 'mc_small'\n  tag \"${libraryid}\"\n  publishDir \"${params.outdir}/post_ar_fastq_trimmed\", mode: params.publish_dir_mode\n\n  when: params.run_post_ar_trimming\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(r1), path(r2) from ch_adapterremoval_for_post_ar_trimming\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_R1_postartrimmed.fq.gz\") into ch_post_ar_trimming_for_lanemerge_r1\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_R2_postartrimmed.fq.gz\") optional true into ch_post_ar_trimming_for_lanemerge_r2\n\n  script:\n  if ( seqtype == 'SE' | (seqtype == 'PE' && !params.skip_collapse) ) {\n  \"\"\"\n  fastp --in1 ${r1} --trim_front1 ${params.post_ar_trim_front} --trim_tail1 ${params.post_ar_trim_tail} -A -G -Q -L -w ${task.cpus} --out1 \"${libraryid}\"_L\"${lane}\"_R1_postartrimmed.fq.gz\n  \"\"\"\n  } else if ( seqtype == 'PE' && params.skip_collapse ) {\n  \"\"\"\n  fastp --in1 ${r1} --in2 ${r2}  --trim_front1 ${params.post_ar_trim_front} --trim_tail1 ${params.post_ar_trim_tail} --trim_front2 ${params.post_ar_trim_front2} --trim_tail2 ${params.post_ar_trim_tail2} -A -G -Q -L -w ${task.cpus} --out1 \"${libraryid}\"_L\"${lane}\"_R1_postartrimmed.fq.gz --out2 \"${libraryid}\"_L\"${lane}\"_R2_postartrimmed.fq.gz\n  \"\"\"\n  }\n\n}\n\n// When not collapsing paired-end data, re-merge the R1 and R2 files into single map. Otherwise if SE or collapsed PE, R2 now becomes NA\n// Sort to make sure we get consistent R1 and R2 ordered when using `-resume`, even if not needed for FastQC\nif ( params.skip_collapse ){\n  ch_post_ar_trimming_for_lanemerge_r1\n    .mix(ch_post_ar_trimming_for_lanemerge_r2)\n    .groupTuple(by: [0,1,2,3,4,5,6])\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7].sort()[0])\n        def r2 = seqtype == \"PE\" ? file(it[7].sort()[1]) : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n    }\n    .set { ch_post_ar_trimming_for_lanemerge; }\n} else {\n  ch_post_ar_trimming_for_lanemerge_r1\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7])\n        def r2 = file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n    }\n    .set { ch_post_ar_trimming_for_lanemerge; }\n}\n\n\n// Inline barcode removal bypass when not running it \nif (params.run_post_ar_trimming) {\n    ch_post_ar_trimming_for_lanemerge\n        .into { ch_inlinebarcoderemoval_for_fastqc_after_clipping; ch_inlinebarcoderemoval_for_lanemerge; } \n} else {\n    ch_adapterremoval_for_skip_post_ar_trimming\n        .into { ch_inlinebarcoderemoval_for_fastqc_after_clipping; ch_inlinebarcoderemoval_for_lanemerge; } \n}\n\n// Lane merging for libraries sequenced over multiple lanes (e.g. NextSeq)\nch_branched_for_lanemerge = ch_inlinebarcoderemoval_for_lanemerge\n  .groupTuple(by: [0,1,3,4,5,6])\n  .map {\n    it ->\n      def samplename = it[0]\n      def libraryid  = it[1]\n      def lane = it[2]\n      def seqtype = it[3]\n      def organism = it[4]\n      def strandedness = it[5]\n      def udg = it[6]\n      def r1 = it[7]\n      def r2 = it[8]\n\n      [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n  }\n  .branch {\n    skip_merge: it[7].size() == 1 // Can skip merging if only single lanes\n    merge_me: it[7].size() > 1\n  }\n\nch_branched_for_lanemerge_skipme = ch_branched_for_lanemerge.skip_merge\n  .map{\n    it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = it[7][0]\n        def r2 = it[8][0]\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n  }\n\n\nch_branched_for_lanemerge_ready = ch_branched_for_lanemerge.merge_me\n  .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = it[7]\n\n        // find and remove duplicate dummies to prevent file collision error\n        def r2 = it[8]*.toString()\n        r2.removeAll{ it == \"$projectDir/assets/nf-core_eager_dummy.txt\" }\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n  }\n\nprocess lanemerge {\n  label 'sc_tiny'\n  tag \"${libraryid}\"\n  publishDir \"${params.outdir}/lanemerging\", mode: params.publish_dir_mode\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(r1), path(r2) from ch_branched_for_lanemerge_ready\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_R1_lanemerged.fq.gz\") into ch_lanemerge_for_mapping_r1\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_R2_lanemerged.fq.gz\") optional true into ch_lanemerge_for_mapping_r2\n\n  script:\n  if ( seqtype == 'PE' && ( params.skip_collapse || params.skip_adapterremoval ) ){\n  def lane = 0\n  \"\"\"\n  cat ${r1} > \"${libraryid}\"_R1_lanemerged.fq.gz\n  cat ${r2} > \"${libraryid}\"_R2_lanemerged.fq.gz\n  \"\"\"\n  } else {\n  \"\"\"\n  cat ${r1} > \"${libraryid}\"_R1_lanemerged.fq.gz\n  \"\"\"\n  }\n\n}\n\n// Ensuring always valid R2 file even if doesn't exist for AWS\nif ( ( params.skip_collapse || params.skip_adapterremoval ) ) {\n  ch_lanemerge_for_mapping_r1\n    .mix(ch_lanemerge_for_mapping_r2)\n    .groupTuple(by: [0,1,2,3,4,5,6])\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7].sort()[0])\n        def r2 = seqtype == \"PE\" ? file(it[7].sort()[1]) : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n    }\n    .mix(ch_branched_for_lanemerge_skipme)\n    .into { ch_lanemerge_for_skipmap; ch_lanemerge_for_bwa; ch_lanemerge_for_cm; ch_lanemerge_for_bwamem; ch_lanemerge_for_bt2 }\n} else {\n  ch_lanemerge_for_mapping_r1\n    .map{\n      it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = file(it[7])\n        def r2 = file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n    }\n    .mix(ch_branched_for_lanemerge_skipme)\n    .into { ch_lanemerge_for_skipmap; ch_lanemerge_for_bwa; ch_lanemerge_for_cm; ch_lanemerge_for_bwamem; ch_lanemerge_for_bt2 }\n}\n\n// ENA upload doesn't do separate lanes, so merge raw FASTQs for mapped-reads removal \n\n// Per-library lane grouping done within process\nprocess lanemerge_hostremoval_fastq {\n  label 'sc_tiny'\n  tag \"${libraryid}\"\n\n  when: \n  params.hostremoval_input_fastq\n\n  input:\n  tuple samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_input_for_lanemerge_hostremovalfastq.groupTuple(by: [0,1,3,4,5,6,7])\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*.fq.gz\") into ch_fastqlanemerge_for_hostremovalfastq\n\n  script:\n  if ( seqtype == 'PE' ){\n  lane = 0\n  \"\"\"\n  cat ${r1} > \"${libraryid}\"_R1_lanemerged.fq.gz\n  cat ${r2} > \"${libraryid}\"_R2_lanemerged.fq.gz\n  \"\"\"\n  } else {\n  \"\"\"\n  cat ${r1} > \"${libraryid}\"_R1_lanemerged.fq.gz\n  \"\"\"\n  }\n\n}\n\n// Post-preprocessing QC to help user check pre-processing removed all sequencing artefacts. If doing post-AR trimming includes this step in output.\n\nprocess fastqc_after_clipping {\n    label 'mc_small'\n    tag \"${libraryid}_L${lane}\"\n    publishDir \"${params.outdir}/fastqc/after_clipping\", mode: params.publish_dir_mode,\n        saveAs: { filename ->\n                      filename.indexOf(\".zip\") > 0 ? \"zips/$filename\" : \"$filename\"\n                }\n\n\n    when: !params.skip_adapterremoval && !params.skip_fastqc\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_inlinebarcoderemoval_for_fastqc_after_clipping\n\n    output:\n    path(\"*_fastqc.{zip,html}\") into ch_fastqc_after_clipping\n\n    script:\n    if ( params.skip_collapse && seqtype == 'PE' ) {\n    \"\"\"\n    fastqc -t ${task.cpus} -q ${r1} ${r2}\n    \"\"\"\n    } else {\n    \"\"\"\n    fastqc -t ${task.cpus} -q ${r1}\n    \"\"\"\n    }\n\n}\n\n//////////////////////////////////////////////////\n/* --    READ MAPPING AND POSTPROCESSING     -- */\n//////////////////////////////////////////////////\n\n// bwa aln as standard aDNA mapper\n\nprocess bwa {\n    label 'mc_medium'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/mapping/bwa\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(r1), path(r2) from ch_lanemerge_for_bwa\n    path index from bwa_index.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.mapped.bam\"), path(\"*.{bai,csi}\") into ch_output_from_bwa   \n\n    when: \n    params.mapper == 'bwaaln'\n\n    script:\n    def size = params.large_ref ? '-c' : ''\n    def fasta = \"${index}/${fasta_base}\"\n\n    //PE data without merging, PE data without any AR applied\n    if ( seqtype == 'PE' && ( params.skip_collapse || params.skip_adapterremoval ) ){\n    \"\"\"\n    bwa aln -t ${task.cpus} $fasta ${r1} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.r1.sai\n    bwa aln -t ${task.cpus} $fasta ${r2} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.r2.sai\n    bwa sampe -r \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" $fasta ${libraryid}.r1.sai ${libraryid}.r2.sai ${r1} ${r2} | samtools sort -@ ${task.cpus - 1} -O bam - > ${libraryid}_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size}\n    \"\"\"\n    } else {\n    //PE collapsed, or SE data\n    \"\"\"\n    bwa aln -t ${task.cpus} ${fasta} ${r1} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.sai\n    bwa samse -r \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" $fasta ${libraryid}.sai $r1 | samtools sort -@ ${task.cpus - 1} -O bam - > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size}\n    \"\"\"\n    }\n    \n}\n\n// bwa mem for more complex or for modern data mapping\n\nprocess bwamem {\n    label 'mc_medium'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/mapping/bwamem\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_lanemerge_for_bwamem\n    path index from bwa_index_bwamem.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.mapped.bam\"), path(\"*.{bai,csi}\") into ch_output_from_bwamem\n\n    when: \n    params.mapper == 'bwamem'\n\n    script:\n    def split_cpus = Math.floor(task.cpus/2)\n    def fasta = \"${index}/${fasta_base}\"\n    def size = params.large_ref ? '-c' : ''\n\n    if (!params.single_end && params.skip_collapse){\n    \"\"\"\n    bwa mem -t ${split_cpus} $fasta $r1 $r2 -R \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" | samtools sort -@ ${split_cpus} -O bam - > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index ${size} -@ ${task.cpus} \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    \"\"\"\n    } else {\n    \"\"\"\n    bwa mem -t ${split_cpus} $fasta $r1 -R \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" | samtools sort -@ ${split_cpus} -O bam - > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index -@ ${task.cpus} \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size} \n    \"\"\"\n    }\n    \n}\n\n// CircularMapper reference preparation and mapping for circular genomes e.g. mtDNA\n\nprocess circulargenerator{\n    label 'sc_medium'\n    tag \"$prefix\"\n    publishDir \"${params.outdir}/reference_genome/circularmapper_index\", mode: params.publish_dir_mode, saveAs: { filename -> \n            if (params.save_reference) filename \n            else if(!params.save_reference && filename == \"where_are_my_files.txt\") filename\n            else null\n    }\n\n    input:\n    file fasta from ch_fasta_for_circulargenerator\n\n    output:\n    file \"${prefix}.{amb,ann,bwt,sa,pac}\" into ch_circularmapper_indices\n    file \"*_elongated\" into ch_circularmapper_elongatedfasta\n\n    when: \n    params.mapper == 'circularmapper'\n\n    script:\n    prefix = \"${fasta.baseName}_${params.circularextension}.${fasta.extension}\"\n    \"\"\"\n    circulargenerator -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i $fasta -s ${params.circulartarget}\n    bwa index $prefix\n    \"\"\"\n\n}\n\nprocess circularmapper{\n    label 'mc_medium'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/mapping/circularmapper\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_lanemerge_for_cm\n    file index from ch_circularmapper_indices.collect()\n    file fasta from ch_fasta_for_circularmapper.collect()\n    file elongated from ch_circularmapper_elongatedfasta.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*.mapped.bam\"), file(\"*.{bai,csi}\") into ch_output_from_cm\n\n    when: \n    params.mapper == 'circularmapper'\n\n    script:\n    def filter = params.circularfilter ? '-f true -x true' : ''\n    def elongated_root = \"${fasta.baseName}_${params.circularextension}.${fasta.extension}\"\n    def size = params.large_ref ? '-c' : ''\n\n    if (!params.single_end && params.skip_collapse ){\n    \"\"\"\n    bwa aln -t ${task.cpus} $elongated_root $r1 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.r1.sai\n    bwa aln -t ${task.cpus} $elongated_root $r2 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.r2.sai\n    bwa sampe -r \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" $elongated_root ${libraryid}.r1.sai ${libraryid}.r2.sai $r1 $r2 > tmp.out\n    realignsamfile -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i tmp.out -r $fasta $filter \n    samtools sort -@ ${task.cpus} -O bam tmp_realigned.bam > ${libraryid}_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size} \n    \"\"\"\n    } else {\n    \"\"\" \n    bwa aln -t ${task.cpus} $elongated_root $r1 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.sai\n    bwa samse -r \"@RG\\\\tID:ILLUMINA-${samplename}_${libraryid}\\\\tSM:${samplename}\\\\tLB:${libraryid}\\\\tPL:illumina\\\\tPU:ILLUMINA-${libraryid}-${seqtype}\" $elongated_root ${libraryid}.sai $r1 > tmp.out\n    realignsamfile -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i tmp.out -r $fasta $filter \n    samtools sort -@ ${task.cpus} -O bam tmp_realigned.bam > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size}\n    \"\"\"\n    }\n    \n}\n\nprocess bowtie2 {\n    label 'mc_medium'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/mapping/bt2\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(r1), file(r2) from ch_lanemerge_for_bt2\n    path index from bt2_index.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.mapped.bam\"), path(\"*.{bai,csi}\") into ch_output_from_bt2\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_bt2.log\") into ch_bt2_for_multiqc\n\n    when: \n    params.mapper == 'bowtie2'\n\n    script:\n    def split_cpus = Math.floor(task.cpus/2)\n    def size = params.large_ref ? '-c' : ''\n    def fasta = \"${index}/${fasta_base}\"\n    def trim5 = params.bt2_trim5 != 0 ? \"--trim5 ${params.bt2_trim5}\" : \"\"\n    def trim3 = params.bt2_trim3 != 0 ? \"--trim3 ${params.bt2_trim3}\" : \"\"\n    def bt2n = params.bt2n != 0 ? \"-N ${params.bt2n}\" : \"\"\n    def bt2l = params.bt2l != 0 ? \"-L ${params.bt2l}\" : \"\"\n\n    if ( \"${params.bt2_alignmode}\" == \"end-to-end\"  ) {\n      switch ( \"${params.bt2_sensitivity}\" ) {\n        case \"no-preset\":\n        sensitivity = \"\"; break\n        case \"very-fast\":\n        sensitivity = \"--very-fast\"; break\n        case \"fast\":\n        sensitivity = \"--fast\"; break\n        case \"sensitive\":\n        sensitivity = \"--sensitive\"; break\n        case \"very-sensitive\":\n        sensitivity = \"--very-sensitive\"; break\n        default:\n        sensitivity = \"\"; break\n        }\n      } else if (\"${params.bt2_alignmode}\" == \"local\") {\n      switch ( \"${params.bt2_sensitivity}\" ) {\n        case \"no-preset\":\n        sensitivity = \"\"; break\n        case \"very-fast\":\n        sensitivity = \"--very-fast-local\"; break\n        case \"fast\":\n        sensitivity = \"--fast-local\"; break\n        case \"sensitive\":\n        sensitivity = \"--sensitive-local\"; break\n        case \"very-sensitive\":\n        sensitivity = \"--very-sensitive-local\"; break\n        default:\n        sensitivity = \"\"; break\n\n        }\n      }\n\n    //PE data without merging, PE data without any AR applied\n    if ( seqtype == 'PE' && ( params.skip_collapse || params.skip_adapterremoval ) ){\n    \"\"\"\n    bowtie2 -x ${fasta} -1 ${r1} -2 ${r2} -p ${split_cpus} ${sensitivity} ${bt2n} ${bt2l} ${trim5} ${trim3} --maxins ${params.bt2_maxins} --rg-id ILLUMINA-${samplename}_${libraryid} --rg SM:${samplename} --rg LB:${libraryid} --rg PL:illumina --rg PU:ILLUMINA-${libraryid}-${seqtype} 2> \"${libraryid}\"_bt2.log | samtools sort -@ ${split_cpus} -O bam > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size}\n    \"\"\"\n    } else {\n    //PE collapsed, or SE data \n    \"\"\"\n    bowtie2 -x ${fasta} -U ${r1} -p ${split_cpus} ${sensitivity} ${bt2n} ${bt2l} ${trim5} ${trim3} --rg-id ILLUMINA-${samplename}_${libraryid} --rg SM:${samplename} --rg LB:${libraryid} --rg PL:illumina --rg PU:ILLUMINA-${libraryid}-${seqtype} 2> \"${libraryid}\"_bt2.log | samtools sort -@ ${split_cpus} -O bam > \"${libraryid}\"_\"${seqtype}\".mapped.bam\n    samtools index \"${libraryid}\"_\"${seqtype}\".mapped.bam ${size}\n    \"\"\"\n    }\n    \n}\n\n// Gather all mapped BAMs from all possible mappers into common channels to send downstream\nch_output_from_bwa.mix(ch_output_from_bwamem, ch_output_from_cm, ch_indexbam_for_filtering, ch_output_from_bt2)\n  .into { ch_mapping_for_hostremovalfastq; ch_mapping_for_seqtype_merging }\n\n// Synchronise the mapped input FASTQ and input non-remapped BAM channels\nch_fastqlanemerge_for_hostremovalfastq\n    .map {\n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = seqtype == \"PE\" ? file(it[7].sort()[0]) : file(it[7])\n        def r2 = seqtype == \"PE\" ? file(it[7].sort()[1]) : file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n        [ samplename, libraryid, lane, seqtype, organism, strandedness, udg, r1, r2 ]\n\n    }\n    .mix(ch_mapping_for_hostremovalfastq)\n    .groupTuple(by: [0,1,3,4,5,6])\n    .map {\n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def r1 = it[7][0]\n        def r2 = it[8][0]\n        def bam = it[7][1]\n        def bai = it[8][1]\n\n      [ samplename, libraryid, seqtype, organism, strandedness, udg, r1, r2, bam, bai ]\n\n    }\n    .filter{ it[8] != null }\n    .set { ch_synced_for_hostremovalfastq }\n\n// Remove mapped reads from original (lane merged) input FASTQ e.g. for sensitive host data when running metagenomic data\n\nprocess hostremoval_input_fastq {\n    label 'mc_medium'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/hostremoved_fastq\", mode: params.publish_dir_mode\n\n    when: \n    params.hostremoval_input_fastq\n\n    input: \n    tuple samplename, libraryid, seqtype, organism, strandedness, udg, file(r1), file(r2), file(bam), file(bai) from ch_synced_for_hostremovalfastq\n\n    output:\n    tuple samplename, libraryid, seqtype, organism, strandedness, udg, file(\"*.fq.gz\") into ch_output_from_hostremovalfastq\n\n    script:\n    def merged = params.skip_collapse ? \"\": \"-merged\"\n    if ( seqtype == 'SE' ) {\n        out_fwd = bam.baseName+'.hostremoved.fq.gz'\n        \"\"\"\n        samtools index $bam\n        extract_map_reads.py $bam ${r1} -m ${params.hostremoval_mode} $merged -of $out_fwd -t ${task.cpus} \n        \"\"\"\n    } else {\n        out_fwd = bam.baseName+'.hostremoved.fwd.fq.gz'\n        out_rev = bam.baseName+'.hostremoved.rev.fq.gz'\n        \"\"\"\n        samtools index $bam\n        extract_map_reads.py $bam ${r1} -rev ${r2} -m ${params.hostremoval_mode} $merged -of $out_fwd -or $out_rev -t ${task.cpus}\n        \"\"\" \n    }\n    \n}\n\n// Seqtype merging to combine paired end with single end  sequenceing data of the same libraries\n// goes here, goes into flagstat, filter etc. Important: This type of merge of this isn't technically valid for DeDup!\n// and should only be used with markduplicates!\nch_branched_for_seqtypemerge = ch_mapping_for_seqtype_merging\n  .groupTuple(by: [0,1,4,5,6])\n  .map {\n    it ->\n      def samplename = it[0]\n      def libraryid  = it[1]\n      def lane = 0\n      def seqtype = it[3].unique() // How to deal with this?\n      def organism = it[4]\n      def strandedness = it[5]\n      def udg = it[6]\n      def r1 = it[7]\n      def r2 = it[8]\n\n      // 1. We will assume if mixing it is better to set as PE as this is informative\n      // for DeDup (and markduplicates doesn't care), but will throw a warning!\n      // 2. We will also flatten to a single value to address problems with 'unstable' \n      // Nextflow ArrayBag object types not allowing the .join to work between resumes\n      // See: https://github.com/nf-core/eager/issues/880\n\n      def seqtype_new = seqtype.flatten().size() > 1 ? 'PE' : seqtype.flatten()[0] \n                      \n      if ( seqtype.flatten().size() > 1 &&  params.dedupper == 'dedup' ) {\n        log.warn \"[nf-core/eager] Warning: you are running DeDup on BAMs with a mixture of PE/SE data for library: ${libraryid}. DeDup is designed for PE data only, deduplication maybe suboptimal!\"\n      }\n      \n      [ samplename, libraryid, lane, seqtype_new, organism, strandedness, udg, r1, r2 ]\n\n  }\n  .branch {\n    skip_merge: it[7].size() == 1 // Can skip merging if only single lanes\n    merge_me: it[7].size() > 1\n  }\n\n  process seqtype_merge {\n\n    label 'sc_tiny'\n    tag \"$libraryid\"\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_branched_for_seqtypemerge.merge_me\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*_seqtypemerged.bam\"), file(\"*_seqtypemerged*.{bai,csi}\")  into ch_seqtypemerge_for_filtering\n\n    script:\n    def size = params.large_ref ? '-c' : ''\n    \"\"\"\n    samtools merge ${libraryid}_seqtypemerged.bam ${bam}\n    samtools index ${libraryid}_seqtypemerged.bam ${size}\n    \"\"\"\n    \n  }\n\nch_seqtypemerge_for_filtering\n  .mix(ch_branched_for_seqtypemerge.skip_merge)\n  .into { ch_seqtypemerged_for_skipfiltering; ch_seqtypemerged_for_samtools_filter; ch_seqtypemerged_for_samtools_flagstat } \n\n// Post-mapping QC\n\nprocess samtools_flagstat {\n    label 'sc_tiny'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/samtools/stats\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_seqtypemerged_for_samtools_flagstat\n\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*stats\") into ch_flagstat_for_multiqc,ch_flagstat_for_endorspy\n\n    script:\n    \"\"\"\n    samtools flagstat $bam > ${libraryid}_flagstat.stats\n    \"\"\"\n}\n\n\n// BAM filtering e.g. to extract unmapped reads for downstream or stricter mapping quality\n\nprocess samtools_filter {\n    label 'mc_medium'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/samtools/filter\", mode: params.publish_dir_mode,\n    saveAs: {filename ->\n            if (filename.indexOf(\".fq.gz\") > 0) \"$filename\"\n            else if (filename.indexOf(\".unmapped.bam\") > 0) \"$filename\"\n            else if (filename.indexOf(\".filtered.bam\")) \"$filename\"\n            else null\n    }\n\n    when: \n    params.run_bam_filtering\n\n    input: \n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_seqtypemerged_for_samtools_filter\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*filtered.bam\"), file(\"*.{bai,csi}\") into ch_output_from_filtering\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*.unmapped.fastq.gz\") optional true into ch_bam_filtering_for_metagenomic,ch_metagenomic_for_skipentropyfilter\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*.unmapped.bam\") optional true\n\n    script:\n    \n    def size = params.large_ref ? '-c' : ''\n    \n    // Unmapped/MAPQ Filtering WITHOUT min-length filtering\n    if ( \"${params.bam_unmapped_type}\" == \"keep\"  && params.bam_filter_minreadlength == 0 ) {\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"discard\" && params.bam_filter_minreadlength == 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"bam\" && params.bam_filter_minreadlength == 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"fastq\" && params.bam_filter_minreadlength == 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n\n        ## FASTQ\n        samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus - 1} > ${libraryid}.unmapped.fastq.gz\n        rm ${libraryid}.unmapped.bam\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"both\" && params.bam_filter_minreadlength == 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \n        ## FASTQ\n        samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus -1} > ${libraryid}.unmapped.fastq.gz\n        \"\"\"\n    // Unmapped/MAPQ Filtering WITH min-length filtering\n    } else if ( \"${params.bam_unmapped_type}\" == \"keep\" && params.bam_filter_minreadlength != 0 ) {\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam\n        filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"discard\" && params.bam_filter_minreadlength != 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam\n        filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"bam\" && params.bam_filter_minreadlength != 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam\n        filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"fastq\" && params.bam_filter_minreadlength != 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam\n        filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n\n        ## FASTQ\n        samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus - 1} > ${libraryid}.unmapped.fastq.gz\n        rm ${libraryid}.unmapped.bam\n        \"\"\"\n    } else if ( \"${params.bam_unmapped_type}\" == \"both\" && params.bam_filter_minreadlength != 0 ){\n        \"\"\"\n        samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam\n        samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam\n        filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam\n        samtools index ${libraryid}.filtered.bam ${size}\n        \n        ## FASTQ\n        samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus} > ${libraryid}.unmapped.fastq.gz\n        \"\"\"\n    }\n}\n\n// samtools_filter bypass in case not run\nif (params.run_bam_filtering) {\n    ch_seqtypemerged_for_skipfiltering.mix(ch_output_from_filtering)\n        .filter { it =~/.*filtered.bam/ }\n        .into { ch_filtering_for_skiprmdup; ch_filtering_for_dedup; ch_filtering_for_markdup; ch_filtering_for_flagstat; ch_skiprmdup_for_libeval; ch_mapped_for_preseq } \n\n} else {\n    ch_seqtypemerged_for_skipfiltering\n        .into { ch_filtering_for_skiprmdup; ch_filtering_for_dedup; ch_filtering_for_markdup; ch_filtering_for_flagstat; ch_skiprmdup_for_libeval; ch_mapped_for_preseq } \n\n}\n\n// Post filtering mapping QC - particularly to help see how much was removed from mapping quality filtering\n\nprocess samtools_flagstat_after_filter {\n    label 'sc_tiny'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/samtools/filtered_stats\", mode: params.publish_dir_mode\n\n    when:\n    params.run_bam_filtering\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_filtering_for_flagstat\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.stats\") into ch_bam_filtered_flagstat_for_multiqc, ch_bam_filtered_flagstat_for_endorspy\n\n    script:\n    \"\"\"\n    samtools flagstat $bam > ${libraryid}_postfilterflagstat.stats\n    \"\"\"\n}\n\nif (params.run_bam_filtering) {\n  ch_flagstat_for_endorspy\n    .join(ch_bam_filtered_flagstat_for_endorspy, by: [0,1,2,3,4,5,6])\n    .set{ ch_allflagstats_for_endorspy }\n\n} else {\n  // Add a file entry to match expected no. tuple elements for endorS.py even if not giving second file\n  ch_flagstat_for_endorspy\n    .map { it -> \n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def stats = file(it[7])\n        def poststats = file(\"$projectDir/assets/nf-core_eager_dummy.txt\")\n\n      [samplename, libraryid, lane, seqtype, organism, strandedness, udg, stats, poststats ] \n    }\n    .set{ ch_allflagstats_for_endorspy }\n}\n\n// Endogenous DNA calculator to say how much of a library contained 'on-target' DNA\n\nprocess endorSpy {\n    label 'sc_tiny'\n    tag \"$libraryid\"\n    publishDir \"${params.outdir}/endorspy\", mode: params.publish_dir_mode\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(stats), path(poststats) from ch_allflagstats_for_endorspy\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.json\") into ch_endorspy_for_multiqc\n\n    script:\n    if (params.run_bam_filtering) {\n      \"\"\"\n      endorS.py -o json -n ${libraryid} ${stats} ${poststats}\n      \"\"\"\n    } else {\n      \"\"\"\n      endorS.py -o json -n ${libraryid} ${stats}\n      \"\"\"\n    }\n}\n\n// Post-mapping PCR amplicon removal because these lab artefacts inflate coverage statistics\n\nprocess dedup{\n    label 'mc_small'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/deduplication/\", mode: params.publish_dir_mode,\n        saveAs: {filename -> \"${libraryid}/$filename\"}\n\n    when:\n    !params.skip_deduplication && params.dedupper == 'dedup'\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_filtering_for_dedup\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.hist\") into ch_hist_for_preseq\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.json\") into ch_dedup_results_for_multiqc\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${libraryid}_rmdup.bam\"), path(\"*.{bai,csi}\") into ch_output_from_dedup, ch_dedup_for_libeval\n\n    script:\n    def treat_merged = params.dedup_all_merged ? '-m' : ''\n    def size = params.large_ref ? '-c' : ''\n    \n    if ( bam.baseName != libraryid ) {\n    // To make sure direct BAMs have a clean name\n    \"\"\"\n    mv ${bam} ${libraryid}.bam\n    dedup -Xmx${task.memory.toGiga()}g -i ${libraryid}.bam $treat_merged -o . -u \n    mv *.log dedup.log\n    samtools sort -@ ${task.cpus} \"${libraryid}\"_rmdup.bam -o \"${libraryid}\"_rmdup.bam\n    samtools index \"${libraryid}\"_rmdup.bam ${size}\n    \"\"\"\n    } else {\n    \"\"\"\n    dedup -Xmx${task.memory.toGiga()}g -i ${libraryid}.bam $treat_merged -o . -u \n    mv *.log dedup.log\n    samtools sort -@ ${task.cpus} \"${libraryid}\"_rmdup.bam -o \"${libraryid}\"_rmdup.bam\n    samtools index \"${libraryid}\"_rmdup.bam ${size}\n    \"\"\"\n    }\n}\n\nprocess markduplicates{\n    label 'mc_small'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/deduplication/\", mode: params.publish_dir_mode,\n        saveAs: {filename -> \"${libraryid}/$filename\"}\n\n    when:\n    !params.skip_deduplication && params.dedupper == 'markduplicates'\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_filtering_for_markdup\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.metrics\") into ch_markdup_results_for_multiqc\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${libraryid}_rmdup.bam\"), path(\"*.{bai,csi}\") into ch_output_from_markdup, ch_markdup_for_libeval\n\n    script:\n    def size = params.large_ref ? '-c' : ''\n\n    if ( bam.baseName != libraryid ) {\n    // To make sure direct BAMs have a clean name\n    \"\"\"\n    mv ${bam} ${libraryid}.bam\n    picard -Xmx${task.memory.toMega()}M MarkDuplicates INPUT=${libraryid}.bam OUTPUT=${libraryid}_rmdup.bam REMOVE_DUPLICATES=TRUE AS=TRUE METRICS_FILE=\"${libraryid}_rmdup.metrics\" VALIDATION_STRINGENCY=SILENT\n    samtools index ${libraryid}_rmdup.bam ${size}\n    \"\"\"\n    } else {\n    \"\"\"\n    picard -Xmx${task.memory.toMega()}M MarkDuplicates INPUT=${libraryid}.bam OUTPUT=${libraryid}_rmdup.bam REMOVE_DUPLICATES=TRUE AS=TRUE METRICS_FILE=\"${libraryid}_rmdup.metrics\" VALIDATION_STRINGENCY=SILENT\n    samtools index ${libraryid}_rmdup.bam ${size}\n    \"\"\"\n    }\n\n}\n\n// This is for post-deduplcation per-library evaluation steps _without_ any \n// form of library merging. \nif ( params.skip_deduplication ) {\n  ch_skiprmdup_for_libeval.mix(ch_dedup_for_libeval, ch_markdup_for_libeval)\n    .into{ ch_rmdup_for_preseq; ch_rmdup_for_damageprofiler; ch_rmdup_for_mapdamage; ch_for_nuclear_contamination; ch_rmdup_formtnucratio }\n} else {\n  ch_dedup_for_libeval.mix(ch_markdup_for_libeval)\n    .into{ ch_rmdup_for_preseq; ch_rmdup_for_damageprofiler; ch_rmdup_for_mapdamage; ch_for_nuclear_contamination; ch_rmdup_formtnucratio }\n}\n\n// Merge independent libraries sequenced but with same treatment (often done to \n// improve complexity) with the same _sample_ name. Different strand/UDG libs \n// not merged because bamtrim/pmdtools/genotyping needs that info.\n\n// Step one: work out which are single libraries (from skipping rmdup and both dedups) that do not need merging and pass to a skipping\nif ( params.skip_deduplication ) {\n  ch_input_for_librarymerging = ch_filtering_for_skiprmdup\n    .groupTuple(by:[0,4,5,6])\n    .branch{\n      clean_libraryid: it[7].size() == 1\n      merge_me: it[7].size() > 1\n    }\n} else {\n    ch_input_for_librarymerging = ch_output_from_dedup.mix(ch_output_from_markdup)\n    .groupTuple(by:[0,4,5,6])\n    .branch{\n      clean_libraryid: it[7].size() == 1\n      merge_me: it[7].size() > 1\n    }\n}\n\n// For non-merging libraries, fix group libraryIDs into single values. \n// This is a bit hacky as theoretically could have different, but this should\n// rarely be the case.\nch_input_for_librarymerging.clean_libraryid\n  .map{\n    it ->\n      def libraryid = it[1][0]\n      def bam = it[7].flatten()\n      def bai = it[8].flatten()\n\n      [it[0], libraryid, it[2], it[3], it[4], it[5], it[6], bam, bai ]\n    }\n  .set { ch_input_for_skiplibrarymerging }\n\nch_input_for_librarymerging.merge_me\n  .map{\n    it ->\n      def libraryid = it[1][0]\n      def seqtype = \"merged\"\n      def bam = it[7].flatten()\n      def bai = it[8].flatten()\n\n      [it[0], libraryid, it[2], seqtype, it[4], it[5], it[6], bam, bai ]\n    }\n  .set { ch_fixedinput_for_librarymerging }\n\nprocess library_merge {\n  label 'sc_tiny'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/merged_bams/initial\", mode: params.publish_dir_mode\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_fixedinput_for_librarymerging\n\n  output:\n  tuple samplename, val(\"${samplename}_libmerged\"), lane, seqtype, organism, strandedness, udg, path(\"*_libmerged_rmdup.bam\"), path(\"*_libmerged_rmdup.bam.{bai,csi}\") into ch_output_from_librarymerging\n\n  script:\n  def size = params.large_ref ? '-c' : ''\n  \"\"\"\n  samtools merge ${samplename}_udg${udg}_libmerged_rmdup.bam ${bam}\n  samtools index ${samplename}_udg${udg}_libmerged_rmdup.bam ${size}\n  \"\"\"\n}\n\n// Mix back in libraries from skipping dedup, skipping library merging\nif (!params.skip_deduplication) {\n    ch_input_for_skiplibrarymerging.mix(ch_output_from_librarymerging)\n        .filter { it =~/.*_rmdup.bam/ }\n        .into { ch_rmdup_for_skipdamagemanipulation;  ch_rmdup_for_pmdtools; ch_rmdup_for_bamutils; ch_rmdup_for_bedtools; ch_rmdup_for_damagerescaling } \n\n} else {\n    ch_input_for_skiplibrarymerging.mix(ch_output_from_librarymerging)\n        .into { ch_rmdup_for_skipdamagemanipulation; ch_rmdup_for_pmdtools; ch_rmdup_for_bamutils; ch_rmdup_for_bedtools; ch_rmdup_for_damagerescaling } \n}\n\n//////////////////////////////////////////////////\n/* --     POST DEDUPLICATION EVALUATION      -- */\n//////////////////////////////////////////////////\n\n// Library complexity calculation from mapped reads - could a user cost-effectively sequence deeper for more unique information?\nif ( params.skip_deduplication ) {\n  ch_input_for_preseq = ch_rmdup_for_preseq.map{ it[0,1,2,3,4,5,6,7] }\n\n} else if ( !params.skip_deduplication && params.dedupper == \"markduplicates\" ) {\n  ch_input_for_preseq = ch_mapped_for_preseq.map{ it[0,1,2,3,4,5,6,7] }\n\n} else if ( !params.skip_deduplication && params.dedupper == \"dedup\" ) {\n  ch_input_for_preseq = ch_hist_for_preseq\n\n}\n\nprocess preseq {\n    label 'sc_tiny'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/preseq\", mode: params.publish_dir_mode\n\n    when:\n    !params.skip_preseq\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(input) from ch_input_for_preseq\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${input.baseName}.preseq\") into ch_preseq_for_multiqc\n\n    script:\n    pe_mode = params.skip_collapse && seqtype == \"PE\" ? '-P' : ''\n    if(!params.skip_deduplication && params.preseq_mode == 'c_curve' && params.dedupper == \"dedup\"){\n    \"\"\"\n    preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -H ${input}\n    \"\"\"\n    } else if( !params.skip_deduplication && params.preseq_mode == 'c_curve' && params.dedupper == \"markduplicates\"){\n    \"\"\"\n    preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode}\n    \"\"\"\n    } else if ( params.skip_deduplication && params.preseq_mode == 'c_curve' ) {\n    \"\"\"\n    preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode}\n    \"\"\"\n    } else if(!params.skip_deduplication && params.preseq_mode == 'lc_extrap' && params.dedupper == \"dedup\"){\n    \"\"\"\n    preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -H ${input} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms}\n    \"\"\"\n    } else if( !params.skip_deduplication && params.preseq_mode == 'lc_extrap' && params.dedupper == \"markduplicates\"){\n    \"\"\"\n    preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms}\n    \"\"\"\n    } else if ( params.skip_deduplication && params.preseq_mode == 'lc_extrap' ) {\n    \"\"\"\n    preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms}\n    \"\"\"\n    }\n}\n\n// Optional mapping statistics for specific annotations - e.g. genes in bacterial genome\n\nprocess bedtools {\n  label 'mc_small'\n  tag \"${libraryid}\"\n  publishDir \"${params.outdir}/bedtools\", mode: params.publish_dir_mode\n\n  when:\n  params.run_bedtools_coverage\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_bedtools\n  file anno_file from ch_anno_for_bedtools.collect()\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*\")\n\n  script:\n  sorting_of_anno = params.anno_file_is_unsorted ? \"\" : \"-sorted\"\n  \"\"\"\n  ## Create genome file from bam header\n  samtools view -H ${bam} | grep '@SQ' | sed 's#@SQ\\tSN:\\\\|LN:##g' > genome.txt\n  \n  ##  Run bedtools\n  bedtools coverage -nonamecheck -g genome.txt ${sorting_of_anno} -a ${anno_file} -b ${bam} | pigz -p ${task.cpus - 1} > \"${bam.baseName}\".breadth.gz\n  bedtools coverage -nonamecheck -g genome.txt ${sorting_of_anno} -a ${anno_file} -b ${bam} -mean | pigz -p ${task.cpus - 1} > \"${bam.baseName}\".depth.gz\n  \"\"\"\n}\n\n//////////////////////////////////////////////////////////////\n/* --    ANCIENT DNA EVALUATION AND BAM MODIFICATION     -- */\n//////////////////////////////////////////////////////////////\n\n// Calculate typical aDNA damage frequency distribution with DamageProfiler\n\nprocess damageprofiler {\n    label 'sc_small'\n    tag \"${libraryid}\"\n\n    publishDir \"${params.outdir}/damageprofiler\", mode: params.publish_dir_mode\n\n    when:\n    !params.skip_damage_calculation && params.damage_calculation_tool == 'damageprofiler'\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_damageprofiler\n    file fasta from ch_fasta_for_damageprofiler.collect()\n    file fai from ch_fai_for_damageprofiler.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${base}/*.txt\") optional true\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${base}/*.log\")\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${base}/*.pdf\") optional true\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"${base}/*.json\") optional true into ch_damageprofiler_results\n\n    script:\n    base = \"${bam.baseName}\"\n    \"\"\"\n    damageprofiler -Xmx${task.memory.toGiga()}g -i $bam -r $fasta -l ${params.damageprofiler_length} -t ${params.damageprofiler_threshold} -o . -yaxis_damageplot ${params.damageprofiler_yaxis}\n    \"\"\"\n}\n\n// Calculate typical aDNA damage frequency distribution with mapDamage\n\nprocess mapdamage_calculation {\n    label 'sc_small'\n    tag \"${libraryid}\"\n\n    publishDir \"${params.outdir}/mapdamage\", mode: params.publish_dir_mode\n\n    when:\n    !params.skip_damage_calculation && params.damage_calculation_tool == 'mapdamage'\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_mapdamage\n    file fasta from ch_fasta_for_mapdamage.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"results_${base}\") into ch_output_from_mapdamage\n    path (\"results_${base}\") into ch_mapdamage_for_multiqc\n\n    script:\n    base = \"${bam.baseName}\"\n    def singlestranded = strandedness == \"single\" ? '--single-stranded' : ''\n    def downsample = params.mapdamage_downsample != 0 ? \"-n ${params.mapdamage_downsample} --downsample-seed=1\" : '' // Include seed to make results consistent between runs\n    \"\"\"\n    mapDamage -i ${bam} -r ${fasta} ${singlestranded} ${downsample} --ymax=${params.mapdamage_yaxis} --no-stats\n    \"\"\"\n}\n\n// Damage rescaling with mapDamage\n\nprocess mapdamage_rescaling {\n\n    label 'sc_small'\n    tag \"${libraryid}\"\n\n    publishDir \"${params.outdir}/damage_rescaling\", mode: params.publish_dir_mode\n\n    when:\n    params.run_mapdamage_rescaling\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_damagerescaling\n    file fasta from ch_fasta_for_damagerescaling.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_rescaled.bam\"), path(\"*rescaled.bam.{bai,csi}\") into ch_output_from_damagerescaling\n\n    script:\n    def base = \"${bam.baseName}\"\n    def singlestranded = strandedness == \"single\" ? '--single-stranded' : ''\n    def size = params.large_ref ? '-c' : ''\n    def rescale_length_3p = params.rescale_length_3p != 0 ? \"--rescale-length-3p=${params.rescale_length_3p}\" : \"\"\n    def rescale_length_5p = params.rescale_length_5p != 0 ? \"--rescale-length-5p=${params.rescale_length_5p}\" : \"\"\n    \"\"\"\n    mapDamage -i ${bam} -r ${fasta} --rescale --rescale-out=\"${base}_rescaled.bam\" --seq-length=${params.rescale_seqlength} ${rescale_length_5p} ${rescale_length_3p} ${singlestranded}\n    samtools index ${base}_rescaled.bam ${size}\n    \"\"\"\n\n}\n\n// Optionally perform further aDNA evaluation or filtering for just reads with damage etc.\n\nprocess mask_reference_for_pmdtools {\n    label 'sc_tiny'\n    tag \"${fasta}\"\n    publishDir \"${params.outdir}/reference_genome/masked_reference\", mode: params.publish_dir_mode\n\n    when: (params.pmdtools_reference_mask && params.run_pmdtools)\n\n    input:\n    file fasta from ch_unmasked_fasta_for_masking\n    file bedfile from ch_bedfile_for_reference_masking\n\n    output:\n    file \"${fasta.baseName}_masked.fa\" into ch_masked_fasta_for_pmdtools\n\n    script:\n    log.info \"[nf-core/eager]: Masking reference \\'${fasta}\\' at positions found in \\'${bedfile}\\'. Masked reference will be used for pmdtools.\"\n    \"\"\"\n    bedtools maskfasta -fi ${fasta} -bed ${bedfile} -fo ${fasta.baseName}_masked.fa\n    \"\"\"\n}\n\n// If masking was requested, use masked reference for pmdtools, else use original reference\nif (params.pmdtools_reference_mask) {\n  ch_masked_fasta_for_pmdtools.set{ch_fasta_for_pmdtools}\n} else {\n  ch_unmasked_fasta_for_pmdtools.set{ch_fasta_for_pmdtools}\n}\n\nprocess pmdtools {\n    label 'mc_medium'\n    tag \"${libraryid}\"\n    publishDir \"${params.outdir}/pmdtools\", mode: params.publish_dir_mode\n\n    when: params.run_pmdtools\n\n    input: \n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_pmdtools\n    file fasta from ch_fasta_for_pmdtools.collect()\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.pmd.bam\"), path(\"*.pmd.bam.{bai,csi}\") into ch_output_from_pmdtools\n    file \"*.cpg.range*.txt\"\n\n    script:\n    //Check which treatment for the libraries was used\n    def treatment = udg ? (udg == 'half' ? '--UDGhalf' : '--CpG') : '--UDGminus'\n    def size = params.large_ref ? '-c' : ''\n    def platypus = params.pmdtools_platypus ? '--platypus' : ''\n    \"\"\"\n    #Run Filtering step \n    samtools calmd ${bam} ${fasta} | pmdtools --threshold ${params.pmdtools_threshold} ${treatment} --header | samtools view -Sb - > \"${libraryid}\".pmd.bam\n    \n    #Run Calc Range step\n    ## To allow early shut off of pipe: https://github.com/nextflow-io/nextflow/issues/1564\n    trap 'if [[ \\$? == 141 ]]; then echo \"Shutting samtools early due to -n parameter\" && samtools index ${libraryid}.pmd.bam ${size}; exit 0; fi' EXIT\n    samtools calmd ${bam} ${fasta} | pmdtools --deamination ${platypus} --range ${params.pmdtools_range} ${treatment} -n ${params.pmdtools_max_reads} > \"${libraryid}\".cpg.range.\"${params.pmdtools_range}\".txt\n    \n    samtools index ${libraryid}.pmd.bam ${size}\n    \"\"\"\n}\n\n// BAM Trimming for just non-UDG or half-UDG libraries to remove damage prior genotyping\n\nif ( params.run_trim_bam ) {\n\n    // You wouldn't want to make UDG treated reads even shorter, so skip trimming if UDG.\n    // We assume same trim amount for both non-UDG/UDG half as could trim a bit more off half-UDG to match non-UDG if needed, with minimal effect \n    // Note: Trimming of e.g. adapters are sequencing artefacts and should be removed before mapping, so we don't account for this here.\n    ch_bamutils_decision = ch_rmdup_for_bamutils.branch{\n        totrim: it[6] == 'none' || it[6] == 'half' \n        notrim: it[6] == 'full'\n    }\n\n} else {\n\n    ch_bamutils_decision = ch_rmdup_for_bamutils.branch{\n        totrim: it[6] == \"dummy\"\n        notrim: it[6] == 'full' || it[6] == 'none' || it[6] == 'half'\n    }\n\n}\n\nprocess bam_trim {\n    label 'mc_small'\n    tag \"${libraryid}\" \n    publishDir \"${params.outdir}/trimmed_bam\", mode: params.publish_dir_mode\n\n    when: params.run_trim_bam\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_bamutils_decision.totrim\n\n    output: \n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.trimmed.bam\"), path(\"*.trimmed.bam.{bai,csi}\") into ch_trimmed_from_bamutils\n\n    script:\n    def softclip = params.bamutils_softclip ? '-c' : '' \n    def size = params.large_ref ? '-c' : ''\n    def left_clipping = strandedness == \"double\" ? (udg == \"half\" ? \"${params.bamutils_clip_double_stranded_half_udg_left}\" : \"${params.bamutils_clip_double_stranded_none_udg_left}\") : (udg == \"half\" ? \"${params.bamutils_clip_single_stranded_half_udg_left}\" : \"${params.bamutils_clip_single_stranded_none_udg_left}\")\n    def right_clipping = strandedness == \"double\" ? (udg == \"half\" ? \"${params.bamutils_clip_double_stranded_half_udg_right}\" : \"${params.bamutils_clip_double_stranded_none_udg_right}\") : (udg == \"half\" ? \"${params.bamutils_clip_single_stranded_half_udg_right}\" : \"${params.bamutils_clip_single_stranded_none_udg_right}\")\n\n    // def left_clipping = udg == \"half\" ? \"${params.bamutils_clip_half_udg_left}\" : \"${params.bamutils_clip_none_udg_left}\"\n    // def right_clipping = udg == \"half\" ? \"${params.bamutils_clip_half_udg_right}\" : \"${params.bamutils_clip_none_udg_right}\"\n    \"\"\"\n    bam trimBam $bam tmp.bam -L ${left_clipping} -R ${right_clipping} ${softclip}\n    samtools sort -@ ${task.cpus} tmp.bam -o ${libraryid}_udg${udg}.trimmed.bam \n    samtools index ${libraryid}_udg${udg}.trimmed.bam ${size}\n    \"\"\"\n}\n\n// Post-trimming merging of libraries to single samples, except for SS/DS \n// libraries as they should be genotyped separately, because we will assume \n// that if trimming is turned on, 'lab-removed' libraries can be combined with \n// merged with 'in-silico damage removed' libraries to improve genotyping\n\nch_trimmed_formerge = ch_bamutils_decision.notrim\n  .mix(ch_trimmed_from_bamutils)\n  .groupTuple(by:[0,4,5])\n  .map{\n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def bam = it[7].flatten()\n        def bai = it[8].flatten()\n\n      [samplename, libraryid, lane, seqtype, organism, strandedness, udg, bam, bai ]\n  }\n  .branch{\n    skip_merging: it[7].size() == 1\n    merge_me: it[7].size() > 1\n  }\n\n//////////////////////////////////////////////////////////////////////////\n/* --    POST aDNA BAM MODIFICATION AND FINAL MAPPING STATISTICS     -- */\n//////////////////////////////////////////////////////////////////////////\n\nprocess additional_library_merge {\n  label 'sc_tiny'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/merged_bams/additional\", mode: params.publish_dir_mode\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_trimmed_formerge.merge_me\n\n  output:\n  tuple samplename, val(\"${samplename}_libmerged\"), lane, seqtype, organism, strandedness, udg, path(\"*_libmerged_add.bam\"), path(\"*_libmerged_add.bam.{bai,csi}\") into ch_output_from_trimmerge\n\n  script:\n  def size = params.large_ref ? '-c' : ''\n  \"\"\"\n  samtools merge ${samplename}_libmerged_add.bam ${bam}\n  samtools index ${samplename}_libmerged_add.bam ${size}\n  \"\"\"\n}\n\nch_trimmed_formerge.skip_merging\n  .mix(ch_output_from_trimmerge)\n  .into{ ch_output_from_bamutils; ch_addlibmerge_for_qualimap; ch_for_sexdeterrmine_prep }\n\n  // General mapping quality statistics for whole reference sequence - e.g. X and % coverage\n\nprocess qualimap {\n    label 'mc_small'\n    tag \"${samplename}\"\n    publishDir \"${params.outdir}/qualimap\", mode: params.publish_dir_mode\n\n    when:\n    !params.skip_qualimap\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_addlibmerge_for_qualimap\n    file fasta from ch_fasta_for_qualimap.collect()\n    path snpcapture_bed from ch_snpcapture_bed \n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*\") into ch_qualimap_results\n\n    script:\n    def snpcap = snpcapture_bed.getName() != 'nf-core_eager_dummy.txt' ? \"-gff ${snpcapture_bed}\" : ''\n    \"\"\"\n    qualimap bamqc -bam $bam -nt ${task.cpus} -outdir . -outformat \"HTML\" ${snpcap} --java-mem-size=${task.memory.toGiga()}G\n    \"\"\"\n}\n\n/////////////////////////////\n/* --    GENOTYPING     -- */\n/////////////////////////////\n\n// Reroute files for genotyping; we have to ensure to select lib-merged BAMs, as input channel will also contain the un-merged ones resulting in unwanted multi-sample VCFs\nif ( params.run_genotyping && params.genotyping_source == 'raw' ) {\n    ch_output_from_bamutils\n      .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( params.run_genotyping && params.genotyping_source == \"trimmed\" && !params.run_trim_bam )  {\n    exit 1, \"[nf-core/eager] error: Cannot run genotyping with 'trimmed' source without running BAM trimming (--run_trim_bam)! Please check input parameters.\"\n\n} else if ( params.run_genotyping && params.genotyping_source == \"trimmed\" && params.run_trim_bam )  {\n    ch_output_from_bamutils\n        .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( params.run_genotyping && params.genotyping_source == \"pmd\" && !params.run_pmdtools )  {\n    exit 1, \"[nf-core/eager] error: Cannot run genotyping with 'pmd' source without running pmdtools (--run_pmdtools)! Please check input parameters.\"\n\n} else if ( params.run_genotyping && params.genotyping_source == \"pmd\" && params.run_pmdtools )  {\n  ch_output_from_pmdtools\n    .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( params.run_genotyping && params.genotyping_source == \"rescaled\" && params.run_mapdamage_rescaling) {\n  ch_output_from_damagerescaling\n    .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( params.run_genotyping && params.genotyping_source == \"rescaled\" && !params.run_mapdamage_rescaling) {\n    exit 1, \"[nf-core/eager] error: Cannot run genotyping with 'rescaled' source without running damage rescaling (--run_damagescaling)! Please check input parameters.\"\n\n} else if ( !params.run_genotyping && !params.run_trim_bam && !params.run_pmdtools )  {\n    ch_rmdup_for_skipdamagemanipulation\n    .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( !params.run_genotyping && !params.run_trim_bam && params.run_pmdtools )  {\n    ch_rmdup_for_skipdamagemanipulation\n    .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n} else if ( !params.run_genotyping && params.run_trim_bam && !params.run_pmdtools )  {\n    ch_rmdup_for_skipdamagemanipulation\n    .into { ch_damagemanipulation_for_skipgenotyping; ch_damagemanipulation_for_readgroupreplacement; ch_damagemanipulation_for_genotyping_ug; ch_damagemanipulation_for_genotyping_hc; ch_damagemanipulation_for_genotyping_freebayes; ch_damagemanipulation_for_genotyping_pileupcaller; ch_damagemanipulation_for_genotyping_angsd }\n\n}\n\n// replace readgroups to ensure single 'sample' per VCF for MultiVCFAnalyzer only\n\nprocess picard_addorreplacereadgroups {\n  label 'sc_tiny'\n  tag \"${samplename}\"\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'ug' && params.run_multivcfanalyzer\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_damagemanipulation_for_readgroupreplacement\n\n  output:\n  tuple samplename, val(\"${samplename}\"), lane, seqtype, organism, strandedness, udg, path(\"*rg.bam\"), path(\"*rg.bam.{bai,csi}\") into ch_readgroup_replacement_for_ug\n\n  script:\n  def size = params.large_ref ? '-c' : ''\n  \"\"\"\n  picard -Xmx${task.memory.toGiga()}g AddOrReplaceReadGroups I=${bam} O=${samplename}_rg.bam RGID=1 RGLB=\"${samplename}_rg\" RGPL=illumina RGPU=4410 RGSM=\"${samplename}_rg\" VALIDATION_STRINGENCY=LENIENT\n  samtools index ${samplename}_rg.bam ${size}\n  \"\"\"\n\n}\n\nif ( params.run_genotyping && params.genotyping_tool == 'ug' && params.run_multivcfanalyzer ) {\n  ch_input_for_ug = ch_readgroup_replacement_for_ug\n} else {\n  ch_input_for_ug = ch_damagemanipulation_for_genotyping_ug\n}\n\n// Unified Genotyper - although not-supported, better for aDNA (because HC does de novo assembly which requires higher coverages), and needed for MultiVCFAnalyzer\n\nprocess genotyping_ug {\n  label 'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode, pattern: '*{.vcf.gz,.realign.bam,realign.bai}'\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'ug'\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_input_for_ug\n  file fasta from ch_fasta_for_genotyping_ug.collect()\n  file fai from ch_fai_for_ug.collect()\n  file dict from ch_dict_for_ug.collect()\n\n  output: \n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*vcf.gz\") into ch_ug_for_multivcfanalyzer,ch_ug_for_vcf2genome,ch_ug_for_bcftools_stats\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(\"*.realign.{bam,bai}\") optional true\n\n  script:\n  def defaultbasequalities = !params.gatk_ug_defaultbasequalities ? '' : \" --defaultBaseQualities ${params.gatk_ug_defaultbasequalities}\" \n  def keep_realign = params.gatk_ug_keep_realign_bam ? \"samtools index ${samplename}.realign.bam\" : \"rm ${samplename}.realign.{bam,bai}\"\n  if (!params.gatk_dbsnp)\n    \"\"\"\n    samtools index -b ${bam}\n    gatk3 -Xmx${task.memory.toGiga()}g -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}\n    gatk3 -Xmx${task.memory.toGiga()}g -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplename}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}\n    gatk3 -Xmx${task.memory.toGiga()}g -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}\n    \n    $keep_realign\n    \n    bgzip -@ ${task.cpus} ${samplename}.unifiedgenotyper.vcf\n    \"\"\"\n  else if (params.gatk_dbsnp)\n    \"\"\"\n    samtools index ${bam}\n    gatk3 -Xmx${task.memory.toGiga()}g -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}\n    gatk3 -Xmx${task.memory.toGiga()}g -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplename}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}\n    gatk3 -Xmx${task.memory.toGiga()}g -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --dbsnp ${params.gatk_dbsnp} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}\n    \n    $keep_realign\n    \n    bgzip -@  ${task.cpus} ${samplename}.unifiedgenotyper.vcf\n    \"\"\"\n}\n\n // HaplotypeCaller as 'best practise' tool for human DNA in particular \n\nprocess genotyping_hc {\n  label 'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'hc'\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_damagemanipulation_for_genotyping_hc\n  file fasta from ch_fasta_for_genotyping_hc.collect()\n  file fai from ch_fai_for_hc.collect()\n  file dict from ch_dict_for_hc.collect()\n\n  output: \n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*vcf.gz\") into ch_hc_for_bcftools_stats\n\n  script:\n  if (!params.gatk_dbsnp)\n    \"\"\"\n    gatk HaplotypeCaller --java-options \"-Xmx${task.memory.toGiga()}G\" -R ${fasta} -I ${bam} -O ${samplename}.haplotypecaller.vcf -stand-call-conf ${params.gatk_call_conf} --sample-ploidy ${params.gatk_ploidy} --output-mode ${params.gatk_hc_out_mode} --emit-ref-confidence ${params.gatk_hc_emitrefconf}\n    bgzip -@ ${task.cpus} ${samplename}.haplotypecaller.vcf\n    \"\"\"\n\n  else if (params.gatk_dbsnp)\n    \"\"\"\n    gatk HaplotypeCaller --java-options \"-Xmx${task.memory.toGiga()}G\" -R ${fasta} -I ${bam} -O ${samplename}.haplotypecaller.vcf --dbsnp ${params.gatk_dbsnp} -stand-call-conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} --output_mode ${params.gatk_hc_out_mode} --emit-ref-confidence ${params.gatk_hc_emitrefconf}\n    bgzip -@  ${task.cpus} ${samplename}.haplotypecaller.vcf\n    \"\"\"\n}\n\n // Freebayes for 'more efficient/simple' and more generic genotyping (vs HC) \n\nprocess genotyping_freebayes {\n  label 'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'freebayes'\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_damagemanipulation_for_genotyping_freebayes\n  file fasta from ch_fasta_for_genotyping_freebayes.collect()\n  file fai from ch_fai_for_freebayes.collect()\n  file dict from ch_dict_for_freebayes.collect()\n\n  output: \n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*vcf.gz\") into ch_fb_for_bcftools_stats\n  \n  script:\n  def skip_coverage = \"${params.freebayes_g}\" == 0 ? \"\" : \"-g ${params.freebayes_g}\"\n  \"\"\"\n  freebayes -f ${fasta} -p ${params.freebayes_p} -C ${params.freebayes_C} ${skip_coverage} ${bam} > ${samplename}.freebayes.vcf\n  bgzip -@  ${task.cpus} ${samplename}.freebayes.vcf\n  \"\"\"\n}\n\n\n // Branch channel by strandedness\nch_damagemanipulation_for_genotyping_pileupcaller\n  .branch{\n      singleStranded: it[5] == \"single\"\n      doubleStranded: it[5] == \"double\"\n  }\n  .set{ch_input_for_genotyping_pileupcaller}\n\n // Create pileupcaller input tuples\nch_input_for_genotyping_pileupcaller.singleStranded\n  .groupTuple(by:[5])\n  .map{\n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def bam = it[7].flatten()\n        def bai = it[8].flatten()\n\n      [samplename, libraryid, lane, seqtype, organism, strandedness, udg, bam, bai ]\n  }\n  .set {ch_prepped_for_pileupcaller_single}\n\nch_input_for_genotyping_pileupcaller.doubleStranded\n  .groupTuple(by:[5])\n  .map{\n        def samplename = it[0]\n        def libraryid  = it[1]\n        def lane = it[2]\n        def seqtype = it[3]\n        def organism = it[4]\n        def strandedness = it[5]\n        def udg = it[6]\n        def bam = it[7].flatten()\n        def bai = it[8].flatten()\n\n      [samplename, libraryid, lane, seqtype, organism, strandedness, udg, bam, bai ]\n  }\n  .set {ch_prepped_for_pileupcaller_double}\n\nprocess genotyping_pileupcaller {\n  label 'mc_small'\n  tag \"${strandedness}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'pileupcaller'\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_prepped_for_pileupcaller_double.mix(ch_prepped_for_pileupcaller_single)\n  file fasta from ch_fasta_for_genotyping_pileupcaller.collect()\n  file fai from ch_fai_for_pileupcaller.collect()\n  file dict from ch_dict_for_pileupcaller.collect()\n  path(bed) from ch_bed_for_pileupcaller.collect()\n  path(snp) from ch_snp_for_pileupcaller.collect()\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"pileupcaller.${strandedness}.*\") into ch_for_eigenstrat_snp_coverage\n\n  script:\n  def use_bed = bed.getName() != 'nf-core_eager_dummy.txt' ? \"-l ${bed}\" : ''\n  def use_snp = snp.getName() != 'nf-core_eager_dummy2.txt' ? \"-f ${snp}\" : ''\n\n  def transitions_mode = strandedness == \"single\" ? \"\" : \"${params.pileupcaller_transitions_mode}\" == 'SkipTransitions' ? \"--skipTransitions\" : \"${params.pileupcaller_transitions_mode}\" == 'TransitionsMissing' ? \"--transitionsMissing\" : \"\"\n  def caller = \"--${params.pileupcaller_method}\"\n  def ssmode = strandedness == \"single\" ? \"--singleStrandMode\" : \"\"\n  def bam_list = bam.flatten().join(\" \")\n  def sample_names = samplename.flatten().join(\",\")\n  def map_q = params.pileupcaller_min_map_quality\n  def base_q = params.pileupcaller_min_base_quality\n\n  \"\"\"\n  samtools mpileup -B --ignore-RG -q ${map_q} -Q ${base_q} ${use_bed} -f ${fasta} ${bam_list} | pileupCaller ${caller} ${ssmode} ${transitions_mode} --sampleNames ${sample_names} ${use_snp} -e pileupcaller.${strandedness}\n  \"\"\"\n}\n\nprocess eigenstrat_snp_coverage {\n  label 'mc_tiny'\n  tag \"${strandedness}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode\n  \n  when:\n  params.run_genotyping && params.genotyping_tool == 'pileupcaller'\n  \n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*\") from ch_for_eigenstrat_snp_coverage\n  \n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.json\") into ch_eigenstrat_snp_cov_for_multiqc\n  path(\"*_eigenstrat_coverage.txt\")\n  \n  script:\n  /* \n  The following code block can be swapped in once the eigenstratdatabasetools MultiQC module becomes available.\n  \"\"\"\n  eigenstrat_snp_coverage -i pileupcaller.${strandedness} >${strandedness}_eigenstrat_coverage.txt -j ${strandedness}_eigenstrat_coverage_mqc.json\n  \"\"\"\n  */\n  \"\"\"\n  eigenstrat_snp_coverage -i pileupcaller.${strandedness} >${strandedness}_eigenstrat_coverage.txt\n  parse_snp_cov.py ${strandedness}_eigenstrat_coverage.txt\n  \"\"\"\n}\n\nprocess genotyping_angsd {\n  label 'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/genotyping\", mode: params.publish_dir_mode\n\n  when:\n  params.run_genotyping && params.genotyping_tool == 'angsd'\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_damagemanipulation_for_genotyping_angsd\n  file fasta from ch_fasta_for_genotyping_angsd.collect()\n  file fai from ch_fai_for_angsd.collect()\n  file dict from ch_dict_for_angsd.collect()\n\n  output: \n  path(\"${samplename}*\")\n  \n  script:\n  switch ( \"${params.angsd_glmodel}\" ) {\n    case \"samtools\":\n    angsd_glmodel = \"1\"; break\n    case \"gatk\":\n    angsd_glmodel = \"2\"; break\n    case \"soapsnp\":\n    angsd_glmodel = \"3\"; break\n    case \"syk\":\n    angsd_glmodel = \"4\"; break\n  }\n\n  switch ( \"${params.angsd_glformat}\" ) {\n    case \"text\":\n    angsd_glformat = \"4\"; break\n    case \"binary\":\n    angsd_glformat = \"1\"; break\n    case \"beagle\":\n    angsd_glformat = \"2\"; break\n    case \"binary_three\":\n    angsd_glformat = \"3\"; break\n  }\n  \n  def angsd_fasta = !params.angsd_createfasta ? '' : params.angsd_fastamethod == 'random' ? '-doFasta 1 -doCounts 1' : '-doFasta 2 -doCounts 1' \n  def angsd_majorminor = params.angsd_glformat != \"beagle\" ? '' : '-doMajorMinor 1'\n  \"\"\"\n  echo ${bam} > bam.filelist\n  mkdir angsd\n  angsd -bam bam.filelist -nThreads ${task.cpus} -GL ${angsd_glmodel} -doGlF ${angsd_glformat} ${angsd_majorminor} ${angsd_fasta} -out ${samplename}.angsd\n  \"\"\"\n}\n\n////////////////////////////////////\n/* --    GENOTYPING STATS     -- */\n////////////////////////////////////\n\nprocess bcftools_stats {\n  label  'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/bcftools/stats\", mode: params.publish_dir_mode\n\n  when: \n  params.run_bcftools_stats\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(vcf) from ch_ug_for_bcftools_stats.mix(ch_hc_for_bcftools_stats,ch_fb_for_bcftools_stats)\n  file fasta from ch_fasta_for_bcftools_stats.collect()\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.vcf.stats\") into ch_bcftools_stats_for_multiqc\n\n  script:\n  \"\"\"\n  bcftools stats *.vcf.gz -F ${fasta} > ${samplename}.vcf.stats\n  \"\"\"\n}\n\n////////////////////////////////////\n/* --    CONSENSUS CALLING     -- */\n////////////////////////////////////\n\n// Generate a simple consensus-called FASTA file based on genotype VCF\n\nprocess vcf2genome {\n  label  'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/consensus_sequence\", mode: params.publish_dir_mode\n\n  when: \n  params.run_vcf2genome\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(vcf) from ch_ug_for_vcf2genome\n  file fasta from ch_fasta_for_vcf2genome.collect()\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.fasta.gz\")\n\n  script:\n  def out = !params.vcf2genome_outfile ? \"${samplename}.fasta\" : \"${samplename}_${params.vcf2genome_outfile}.fasta\"\n  def fasta_head = !params.vcf2genome_header ? \"${samplename}\" : \"${params.vcf2genome_header}\"\n  \"\"\"\n  pigz -d -f -p ${task.cpus} ${vcf}\n  vcf2genome -Xmx${task.memory.toGiga()}g -draft ${out} -draftname \"${fasta_head}\" -in ${vcf.baseName} -minc ${params.vcf2genome_minc} -minfreq ${params.vcf2genome_minfreq} -minq ${params.vcf2genome_minq} -ref ${fasta} -refMod ${out}_refmod.fasta -uncertain ${out}_uncertainty.fasta\n  pigz -f -p ${task.cpus} ${out}*\n  bgzip -@ ${task.cpus} *.vcf\n  \"\"\"\n}\n\n// More complex consensus caller with additional filtering functionality (e.g. for heterozygous calls) to generate SNP tables and other things sometimes used in aDNA bacteria studies\n\n// Create input channel for MultiVCFAnalyzer, possibly mixing with pre-made VCFs.\nif (!params.additional_vcf_files) {\n    ch_vcfs_for_multivcfanalyzer = ch_ug_for_multivcfanalyzer.map{ it[-1] }.collect()\n} else {\n    ch_vcfs_for_multivcfanalyzer = ch_ug_for_multivcfanalyzer.map{ it[-1] }.mix(ch_extravcfs_for_multivcfanalyzer).collect()\n}\n\nprocess multivcfanalyzer {\n  label  'mc_small'\n  publishDir \"${params.outdir}/multivcfanalyzer\", mode: params.publish_dir_mode\n\n  when:\n  params.genotyping_tool == 'ug' && params.run_multivcfanalyzer && params.gatk_ploidy.toString() == '2'\n\n  input:\n  file vcf from ch_vcfs_for_multivcfanalyzer\n  file fasta from ch_fasta_for_multivcfanalyzer\n\n  output:\n  file('fullAlignment.fasta.gz')\n  file('info.txt.gz')\n  file('snpAlignment.fasta.gz')\n  file('snpAlignmentIncludingRefGenome.fasta.gz')\n  file('snpStatistics.tsv.gz')\n  file('snpTable.tsv.gz')\n  file('snpTableForSnpEff.tsv.gz')\n  file('snpTableWithUncertaintyCalls.tsv.gz')\n  file('structureGenotypes.tsv.gz')\n  file('structureGenotypes_noMissingData-Columns.tsv.gz')\n  file('MultiVCFAnalyzer.json') optional true into ch_multivcfanalyzer_for_multiqc\n\n  script:\n  def write_freqs = params.write_allele_frequencies ? \"T\" : \"F\"\n  \"\"\"\n  pigz -d -f -p ${task.cpus} ${vcf}\n  multivcfanalyzer -Xmx${task.memory.toGiga()}g ${params.snp_eff_results} ${fasta} ${params.reference_gff_annotations} . ${write_freqs} ${params.min_genotype_quality} ${params.min_base_coverage} ${params.min_allele_freq_hom} ${params.min_allele_freq_het} ${params.reference_gff_exclude} *.vcf\n  pigz -p ${task.cpus} *.tsv *.txt snpAlignment.fasta snpAlignmentIncludingRefGenome.fasta fullAlignment.fasta\n  bgzip -@ ${task.cpus} *.vcf\n  \"\"\"\n }\n\n////////////////////////////////////////////////////////////\n/* --    HUMAN DNA SPECIFIC ADDITIONAL INFORMATION     -- */\n////////////////////////////////////////////////////////////\n\n// Mitochondrial to nuclear ratio helps to evaluate quality of tissue sampled\n\n process mtnucratio {\n  label 'sc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/mtnucratio\", mode: params.publish_dir_mode\n\n  when: \n  params.run_mtnucratio\n\n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_formtnucratio\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.mtnucratio\")\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*.json\") into ch_mtnucratio_for_multiqc\n\n  script:\n  \"\"\"\n  mtnucratio -Xmx${task.memory.toGiga()}g ${bam} \"${params.mtnucratio_header}\"\n  \"\"\"\n }\n\n// Human biological sex estimation\n\n// rename to prevent single/double stranded library sample name-based file conflict\nprocess sexdeterrmine_prep {\n  label 'sc_small'\n  \n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_for_sexdeterrmine_prep\n  \n  output:\n  file \"*_{single,double}strand.bam\" into ch_prepped_for_sexdeterrmine\n\n  when:\n  params.run_sexdeterrmine\n\n  script:\n  \"\"\"\n  mv ${bam} ${bam.baseName}_${strandedness}strand.bam\n  \"\"\"\n\n}\n\n// As we collect all files for a single sex_deterrmine run, we DO NOT use the normal input/output tuple\nprocess sexdeterrmine {\n    label 'mc_small'\n    publishDir \"${params.outdir}/sex_determination\", mode: params.publish_dir_mode\n\n    input:\n    path bam from ch_prepped_for_sexdeterrmine.collect()\n    path(bed) from ch_bed_for_sexdeterrmine\n\n    output:\n    file \"SexDet.txt\"\n    file \"*.json\" into ch_sexdet_for_multiqc\n\n    when:\n    params.run_sexdeterrmine\n    \n    script:\n    def filter = bed.getName() != 'nf-core_eager_dummy.txt' ? \"-b $bed\" : ''\n    \"\"\"\n    ls *.bam >> bamlist.txt\n    samtools depth -aa -q30 -Q30 $filter -f bamlist.txt | sexdeterrmine -f bamlist.txt > SexDet.txt\n    \"\"\"\n}\n\n// Human DNA nuclear contamination estimation\n\n process nuclear_contamination{\n    label 'sc_small'\n    tag \"${samplename}\"\n    publishDir \"${params.outdir}/nuclear_contamination\", mode: params.publish_dir_mode\n\n    when:\n    params.run_nuclear_contamination\n\n    input:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(input), path(bai) from ch_for_nuclear_contamination\n\n    output:\n    tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path('*.X.contamination.out') into ch_from_nuclear_contamination\n\n    script:\n    \"\"\"\n    samtools index ${input}\n    angsd -i ${input} -r ${params.contamination_chrom_name}:5000000-154900000 -doCounts 1 -iCounts 1 -minMapQ 30 -minQ 30 -out ${libraryid}.doCounts\n    contamination -a ${libraryid}.doCounts.icnts.gz -h ${projectDir}/assets/angsd_resources/HapMapChrX.gz 2> ${libraryid}.X.contamination.out\n    \"\"\"\n }\n \n// As we collect all files for a single print_nuclear_contamination run, we DO NOT use the normal input/output tuple\nprocess print_nuclear_contamination{\n    label 'sc_tiny'\n    publishDir \"${params.outdir}/nuclear_contamination\", mode: params.publish_dir_mode\n\n    when:\n    params.run_nuclear_contamination\n\n    input:\n    path Contam from ch_from_nuclear_contamination.map { it[7] }.collect()\n\n    output:\n    file 'nuclear_contamination.txt'\n    file 'nuclear_contamination_mqc.json' into ch_nuclear_contamination_for_multiqc\n\n    script:\n    \"\"\"\n    print_x_contamination.py ${Contam.join(' ')}\n    \"\"\"\n }\n\n/////////////////////////////////////////////////////////\n/* --    METAGENOMICS-SPECIFIC ADDITIONAL STEPS     -- */\n/////////////////////////////////////////////////////////\n\n// Low entropy read filter to reduce input sequences of reads that are highly uninformative, and thus reduce runtime/false positives\n\nprocess metagenomic_complexity_filter {\n  label 'mc_small'\n  tag \"${samplename}\"\n  publishDir \"${params.outdir}/metagenomic_complexity_filter/\", mode: params.publish_dir_mode\n\n  when:\n  params.metagenomic_complexity_filter\n  \n  input:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(fastq) from ch_bam_filtering_for_metagenomic\n\n\n  output:\n  tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(\"*_lowcomplexityremoved.fq.gz\") into ch_lowcomplexityfiltered_for_metagenomic\n  path(\"*_bbduk.stats\") into ch_metagenomic_complexity_filter_for_multiqc\n\n  script:\n  \"\"\"\n  bbduk.sh -Xmx${task.memory.toGiga()}g in=${fastq} threads=${task.cpus} entropymask=f entropy=${params.metagenomic_complexity_entropy} out=${fastq}_lowcomplexityremoved.fq.gz 2> ${fastq}_bbduk.stats\n  \"\"\"\n\n}\n\n// metagenomic complexity filter bypass\n\nif ( params.metagenomic_complexity_filter ) {\n  ch_lowcomplexityfiltered_for_metagenomic\n    .set{ ch_filtered_for_metagenomic }\n} else {\n  ch_metagenomic_for_skipentropyfilter\n    .set{ ch_filtered_for_metagenomic }\n}\n\n// MALT is a super-fast BLAST replacement typically used for pathogen detection or microbiome profiling against large databases, here using off-target reads from mapping\n\n// As we collect all files for a all metagenomic runs, we DO NOT use the normal input/output tuple!\nif (params.metagenomic_tool == 'malt') {\n  ch_filtered_for_metagenomic\n    .set {ch_input_for_metagenomic_malt}\n\n  ch_input_for_metagenomic_kraken = Channel.empty()\n} else if (params.metagenomic_tool == 'kraken') {\n  ch_filtered_for_metagenomic\n    .set {ch_input_for_metagenomic_kraken}\n\n  ch_input_for_metagenomic_malt = Channel.empty()\n} else if ( !params.metagenomic_tool ) {\n  ch_input_for_metagenomic_malt = Channel.empty()\n  ch_input_for_metagenomic_kraken = Channel.empty()\n\n}\n\n// As we collect all files for a single MALT run, we DO NOT use the normal input/output tuple\nprocess malt {\n  label 'mc_small'\n  publishDir \"${params.outdir}/metagenomic_classification/malt\", mode: params.publish_dir_mode\n\n  when:\n  params.run_metagenomic_screening && params.run_bam_filtering && params.bam_unmapped_type == 'fastq' && params.metagenomic_tool == 'malt'\n\n  input:\n  file fastqs from ch_input_for_metagenomic_malt.map { it[7] }.collect()\n  file db from ch_db_for_malt\n\n  output:\n  path(\"*.rma6\") into ch_rma_for_maltExtract\n  path(\"*.sam.gz\") optional true\n  path(\"malt.log\") into ch_malt_for_multiqc\n\n  script:\n  if ( \"${params.malt_min_support_mode}\" == \"percent\" ) {\n    min_supp = \"-supp ${params.malt_min_support_percent}\" \n  } else if ( \"${params.malt_min_support_mode}\" == \"reads\" ) {\n    min_supp = \"-sup ${params.metagenomic_min_support_reads}\"\n  }\n  def sam_out = params.malt_sam_output ? \"-a . -f SAM\" : \"\"\n  \"\"\"\n  malt-run \\\n  -J-Xmx${task.memory.toGiga()}g \\\n  -t ${task.cpus} \\\n  -v \\\n  -o . \\\n  -d ${db} \\\n  ${sam_out} \\\n  -id ${params.percent_identity} \\\n  -m ${params.malt_mode} \\\n  -at ${params.malt_alignment_mode} \\\n  -top ${params.malt_top_percent} \\\n  ${min_supp} \\\n  -mq ${params.malt_max_queries} \\\n  --memoryMode ${params.malt_memory_mode} \\\n  -i ${fastqs.join(' ')} |&tee malt.log\n  \"\"\"\n}\n\n// MaltExtract performs aDNA evaluation from the output of MALT (damage patterns, read lengths etc.)\n\n// As we collect all files for a single MALT extract run, we DO NOT use the normal input/output tuple\nprocess maltextract {\n  label 'mc_medium'\n  publishDir \"${params.outdir}/maltextract/\", mode: params.publish_dir_mode\n\n  when: \n  params.run_maltextract && params.metagenomic_tool == 'malt'\n\n  input:\n  file rma6 from ch_rma_for_maltExtract.collect()\n  file taxon_list from ch_taxonlist_for_maltextract\n  file ncbifiles from ch_ncbifiles_for_maltextract\n  \n  output:\n  path \"results/\" type('dir')\n  file \"results/*_Wevid.json\" optional true into ch_hops_for_multiqc \n\n  script:\n  def destack = params.maltextract_destackingoff ? \"--destackingOff\" : \"\"\n  def downsam = params.maltextract_downsamplingoff ? \"--downSampOff\" : \"\"\n  def dupremo = params.maltextract_duplicateremovaloff ? \"--dupRemOff\" : \"\"\n  def matches = params.maltextract_matches ? \"--matches\" : \"\"\n  def megsum = params.maltextract_megansummary ? \"--meganSummary\" : \"\"\n  def topaln = params.maltextract_topalignment ?  \"--useTopAlignment\" : \"\"\n  def ss = params.single_stranded ? \"--singleStranded\" : \"\"\n  \"\"\"\n  MaltExtract \\\n  -Xmx${task.memory.toGiga()}g \\\n  -t ${taxon_list} \\\n  -i ${rma6.join(' ')} \\\n  -o results/ \\\n  -r ${ncbifiles} \\\n  -p ${task.cpus} \\\n  -f ${params.maltextract_filter} \\\n  -a ${params.maltextract_toppercent} \\\n  --minPI ${params.maltextract_percentidentity} \\\n  ${destack} \\\n  ${downsam} \\\n  ${dupremo} \\\n  ${matches} \\\n  ${megsum} \\\n  ${topaln} \\\n  ${ss}\n\n  postprocessing.AMPS.r -r results/ -m ${params.maltextract_filter} -t ${task.cpus} -n ${taxon_list} -j\n  \"\"\"\n}\n\n// Kraken is offered as a replacement for MALT as MALT is _very_ resource hungry\n\nif (params.run_metagenomic_screening && params.database.endsWith(\".tar.gz\") && params.metagenomic_tool == 'kraken'){\n  comp_kraken = file(params.database)\n\n  process decomp_kraken {\n    input:\n    path(ckdb) from comp_kraken\n    \n    output:\n    path(dbname) into ch_krakendb\n    \n    script:\n    dbname = ckdb.toString() - '.tar.gz'\n    \"\"\"\n    tar xvzf $ckdb\n    mkdir -p $dbname\n    mv *.k2d $dbname || echo \"nothing to do\"\n    \"\"\"\n  }\n\n} else if (params.database && ! params.database.endsWith(\".tar.gz\") && params.run_metagenomic_screening && params.metagenomic_tool == 'kraken') {\n    ch_krakendb = Channel.fromPath(params.database).first()\n} else {\n    ch_krakendb = Channel.empty()\n}\n\nprocess kraken {\n  tag \"$prefix\"\n  label 'mc_huge'\n  publishDir \"${params.outdir}/metagenomic_classification/kraken\", mode: params.publish_dir_mode\n\n  when:\n  params.run_metagenomic_screening && params.run_bam_filtering && params.bam_unmapped_type == 'fastq' && params.metagenomic_tool == 'kraken'\n\n  input:\n  path(fastq) from ch_input_for_metagenomic_kraken.map { it[7] }\n  path(krakendb) from ch_krakendb\n\n  output:\n  file \"*.kraken.out\" optional true into ch_kraken_out\n  tuple prefix, path(\"*.kraken2_report\") optional true into ch_kraken_report, ch_kraken_for_multiqc\n\n  script:\n  prefix = fastq.baseName\n  out = prefix+\".kraken.out\"\n  kreport = prefix+\".kraken2_report\"\n  kreport_old = prefix+\".kreport\"\n\n  \"\"\"\n  kraken2 --db ${krakendb} --threads ${task.cpus} --output $out --report-minimizer-data --report $kreport $fastq\n  cut -f1-3,6-8 $kreport > $kreport_old\n  \"\"\"\n}\n\nprocess kraken_parse {\n  tag \"$name\"\n  errorStrategy 'ignore'\n\n  input:\n  tuple val(name), path(kraken_r) from ch_kraken_report\n\n  output:\n  path('*_kraken_parsed.csv') into ch_kraken_parsed\n\n  script:\n  read_out = name+\".read_kraken_parsed.csv\"\n  kmer_out =  name+\".kmer_kraken_parsed.csv\"\n  \"\"\"\n  kraken_parse.py -c ${params.metagenomic_min_support_reads} -or $read_out -ok $kmer_out $kraken_r\n  \"\"\"    \n}\n\nprocess kraken_merge {\n  publishDir \"${params.outdir}/metagenomic_classification/kraken\", mode: params.publish_dir_mode\n\n  input:\n  file csv_count from ch_kraken_parsed.collect()\n\n  output:\n  path('*.csv')\n\n  script:\n  read_out = \"kraken_read_count.csv\"\n  kmer_out = \"kraken_kmer_duplication.csv\"\n  \"\"\"\n  merge_kraken_res.py -or $read_out -ok $kmer_out\n  \"\"\"    \n}\n\n//////////////////////////////////////\n/* --    PIPELINE COMPLETION     -- */\n//////////////////////////////////////\n\n// Pipeline documentation for on-server guidance\n\nprocess output_documentation {\n    label 'sc_tiny'\n    publishDir \"${params.outdir}/documentation\", mode: params.publish_dir_mode\n\n    input:\n    file output_docs from ch_output_docs\n    file images from ch_output_docs_images\n\n    output:\n    file \"results_description.html\"\n\n    script:\n    \"\"\"\n    markdown_to_html.py $output_docs -o results_description.html\n    \"\"\"\n}\n\n/*\n * Parse software version numbers\n */\n\nprocess get_software_versions {\n  label 'mc_small'\n    publishDir \"${params.outdir}/pipeline_info\", mode: params.publish_dir_mode,\n        saveAs: { filename ->\n                      if (filename.indexOf(\".csv\") > 0) filename\n                      else null\n                }\n\n    output:\n    file 'software_versions_mqc.yaml' into software_versions_yaml\n    file \"software_versions.csv\"\n\n    script:\n    \"\"\"\n    echo $workflow.manifest.version &> v_pipeline.txt\n    echo $workflow.nextflow.version &> v_nextflow.txt\n    \n    fastqc -t ${task.cpus} --version &> v_fastqc.txt 2>&1 || true\n    AdapterRemoval --version  &> v_adapterremoval.txt 2>&1 || true\n    fastp --version &> v_fastp.txt 2>&1 || true\n    bwa &> v_bwa.txt 2>&1 || true\n    circulargenerator -Xmx${task.memory.toGiga()}g --help | head -n 1 &> v_circulargenerator.txt 2>&1 || true\n    samtools --version &> v_samtools.txt 2>&1 || true\n    dedup -Xmx${task.memory.toGiga()}g -v &> v_dedup.txt 2>&1 || true\n    ## bioconda recipe of picard is incorrectly set up and extra warning made with stderr, this ugly command ensures only version exported\n    ( exec 7>&1; picard -Xmx${task.memory.toMega()}M MarkDuplicates --version 2>&1 >&7 | grep -v '/' >&2 ) 2> v_markduplicates.txt || true\n    qualimap --version --java-mem-size=${task.memory.toGiga()}G &> v_qualimap.txt 2>&1 || true\n    preseq &> v_preseq.txt 2>&1 || true\n    gatk --java-options \"-Xmx${task.memory.toGiga()}G\" --version 2>&1 | grep '(GATK)' > v_gatk.txt 2>&1 || true\n    gatk3 -Xmx${task.memory.toGiga()}g  --version 2>&1 | head -n 1 > v_gatk3.txt 2>&1 || true\n    freebayes --version &> v_freebayes.txt 2>&1 || true\n    bedtools --version &> v_bedtools.txt 2>&1 || true\n    damageprofiler -Xmx${task.memory.toGiga()}g --version &> v_damageprofiler.txt 2>&1 || true\n    bam --version &> v_bamutil.txt 2>&1 || true\n    pmdtools --version &> v_pmdtools.txt 2>&1 || true\n    angsd -h |& head -n 1 | cut -d ' ' -f3-4 &> v_angsd.txt 2>&1 || true \n    multivcfanalyzer -Xmx${task.memory.toGiga()}g --help | head -n 1 &> v_multivcfanalyzer.txt 2>&1 || true\n    malt-run -J-Xmx${task.memory.toGiga()}g --help |& tail -n 3 | head -n 1 | cut -f 2 -d'(' | cut -f 1 -d ',' &> v_malt.txt 2>&1 || true\n    MaltExtract -Xmx${task.memory.toGiga()}g --help | head -n 2 | tail -n 1 &> v_maltextract.txt 2>&1 || true\n    multiqc --version &> v_multiqc.txt 2>&1 || true\n    vcf2genome -Xmx${task.memory.toGiga()}g -h |& head -n 1 &> v_vcf2genome.txt || true\n    mtnucratio -Xmx${task.memory.toGiga()}g --help &> v_mtnucratiocalculator.txt || true\n    sexdeterrmine --version &> v_sexdeterrmine.txt || true\n    kraken2 --version | head -n 1 &> v_kraken.txt || true\n    endorS.py --version &> v_endorSpy.txt || true\n    pileupCaller --version &> v_sequencetools.txt 2>&1 || true\n    bowtie2 --version | grep -a 'bowtie2-.* -fdebug' > v_bowtie2.txt || true\n    eigenstrat_snp_coverage --version | cut -d ' ' -f2 >v_eigenstrat_snp_coverage.txt || true\n    mapDamage --version > v_mapdamage.txt || true\n    bbversion.sh > v_bbduk.txt || true\n    bcftools --version | grep 'bcftools' | cut -d ' ' -f 2 > v_bcftools.txt || true\n    scrape_software_versions.py &> software_versions_mqc.yaml\n    \"\"\"\n}\n\n// MultiQC file generation for pipeline report\n//def workflow_summary = NfcoreSchema.params_summary_multiqc(workflow, summary_params)\n\n//ch_workflow_summary = Channel.value(workflow_summary)\n\nprocess multiqc {\n    label 'sc_medium'\n\n    publishDir \"${params.outdir}/multiqc\", mode: params.publish_dir_mode\n\n    input:\n    file multiqc_config from ch_multiqc_config\n    file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([])\n    file software_versions_mqc from software_versions_yaml.collect().ifEmpty([])\n    file logo from ch_eager_logo\n    file ('fastqc_raw/*') from ch_prefastqc_for_multiqc.collect().ifEmpty([])\n    path('fastqc/*') from ch_fastqc_after_clipping.collect().ifEmpty([])\n    file ('adapter_removal/*') from ch_adapterremoval_logs.collect().ifEmpty([])\n    file ('mapping/bt2/*') from ch_bt2_for_multiqc.collect().ifEmpty([])\n    file ('flagstat/*') from ch_flagstat_for_multiqc.collect().ifEmpty([])\n    file ('flagstat_filtered/*') from ch_bam_filtered_flagstat_for_multiqc.collect().ifEmpty([])\n    file ('preseq/*') from ch_preseq_for_multiqc.collect().ifEmpty([])\n    file ('damageprofiler/dmgprof*/*') from ch_damageprofiler_results.collect().ifEmpty([])\n    file ('mapdamage/*') from ch_mapdamage_for_multiqc.collect().ifEmpty([])\n    file ('qualimap/qualimap*/*') from ch_qualimap_results.collect().ifEmpty([])\n    file ('markdup/*') from ch_markdup_results_for_multiqc.collect().ifEmpty([])\n    file ('dedup*/*') from ch_dedup_results_for_multiqc.collect().ifEmpty([])\n    file ('fastp/*') from ch_fastp_for_multiqc.collect().ifEmpty([])\n    file ('sexdeterrmine/*') from ch_sexdet_for_multiqc.collect().ifEmpty([])\n    file ('mutnucratio/*') from ch_mtnucratio_for_multiqc.collect().ifEmpty([])\n    file ('endorspy/*') from ch_endorspy_for_multiqc.collect().ifEmpty([])\n    file ('multivcfanalyzer/*') from ch_multivcfanalyzer_for_multiqc.collect().ifEmpty([])\n    file ('fastp_lowcomplexityfilter/*') from ch_metagenomic_complexity_filter_for_multiqc.collect().ifEmpty([])\n    file ('malt/*') from ch_malt_for_multiqc.collect().ifEmpty([])\n    file ('kraken/*') from ch_kraken_for_multiqc.collect().ifEmpty([])\n    file ('hops/*') from ch_hops_for_multiqc.collect().ifEmpty([])\n    file ('nuclear_contamination/*') from ch_nuclear_contamination_for_multiqc.collect().ifEmpty([])\n    file ('genotyping/*') from ch_eigenstrat_snp_cov_for_multiqc.collect().ifEmpty([])\n    file ('bcftools_stats') from ch_bcftools_stats_for_multiqc.collect().ifEmpty([])\n    file workflow_summary from ch_workflow_summary.collectFile(name: \"workflow_summary_mqc.yaml\")\n\n    output:\n    file \"*multiqc_report.html\" into ch_multiqc_report\n    file \"*_data\"\n\n    script:\n    rtitle = ''\n    rfilename = ''\n    if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) {\n        rtitle = \"--title \\\"${workflow.runName}\\\"\"\n        rfilename = \"--filename \" + workflow.runName.replaceAll('\\\\W','_').replaceAll('_+','_') + \"_multiqc_report\"\n    }\n    \n    def custom_config_file = params.multiqc_config ? \"--config $mqc_custom_config\" : ''\n    \"\"\"\n    multiqc -f $rtitle $rfilename $multiqc_config $custom_config_file .\n    \"\"\"\n}\n\n// Send completion emails if requested, so user knows data is ready\n\nworkflow.onComplete {\n\n    // Set up the e-mail variables\n    def subject = \"[nf-core/eager] Successful: $workflow.runName\"\n    if (!workflow.success) {\n        subject = \"[nf-core/eager] FAILED: $workflow.runName\"\n    }\n    def email_fields = [:]\n    email_fields['version'] = workflow.manifest.version\n    email_fields['runName'] = workflow.runName\n    email_fields['success'] = workflow.success\n    email_fields['dateComplete'] = workflow.complete\n    email_fields['duration'] = workflow.duration\n    email_fields['exitStatus'] = workflow.exitStatus\n    email_fields['errorMessage'] = (workflow.errorMessage ?: 'None')\n    email_fields['errorReport'] = (workflow.errorReport ?: 'None')\n    email_fields['commandLine'] = workflow.commandLine\n    email_fields['projectDir'] = workflow.projectDir\n    email_fields['summary'] = summary\n    email_fields['summary']['Date Started'] = workflow.start\n    email_fields['summary']['Date Completed'] = workflow.complete\n    email_fields['summary']['Pipeline script file path'] = workflow.scriptFile\n    email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId\n    if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository\n    if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId\n    if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision\n    email_fields['summary']['Nextflow Version'] = workflow.nextflow.version\n    email_fields['summary']['Nextflow Build'] = workflow.nextflow.build\n    email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp\n\n    // On success try attach the multiqc report\n    def mqc_report = null\n    try {\n        if (workflow.success) {\n            mqc_report = ch_multiqc_report.getVal()\n            if (mqc_report.getClass() == ArrayList) {\n                log.warn \"[nf-core/eager] Found multiple reports from process 'multiqc', will use only one\"\n                mqc_report = mqc_report[0]\n            }\n        }\n    } catch (all) {\n        log.warn \"[nf-core/eager] Could not attach MultiQC report to summary email\"\n    }\n\n    // Check if we are only sending emails on failure\n    email_address = params.email\n    if (!params.email && params.email_on_fail && !workflow.success) {\n        email_address = params.email_on_fail\n    }\n\n    // Render the TXT template\n    def engine = new groovy.text.GStringTemplateEngine()\n    def tf = new File(\"$projectDir/assets/email_template.txt\")\n    def txt_template = engine.createTemplate(tf).make(email_fields)\n    def email_txt = txt_template.toString()\n\n    // Render the HTML template\n    def hf = new File(\"$projectDir/assets/email_template.html\")\n    def html_template = engine.createTemplate(hf).make(email_fields)\n    def email_html = html_template.toString()\n\n    // Render the sendmail template\n    def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: \"$projectDir\", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ]\n    def sf = new File(\"$projectDir/assets/sendmail_template.txt\")\n    def sendmail_template = engine.createTemplate(sf).make(smail_fields)\n    def sendmail_html = sendmail_template.toString()\n\n    // Send the HTML e-mail\n    if (email_address) {\n        try {\n            if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') }\n            // Try to send HTML e-mail using sendmail\n            [ 'sendmail', '-t' ].execute() << sendmail_html\n            log.info \"[nf-core/eager] Sent summary e-mail to $email_address (sendmail)\"\n        } catch (all) {\n            // Catch failures and try with plaintext\n            def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ]\n            if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) {\n              mail_cmd += [ '-A', mqc_report ]\n            }\n            mail_cmd.execute() << email_html\n            log.info \"[nf-core/eager] Sent summary e-mail to $email_address (mail)\"\n        }\n    }\n\n    // Write summary e-mail HTML to a file\n    def output_d = new File(\"${params.outdir}/pipeline_info/\")\n    if (!output_d.exists()) {\n        output_d.mkdirs()\n    }\n    def output_hf = new File(output_d, \"pipeline_report.html\")\n    output_hf.withWriter { w -> w << email_html }\n    def output_tf = new File(output_d, \"pipeline_report.txt\")\n    output_tf.withWriter { w -> w << email_txt }\n\n    c_green = params.monochrome_logs ? '' : \"\\033[0;32m\";\n    c_purple = params.monochrome_logs ? '' : \"\\033[0;35m\";\n    c_red = params.monochrome_logs ? '' : \"\\033[0;31m\";\n    c_reset = params.monochrome_logs ? '' : \"\\033[0m\";\n\n    if (workflow.stats.ignoredCount > 0 && workflow.success) {\n        log.info \"-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-\"\n        log.info \"-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-\"\n        log.info \"-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-\"\n    }\n\n    if (workflow.success) {\n        log.info \"-${c_purple}[nf-core/eager]${c_green} Pipeline completed successfully${c_reset}-\"\n        log.info \"-${c_purple}[nf-core/eager]${c_green} MultiQC run report can be found in ${params.outdir}/multiqc ${c_reset}-\"\n        log.info \"-${c_purple}[nf-core/eager]${c_green} Further output documentation can be seen at https://nf-core/eager/output ${c_reset}-\"\n    } else {\n        checkHostname()\n        log.info \"-${c_purple}[nf-core/eager]${c_red} Pipeline completed with errors${c_reset}-\"\n    }\n\n}\n\nworkflow.onError {\n    // Print unexpected parameters - easiest is to just rerun validation\n    NfcoreSchema.validateParameters(params, json_schema, log)\n}\n\n\n/////////////////////////////////////\n/* --    AUXILARY FUNCTIONS     -- */\n/////////////////////////////////////\n\n// Channelling the TSV file containing FASTQ or BAM \ndef extract_data(tsvFile) {\n    Channel.fromPath(tsvFile)\n        .splitCsv(header: true, sep: '\\t')\n        .map { row ->\n\n            def expected_keys = ['Sample_Name', 'Library_ID', 'Lane', 'Colour_Chemistry', 'SeqType', 'Organism', 'Strandedness', 'UDG_Treatment', 'R1', 'R2', 'BAM']\n            if ( !row.keySet().containsAll(expected_keys) ) exit 1, \"[nf-core/eager] error: Invalid TSV input - malformed column names. Please check input TSV. Column names should be: ${expected_keys.join(\", \")}\"\n\n            checkNumberOfItem(row, 11)\n\n            if ( row.Sample_Name == null || row.Sample_Name.isEmpty() ) exit 1, \"[nf-core/eager] error: the Sample_Name column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.Library_ID == null || row.Library_ID.isEmpty() ) exit 1, \"[nf-core/eager] error: the Library_ID column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.Lane == null || row.Lane.isEmpty() ) exit 1, \"[nf-core/eager] error: the Lane column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.Colour_Chemistry == null || row.Colour_Chemistry.isEmpty() ) exit 1, \"[nf-core/eager] error: the Colour_Chemistry column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.SeqType == null || row.SeqType.isEmpty() ) exit 1, \"[nf-core/eager] error: the SeqType column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.Organism == null || row.Organism.isEmpty() ) exit 1, \"[nf-core/eager] error: the Organism column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.Strandedness == null || row.Strandedness.isEmpty() ) exit 1, \"[nf-core/eager] error: the Strandedness column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.UDG_Treatment == null || row.UDG_Treatment.isEmpty() ) exit 1, \"[nf-core/eager] error: the UDG_Treatment column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.R1 == null || row.R1.isEmpty() ) exit 1, \"[nf-core/eager] error: the R1 column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.R2 == null || row.R2.isEmpty() ) exit 1, \"[nf-core/eager] error: the R2 column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n            if ( row.BAM == null || row.BAM.isEmpty() ) exit 1, \"[nf-core/eager] error: the BAM column is empty. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n\n            def samplename = row.Sample_Name\n            def libraryid  = row.Library_ID\n            def lane = row.Lane\n            def colour = row.Colour_Chemistry\n            def seqtype = row.SeqType\n            def organism = row.Organism\n            def strandedness = row.Strandedness\n            def udg = row.UDG_Treatment\n            def r1 = row.R1.matches('NA') ? 'NA' : return_file(row.R1)\n            def r2 = row.R2.matches('NA') ? 'NA' : return_file(row.R2)\n            def bam = row.BAM.matches('NA') ? 'NA' : return_file(row.BAM)\n\n            // check no empty metadata fields\n            if (samplename == '' || libraryid == '' || lane == '' || colour == '' || seqtype == '' || organism == '' || strandedness == '' || udg == '' || r1 == '' || r2 == '' || bam == '') exit 1, \"[nf-core/eager] error: a field/column does not contain any information. Ensure all cells are filled or contain 'NA' for optional fields. Check row:\\n ${row}\"\n\n            // Check no 'empty' rows\n            if (r1.matches('NA') && r2.matches('NA') && bam.matches('NA')) exit 1, \"[nf-core/eager] error: A row in your TSV appears to have all files defined as NA. See '--help' flag and documentation under 'running the pipeline' for more information. Check row for: ${samplename}\"\n\n            // Ensure BAMs aren't submitted with PE\n            if (!bam.matches('NA') && seqtype.matches('PE')) exit 1, \"[nf-core/eager] error: BAM input rows in TSV cannot be set as PE, only SE. See '--help' flag and documentation under 'running the pipeline' for more information. Check row for: ${samplename}\"\n\n            // Check valid UDG treatment\n            if (!udg.matches('none') && !udg.matches('half') && !udg.matches('full')) exit 1, \"[nf-core/eager] error: UDG treatment can only be 'none', 'half' or 'full'. See '--help' flag and documentation under 'running the pipeline' for more information. You have '${udg}'\"\n\n            // Check valid colour chemistry\n            if (!colour.matches('2') && !colour.matches('4')) exit 1, \"[nf-core/eager] error: Colour chemistry in TSV can either be 2 (e.g. NextSeq/NovaSeq) or 4 (e.g. HiSeq/MiSeq)\"\n\n            //  Ensure that we do not accept incompatible chemistry setup\n            if (!seqtype.matches('PE') && !seqtype.matches('SE')) exit 1, \"[nf-core/eager] error:  SeqType for one or more rows in TSV is neither SE nor PE! see '--help' flag and documentation under 'running the pipeline' for more information. You have: '${seqtype}'\"\n            \n           // So we don't accept existing files that are wrong format: e.g. fasta or sam\n            if ( !r1.matches('NA') && !has_extension(r1, \"fastq.gz\") && !has_extension(r1, \"fq.gz\") && !has_extension(r1, \"fastq\") && !has_extension(r1, \"fq\")) exit 1, \"[nf-core/eager] error: A specified R1 file either has a non-recognizable FASTQ extension or is not NA. See '--help' flag and documentation under 'running the pipeline' for more information. Check: ${r1}\"\n            if ( !r2.matches('NA') && !has_extension(r2, \"fastq.gz\") && !has_extension(r2, \"fq.gz\") && !has_extension(r2, \"fastq\") && !has_extension(r2, \"fq\")) exit 1, \"[nf-core/eager] error: A specified R2 file either has a non-recognizable FASTQ extension or is not NA. See '--help' flag and documentation under 'running the pipeline' for more information. Check: ${r2}\"\n            if ( !bam.matches('NA') && !has_extension(bam, \"bam\")) exit 1, \"[nf-core/eager] error: A specified R1 file either has a non-recognizable BAM extension or is not NA. See '--help' flag and documentation under 'running the pipeline' for more information. Check: ${bam}\"\n            \n            [ samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, r1, r2, bam ]\n\n        }\n\n    }\n\n// Check if a row has the expected number of item\ndef checkNumberOfItem(row, number) {\n    if (row.size() != number) exit 1, \"[nf-core/eager] error:  Invalid TSV input - malformed row (e.g. missing column) in ${row}, see '--help' flag and documentation under 'running the pipeline' for more information\"\n    return true\n}\n\n// Return file if it exists\ndef return_file(it) {\n    if (!file(it).exists()) exit 1, \"[nf-core/eager] error: Cannot find supplied FASTQ or BAM input file. If using input method TSV set to NA if no file required. See '--help' flag and documentation under 'running the pipeline' for more information. Check file: ${it}\" \n    return file(it)\n}\n\n// Check file extension\ndef has_extension(it, extension) {\n    it.toString().toLowerCase().endsWith(extension.toLowerCase())\n}\n\n// Extract FastQs from Path\n// Create a channel of FASTQs from a directory pattern: \"my_samples/*/\"\n// All FASTQ files in subdirectories are collected and emitted;\n// they must have _R1_ and/or _R2_ in their names.\ndef retrieve_input_paths(input, colour_chem, pe_se, ds_ss, udg_treat, bam_in) {\n\n  if ( !bam_in ) {\n        if( pe_se ) {\n            log.info \"Generating single-end FASTQ data TSV\"\n            Channel\n                .fromFilePairs( input, size: 1 )\n                .filter { it =~/.*.fastq.gz|.*.fq.gz|.*.fastq|.*.fq/ }\n                .ifEmpty { exit 1, \"[nf-core/eager] error:  Your specified FASTQ read files did not end in: '.fastq.gz', '.fq.gz', '.fastq', or '.fq'. Did you forget --bam?\" }\n                .map { row -> [ row[0], [ row[1][0] ] ] }\n                .ifEmpty { exit 1, \"[nf-core/eager] error:  --input was empty - no input files supplied!\" }\n                .into { ch_reads_for_faketsv; ch_reads_for_validate }\n\n                // Check we don't have any duplicated sample names due to fromFilePairs behaviour of calculating sample name from anything before R1/R2 glob\n                ch_reads_for_validate\n                  .groupTuple()\n                  .map{\n                    if ( validate_size(it[1], 1) ) { null } else { exit 1, \"[nf-core/eager] error: You have supplied non-unique sample names (text before R1/R2 indication). Did you accidentally supply paired-end data?  see '--help' flag and documentation under 'running the pipeline' for more information. Check duplicates of: ${it[0]}\" } \n                  }\n\n        } else if (!pe_se ){\n            log.info \"Generating paired-end FASTQ data TSV\"\n\n            Channel\n                .fromFilePairs( input )\n                .filter { it =~/.*.fastq.gz|.*.fq.gz|.*.fastq|.*.fq/ }\n                .ifEmpty { exit 1, \"[nf-core/eager] error: Files could not be found. Do the specified FASTQ read files end in: '.fastq.gz', '.fq.gz', '.fastq', or '.fq'? Did you forget --single_end?\" }\n                .map { row -> [ row[0], [ row[1][0], row[1][1] ] ] }\n                .ifEmpty { exit 1, \"[nf-core/eager] error: --input was empty - no input files supplied!\" }\n                .into { ch_reads_for_faketsv; ch_reads_for_validate }\n\n                // Check we don't have any duplicated sample names due to fromFilePairs behaviour of calculating sample name from anything before R1/R2 glob\n                ch_reads_for_validate\n                  .groupTuple()\n                  .map{\n                    if ( validate_size(it[1], 1) ) { null } else { exit 1, \"[nf-core/eager] error: You have supplied non-unique sample names (text before R1/R2 indication). See '--help' flag and documentation under 'running the pipeline' for more information. Check duplicates of: ${it[0]}\" } \n                  }\n\n        } \n\n    } else if ( bam_in ) {\n              log.info \"Generating BAM data TSV\"\n\n         Channel\n            .fromFilePairs( input, size: 1 )\n            .filter { it =~/.*.bam/ }\n            .map { row -> [ row[0], [ row[1][0] ] ] }\n            .ifEmpty { exit 1, \"[nf-core/eager] error: Cannot find any bam file matching: ${input}\" }\n            .set { ch_reads_for_faketsv }\n\n    }\n\nch_reads_for_faketsv\n  .map{\n\n      def samplename = it[0]\n      def libraryid  = it[0]\n      def lane = 0\n      def colour = \"${colour_chem}\"\n      def seqtype = pe_se ? 'SE' : 'PE'\n      def organism = 'NA'\n      def strandedness = ds_ss ? 'single' : 'double'\n      def udg = udg_treat\n      def r1 = !bam_in ? return_file(it[1][0]) : 'NA'\n      def r2 = !bam_in && !pe_se ? return_file(it[1][1]) : 'NA'\n      def bam = bam_in && pe_se ? return_file(it[1][0]) : 'NA'\n\n      [ samplename, libraryid, lane, colour, seqtype, organism, strandedness, udg, r1, r2, bam ]\n  }\n  .ifEmpty {exit 1, \"[nf-core/eager] error: Invalid file paths with --input\"}\n\n}\n\n// Function to check length of collection in a channel closure is as expected (e.g. with .map())\ndef validate_size(collection, size){\n    if ( collection.size() != size ) { return false } else { return true }\n}\n\ndef checkHostname() {\n    def c_reset = params.monochrome_logs ? '' : \"\\033[0m\"\n    def c_white = params.monochrome_logs ? '' : \"\\033[0;37m\"\n    def c_red = params.monochrome_logs ? '' : \"\\033[1;91m\"\n    def c_yellow_bold = params.monochrome_logs ? '' : \"\\033[1;93m\"\n    if (params.hostnames) {\n        def hostname = 'hostname'.execute().text.trim()\n        params.hostnames.each { prof, hnames ->\n            hnames.each { hname ->\n                if (hostname.contains(hname) && !workflow.profile.contains(prof)) {\n                    log.error \"${c_red}====================================================${c_reset}\\n\" +\n                            \"  ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\\n\" +\n                            \"  but your machine hostname is ${c_white}'$hostname'${c_reset}\\n\" +\n                            \"  ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\\n\" +\n                            \"${c_red}====================================================${c_reset}\\n\"\n                }\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "nextflow.config",
    "content": "/*\n * -------------------------------------------------\n *  nf-core/eager Nextflow config file\n * -------------------------------------------------\n * Default config options for all environments.\n */\n// Global default params, used in configs\nparams {\n\n  // Workflow flags\n  genome = false\n  input = null\n  input_paths = null\n  single_end = false\n  outdir = './results'\n  publish_dir_mode = 'copy'\n  config_profile_name = null\n\n  // aws\n  awsqueue = null\n  awsregion = 'eu-west-1'\n  awscli = null\n\n  //Pipeline options\n  enable_conda               = false\n  validate_params            = true\n  schema_ignore_params       = 'genome'\n  show_hidden_params         = false\n\n  //Input reads\n  udg_type = 'none'\n  single_stranded = false\n  single_end = false\n  colour_chemistry = 4\n  bam = false\n  \n  // Optional input information\n  snpcapture_bed = null\n  run_convertinputbam = false\n\n  //Input reference\n  fasta = null\n  bwa_index = null\n  bt2_index = null\n  fasta_index = null\n  seq_dict = null\n  large_ref = false\n  save_reference = false\n  \n  // this is just to stop the iGenomes WARN as we set as FALSE by default. Otherwise should be overwritten by optional config load below.\n  genomes = false \n\n\n  //Skipping parts of the pipeline for impatient users\n  skip_fastqc = false\n  skip_adapterremoval = false \n  skip_preseq = false\n  skip_deduplication = false\n  skip_damage_calculation = false\n  skip_qualimap = false\n\n  //More defaults\n  complexity_filter_poly_g = false\n  complexity_filter_poly_g_min = 10\n\n  //Read clipping and merging parameters\n  clip_forward_adaptor = 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'\n  clip_reverse_adaptor = 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'\n  clip_adapters_list = null \n  clip_readlength = 30\n  clip_min_read_quality = 20\n  min_adap_overlap = 1\n  skip_collapse = false\n  skip_trim = false\n  preserve5p = false\n  mergedonly = false\n  qualitymax = 41\n  run_post_ar_trimming = false\n  post_ar_trim_front = 7\n  post_ar_trim_tail = 7\n  post_ar_trim_front2 = 7\n  post_ar_trim_tail2 = 7\n\n  //Mapping algorithm\n  mapper = 'bwaaln'\n  bwaalnn = 0.01 // From Oliva et al. 2021 (10.1093/bib/bbab076)\n  bwaalnk = 2\n  bwaalnl = 1024 // From Oliva et al. 2021 (10.1093/bib/bbab076)\n  bwaalno = 2 // From Oliva et al. 2021 (10.1093/bib/bbab076)\n  circularextension = 500\n  circulartarget = 'MT'\n  circularfilter = false\n  bt2_alignmode = 'local' // from Cahill 2018 (10.1093/molbev/msy018) and, Poullet and Orlando (10.3389/fevo.2020.00105)\n  bt2_sensitivity = 'sensitive' // from Poullet and Orlando (10.3389/fevo.2020.00105)\n  bt2n = 0 // Do not set Cahill 2018 recommendation of 1 here, so not to 'hide' overriding bowtie2 presets\n  bt2l = 0\n  bt2_trim5 = 0\n  bt2_trim3 = 0\n  bt2_maxins = 500\n\n  //Mapped read removal from input FASTQ\n  hostremoval_input_fastq = false\n  hostremoval_mode = 'remove'\n\n  //BAM Filtering steps (default = discard unmapped reads)\n  run_bam_filtering = false\n  bam_mapping_quality_threshold = 0\n  bam_filter_minreadlength = 0\n  bam_unmapped_type = 'discard'\n\n  //DeDuplication settings\n  dedupper = 'markduplicates'\n  dedup_all_merged = false\n\n  //Preseq settings\n  preseq_step_size = 1000\n  preseq_mode = 'c_curve'\n  preseq_bootstrap = 100\n  preseq_maxextrap = 10000000000\n  preseq_cval = 0.95\n  preseq_terms = 100\n\n  //Damage estimation settings\n  damage_calculation_tool = 'damageprofiler'\n  damageprofiler_length = 100\n  damageprofiler_threshold = 15\n  damageprofiler_yaxis = 0.30\n  mapdamage_downsample = 0\n  mapdamage_yaxis = 0.30\n\n  //PMDTools settings\n  run_pmdtools = false\n  pmdtools_range = 10\n  pmdtools_threshold = 3\n  pmdtools_reference_mask = null\n  pmdtools_max_reads = 10000\n  pmdtools_platypus = false\n\n  // mapDamage\n  run_mapdamage_rescaling = false\n  rescale_length_5p = 0\n  rescale_length_3p = 0\n  rescale_seqlength = 12\n\n  //Bedtools settings\n  run_bedtools_coverage = false\n  anno_file = null\n  anno_file_is_unsorted = false\n\n  //bamUtils trimbam settings\n  run_trim_bam = false \n  bamutils_clip_double_stranded_half_udg_left = 0\n  bamutils_clip_double_stranded_half_udg_right = 0\n  bamutils_clip_double_stranded_none_udg_left = 0\n  bamutils_clip_double_stranded_none_udg_right = 0\n  bamutils_clip_single_stranded_half_udg_left = 0\n  bamutils_clip_single_stranded_half_udg_right = 0\n  bamutils_clip_single_stranded_none_udg_left = 0\n  bamutils_clip_single_stranded_none_udg_right = 0\n  bamutils_softclip = false\n\n  //Genotyping options\n  run_genotyping = false\n  genotyping_tool = null\n  genotyping_source = 'raw'\n  // gatk options\n  gatk_call_conf = 30\n  gatk_ploidy = 2\n  gatk_downsample = 250\n  gatk_dbsnp = null\n  gatk_hc_out_mode = 'EMIT_VARIANTS_ONLY'\n  gatk_hc_emitrefconf = 'GVCF'\n  gatk_ug_genotype_model = 'SNP'\n  gatk_ug_out_mode = 'EMIT_VARIANTS_ONLY'\n  gatk_ug_keep_realign_bam = false\n  gatk_ug_defaultbasequalities = null\n  // freebayes options\n  freebayes_C = 1\n  freebayes_g = 0\n  freebayes_p = 2\n  // Sequencetools pileupCaller\n  pileupcaller_snpfile = null\n  pileupcaller_bedfile = null\n  pileupcaller_method = 'randomHaploid'\n  pileupcaller_transitions_mode = 'AllSites'\n  pileupcaller_min_map_quality = 30\n  pileupcaller_min_base_quality = 30\n  // ANGSD Genotype Likelihoods\n  angsd_glmodel = 'samtools'\n  angsd_glformat = 'binary'\n  angsd_createfasta = false\n  angsd_fastamethod = 'random'\n  run_bcftools_stats = true\n\n  //Consensus sequence generation\n  run_vcf2genome = false\n  vcf2genome_outfile = ''\n  vcf2genome_header = ''\n  vcf2genome_minc = 5\n  vcf2genome_minq = 30\n  vcf2genome_minfreq = 0.8\n\n  //MultiVCFAnalyzer Options\n  run_multivcfanalyzer = false\n  write_allele_frequencies = false\n  min_genotype_quality = 30\n  min_base_coverage = 5\n  min_allele_freq_hom = 0.9\n  min_allele_freq_het = 0.9\n  additional_vcf_files = null\n  reference_gff_annotations = 'NA'\n  reference_gff_exclude = 'NA'\n  snp_eff_results = 'NA'\n\n  //mtnucratio\n  run_mtnucratio = false\n  mtnucratio_header = 'MT'\n\n  //Sex.DetERRmine settings\n  run_sexdeterrmine = false\n  sexdeterrmine_bedfile = null\n\n  //Nuclear contamination based on chromosome X heterozygosity.\n  run_nuclear_contamination = false\n  contamination_chrom_name = 'X' // Default to using hs37d5 name\n\n  // taxonomic classifier\n  run_metagenomic_screening  = false\n  \n  metagenomic_complexity_filter = false\n  metagenomic_complexity_entropy = 0.3\n\n  metagenomic_tool = null\n  database  = null\n  metagenomic_min_support_reads = 1\n  percent_identity = 85\n  malt_mode = 'BlastN'\n  malt_alignment_mode = 'SemiGlobal'\n  malt_top_percent = 1\n  malt_min_support_mode = 'percent'\n  malt_min_support_percent = 0.01\n  malt_max_queries = 100\n  malt_memory_mode = 'load'\n  malt_sam_output = false\n\n  // maltextract - only including number \n  // parameters if default documented or duplicate of MALT\n  run_maltextract = false\n  maltextract_taxon_list = null\n  maltextract_ncbifiles = null\n  maltextract_filter = 'def_anc'\n  maltextract_toppercent = 0.01\n  maltextract_destackingoff = false\n  maltextract_downsamplingoff = false\n  maltextract_duplicateremovaloff = false\n  maltextract_matches = false\n  maltextract_megansummary = false\n  maltextract_percentidentity = 85.0\n  maltextract_topalignment =  false\n\n  // Boilerplate options\n  multiqc_config = false\n  email = false\n  email_on_fail = false\n  max_multiqc_email_size = 25.MB\n  plaintext_email = false\n  monochrome_logs = false\n  help = false\n  igenomes_base = 's3://ngi-igenomes/igenomes'\n  tracedir = \"${params.outdir}/pipeline_info\"\n  igenomes_ignore = true\n  custom_config_version = 'master'\n  custom_config_base = \"https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}\"\n  hostnames = false\n  config_profile_name = null\n  config_profile_description = false\n  config_profile_contact = false\n  config_profile_url = false\n  validate_params = true\n  show_hidden_params = false\n  schema_ignore_params = 'genomes,input_paths'\n\n  // Defaults only, expecting to be overwritten\n  max_memory = 128.GB\n  max_cpus = 16\n  max_time = 240.h\n\n}\n\n// Container slug. Stable releases should specify release tag!\n// Developmental code should specify :dev\nprocess.container = 'nfcore/eager:2.5.3'\n\n// Load base.config by default for all pipelines\nincludeConfig 'conf/base.config'\n\n// Load nf-core custom profiles from different Institutions\ntry {\n  includeConfig \"${params.custom_config_base}/nfcore_custom.config\"\n} catch (Exception e) {\n  System.err.println(\"WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config\")\n}\n\n// Load nf-core/eager custom profiles from different institutions\ntry {\n  includeConfig \"${params.custom_config_base}/pipeline/eager.config\"\n} catch (Exception e) {\n  System.err.println(\"WARNING: Could not load nf-core/config/eager profiles: ${params.custom_config_base}/pipeline/eager.config\")\n}\n\nprofiles {\n  conda {\n    docker.enabled = false\n    singularity.enabled = false\n    podman.enabled = false\n    shifter.enabled = false\n    charliecloud.enabled = false\n    process.conda = \"$projectDir/environment.yml\"\n  }\n  debug { process.beforeScript = 'echo $HOSTNAME' }\n  docker {\n    docker.enabled = true\n    singularity.enabled = false\n    podman.enabled = false\n    shifter.enabled = false\n    charliecloud.enabled = false\n    // Avoid this error:\n    //   WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.\n    // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351\n    // once this is established and works well, nextflow might implement this behavior as new default.\n    docker.runOptions = '-u \\$(id -u):\\$(id -g)'\n  }\n  singularity {\n    docker.enabled = false\n    singularity.enabled = true\n    podman.enabled = false\n    shifter.enabled = false\n    charliecloud.enabled = false\n    singularity.autoMounts = true\n  }\n  podman {\n    singularity.enabled = false\n    docker.enabled = false\n    podman.enabled = true\n    shifter.enabled = false\n    charliecloud.enabled = false\n  }\n  shifter {\n    singularity.enabled = false\n    docker.enabled = false\n    podman.enabled = false\n    shifter.enabled = true\n    charliecloud.enabled = false\n  }\n  charliecloud {\n    singularity.enabled = false\n    docker.enabled = false\n    podman.enabled = false\n    shifter.enabled = false\n    charliecloud.enabled = true\n  }\n  test { includeConfig 'conf/test.config'}\n  test_direct { includeConfig 'conf/test_direct.config' }\n  test_full { includeConfig 'conf/test_full.config' }\n  test_bam { includeConfig 'conf/test_bam.config'}\n  test_fna { includeConfig 'conf/test_fna.config'}\n  test_humanbam { includeConfig 'conf/test_humanbam.config' }\n  test_pretrim { includeConfig 'conf/test_pretrim.config' }\n  test_kraken { includeConfig 'conf/test_kraken.config' }\n  test_tsv_bam { includeConfig 'conf/test_tsv_bam.config'}\n  test_tsv_fna { includeConfig 'conf/test_tsv_fna.config'}\n  test_tsv_humanbam { includeConfig 'conf/test_tsv_humanbam.config' }\n  test_tsv_pretrim { includeConfig 'conf/test_tsv_pretrim.config' }\n  test_tsv_kraken { includeConfig 'conf/test_tsv_kraken.config' }\n  test_tsv_complex { includeConfig 'conf/test_tsv_complex.config' }\n  test_stresstest_human { includeConfig 'conf/test_stresstest_human.config' }\n  benchmarking_human { includeConfig 'conf/benchmarking_human.config' }\n  benchmarking_vikingfish { includeConfig 'conf/benchmarking_vikingfish.config' }\n}\n\n\n// Load igenomes.config if required\nif (!params.igenomes_ignore) {\n  includeConfig 'conf/igenomes.config'\n}\n\n// Export these variables to prevent local Python/R libraries from conflicting with those in the container\nenv {\n  PYTHONNOUSERSITE = 1\n  R_PROFILE_USER = \"/.Rprofile\"\n  R_ENVIRON_USER = \"/.Renviron\"\n}\n\n// Capture exit codes from upstream processes when piping\nprocess.shell = ['/bin/bash', '-euo', 'pipefail']\n\ndef trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')\ntimeline {\n  enabled = true\n  file = \"${params.tracedir}/execution_timeline_${trace_timestamp}.html\"\n}\nreport {\n  enabled = true\n  file = \"${params.tracedir}/execution_report_${trace_timestamp}.html\"\n}\ntrace {\n  enabled = true\n  file = \"${params.tracedir}/execution_trace_${trace_timestamp}.txt\"\n}\ndag {\n  enabled = true\n  file = \"${params.tracedir}/pipeline_dag_${trace_timestamp}.svg\"\n}\n\nmanifest {\n  name = 'nf-core/eager'\n  author = 'The nf-core/eager community'\n  homePage = 'https://github.com/nf-core/eager'\n  description = 'A fully reproducible and state-of-the-art ancient DNA analysis pipeline'\n  mainScript = 'main.nf'\n  nextflowVersion = '>=20.07.1'\n  version = '2.5.3'\n}\n\n// Function to ensure that resource requirements don't go beyond\n// a maximum limit\ndef check_max(obj, type) {\n  if (type == 'memory') {\n    try {\n      if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)\n        return params.max_memory as nextflow.util.MemoryUnit\n      else\n        return obj\n    } catch (all) {\n      println \"   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj\"\n      return obj\n    }\n  } else if (type == 'time') {\n    try {\n      if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)\n        return params.max_time as nextflow.util.Duration\n      else\n        return obj\n    } catch (all) {\n      println \"   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj\"\n      return obj\n    }\n  } else if (type == 'cpus') {\n    try {\n      return Math.min( obj, params.max_cpus as int )\n    } catch (all) {\n      println \"   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj\"\n      return obj\n    }\n  }\n}"
  },
  {
    "path": "nextflow_schema.json",
    "content": "{\n    \"$schema\": \"http://json-schema.org/draft-07/schema\",\n    \"$id\": \"https://raw.githubusercontent.com/nf-core/eager/master/nextflow_schema.json\",\n    \"title\": \"nf-core/eager pipeline parameters\",\n    \"description\": \"A fully reproducible and state-of-the-art ancient DNA analysis pipeline\",\n    \"type\": \"object\",\n    \"definitions\": {\n        \"input_output_options\": {\n            \"title\": \"Input/output options\",\n            \"type\": \"object\",\n            \"fa_icon\": \"fas fa-terminal\",\n            \"description\": \"Define where the pipeline should find input data, and additional metadata.\",\n            \"required\": [\n                \"input\"\n            ],\n            \"properties\": {\n                \"input\": {\n                    \"type\": \"string\",\n                    \"description\": \"Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use '{1,2}' notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.\",\n                    \"fa_icon\": \"fas fa-dna\",\n                    \"help_text\": \"There are two possible ways of supplying input sequencing data to nf-core/eager. The most efficient but more simplistic is supplying direct paths (with wildcards) to your FASTQ or BAM files, with each file or pair being considered a single library and each one run independently  (e.g. for paired-end data: `--input '/<path>/<to>/*_{R1,R2}_*.fq.gz'`). TSV input requires creation of an extra file by the user (`--input '/<path>/<to>/eager_data.tsv'`) and extra metadata, but allows more powerful lane and library merging.  Please see [usage docs](https://nf-co.re/eager/docs/usage#input-specifications) for detailed instructions and specifications.\"\n                },\n                \"udg_type\": {\n                    \"type\": \"string\",\n                    \"default\": \"none\",\n                    \"description\": \"Specifies whether you have UDG treated libraries. Set to 'half' for partial treatment, or 'full' for UDG. If not set, libraries are assumed to have no UDG treatment ('none'). Not required for TSV input.\",\n                    \"fa_icon\": \"fas fa-vial\",\n                    \"help_text\": \"Defines whether Uracil-DNA glycosylase (UDG) treatment was used to remove DNA\\ndamage on the sequencing libraries.\\n\\nSpecify `'none'` if no treatment was performed. If you have partial UDG treated\\ndata ([Rohland et al 2016](http://dx.doi.org/10.1098/rstb.2013.0624)), specify\\n`'half'`. If you have complete UDG treated data ([Briggs et al.\\n2010](https://doi.org/10.1093/nar/gkp1163)), specify `'full'`. \\n\\nWhen also using PMDtools specifying `'half'` will use a different model for DNA\\ndamage assessment in PMDTools (PMDtools: `--UDGhalf`). Specify `'full'` and the\\nPMDtools DNA damage assessment will use CpG context only (PMDtools: `--CpG`).\\nDefault: `'none'`.\\n\\n> **Tip**: You should provide a small decoy reference genome with pre-made indices, e.g.\\n> the human mtDNA genome, for the mandatory parameter `--fasta` in order to\\n> avoid long computational time for generating the index files of the reference\\n> genome, even if you do not actually need a reference genome for any downstream\\n> analyses.\",\n                    \"enum\": [\n                        \"none\",\n                        \"half\",\n                        \"full\"\n                    ]\n                },\n                \"single_stranded\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.\",\n                    \"fa_icon\": \"fas fa-minus\",\n                    \"help_text\": \"Indicates libraries are single stranded.\\n\\nCurrently only affects MALTExtract where it will switch on damage patterns\\ncalculation mode to single-stranded, (MaltExtract: `--singleStranded`) and\\ngenotyping with pileupCaller where a different method is used (pileupCaller:\\n`--singleStrandMode`). Default: false\\n\\nOnly required when using the 'Path' method of `--input`\"\n                },\n                \"single_end\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specifies that the input is single end reads. Not required for TSV input.\",\n                    \"fa_icon\": \"fas fa-align-left\",\n                    \"help_text\": \"By default, the pipeline expects paired-end data. If you have single-end data, specify this parameter on the command line when you launch the pipeline. It is not possible to run a mixture of single-end and paired-end files in one run.\\n\\nOnly required when using the 'Path' method of `--input`\"\n                },\n                \"colour_chemistry\": {\n                    \"type\": \"integer\",\n                    \"default\": 4,\n                    \"description\": \"Specifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.\",\n                    \"fa_icon\": \"fas fa-palette\",\n                    \"help_text\": \"Specifies which Illumina colour chemistry a library was sequenced with. This informs whether to perform poly-G trimming (if `--complexity_filter_poly_g` is also supplied). Only 2 colour chemistry sequencers (e.g. NextSeq or NovaSeq) can generate uncertain poly-G tails (due to 'G' being indicated via a no-colour detection). Default is '4' to indicate e.g. HiSeq or MiSeq platforms, which do not require poly-G trimming. Options: 2, 4. Default: 4\\n\\nOnly required when using the 'Path' method of input.\"\n                },\n                \"bam\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specifies that the input is in BAM format. Not required for TSV input.\",\n                    \"fa_icon\": \"fas fa-align-justify\",\n                    \"help_text\": \"Specifies the input file type to `--input` is in BAM format. This will automatically also apply `--single_end`.\\n\\nOnly required when using the 'Path' method of `--input`.\\n\"\n                }\n            },\n            \"help_text\": \"There are two possible ways of supplying input sequencing data to nf-core/eager.\\nThe most efficient but more simplistic is supplying direct paths (with\\nwildcards) to your FASTQ or BAM files, with each file or pair being considered a\\nsingle library and each one run independently. TSV input requires creation of an\\nextra file by the user and extra metadata, but allows more powerful lane and\\nlibrary merging.\"\n        },\n        \"input_data_additional_options\": {\n            \"title\": \"Input Data Additional Options\",\n            \"type\": \"object\",\n            \"description\": \"Additional options regarding input data.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"snpcapture_bed\": {\n                    \"type\": \"string\",\n                    \"fa_icon\": \"fas fa-magnet\",\n                    \"description\": \"If library result of SNP capture, path to BED file containing SNPS positions on reference genome. SNP statistics are qualimap results directory only not MultiQC.\",\n                    \"help_text\": \"Can be used to set a path to a BED file (3/6 column format) of SNP positions of a reference genome, to calculate SNP captured libraries on-target efficiency. This should be used for array or in-solution SNP capture protocols such as 390K, 1240K, etc. If supplied, some on-target metrics are automatically generated for you by qualimap in the 'Globals inside' section of the 'genome_results.txt' file in the qualimap results directory. These statistics are currently NOT displayed in MultiQC!\"\n                },\n                \"run_convertinputbam\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turns on conversion of an input BAM file into FASTQ format to allow re-preprocessing (e.g. AdapterRemoval etc.).\",\n                    \"fa_icon\": \"fas fa-undo-alt\",\n                    \"help_text\": \"Allows you to convert an input BAM file back to FASTQ for downstream processing. Note this is required if you need to perform AdapterRemoval and/or polyG clipping.\\n\\nIf not turned on, BAMs will automatically be sent to post-mapping steps.\"\n                }\n            },\n            \"fa_icon\": \"far fa-plus-square\"\n        },\n        \"reference_genome_options\": {\n            \"title\": \"Reference genome options\",\n            \"type\": \"object\",\n            \"fa_icon\": \"fas fa-dna\",\n            \"properties\": {\n                \"fasta\": {\n                    \"type\": \"string\",\n                    \"fa_icon\": \"fas fa-font\",\n                    \"description\": \"Path or URL to a FASTA reference file (required if not iGenome reference). File suffixes can be: '.fa', '.fn', '.fna', '.fasta'.\",\n                    \"help_text\": \"You specify the full path to your reference genome here. The FASTA file can have any file suffix, such as `.fasta`, `.fna`, `.fa`, `.FastA` etc. You may also supply a gzipped reference files, which will be unzipped automatically for you.\\n\\nFor example:\\n\\n```bash\\n--fasta '/<path>/<to>/my_reference.fasta'\\n```\\n\\n> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note that you can save the indices created for you for later by giving the `--save_reference` flag.\\n> You must select either a `--fasta` or `--genome`\\n\"\n                },\n                \"genome\": {\n                    \"type\": \"string\",\n                    \"description\": \"Name of iGenomes reference (required if not FASTA reference). Requires argument `--igenomes_ignore false`, as iGenomes is ignored by default in nf-core/eager\",\n                    \"fa_icon\": \"fas fa-book\",\n                    \"help_text\": \"Alternatively to `--fasta`, the pipeline config files come bundled with paths to the Illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.\\n\\nThere are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag.\\n\\nYou can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are:\\n\\n- Human\\n  - `--genome GRCh37`\\n  - `--genome GRCh38`\\n- Mouse *\\n  - `--genome GRCm38`\\n- _Drosophila_ *\\n  - `--genome BDGP6`\\n- _S. cerevisiae_ *\\n  - `--genome 'R64-1-1'`\\n\\n> \\\\* Not bundled with nf-core eager by default.\\n\\nNote that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file.\\n\\nThe syntax for this reference configuration is as follows:\\n\\n```nextflow\\nparams {\\n  genomes {\\n    'GRCh37' {\\n      fasta   = '<path to the iGenomes genome fasta file>'\\n    }\\n    // Any number of additional genomes, key is used with --genome\\n  }\\n}\\n**NB** Requires argument `--igenomes_ignore false` as iGenomes ignored by default in nf-core/eager\\n\\n```\"\n                },\n                \"igenomes_base\": {\n                    \"type\": \"string\",\n                    \"description\": \"Directory / URL base for iGenomes references.\",\n                    \"default\": \"s3://ngi-igenomes/igenomes\",\n                    \"fa_icon\": \"fas fa-cloud-download-alt\",\n                    \"hidden\": true\n                },\n                \"igenomes_ignore\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Do not load the iGenomes reference config.\",\n                    \"fa_icon\": \"fas fa-ban\",\n                    \"hidden\": true,\n                    \"help_text\": \"Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.\"\n                },\n                \"bwa_index\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to directory containing pre-made BWA indices (i.e. the directory before the files ending in '.amb' '.ann' '.bwt'. Do not include the files themselves. Most likely the same directory of the file provided with --fasta). If not supplied will be made for you.\",\n                    \"fa_icon\": \"fas fa-address-book\",\n                    \"help_text\": \"If you want to use pre-existing `bwa index` indices, please supply the **directory** to the FASTA you also specified in `--fasta` nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.\\n\\nFor example:\\n\\n```bash\\nnextflow run nf-core/eager \\\\\\n-profile test,docker \\\\\\n--input '*{R1,R2}*.fq.gz'\\n--fasta 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' \\\\\\n--bwa_index 'results/reference_genome/bwa_index/BWAIndex/'\\n```\\n\\n> `bwa index` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise nf-core/eager will not be able to find them.\"\n                },\n                \"bt2_index\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to directory containing pre-made Bowtie2 indices (i.e. everything before the endings e.g. '.1.bt2', '.2.bt2', '.rev.1.bt2'. Most likely the same value as --fasta). If not supplied will be made for you.\",\n                    \"fa_icon\": \"far fa-address-book\",\n                    \"help_text\": \"If you want to use pre-existing `bt2 index` indices, please supply the **directory** to the FASTA you also specified in `--fasta`. nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding `bt2` index file suffixes.\\n\\nFor example:\\n\\n```bash\\nnextflow run nf-core/eager \\\\\\n-profile test,docker \\\\\\n--input '*{R1,R2}*.fq.gz'\\n--fasta 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' \\\\\\n--bwa_index 'results/reference_genome/bt2_index/BT2Index/'\\n```\\n\\n> `bowtie2-build` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise nf-core/eager will not be able to find them.\"\n                },\n                \"fasta_index\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to samtools FASTA index (typically ending in '.fai'). If not supplied will be made for you.\",\n                    \"fa_icon\": \"far fa-bookmark\",\n                    \"help_text\": \"If you want to use a pre-existing `samtools faidx` index, use this to specify the required FASTA index file for the selected reference genome. This should be generated by `samtools faidx` and has a file suffix of `.fai`\\n\\nFor example:\\n\\n```bash\\n--fasta_index 'Mammoth_MT_Krause.fasta.fai'\\n```\"\n                },\n                \"seq_dict\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to picard sequence dictionary file (typically ending in '.dict'). If not supplied will be made for you.\",\n                    \"fa_icon\": \"fas fa-spell-check\",\n                    \"help_text\": \"If you want to use a pre-existing `picard CreateSequenceDictionary` dictionary file, use this to specify the required `.dict` file for the selected reference genome.\\n\\nFor example:\\n\\n```bash\\n--seq_dict 'Mammoth_MT_Krause.dict'\\n```\"\n                },\n                \"large_ref\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specify to generate more recent '.csi' BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the '.csi' format over the older '.bai'.\",\n                    \"fa_icon\": \"fas fa-mountain\",\n                    \"help_text\": \"This parameter is required to be set for large reference genomes. If your\\nreference genome is larger than 3.5GB, the `samtools index` calls in the\\npipeline need to generate `CSI` indices instead of `BAI` indices to compensate\\nfor the size of the reference genome (with samtools: `-c`). This parameter is\\nnot required for smaller references (including the human `hg19` or\\n`grch37`/`grch38` references), but `>4GB` genomes have been shown to need `CSI`\\nindices. Default: off\"\n                },\n                \"save_reference\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"If not already supplied by user, turns on saving of generated reference genome indices for later re-usage.\",\n                    \"fa_icon\": \"far fa-save\",\n                    \"help_text\": \"Use this if you do not have pre-made reference FASTA indices for `bwa`, `samtools` and `picard`. If you turn this on, the indices nf-core/eager generates for you and will be saved in the `<your_output_dir>/results/reference_genomes` for you. If not supplied, nf-core/eager generated index references will be deleted.\\n\\n> modifies SAMtools index command: `-c`\"\n                }\n            },\n            \"description\": \"Specify locations of references and optionally, additional pre-made indices\",\n            \"help_text\": \"All nf-core/eager runs require a reference genome in FASTA format to map reads\\nagainst to.\\n\\nIn addition we provide various options for indexing of different types of\\nreference genomes (based on the tools used in the pipeline). nf-core/eager can\\nindex reference genomes for you (with options to save these for other analysis),\\nbut you can also supply your pre-made indices.\\n\\nSupplying pre-made indices saves time in pipeline execution and is especially\\nadvised when running multiple times on the same cluster system for example. You\\ncan even add a resource [specific profile](#profile) that sets paths to\\npre-computed reference genomes, saving time when specifying these.\\n\\n> :warning: you must always supply a reference file. If you want to use\\n  functionality that does not require one, supply a small decoy genome such as\\n  phiX or the human mtDNA genome.\"\n        },\n        \"output_options\": {\n            \"title\": \"Output options\",\n            \"type\": \"object\",\n            \"description\": \"Specify where to put output files and optional saving of intermediate files\",\n            \"default\": \"\",\n            \"properties\": {\n                \"outdir\": {\n                    \"type\": \"string\",\n                    \"description\": \"The output directory where the results will be saved.\",\n                    \"default\": \"./results\",\n                    \"fa_icon\": \"fas fa-folder-open\",\n                    \"help_text\": \"The output directory where the results will be saved. By default will be made in the directory you run the command in under `./results`.\"\n                },\n                \"publish_dir_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"copy\",\n                    \"hidden\": true,\n                    \"description\": \"Method used to save pipeline results to output directory.\",\n                    \"help_text\": \"The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.\",\n                    \"fa_icon\": \"fas fa-copy\",\n                    \"enum\": [\n                        \"symlink\",\n                        \"rellink\",\n                        \"link\",\n                        \"copy\",\n                        \"copyNoFollow\",\n                        \"move\"\n                    ]\n                }\n            },\n            \"fa_icon\": \"fas fa-cloud-download-alt\"\n        },\n        \"generic_options\": {\n            \"title\": \"Generic options\",\n            \"type\": \"object\",\n            \"properties\": {\n                \"help\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Display help text.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-question-circle\"\n                },\n                \"validate_params\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Boolean whether to validate parameters against the schema at runtime\",\n                    \"default\": true,\n                    \"fa_icon\": \"fas fa-check-square\",\n                    \"hidden\": true\n                },\n                \"email\": {\n                    \"type\": \"string\",\n                    \"description\": \"Email address for completion summary.\",\n                    \"fa_icon\": \"fas fa-envelope\",\n                    \"help_text\": \"An email address to send a summary email to when the pipeline is completed.\",\n                    \"pattern\": \"^([a-zA-Z0-9_\\\\-\\\\.]+)@([a-zA-Z0-9_\\\\-\\\\.]+)\\\\.([a-zA-Z]{2,5})$\"\n                },\n                \"email_on_fail\": {\n                    \"type\": \"string\",\n                    \"description\": \"Email address for completion summary, only when pipeline fails.\",\n                    \"fa_icon\": \"fas fa-exclamation-triangle\",\n                    \"pattern\": \"^([a-zA-Z0-9_\\\\-\\\\.]+)@([a-zA-Z0-9_\\\\-\\\\.]+)\\\\.([a-zA-Z]{2,5})$\",\n                    \"hidden\": true,\n                    \"help_text\": \"Set this parameter to your e-mail address to get a summary e-mail with details of the run if it **fails**. Normally would be the same as in `--email` but can be different. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.\\n\\n> Note that this functionality requires either `mail` or `sendmail` to be installed on your system.\"\n                },\n                \"plaintext_email\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Send plain-text email instead of HTML.\",\n                    \"fa_icon\": \"fas fa-remove-format\",\n                    \"hidden\": true,\n                    \"help_text\": \"Set to receive plain-text e-mails instead of HTML formatted.\"\n                },\n                \"max_multiqc_email_size\": {\n                    \"type\": \"string\",\n                    \"description\": \"File size limit when attaching MultiQC reports to summary emails.\",\n                    \"default\": \"25.MB\",\n                    \"fa_icon\": \"fas fa-file-upload\",\n                    \"hidden\": true,\n                    \"help_text\": \"If file generated by pipeline exceeds the threshold, it will not be attached.\"\n                },\n                \"monochrome_logs\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Do not use coloured log outputs.\",\n                    \"fa_icon\": \"fas fa-palette\",\n                    \"hidden\": true,\n                    \"help_text\": \"Set to disable colourful command line output and live life in monochrome.\"\n                },\n                \"multiqc_config\": {\n                    \"type\": \"string\",\n                    \"description\": \"Custom config file to supply to MultiQC.\",\n                    \"fa_icon\": \"fas fa-cog\",\n                    \"hidden\": true\n                },\n                \"tracedir\": {\n                    \"type\": \"string\",\n                    \"description\": \"Directory to keep pipeline Nextflow logs and reports.\",\n                    \"default\": \"${params.outdir}/pipeline_info\",\n                    \"fa_icon\": \"fas fa-cogs\",\n                    \"hidden\": true\n                },\n                \"show_hidden_params\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"far fa-eye-slash\",\n                    \"description\": \"Show all params when using `--help`\",\n                    \"hidden\": true,\n                    \"help_text\": \"By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters.\"\n                },\n                \"enable_conda\": {\n                    \"type\": \"boolean\",\n                    \"hidden\": true,\n                    \"description\": \"Parameter used for checking conda channels to be set correctly.\"\n                },\n                \"schema_ignore_params\": {\n                    \"type\": \"string\",\n                    \"fa_icon\": \"fas fa-not-equal\",\n                    \"description\": \"String to specify ignored parameters for parameter validation\",\n                    \"hidden\": true,\n                    \"default\": \"genomes\"\n                }\n            },\n            \"fa_icon\": \"fas fa-file-import\",\n            \"description\": \"Less common options for the pipeline, typically set in a config file.\",\n            \"help_text\": \"These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\\n\\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.\"\n        },\n        \"max_job_request_options\": {\n            \"title\": \"Max job request options\",\n            \"type\": \"object\",\n            \"fa_icon\": \"fab fa-acquisitions-incorporated\",\n            \"description\": \"Set the top limit for requested resources for any single job.\",\n            \"help_text\": \"If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\\n\\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.\",\n            \"properties\": {\n                \"max_cpus\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Maximum number of CPUs that can be requested    for any single job.\",\n                    \"default\": 16,\n                    \"fa_icon\": \"fas fa-microchip\",\n                    \"hidden\": true,\n                    \"help_text\": \"Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`\"\n                },\n                \"max_memory\": {\n                    \"type\": \"string\",\n                    \"description\": \"Maximum amount of memory that can be requested for any single job.\",\n                    \"default\": \"128.GB\",\n                    \"fa_icon\": \"fas fa-memory\",\n                    \"pattern\": \"^\\\\d+(\\\\.\\\\d+)?\\\\.?\\\\s*(K|M|G|T)?B$\",\n                    \"hidden\": true,\n                    \"help_text\": \"Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`\"\n                },\n                \"max_time\": {\n                    \"type\": \"string\",\n                    \"description\": \"Maximum amount of time that can be requested for any single job.\",\n                    \"default\": \"240.h\",\n                    \"fa_icon\": \"far fa-clock\",\n                    \"pattern\": \"^(\\\\d+\\\\.?\\\\s*(s|m|h|day)\\\\s*)+$\",\n                    \"hidden\": true,\n                    \"help_text\": \"Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`\"\n                }\n            }\n        },\n        \"institutional_config_options\": {\n            \"title\": \"Institutional config options\",\n            \"type\": \"object\",\n            \"fa_icon\": \"fas fa-university\",\n            \"description\": \"Parameters used to describe centralised config profiles. These generally should not be edited.\",\n            \"help_text\": \"The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.\",\n            \"properties\": {\n                \"custom_config_version\": {\n                    \"type\": \"string\",\n                    \"description\": \"Git commit id for Institutional configs.\",\n                    \"default\": \"master\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\",\n                    \"help_text\": \"Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\\n\\n```bash\\n## Download and use config file with following git commit id\\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\\n```\"\n                },\n                \"custom_config_base\": {\n                    \"type\": \"string\",\n                    \"description\": \"Base directory for Institutional configs.\",\n                    \"default\": \"https://raw.githubusercontent.com/nf-core/configs/master\",\n                    \"hidden\": true,\n                    \"help_text\": \"If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\\n\\n```bash\\n## Download and unzip the config files\\ncd /path/to/my/configs\\nwget https://github.com/nf-core/configs/archive/master.zip\\nunzip master.zip\\n\\n## Run the pipeline\\ncd /path/to/my/data\\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\\n```\\n\\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.\",\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"hostnames\": {\n                    \"type\": \"string\",\n                    \"description\": \"Institutional configs hostname.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"config_profile_name\": {\n                    \"type\": \"string\",\n                    \"description\": \"Institutional config name.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"config_profile_description\": {\n                    \"type\": \"string\",\n                    \"description\": \"Institutional config description.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"config_profile_contact\": {\n                    \"type\": \"string\",\n                    \"description\": \"Institutional config contact information.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"config_profile_url\": {\n                    \"type\": \"string\",\n                    \"description\": \"Institutional config URL link.\",\n                    \"hidden\": true,\n                    \"fa_icon\": \"fas fa-users-cog\"\n                },\n                \"awsqueue\": {\n                    \"type\": \"string\",\n                    \"description\": \"The AWSBatch JobQueue that needs to be set when running on AWSBatch\",\n                    \"fa_icon\": \"fab fa-aws\"\n                },\n                \"awsregion\": {\n                    \"type\": \"string\",\n                    \"default\": \"eu-west-1\",\n                    \"description\": \"The AWS Region for your AWS Batch job to run on\",\n                    \"fa_icon\": \"fab fa-aws\"\n                },\n                \"awscli\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to the AWS CLI tool\",\n                    \"fa_icon\": \"fab fa-aws\"\n                }\n            }\n        },\n        \"skip_steps\": {\n            \"title\": \"Skip steps\",\n            \"type\": \"object\",\n            \"description\": \"Skip any of the mentioned steps.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"skip_fastqc\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off FastQC pre- and post-Adapter Removal, to speed up the pipeline. Use of this flag is most common when data has been previously pre-processed and the post-Adapter Removal mapped reads are being re-mapped to a new reference genome.\"\n                },\n                \"skip_adapterremoval\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off adapter trimming and paired-end read merging. Equivalent to setting both `--skip_collapse` and `--skip_trim`.\"\n                },\n                \"skip_preseq\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off the computation of library complexity estimation.\"\n                },\n                \"skip_deduplication\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off duplicate removal methods DeDup and MarkDuplicates respectively. No duplicates will be removed on any data in the pipeline.\\n\"\n                },\n                \"skip_damage_calculation\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off the DamageProfiler module to compute DNA damage profiles.\\n\"\n                },\n                \"skip_qualimap\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off QualiMap and thus does not compute coverage and other mapping metrics.\\n\"\n                }\n            },\n            \"fa_icon\": \"fas fa-fast-forward\",\n            \"help_text\": \"Some of the steps in the pipeline can be executed optionally. If you specify\\nspecific steps to be skipped, there won't be any output related to these\\nmodules.\"\n        },\n        \"complexity_filtering\": {\n            \"title\": \"Complexity filtering\",\n            \"type\": \"object\",\n            \"description\": \"Processing of Illumina two-colour chemistry data.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"complexity_filter_poly_g\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on running poly-G removal on FASTQ files. Will only be performed on 2 colour chemistry machine sequenced libraries.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Performs a poly-G tail removal step in the beginning of the pipeline using `fastp`, if turned on. This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.\\n\"\n                },\n                \"complexity_filter_poly_g_min\": {\n                    \"type\": \"integer\",\n                    \"default\": 10,\n                    \"description\": \"Specify length of poly-g min for clipping to be performed.\",\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"help_text\": \"This option can be used to define the minimum length of a poly-G tail to begin low complexity trimming. By default, this is set to a value of `10` unless the user has chosen something specifically using this option.\\n\\n> Modifies fastp parameter: `--poly_g_min_len`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-filter\",\n            \"help_text\": \"More details can be seen in the [fastp\\ndocumentation](https://github.com/OpenGene/fastp)\\n\\nIf using TSV input, this is performed per lane separately\"\n        },\n        \"read_merging_and_adapter_removal\": {\n            \"title\": \"Read merging and adapter removal\",\n            \"type\": \"object\",\n            \"description\": \"Options for adapter clipping and paired-end merging.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"clip_forward_adaptor\": {\n                    \"type\": \"string\",\n                    \"default\": \"AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC\",\n                    \"description\": \"Specify adapter sequence to be clipped off (forward strand).\",\n                    \"fa_icon\": \"fas fa-cut\",\n                    \"help_text\": \"Defines the adapter sequence to be used for the forward read. By default, this is set to `'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'`.\\n\\n> Modifies AdapterRemoval parameter: `--adapter1`\"\n                },\n                \"clip_reverse_adaptor\": {\n                    \"type\": \"string\",\n                    \"default\": \"AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA\",\n                    \"description\": \"Specify adapter sequence to be clipped off (reverse strand).\",\n                    \"fa_icon\": \"fas fa-cut\",\n                    \"help_text\": \"Defines the adapter sequence to be used for the reverse read in paired end sequencing projects. This is set to `'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'` by default.\\n\\n> Modifies AdapterRemoval parameter: `--adapter2`\"\n                },\n                \"clip_adapters_list\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to AdapterRemoval adapter list file. Overrides `--clip_*_adaptor` parameters\",\n                    \"fa_icon\": \"fas fa-cut\",\n                    \"help_text\": \"Allows to supply a file with a list of adapter (combinations) to remove from all files. **Overrides** the `--clip_*_adaptor` parameters . First column represents forward strand, second column for reverse strand. You must supply all possibly combinations, one per line, and this list is applied to all files. See [AdapterRemoval documentation](https://adapterremoval.readthedocs.io/en/latest/manpage.html) for more information.\\n\\n> Modifies AdapterRemoval parameter: `--adapter-list`\"\n                },\n                \"clip_readlength\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"Specify read minimum length to be kept for downstream analysis.\",\n                    \"fa_icon\": \"fas fa-ruler\",\n                    \"help_text\": \"Defines the minimum read length that is required for reads after merging to be considered for downstream analysis after read merging. Default is `30`.\\n\\nNote that when you have a large percentage of very short reads in your library (< 20 bp) - such as retrieved in single-stranded library protocols - that performing read length filtering at this step is not _always_ reliable for correct endogenous DNA calculation.  When you have very few reads passing this length filter, it will artificially inflate your 'endogenous DNA' value by creating a very small denominator. \\n\\nIf you notice you have ultra short reads (< 20 bp), it is recommended to set this parameter to 0, and use `--bam_filter_minreadlength` instead, to filter out 'un-usable' short reads after mapping. A caveat, however, is that this will cause a very large increase in computational run time, due to all reads in the library will be being mapped.\\n\\n> Modifies AdapterRemoval parameter: `--minlength`\\n\"\n                },\n                \"clip_min_read_quality\": {\n                    \"type\": \"integer\",\n                    \"default\": 20,\n                    \"description\": \"Specify minimum base quality for trimming off bases.\",\n                    \"fa_icon\": \"fas fa-medal\",\n                    \"help_text\": \"Defines the minimum read quality per base that is required for a base to be kept. Individual bases at the ends of reads falling below this threshold will be clipped off. Default is set to `20`.\\n\\n> Modifies AdapterRemoval parameter: `--minquality`\"\n                },\n                \"min_adap_overlap\": {\n                    \"type\": \"integer\",\n                    \"default\": 1,\n                    \"description\": \"Specify minimum adapter overlap required for clipping.\",\n                    \"fa_icon\": \"fas fa-hands-helping\",\n                    \"help_text\": \"Specifies a minimum number of bases that overlap with the adapter sequence before adapters are trimmed from reads. Default is set to `1` base overlap.\\n\\n> Modifies AdapterRemoval parameter: `--minadapteroverlap`\"\n                },\n                \"skip_collapse\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Skip of merging forward and reverse reads together and turns on paired-end alignment for downstream mapping. Only applicable for paired-end libraries.\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off the paired-end read merging.\\n\\nFor example\\n\\n```bash\\n--skip_collapse  --input '*_{R1,R2}_*.fastq'\\n```\\n\\nIt is important to use the paired-end wildcard globbing as `--skip_collapse` can only be used on paired-end data!\\n\\n:warning: If you run this and also with `--clip_readlength` set to something (as is by default), you may end up removing single reads from either the pair1 or pair2 file. These will be NOT be mapped when aligning with either `bwa` or `bowtie`, as both can only accept one (forward) or two (forward and reverse) FASTQs as input.\\n\\nAlso note that supplying this flag will then also cause downstream mapping steps to run in paired-end mode. This may be more suitable for modern data, or when you want to utilise mate-pair spatial information.\\n\\n> Modifies AdapterRemoval parameter: `--collapse`\"\n                },\n                \"skip_trim\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Skip adapter and quality trimming.\",\n                    \"fa_icon\": \"fas fa-fast-forward\",\n                    \"help_text\": \"Turns off adapter AND quality trimming.\\n\\nFor example:\\n\\n```bash\\n--skip_trim  --input '*.fastq'\\n```\\n\\n:warning: it is not possible to keep quality trimming (n or base quality) on,\\n_and_ skip adapter trimming.\\n\\n:warning: it is not possible to turn off one or the other of quality\\ntrimming or n trimming. i.e. --trimns --trimqualities are both given\\nor neither. However setting quality in `--clip_min_read_quality` to 0 would\\ntheoretically turn off base quality trimming.\\n\\n> Modifies AdapterRemoval parameters: `--trimns --trimqualities --adapter1 --adapter2`\"\n                },\n                \"preserve5p\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Skip quality base trimming (n, score, window) of 5 prime end.\",\n                    \"fa_icon\": \"fas fa-life-ring\",\n                    \"help_text\": \"Turns off quality based trimming at the 5p end of reads when any of the --trimns, --trimqualities, or --trimwindows options are used. Only 3p end of reads will be removed.\\n\\nThis also entirely disables quality based trimming of collapsed reads, since both ends of these are informative for PCR duplicate filtering. Described [here](https://github.com/MikkelSchubert/adapterremoval/issues/32#issuecomment-504758137).\\n\\n> Modifies AdapterRemoval parameters: `--preserve5p`\"\n                },\n                \"mergedonly\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Only use merged reads downstream (un-merged reads and singletons are discarded).\",\n                    \"fa_icon\": \"fas fa-handshake\",\n                    \"help_text\": \"Specify that only merged reads are sent downstream for analysis.\\n\\nSingletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded.\\n\\nYou may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality). It is highly recommended when using `--dedupper 'dedup'` (see below).\"\n                },\n                \"qualitymax\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify the maximum Phred score used in input FASTQ files\",\n                    \"help_text\": \"Specify maximum Phred score of the quality field of FASTQ files. The quality-score range can vary depending on the machine and version (e.g. see diagram [here](https://en.wikipedia.org/wiki/FASTQ_format#Encoding), and this allows you to increase from the default AdapterRemoval value of `41`.\\n\\n> Modifies AdapterRemoval parameters: `--qualitymax`\",\n                    \"default\": 41,\n                    \"fa_icon\": \"fas fa-arrow-up\"\n                },\n                \"run_post_ar_trimming\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on trimming of inline barcodes (i.e. internal barcodes after adapter removal)\",\n                    \"help_text\": \"In some cases, you may want to additionally trim reads in a FASTQ file after adapter removal.\\n\\nThis could be to remove short 'inline' or 'internal' barcodes that are ligated directly onto DNA molecules prior ligation of adapters and indicies (the former of which allow ultra-multiplexing and/or checks for barcode hopping).\\n\\nIn other cases, you may wish to already remove known high-frequency damage bases to allow stricter mapping.\\n\\nTurning on this module uses `fastp` to trim one, or both ends of a merged read, or in cases where you have not collapsed your read, R1 and R2.\\n\"\n                },\n                \"post_ar_trim_front\": {\n                    \"type\": \"integer\",\n                    \"default\": 7,\n                    \"description\": \"Specify the number of bases to trim off the front of a merged read or R1\",\n                    \"help_text\": \"Specify the number of bases to trim off the start of a read in a merged- or forward read FASTQ file.\\n\\n> Modifies fastp parameters: `--trim_front1`\"\n                },\n                \"post_ar_trim_tail\": {\n                    \"type\": \"integer\",\n                    \"default\": 7,\n                    \"description\": \"Specify the number of bases to trim off the tail of of a merged read or R1\",\n                    \"help_text\": \"Specify the number of bases to trim off the end of a read in a merged- or forward read FASTQ file.\\n\\n> Modifies fastp parameters: `--trim_tail1`\"\n                },\n                \"post_ar_trim_front2\": {\n                    \"type\": \"integer\",\n                    \"default\": 7,\n                    \"description\": \"Specify the number of bases to trim off the front of R2\",\n                    \"help_text\": \"Specify the number of bases to trim off the start of a read in an unmerged forward read (R1) FASTQ file.\\n\\n> Modifies fastp parameters: `--trim_front2`\"\n                },\n                \"post_ar_trim_tail2\": {\n                    \"type\": \"integer\",\n                    \"default\": 7,\n                    \"description\": \"Specify the number of bases to trim off the tail of R2\",\n                    \"help_text\": \"Specify the number of bases to trim off the end of a read in an unmerged reverse read (R2) FASTQ file.\\n\\n> Modifies fastp parameters: `--trim_tail2`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-cut\",\n            \"help_text\": \"These options handle various parts of adapter clipping and read merging steps.\\n\\nMore details can be seen in the [AdapterRemoval\\ndocumentation](https://adapterremoval.readthedocs.io/en/latest/)\\n\\nIf using TSV input, this is performed per lane separately.\\n\\n> :warning: `--skip_trim` will skip adapter clipping AND quality trimming\\n> (n, base quality). It is currently not possible skip one or the other.\"\n        },\n        \"mapping\": {\n            \"title\": \"Read mapping to reference genome\",\n            \"type\": \"object\",\n            \"description\": \"Options for reference-genome mapping\",\n            \"default\": \"\",\n            \"properties\": {\n                \"mapper\": {\n                    \"title\": \"Mapper\",\n                    \"type\": \"string\",\n                    \"description\": \"Specify which mapper to use. Options: 'bwaaln', 'bwamem', 'circularmapper', 'bowtie2'.\",\n                    \"default\": \"bwaaln\",\n                    \"fa_icon\": \"fas fa-layer-group\",\n                    \"help_text\": \"Specify which mapping tool to use. Options are BWA aln (`'bwaaln'`), BWA mem (`'bwamem'`), circularmapper (`'circularmapper'`), or bowtie2 (`bowtie2`). BWA aln is the default and highly suited for short-read ancient DNA. BWA mem can be quite useful for modern DNA, but is rarely used in projects for ancient DNA. CircularMapper enhances  the mapping procedure to circular references, using the BWA algorithm but utilizing a extend-remap procedure (see Peltzer et al 2016, Genome Biology for details). Bowtie2 is similar to BWA aln, and has recently been suggested to provide slightly better results under certain conditions ([Poullet and Orlando 2020](https://doi.org/10.3389/fevo.2020.00105)), as well as providing extra functionality (such as FASTQ trimming). Default is 'bwaaln'\\n\\nMore documentation can be seen for each tool under:\\n\\n- [BWA aln](http://bio-bwa.sourceforge.net/bwa.shtml#3)\\n- [BWA mem](http://bio-bwa.sourceforge.net/bwa.shtml#3)\\n- [CircularMapper](https://circularmapper.readthedocs.io/en/latest/contents/userguide.html)\\n- [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#command-line)\\n\",\n                    \"enum\": [\n                        \"bwaaln\",\n                        \"bwamem\",\n                        \"circularmapper\",\n                        \"bowtie2\"\n                    ]\n                },\n                \"bwaalnn\": {\n                    \"type\": \"number\",\n                    \"default\": 0.01,\n                    \"description\": \"Specify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.\",\n                    \"fa_icon\": \"fas fa-sort-numeric-down\",\n                    \"help_text\": \"Configures the `bwa aln -n` parameter, defining how many mismatches are allowed in a read. By default set to `0.04` (following recommendations of [Schubert et al. (2012 _BMC Genomics_)](https://doi.org/10.1186/1471-2164-13-178)), if you're uncertain what to set check out [this](https://apeltzer.shinyapps.io/bwa-mismatches/) Shiny App for more information on how to set this parameter efficiently.\\n\\n> Modifies bwa aln parameter: `-n`\"\n                },\n                \"bwaalnk\": {\n                    \"type\": \"integer\",\n                    \"default\": 2,\n                    \"description\": \"Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.\",\n                    \"fa_icon\": \"fas fa-drafting-compass\",\n                    \"help_text\": \"Configures the `bwa aln -k` parameter for the seeding phase in the mapping algorithm. Default is set to `2`.\\n\\n> Modifies BWA aln parameter: `-k`\"\n                },\n                \"bwaalnl\": {\n                    \"type\": \"integer\",\n                    \"default\": 1024,\n                    \"description\": \"Specify the -l parameter for BWA aln i.e. the length of seeds to be used.\",\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"help_text\": \"Configures the length of the seed used in `bwa aln -l`. Default is set to be 'turned off' at the recommendation of Schubert et al. ([2012 _BMC Genomics_](https://doi.org/10.1186/1471-2164-13-178)) for ancient DNA with `1024`.\\n\\nNote: Despite being recommended, turning off seeding can result in long runtimes!\\n\\n> Modifies BWA aln parameter: `-l`\\n\"\n                },\n                \"bwaalno\": {\n                    \"type\": \"integer\",\n                    \"default\": 2,\n                    \"fa_icon\": \"fas fa-people-arrows\",\n                    \"description\": \"Specify the -o parameter for BWA aln i.e. the number of gaps allowed.\",\n                    \"help_text\": \"Configures the number of gaps used in `bwa aln`. Default is set to `bwa` default.\\n\\n> Modifies BWA aln parameter: `-o`\\n\"\n                },\n                \"circularextension\": {\n                    \"type\": \"integer\",\n                    \"default\": 500,\n                    \"description\": \"Specify the number of bases to extend reference by (circularmapper only).\",\n                    \"fa_icon\": \"fas fa-external-link-alt\",\n                    \"help_text\": \"The number of bases to extend the reference genome with. By default this is set to `500` if not specified otherwise.\\n\\n> Modifies circulargenerator and realignsamfile parameter: `-e`\"\n                },\n                \"circulartarget\": {\n                    \"type\": \"string\",\n                    \"default\": \"MT\",\n                    \"description\": \"Specify the FASTA header of the target chromosome to extend (circularmapper only).\",\n                    \"fa_icon\": \"fas fa-bullseye\",\n                    \"help_text\": \"The chromosome in your FASTA reference that you'd like to be treated as circular. By default this is set to `MT` but can be configured to match any other chromosome.\\n\\n> Modifies circulargenerator parameter: `-s`\"\n                },\n                \"circularfilter\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on to remove reads that did not map to the circularised genome (circularmapper only).\",\n                    \"fa_icon\": \"fas fa-filter\",\n                    \"help_text\": \"If you want to filter out reads that don't map to a circular chromosome (and also non-circular chromosome headers) from the resulting BAM file, turn this on. By default this option is turned off.\\n> Modifies -f and -x parameters of CircularMapper's realignsamfile\\n\"\n                },\n                \"bt2_alignmode\": {\n                    \"type\": \"string\",\n                    \"default\": \"local\",\n                    \"description\": \"Specify the bowtie2 alignment mode. Options:  'local', 'end-to-end'.\",\n                    \"fa_icon\": \"fas fa-arrows-alt-h\",\n                    \"help_text\": \"The type of read alignment to use. Options are 'local' or 'end-to-end'. Local allows only partial alignment of read, with ends of reads possibly 'soft-clipped' (i.e. remain unaligned/ignored), if the soft-clipped alignment provides best alignment score. End-to-end requires all nucleotides to be aligned. Default is 'local', following [Cahill et al (2018)](https://doi.org/10.1093/molbev/msy018) and [Poullet and Orlando 2020](https://doi.org/10.3389/fevo.2020.00105).\\n\\n> Modifies Bowtie2 parameters: `--very-fast --fast --sensitive --very-sensitive --very-fast-local --fast-local --sensitive-local --very-sensitive-local`\",\n                    \"enum\": [\n                        \"local\",\n                        \"end-to-end\"\n                    ]\n                },\n                \"bt2_sensitivity\": {\n                    \"type\": \"string\",\n                    \"default\": \"sensitive\",\n                    \"description\": \"Specify the level of sensitivity for the bowtie2 alignment mode. Options: 'no-preset', 'very-fast', 'fast', 'sensitive', 'very-sensitive'.\",\n                    \"fa_icon\": \"fas fa-microscope\",\n                    \"help_text\": \"The Bowtie2 'preset' to use. Options: 'no-preset' 'very-fast', 'fast', 'sensitive', or 'very-sensitive'. These strings apply to both `--bt2_alignmode` options. See the Bowtie2 [manual](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#command-line) for actual settings. Default is 'sensitive' (following [Poullet and Orlando (2020)](https://doi.org/10.3389/fevo.2020.00105), when running damaged-data _without_ UDG treatment)\\n\\n> Modifies Bowtie2 parameters: `--very-fast --fast --sensitive --very-sensitive --very-fast-local --fast-local --sensitive-local --very-sensitive-local`\",\n                    \"enum\": [\n                        \"no-preset\",\n                        \"very-fast\",\n                        \"fast\",\n                        \"sensitive\",\n                        \"very-sensitive\"\n                    ]\n                },\n                \"bt2n\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.\",\n                    \"fa_icon\": \"fas fa-sort-numeric-down\",\n                    \"help_text\": \"The number of mismatches allowed in the seed during seed-and-extend procedure of Bowtie2. This will override any values set with `--bt2_sensitivity`. Can either be 0 or 1. Default: 0 (i.e. use`--bt2_sensitivity` defaults).\\n\\n> Modifies Bowtie2 parameters: `-N`\",\n                    \"default\": 0\n                },\n                \"bt2l\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.\",\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"help_text\": \"The length of the seed sub-string to use during seeding. This will override any values set with `--bt2_sensitivity`. Default: 0 (i.e. use`--bt2_sensitivity` defaults: [20 for local and 22 for end-to-end](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#command-line).\\n\\n> Modifies Bowtie2 parameters: `-L`\",\n                    \"default\": 0\n                },\n                \"bt2_trim5\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify number of bases to trim off from 5' (left) end of read before alignment.\",\n                    \"fa_icon\": \"fas fa-cut\",\n                    \"help_text\": \"Number of bases to trim at the 5' (left) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present Default: 0\\n\\n> Modifies Bowtie2 parameters: `-bt2_trim5`\",\n                    \"default\": 0\n                },\n                \"bt2_trim3\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify number of bases to trim off from 3' (right) end of read before alignment.\",\n                    \"fa_icon\": \"fas fa-cut\",\n                    \"help_text\": \"Number of bases to trim at the 3' (right) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present Default: 0.\\n\\n> Modifies Bowtie2 parameters: `-bt2_trim3`\",\n                    \"default\": 0\n                },\n                \"bt2_maxins\": {\n                    \"type\": \"integer\",\n                    \"default\": 500,\n                    \"fa_icon\": \"fas fa-exchange-alt\",\n                    \"description\": \"Specify the maximum fragment length for Bowtie2 paired-end mapping mode only.\",\n                    \"help_text\": \"The maximum fragment for valid paired-end alignments. Only for paired-end mapping (i.e. unmerged), and therefore typically only useful for modern data.\\n\\n See [Bowtie2 documentation](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) for more information.\\n\\n>  Modifies Bowtie2 parameters: `--maxins`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-layer-group\",\n            \"help_text\": \"If using TSV input, mapping is performed at the library level, i.e. after lane merging.\\n\"\n        },\n        \"host_removal\": {\n            \"title\": \"Removal of Host-Mapped Reads\",\n            \"type\": \"object\",\n            \"description\": \"Options for production of host-read removed FASTQ files for privacy reasons.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"hostremoval_input_fastq\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on per-library creation pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Create pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)\\n\"\n                },\n                \"hostremoval_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"remove\",\n                    \"description\": \"Host removal mode. Remove mapped reads completely from FASTQ (remove) or just mask mapped reads sequence by N (replace).\",\n                    \"fa_icon\": \"fas fa-mask\",\n                    \"help_text\": \"Read removal mode. Remove mapped reads completely (`'remove'`) or just replace mapped reads sequence by N (`'replace'`)\\n\\n> Modifies extract_map_reads.py parameter: `-m`\",\n                    \"enum\": [\n                        \"strip\",\n                        \"replace\",\n                        \"remove\"\n                    ]\n                }\n            },\n            \"fa_icon\": \"fas fa-user-shield\",\n            \"help_text\": \"These parameters are used for removing mapped reads from the original input\\nFASTQ files, usually in the context of uploading the original FASTQ files to a\\npublic read archive (NCBI SRA/EBI ENA/DDBJ SRA).\\n\\nThese flags will produce FASTQ files almost identical to your input files,\\nexcept that reads with the same read ID as one found in the mapped bam file, are\\neither removed or 'masked' (every base replaced with Ns).\\n\\nThis functionality allows you to provide other researchers who wish to re-use\\nyour data to apply their own adapter removal/read merging procedures, while\\nmaintaining anonymity for sample donors - for example with microbiome\\nresearch.\\n\\nIf using TSV input, stripping is performed library, i.e. after lane merging.\"\n        },\n        \"bam_filtering\": {\n            \"title\": \"BAM Filtering\",\n            \"type\": \"object\",\n            \"description\": \"Options for quality filtering and how to deal with off-target unmapped reads.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_bam_filtering\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on filtering of mapping quality, read lengths, or unmapped reads of BAM files.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turns on the bam filtering module for either mapping quality filtering or unmapped read treatment.\\n\"\n                },\n                \"bam_mapping_quality_threshold\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Minimum mapping quality for reads filter.\",\n                    \"fa_icon\": \"fas fa-greater-than-equal\",\n                    \"help_text\": \"Specify a mapping quality threshold for mapped reads to be kept for downstream analysis. By default keeps all reads and is therefore set to `0` (basically doesn't filter anything).\\n\\n> Modifies samtools view parameter: `-q`\",\n                    \"default\": 0\n                },\n                \"bam_filter_minreadlength\": {\n                    \"type\": \"integer\",\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"description\": \"Specify minimum read length to be kept after mapping.\",\n                    \"help_text\": \"Specify minimum length of mapped reads. This filtering will apply at the same time as mapping quality filtering.\\n\\nIf used _instead_ of minimum length read filtering at AdapterRemoval, this can be useful to get more realistic endogenous DNA percentages, when most of your reads are very short (e.g. in single-stranded libraries) and would otherwise be discarded by AdapterRemoval (thus making an artificially small denominator for a typical endogenous DNA calculation). Note in this context you should not perform mapping quality filtering nor discarding of unmapped reads to ensure a correct denominator of all reads, for the endogenous DNA calculation.\\n\\n> Modifies filter_bam_fragment_length.py parameter: `-l`\",\n                    \"default\": 0\n                },\n                \"bam_unmapped_type\": {\n                    \"type\": \"string\",\n                    \"default\": \"discard\",\n                    \"description\": \"Defines whether to discard all unmapped reads, keep only bam and/or keep only fastq format Options: 'discard', 'bam', 'fastq', 'both'.\",\n                    \"fa_icon\": \"fas fa-trash-alt\",\n                    \"help_text\": \"Defines how to proceed with unmapped reads: `'discard'` removes all unmapped reads, `keep` keeps both unmapped and mapped reads in the same BAM file, `'bam'` keeps unmapped reads as BAM file, `'fastq'` keeps unmapped reads as FastQ file, `both` keeps both BAM and FASTQ files. Default is `discard`.  `keep` is what would happen if `--run_bam_filtering` was _not_ supplied.\\n\\nNote that in all cases, if `--bam_mapping_quality_threshold` is also supplied, mapping quality filtering will still occur on the mapped reads.\\n\\n> Modifies samtools view parameter: `-f4 -F4`\",\n                    \"enum\": [\n                        \"discard\",\n                        \"keep\",\n                        \"bam\",\n                        \"fastq\",\n                        \"both\"\n                    ]\n                }\n            },\n            \"fa_icon\": \"fas fa-sort-amount-down\",\n            \"help_text\": \"Users can configure to keep/discard/extract certain groups of reads efficiently\\nin the nf-core/eager pipeline.\\n\\nIf using TSV input, filtering is performed library, i.e. after lane merging.\\n\\nThis module utilises `samtools view` and `filter_bam_fragment_length.py`\"\n        },\n        \"deduplication\": {\n            \"title\": \"DeDuplication\",\n            \"type\": \"object\",\n            \"description\": \"Options for removal of PCR amplicon duplicates that can artificially inflate coverage.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"dedupper\": {\n                    \"type\": \"string\",\n                    \"default\": \"markduplicates\",\n                    \"description\": \"Deduplication method to use. Options: 'markduplicates',  'dedup'.\",\n                    \"fa_icon\": \"fas fa-object-group\",\n                    \"help_text\": \"Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool `dedup` ([Peltzer et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered.\\n\\nThis utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should generally only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.\\n\",\n                    \"enum\": [\n                        \"markduplicates\",\n                        \"dedup\"\n                    ]\n                },\n                \"dedup_all_merged\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on treating all reads as merged reads.\",\n                    \"fa_icon\": \"fas fa-handshake\",\n                    \"help_text\": \"Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with `M_` in all cases. Therefore, this can be used as a workaround when also using a mixture of paired-end and single-end data, however this is not recommended (see above).\\n\\n> Modifies dedup parameter: `-m`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-clone\",\n            \"help_text\": \"If using TSV input, deduplication is performed per library, i.e. after lane merging.\"\n        },\n        \"library_complexity_analysis\": {\n            \"title\": \"Library Complexity Analysis\",\n            \"type\": \"object\",\n            \"description\": \"Options for calculating library complexity (i.e. how many unique reads are present).\",\n            \"default\": \"\",\n            \"properties\": {\n                \"preseq_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"c_curve\",\n                    \"description\": \"Specify which mode of preseq to run.\",\n                    \"fa_icon\": \"fas fa-toggle-on\",\n                    \"help_text\": \"Specify which mode of preseq to run.\\n\\nFrom the [PreSeq documentation](http://smithlabresearch.org/wp-content/uploads/manual.pdf): \\n\\n`c curve` is used to compute the expected complexity curve of a mapped read file with a hypergeometric\\nformula\\n\\n`lc extrap` is used to generate the expected yield for theoretical larger experiments and bounds on the\\nnumber of distinct reads in the library and the associated confidence intervals, which is computed by\\nbootstrapping the observed duplicate counts histogram\",\n                    \"enum\": [\n                        \"c_curve\",\n                        \"lc_extrap\"\n                    ]\n                },\n                \"preseq_step_size\": {\n                    \"type\": \"integer\",\n                    \"default\": 1000,\n                    \"description\": \"Specify the step size of Preseq.\",\n                    \"fa_icon\": \"fas fa-shoe-prints\",\n                    \"help_text\": \"Can be used to configure the step size of Preseq's `c_curve` and `lc_extrap` method. Can be useful when only few and thus shallow sequencing results are used for extrapolation.\\n\\n> Modifies preseq c_curve and lc_extrap parameter: `-s`\"\n                },\n                \"preseq_maxextrap\": {\n                    \"type\": \"integer\",\n                    \"default\": 10000000000,\n                    \"description\": \"Specify the maximum extrapolation (lc_extrap mode only)\",\n                    \"fa_icon\": \"fas fa-ban\",\n                    \"help_text\": \"Specify the maximum extrapolation that `lc_extrap` mode will perform.\\n\\n> Modifies preseq lc_extrap parameter: `-e`\"\n                },\n                \"preseq_terms\": {\n                    \"type\": \"integer\",\n                    \"default\": 100,\n                    \"description\": \"Specify the maximum number of terms for extrapolation (lc_extrap mode only)\",\n                    \"fa_icon\": \"fas fa-sort-numeric-up-alt\",\n                    \"help_text\": \"Specify the maximum number of terms that `lc_extrap` mode will use.\\n\\n> Modifies preseq lc_extrap parameter: `-x`\"\n                },\n                \"preseq_bootstrap\": {\n                    \"type\": \"integer\",\n                    \"default\": 100,\n                    \"description\": \"Specify number of bootstraps to perform (lc_extrap mode only)\",\n                    \"fa_icon\": \"fab fa-bootstrap\",\n                    \"help_text\": \"Specify the number of bootstraps `lc_extrap` mode will perform to calculate confidence intervals.\\n\\n> Modifies preseq lc_extrap parameter: `-n`\"\n                },\n                \"preseq_cval\": {\n                    \"type\": \"number\",\n                    \"default\": 0.95,\n                    \"description\": \"Specify confidence interval level (lc_extrap mode only)\",\n                    \"fa_icon\": \"fas fa-check-circle\",\n                    \"help_text\": \"Specify the allowed level of confidence intervals used for `lc_extrap` mode.\\n\\n> Modifies preseq lc_extrap parameter: `-c`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-bezier-curve\",\n            \"help_text\": \"nf-core/eager uses Preseq on mapped reads as one method to calculate library\\ncomplexity. If DeDup is used, Preseq uses the histogram output of DeDup,\\notherwise the sorted non-duplicated BAM file is supplied. Furthermore, if\\npaired-end read collapsing is not performed, the `-P` flag is used.\"\n        },\n        \"adna_damage_analysis\": {\n            \"title\": \"(aDNA) Damage Analysis\",\n            \"type\": \"object\",\n            \"description\": \"Options for calculating and filtering for characteristic ancient DNA damage patterns.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"damage_calculation_tool\": {\n                    \"type\": \"string\",\n                    \"default\": \"damageprofiler\",\n                    \"description\": \"Specify the tool to use for damage calculation.\",\n                    \"fa_icon\": \"fas fa-tools\",\n                    \"help_text\": \"Specify the tool to be used for damage calculation. DamageProfiler is generally faster than mapDamage2, but the latter has an option to limit the number of reads used. This can significantly speed up the processing of very large files, where the damage estimates are already accurate after processing only a fraction of the input. Options: `damageprofiler`, `mapdamage`. By default, DamageProfiler is used.\",\n                    \"enum\": [\n                        \"damageprofiler\",\n                        \"mapdamage\"\n                    ]\n                },\n                \"damageprofiler_length\": {\n                    \"type\": \"integer\",\n                    \"default\": 100,\n                    \"description\": \"Specify length filter for DamageProfiler.\",\n                    \"fa_icon\": \"fas fa-sort-amount-up\",\n                    \"help_text\": \"Specifies the length filter for DamageProfiler. By default set to `100`.\\n\\n> Modifies DamageProfile parameter: `-l`\"\n                },\n                \"damageprofiler_threshold\": {\n                    \"type\": \"integer\",\n                    \"default\": 15,\n                    \"description\": \"Specify number of bases of each read to consider for DamageProfiler calculations.\",\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"help_text\": \"Specifies the length of the read start and end to be considered for profile generation in DamageProfiler. By default set to `15` bases.\\n\\n> Modifies DamageProfile parameter: `-t`\"\n                },\n                \"damageprofiler_yaxis\": {\n                    \"type\": \"number\",\n                    \"default\": 0.3,\n                    \"description\": \"Specify the maximum misincorporation frequency that should be displayed on the damage plot. Set to 0 to 'autoscale'.\",\n                    \"fa_icon\": \"fas fa-ruler-vertical\",\n                    \"help_text\": \"Specifies what the maximum misincorporation frequency should be displayed as, in the DamageProfiler damage plot. This is set to `0.30` (i.e. 30%) by default as this matches the popular [mapDamage2.0](https://ginolhac.github.io/mapDamage) program. However, the default behaviour of DamageProfiler is to 'autoscale' the y-axis maximum to zoom in on any _possible_ damage that may occur (e.g. if the damage is about 10%, the highest value on the y-axis would be set to 0.12). This 'autoscale' behaviour can be turned on by specifying the number to `0`. Default: `0.30`.\\n\\n> Modifies DamageProfile parameter: `-yaxis_damageplot`\"\n                },\n                \"mapdamage_downsample\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"description\": \"Specify the maximum number of reads to consider for damage calculation. Defaults value is `0` (i.e. no downsampling is performed).\",\n                    \"fa_icon\": \"fas fa-greater-than-equal\",\n                    \"help_text\": \"The maximum number of reads used for damage calculation in mapDamage2. Can be used to significantly reduce the amount of time required for damage assessment. Note that a too low value can also obtain incorrect results.\\n\\n> Modifies mapDamage2 parameter: `-n`\"\n                },\n                \"mapdamage_yaxis\": {\n                    \"type\": \"number\",\n                    \"default\": 0.3,\n                    \"description\": \"Specify the maximum misincorporation frequency that should be displayed on the damage plot.\",\n                    \"fa_icon\": \"fas fa-ruler-vertical\",\n                    \"help_text\": \"Specifies what the maximum misincorporation frequency should be displayed as, in the mapDamage2 damage plot. This defaults to `0.30` (i.e. 30%).\\n\\n> Modifies mapDamage2 parameter: `-y`\"\n                },\n                \"run_pmdtools\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on PMDtools\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Specifies to run PMDTools for damage based read filtering and assessment of DNA damage in sequencing libraries. By default turned off.\\n\"\n                },\n                \"pmdtools_range\": {\n                    \"type\": \"integer\",\n                    \"default\": 10,\n                    \"description\": \"Specify range of bases for PMDTools to scan for damage.\",\n                    \"fa_icon\": \"fas fa-arrows-alt-h\",\n                    \"help_text\": \"Specifies the range in which to consider DNA damage from the ends of reads. By default set to `10`.\\n\\n> Modifies PMDTools parameter: `--range`\"\n                },\n                \"pmdtools_threshold\": {\n                    \"type\": \"integer\",\n                    \"default\": 3,\n                    \"description\": \"Specify PMDScore threshold for PMDTools.\",\n                    \"fa_icon\": \"fas fa-chart-bar\",\n                    \"help_text\": \"Specifies the PMDScore threshold to use in the pipeline when filtering BAM files for DNA damage. Only reads which surpass this damage score are considered for downstream DNA analysis. By default set to `3` if not set specifically by the user.\\n\\n> Modifies PMDTools parameter: `--threshold`\"\n                },\n                \"pmdtools_reference_mask\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify a bedfile to be used to mask the reference fasta prior to running pmdtools.\",\n                    \"fa_icon\": \"fas fa-mask\",\n                    \"help_text\": \"Activates masking of the reference fasta prior to running pmdtools. Positions that are in the provided bedfile will be replaced by Ns in the reference genome. This is useful for capture data, where you might not want the allele of a SNP to be counted as damage when it is a transition. Masking of the reference is done using `bedtools maskfasta`.\"\n                },\n                \"pmdtools_max_reads\": {\n                    \"type\": \"integer\",\n                    \"default\": 10000,\n                    \"description\": \"Specify the maximum number of reads to consider for metrics generation.\",\n                    \"fa_icon\": \"fas fa-greater-than-equal\",\n                    \"help_text\": \"The maximum number of reads used for damage assessment in PMDtools. Can be used to significantly reduce the amount of time required for damage assessment in PMDTools. Note that a too low value can also obtain incorrect results.\\n\\n> Modifies PMDTools parameter: `-n`\"\n                },\n                \"pmdtools_platypus\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Append big list of base frequencies for platypus to output.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Enables the printing of a wider list of base frequencies used by platypus as an addition to the output base misincorporation frequency table. By default turned off.\\n\"\n                },\n                \"run_mapdamage_rescaling\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-map\",\n                    \"description\": \"Turn on damage rescaling of BAM files using mapDamage2 to probabilistically remove damage.\",\n                    \"help_text\": \"Turns on mapDamage2's BAM rescaling functionality. This probablistically replaces Ts back to Cs depending on the likelihood this reference-mismatch was originally caused by damage. If the library is specified to be single stranded, this will automatically use the `--single-stranded` mode.\\n\\nThis functionality does not have any MultiQC output.\\n\\n:warning: rescaled libraries will not be merged with non-scaled libraries of the same sample for downstream genotyping, as the model may be different for each library. If you wish to merge these, please do this manually and re-run nf-core/eager using the merged BAMs as input. \\n\\n> Modifies the `--rescale` parameter of mapDamage2\"\n                },\n                \"rescale_seqlength\": {\n                    \"type\": \"integer\",\n                    \"default\": 12,\n                    \"fa_icon\": \"fas fa-ruler-horizontal\",\n                    \"description\": \"Length of read sequence to use from each side for rescaling. Can be overridden by --rescale_length_*p.\",\n                    \"help_text\": \"Specify the length from the end of the read that mapDamage should rescale at both ends.\\n\\n> Modifies the `--seq-length` parameter of mapDamage2.\"\n                },\n                \"rescale_length_5p\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-balance-scale-right\",\n                    \"description\": \"Length of read for mapDamage2 to rescale from 5p end.  Only used if not 0 otherwise --rescale_seqlength used.\",\n                    \"help_text\": \"Specify the length from the end of the read that mapDamage should rescale. Overrides `--rescale_seqlength`.\\n\\n> Modifies the `--rescale-length-5p` parameter of mapDamage2.\"\n                },\n                \"rescale_length_3p\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-balance-scale-left\",\n                    \"description\": \"Length of read for mapDamage2 to rescale from 3p end. Only used if not 0 otherwise --rescale_seqlength used..\",\n                    \"help_text\": \"Specify the length from the end of the read that mapDamage should rescale.\\n\\n> Modifies the `--rescale-length-3p` parameter of mapDamage2.\"\n                }\n            },\n            \"fa_icon\": \"fas fa-chart-line\",\n            \"help_text\": \"More documentation can be seen in the follow links for:\\n\\n- [DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)\\n- [PMDTools documentation](https://github.com/pontussk/PMDtools)\\n\\nIf using TSV input, DamageProfiler is performed per library, i.e. after lane\\nmerging. PMDtools and  BAM Trimming is run after library merging of same-named\\nlibrary BAMs that have the same type of UDG treatment. BAM Trimming is only\\nperformed on non-UDG and half-UDG treated data.\\n\"\n        },\n        \"feature_annotation_statistics\": {\n            \"title\": \"Feature Annotation Statistics\",\n            \"type\": \"object\",\n            \"description\": \"Options for getting reference annotation statistics (e.g. gene coverages)\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_bedtools_coverage\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.\",\n                    \"fa_icon\": \"fas fa-chart-area\",\n                    \"help_text\": \"Specifies to turn on the bedtools module, producing statistics for breadth (or percent coverage), and depth (or X fold) coverages.\\n\"\n                },\n                \"anno_file\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to GFF or BED file containing positions of features in reference file (--fasta). Path should be enclosed in quotes.\",\n                    \"fa_icon\": \"fas fa-file-signature\",\n                    \"help_text\": \"Specify the path to a GFF/BED containing the feature coordinates (or any acceptable input for [`bedtools coverage`](https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html)). Must be in quotes.\\n\"\n                },\n                \"anno_file_is_unsorted\": {\n                    \"type\": \"boolean\",\n                    \"fa_icon\": \"fas fa-random\",\n                    \"description\": \"Specify if the annotation file provided to --anno_file is not sorted in the same way as the reference fasta file.\",\n                    \"help_text\": \"In cases where the annotation file is NOT sorted the same way as the reference fasta, this option should be specified. This will significantly increase the memory usage of bedtools!\\n\\n> Modifies bedtools parameter: `-sorted`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-scroll\",\n            \"help_text\": \"If you're interested in looking at coverage stats for certain features on your\\nreference such as genes, SNPs etc., you can use the following bedtools module\\nfor this purpose.\\n\\nMore documentation on bedtools can be seen in the [bedtools\\ndocumentation](https://bedtools.readthedocs.io/en/latest/)\\n\\nIf using TSV input, bedtools is run after library merging of same-named library\\nBAMs that have the same type of UDG treatment.\\n\"\n        },\n        \"bam_trimming\": {\n            \"title\": \"BAM Trimming\",\n            \"type\": \"object\",\n            \"description\": \"Options for trimming of aligned reads (e.g. to remove damage prior genotyping).\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_trim_bam\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on BAM trimming. Will only run on non-UDG or half-UDG libraries\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turns on the BAM trimming method. Trims off `[n]` bases from reads in the deduplicated BAM file. Damage assessment in PMDTools or DamageProfiler remains untouched, as data is routed through this independently. BAM trimming is typically performed to reduce errors during genotyping that can be caused by aDNA damage.\\n\\nBAM trimming will only be performed on libraries indicated as `--udg_type 'none'` or `--udg_type 'half'`. Complete UDG treatment ('full') should have removed all damage. The amount of bases that will be trimmed off can be set separately for libraries with `--udg_type` `'none'` and `'half'`  (see `--bamutils_clip_half_udg_left` / `--bamutils_clip_half_udg_right` / `--bamutils_clip_none_udg_left` / `--bamutils_clip_none_udg_right`).\\n\\n> Note: additional artefacts such as bar-codes or adapters that could potentially also be trimmed should be removed prior mapping.\"\n                },\n                \"bamutils_clip_double_stranded_half_udg_left\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler-combined\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'left' end of read for double-stranded half-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to `half`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_double_stranded_half_udg_right\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'right' end of read for double-stranded half-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to `half`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_double_stranded_none_udg_left\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler-combined\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'left' end of read for double-stranded non-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to `none`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_double_stranded_none_udg_right\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'right' end of read for double-stranded non-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to `none`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_single_stranded_half_udg_left\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler-combined\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'left' end of read for single-stranded half-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to `half`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_single_stranded_half_udg_right\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'right' end of read for single-stranded half-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to `half`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_single_stranded_none_udg_left\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler-combined\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'left' end of read for single-stranded non-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to `none`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_clip_single_stranded_none_udg_right\": {\n                    \"type\": \"integer\",\n                    \"default\": 0,\n                    \"fa_icon\": \"fas fa-ruler\",\n                    \"description\": \"Specify the number of bases to clip off reads from 'right' end of read for single-stranded non-UDG libraries.\",\n                    \"help_text\": \"Default set to `0` and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to `none`. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).\\n\\n> Modifies bamUtil's trimBam parameter: `-L -R`\"\n                },\n                \"bamutils_softclip\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on using softclip instead of hard masking.\",\n                    \"fa_icon\": \"fas fa-paint-roller\",\n                    \"help_text\": \"By default, nf-core/eager uses hard clipping and sets clipped bases to `N` with quality `!` in the BAM output. Turn this on to use soft-clipping instead, masking reads at the read ends respectively using the CIGAR string.\\n\\n> Modifies bam trimBam parameter: `-c`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-eraser\",\n            \"help_text\": \"For some library preparation protocols, users might want to clip off damaged\\nbases before applying genotyping methods. This can be done in nf-core/eager\\nautomatically by turning on the `--run_trim_bam` parameter.\\n\\nMore documentation can be seen in the [bamUtil\\ndocumentation](https://genome.sph.umich.edu/wiki/BamUtil:_trimBam)\\n\"\n        },\n        \"genotyping\": {\n            \"title\": \"Genotyping\",\n            \"type\": \"object\",\n            \"description\": \"Options for variant calling.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_genotyping\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on genotyping of BAM files.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turns on genotyping to run on all post-dedup and downstream BAMs. For example if `--run_pmdtools` and `--trim_bam` are both supplied, the genotyper will be run on all three BAM files i.e. post-deduplication, post-pmd and post-trimmed BAM files.\"\n                },\n                \"genotyping_tool\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.\",\n                    \"fa_icon\": \"fas fa-tools\",\n                    \"help_text\": \"Specifies which genotyper to use. Current options are: GATK (v3.5) UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller. Specify 'ug', 'hc', 'freebayes', 'pileupcaller' and 'angsd' respectively.\\n\\n> > Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does _de novo_ assembly around each variant site), be aware GATK 3.5 it is officially deprecated by the Broad Institute.\",\n                    \"enum\": [\n                        \"ug\",\n                        \"hc\",\n                        \"freebayes\",\n                        \"pileupcaller\",\n                        \"angsd\"\n                    ]\n                },\n                \"genotyping_source\": {\n                    \"type\": \"string\",\n                    \"default\": \"raw\",\n                    \"description\": \"Specify which input BAM to use for genotyping. Options: 'raw', 'trimmed', 'pmd' or 'rescaled'.\",\n                    \"fa_icon\": \"fas fa-faucet\",\n                    \"help_text\": \"Indicates which BAM file to use for genotyping, depending on what BAM processing modules you have turned on. Options are: `'raw'` for mapped only, filtered, or DeDup BAMs (with priority right to left); `'trimmed'` (for base clipped BAMs); `'pmd'` (for pmdtools output); `'rescaled'` (for mapDamage2 rescaling output). Default is: `'raw'`.\\n\",\n                    \"enum\": [\n                        \"raw\",\n                        \"pmd\",\n                        \"trimmed\",\n                        \"rescaled\"\n                    ]\n                },\n                \"gatk_call_conf\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"Specify GATK phred-scaled confidence threshold.\",\n                    \"fa_icon\": \"fas fa-balance-scale-right\",\n                    \"help_text\": \"If selected, specify a GATK genotyper phred-scaled confidence threshold of a given SNP/INDEL call. Default: `30`\\n\\n> Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: `-stand_call_conf`\"\n                },\n                \"gatk_ploidy\": {\n                    \"type\": \"integer\",\n                    \"default\": 2,\n                    \"description\": \"Specify GATK organism ploidy.\",\n                    \"fa_icon\": \"fas fa-pastafarianism\",\n                    \"help_text\": \"If selected, specify a GATK genotyper ploidy value of your reference organism. E.g. if you want to allow heterozygous calls from >= diploid organisms. Default: `2`\\n\\n> Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: `--sample-ploidy`\"\n                },\n                \"gatk_downsample\": {\n                    \"type\": \"integer\",\n                    \"default\": 250,\n                    \"description\": \"Maximum depth coverage allowed for genotyping before down-sampling is turned on.\",\n                    \"fa_icon\": \"fas fa-icicles\",\n                    \"help_text\": \"Maximum depth coverage allowed for genotyping before down-sampling is turned on. Any position with a coverage higher than this value will be randomly down-sampled to 250 reads. Default: `250`\\n\\n> Modifies GATK UnifiedGenotyper parameter: `-dcov`\"\n                },\n                \"gatk_dbsnp\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.\",\n                    \"fa_icon\": \"fas fa-marker\",\n                    \"help_text\": \"(Optional) Specify VCF file for output VCF SNP annotation e.g. if you want to annotate your VCF file with 'rs' SNP IDs. Check GATK documentation for more information. Gzip not accepted.\\n\"\n                },\n                \"gatk_hc_out_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"EMIT_VARIANTS_ONLY\",\n                    \"description\": \"Specify GATK output mode. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_ACTIVE_SITES'.\",\n                    \"fa_icon\": \"fas fa-bullhorn\",\n                    \"help_text\": \"If the GATK genotyper HaplotypeCaller is selected, what type of VCF to create, i.e. produce calls for every site or just confidence sites. Options: `'EMIT_VARIANTS_ONLY'`, `'EMIT_ALL_CONFIDENT_SITES'`, `'EMIT_ALL_ACTIVE_SITES'`. Default: `'EMIT_VARIANTS_ONLY'`\\n\\n> Modifies GATK HaplotypeCaller parameter: `-output_mode`\",\n                    \"enum\": [\n                        \"EMIT_ALL_ACTIVE_SITES\",\n                        \"EMIT_ALL_CONFIDENT_SITES\",\n                        \"EMIT_VARIANTS_ONLY\"\n                    ]\n                },\n                \"gatk_hc_emitrefconf\": {\n                    \"type\": \"string\",\n                    \"default\": \"GVCF\",\n                    \"description\": \"Specify HaplotypeCaller mode for emitting reference confidence calls . Options: 'NONE', 'BP_RESOLUTION', 'GVCF'.\",\n                    \"fa_icon\": \"fas fa-bullhorn\",\n                    \"help_text\": \"If the GATK HaplotypeCaller is selected, mode for emitting reference confidence calls. Options: `'NONE'`, `'BP_RESOLUTION'`, `'GVCF'`. Default: `'GVCF'`\\n\\n> Modifies GATK HaplotypeCaller parameter: `--emit-ref-confidence`\\n\",\n                    \"enum\": [\n                        \"NONE\",\n                        \"GVCF\",\n                        \"BP_RESOLUTION\"\n                    ]\n                },\n                \"gatk_ug_out_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"EMIT_VARIANTS_ONLY\",\n                    \"description\": \"Specify GATK output mode. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_SITES'.\",\n                    \"fa_icon\": \"fas fa-bullhorn\",\n                    \"help_text\": \"If the GATK UnifiedGenotyper is selected, what type of VCF to create, i.e. produce calls for every site or just confidence sites. Options: `'EMIT_VARIANTS_ONLY'`, `'EMIT_ALL_CONFIDENT_SITES'`, `'EMIT_ALL_SITES'`. Default: `'EMIT_VARIANTS_ONLY'`\\n\\n> Modifies GATK UnifiedGenotyper parameter: `--output_mode`\",\n                    \"enum\": [\n                        \"EMIT_ALL_SITES\",\n                        \"EMIT_ALL_CONFIDENT_SITES\",\n                        \"EMIT_VARIANTS_ONLY\"\n                    ]\n                },\n                \"gatk_ug_genotype_model\": {\n                    \"type\": \"string\",\n                    \"default\": \"SNP\",\n                    \"description\": \"Specify UnifiedGenotyper likelihood model. Options: 'SNP', 'INDEL', 'BOTH', 'GENERALPLOIDYSNP', 'GENERALPLOIDYINDEL'.\",\n                    \"fa_icon\": \"fas fa-project-diagram\",\n                    \"help_text\": \"If the GATK UnifiedGenotyper is selected, which likelihood model to follow, i.e. whether to call use SNPs or INDELS etc. Options: `'SNP'`, `'INDEL'`, `'BOTH'`, `'GENERALPLOIDYSNP'`, `'GENERALPLOIDYINDEL`'. Default: `'SNP'`\\n\\n> Modifies GATK UnifiedGenotyper parameter: `--genotype_likelihoods_model`\",\n                    \"enum\": [\n                        \"SNP\",\n                        \"INDEL\",\n                        \"BOTH\",\n                        \"GENERALPLOIDYSNP\",\n                        \"GENERALPLOIDYINDEL\"\n                    ]\n                },\n                \"gatk_ug_keep_realign_bam\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.\",\n                    \"fa_icon\": \"fas fa-align-left\",\n                    \"help_text\": \"If provided when running GATK's UnifiedGenotyper, this will put into the output folder the BAMs that have realigned reads (with GATK's (v3) IndelRealigner) around possible variants for improved genotyping.\\n\\nThese BAMs will be stored in the same folder as the corresponding VCF files.\"\n                },\n                \"gatk_ug_defaultbasequalities\": {\n                    \"type\": \"string\",\n                    \"description\": \"Supply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.\",\n                    \"fa_icon\": \"fas fa-undo-alt\",\n                    \"help_text\": \"When running GATK's UnifiedGenotyper,  specify a value to set base quality scores, if reads are missing this information. Might be useful if you have 'synthetically' generated reads (e.g. chopping up a reference genome). Default is set to -1  which is to not set any default quality (turned off). Default: `-1`\\n\\n> Modifies GATK UnifiedGenotyper parameter: `--defaultBaseQualities`\"\n                },\n                \"freebayes_C\": {\n                    \"type\": \"integer\",\n                    \"default\": 1,\n                    \"description\": \"Specify minimum required supporting observations to consider a variant.\",\n                    \"fa_icon\": \"fas fa-align-center\",\n                    \"help_text\": \"Specify minimum required supporting observations to consider a variant. Default: `1`\\n\\n> Modifies freebayes parameter: `-C`\"\n                },\n                \"freebayes_g\": {\n                    \"type\": \"integer\",\n                    \"description\": \"Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in --freebayes_C.\",\n                    \"fa_icon\": \"fab fa-think-peaks\",\n                    \"help_text\": \"Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified C. Not set by default.\\n\\n> Modifies freebayes parameter: `-g`\",\n                    \"default\": 0\n                },\n                \"freebayes_p\": {\n                    \"type\": \"integer\",\n                    \"default\": 2,\n                    \"description\": \"Specify ploidy of sample in FreeBayes.\",\n                    \"fa_icon\": \"fas fa-pastafarianism\",\n                    \"help_text\": \"Specify ploidy of sample in FreeBayes. Default is diploid. Default: `2`\\n\\n> Modifies freebayes parameter: `-p`\"\n                },\n                \"pileupcaller_bedfile\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify path to SNP panel in bed format for pileupCaller.\",\n                    \"fa_icon\": \"fas fa-bed\",\n                    \"help_text\": \"Specify a SNP panel in the form of a bed file of sites at which to generate pileup for pileupCaller.\\n\"\n                },\n                \"pileupcaller_snpfile\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify path to SNP panel in EIGENSTRAT format for pileupCaller.\",\n                    \"fa_icon\": \"fas fa-sliders-h\",\n                    \"help_text\": \"Specify a SNP panel in [EIGENSTRAT](https://github.com/DReichLab/EIG/tree/master/CONVERTF) format, pileupCaller will call these sites.\\n\"\n                },\n                \"pileupcaller_method\": {\n                    \"type\": \"string\",\n                    \"default\": \"randomHaploid\",\n                    \"description\": \"Specify calling method to use. Options: 'randomHaploid', 'randomDiploid', 'majorityCall'.\",\n                    \"fa_icon\": \"fas fa-toolbox\",\n                    \"help_text\": \"Specify calling method to use. Options: randomHaploid, randomDiploid, majorityCall. Default: `'randomHaploid'`\\n\\n> Modifies pileupCaller parameter: `--randomHaploid --randomDiploid --majorityCall`\",\n                    \"enum\": [\n                        \"randomHaploid\",\n                        \"randomDiploid\",\n                        \"majorityCall\"\n                    ]\n                },\n                \"pileupcaller_transitions_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"AllSites\",\n                    \"description\": \"Specify the calling mode for transitions. Options: 'AllSites', 'TransitionsMissing', 'SkipTransitions'.\",\n                    \"fa_icon\": \"fas fa-toggle-on\",\n                    \"help_text\": \"Specify if genotypes of transition SNPs should be called, set to missing, or excluded from the genotypes respectively. Options: `'AllSites'`, `'TransitionsMissing'`, `'SkipTransitions'`. Default: `'AllSites'`\\n\\n> Modifies pileupCaller parameter: `--skipTransitions --transitionsMissing`\",\n                    \"enum\": [\n                        \"AllSites\",\n                        \"TransitionsMissing\",\n                        \"SkipTransitions\"\n                    ]\n                },\n                \"pileupcaller_min_map_quality\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"The minimum mapping quality to be used for genotyping.\",\n                    \"fa_icon\": \"fas fa-filter\",\n                    \"help_text\": \"The minimum mapping quality to be used for genotyping. Affects the `samtools pileup` output that is used by pileupcaller. Affects `-q` parameter of samtools mpileup.\"\n                },\n                \"pileupcaller_min_base_quality\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"The minimum base quality to be used for genotyping.\",\n                    \"fa_icon\": \"fas fa-filter\",\n                    \"help_text\": \"The minimum base quality to be used for genotyping. Affects the `samtools pileup` output that is used by pileupcaller. Affects `-Q` parameter of samtools mpileup.\"\n                },\n                \"angsd_glmodel\": {\n                    \"type\": \"string\",\n                    \"default\": \"samtools\",\n                    \"description\": \"Specify which ANGSD genotyping likelihood model to use. Options: 'samtools', 'gatk', 'soapsnp', 'syk'.\",\n                    \"fa_icon\": \"fas fa-project-diagram\",\n                    \"help_text\": \"Specify which genotype likelihood model to use. Options: `'samtools`, `'gatk'`, `'soapsnp'`, `'syk'`. Default: `'samtools'`\\n\\n> Modifies ANGSD parameter: `-GL`\",\n                    \"enum\": [\n                        \"samtools\",\n                        \"gatk\",\n                        \"soapsnp\",\n                        \"syk\"\n                    ]\n                },\n                \"angsd_glformat\": {\n                    \"type\": \"string\",\n                    \"default\": \"binary\",\n                    \"description\": \"Specify which output type to output ANGSD genotyping likelihood results: Options: 'text', 'binary', 'binary_three', 'beagle'.\",\n                    \"fa_icon\": \"fas fa-text-height\",\n                    \"help_text\": \"Specifies what type of genotyping likelihood file format will be output. Options: `'text'`, `'binary'`, `'binary_three'`, `'beagle_binary'`. Default: `'text'`.\\n\\nThe options refer to the following descriptions respectively:\\n\\n- `text`: textoutput of all 10 log genotype likelihoods.\\n- `binary`: binary all 10 log genotype likelihood\\n- `binary_three`: binary 3 times likelihood\\n- `beagle_binary`: beagle likelihood file\\n\\nSee the [ANGSD documentation](http://www.popgen.dk/angsd/) for more information on which to select for your downstream applications.\\n\\n> Modifies ANGSD parameter: `-doGlF`\",\n                    \"enum\": [\n                        \"text\",\n                        \"binary\",\n                        \"binary_three\",\n                        \"beagle\"\n                    ]\n                },\n                \"angsd_createfasta\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on creation of FASTA from ANGSD genotyping likelihood.\",\n                    \"fa_icon\": \"fas fa-align-justify\",\n                    \"help_text\": \"Turns on the ANGSD creation of a FASTA file from the BAM file.\\n\"\n                },\n                \"angsd_fastamethod\": {\n                    \"type\": \"string\",\n                    \"default\": \"random\",\n                    \"description\": \"Specify which genotype type of 'base calling' to use for ANGSD FASTA generation. Options: 'random', 'common'.\",\n                    \"fa_icon\": \"fas fa-toolbox\",\n                    \"help_text\": \"The type of base calling to be performed when creating the ANGSD FASTA file. Options: `'random'` or `'common'`. Will output the most common non-N base at each given position, whereas 'random' will pick one at random. Default: `'random'`.\\n\\n> Modifies ANGSD parameter: `-doFasta -doCounts`\",\n                    \"enum\": [\n                        \"random\",\n                        \"common\"\n                    ]\n                },\n                \"run_bcftools_stats\": {\n                    \"type\": \"boolean\",\n                    \"default\": true,\n                    \"description\": \"Turn on bcftools stats generation for VCF based variant calling statistics\",\n                    \"help_text\": \"Runs `bcftools stats` against VCF files from GATK and FreeBayes genotypers.\\n\\nIt will automatically include the FASTA reference for INDEL-related statistics.\",\n                    \"fa_icon\": \"far fa-chart-bar\"\n                }\n            },\n            \"fa_icon\": \"fas fa-sliders-h\",\n            \"help_text\": \"There are options for different genotypers (or genotype likelihood calculators)\\nto be used. We suggest you read the documentation of each tool to find the ones that\\nsuit your needs.\\n\\nDocumentation for each tool:\\n\\n- [GATK\\n  UnifiedGenotyper](https://software.broadinstitute.org/gatk/documentation/tooldocs/3.5-0/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php)\\n- [GATK\\n  HaplotypeCaller](https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php)\\n- [FreeBayes](https://github.com/ekg/freebayes)\\n- [ANGSD](http://www.popgen.dk/angsd/index.php/Genotype_Likelihoods)\\n- [sequenceTools pileupCaller](https://github.com/stschiff/sequenceTools)\\n\\nIf using TSV input, genotyping is performed per sample (i.e. after all types of\\nlibraries are merged), except for pileupCaller which gathers all double-stranded and\\nsingle-stranded (same-type merged) libraries respectively.\"\n        },\n        \"consensus_sequence_generation\": {\n            \"title\": \"Consensus Sequence Generation\",\n            \"type\": \"object\",\n            \"description\": \"Options for creation of a per-sample FASTA sequence useful for downstream analysis (e.g. multi sequence alignment)\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_vcf2genome\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turns on ability to create a consensus sequence FASTA file based on a UnifiedGenotyper VCF file and the original reference (only considers SNPs).\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turn on consensus sequence genome creation via VCF2Genome. Only accepts GATK UnifiedGenotyper VCF files with the `--gatk_ug_out_mode 'EMIT_ALL_SITES'` and `--gatk_ug_genotype_model 'SNP` flags. Typically useful for small genomes such as mitochondria.\\n\"\n                },\n                \"vcf2genome_outfile\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify the name of the output FASTA file containing the consensus sequence.\",\n                    \"fa_icon\": \"fas fa-file-alt\",\n                    \"help_text\": \"The output FASTA file will be named `<sample_name>_<vcf2genome_outfile>.fasta`.\\n\"\n                },\n                \"vcf2genome_header\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify the header name of the consensus sequence entry within the FASTA file.\",\n                    \"fa_icon\": \"fas fa-heading\",\n                    \"help_text\": \"The name of the FASTA entry you would like in your FASTA file.\\n\"\n                },\n                \"vcf2genome_minc\": {\n                    \"type\": \"integer\",\n                    \"default\": 5,\n                    \"description\": \"Minimum depth coverage required for a call to be included (else N will be called).\",\n                    \"fa_icon\": \"fas fa-sort-amount-up\",\n                    \"help_text\": \"Minimum depth coverage for a SNP to be made. Else, a SNP will be called as N. Default: `5`\\n\\n> Modifies VCF2Genome parameter: `-minc`\"\n                },\n                \"vcf2genome_minq\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"Minimum genotyping quality of a call to be called. Else N will be called.\",\n                    \"fa_icon\": \"fas fa-medal\",\n                    \"help_text\": \"Minimum genotyping quality of a call to be made. Else N will be called. Default: `30`\\n\\n> Modifies VCF2Genome parameter: `-minq`\"\n                },\n                \"vcf2genome_minfreq\": {\n                    \"type\": \"number\",\n                    \"default\": 0.8,\n                    \"description\": \"Minimum fraction of reads supporting a call to be included. Else N will be called.\",\n                    \"fa_icon\": \"fas fa-percent\",\n                    \"help_text\": \"In the case of two possible alleles, the frequency of the majority allele required for a call to be made. Else, a SNP will be called as N. Default: `0.8`\\n\\n> Modifies VCF2Genome parameter: `-minfreq`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-handshake\",\n            \"help_text\": \"If using TSV input, consensus generation is performed per sample (i.e. after all\\ntypes of libraries are merged).\"\n        },\n        \"snp_table_generation\": {\n            \"title\": \"SNP Table Generation\",\n            \"type\": \"object\",\n            \"description\": \"Options for creation of a SNP table useful for downstream analysis (e.g. estimation of cross-mapping of different species and multi-sequence alignment)\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_multivcfanalyzer\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on MultiVCFAnalyzer. Note: This currently only supports diploid GATK UnifiedGenotyper input.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turns on MultiVCFAnalyzer. Will only work when in combination with UnifiedGenotyper genotyping module.\\n\"\n                },\n                \"write_allele_frequencies\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on writing write allele frequencies in the SNP table.\",\n                    \"fa_icon\": \"fas fa-pen\",\n                    \"help_text\": \"Specify whether to tell MultiVCFAnalyzer to write within the SNP table the frequencies of the allele at that position e.g. A (70%).\\n\"\n                },\n                \"min_genotype_quality\": {\n                    \"type\": \"integer\",\n                    \"default\": 30,\n                    \"description\": \"Specify the minimum genotyping quality threshold for a SNP to be called.\",\n                    \"fa_icon\": \"fas fa-medal\",\n                    \"help_text\": \"The minimal genotyping quality for a SNP to be considered for processing by MultiVCFAnalyzer. The default threshold is `30`.\\n\"\n                },\n                \"min_base_coverage\": {\n                    \"type\": \"integer\",\n                    \"default\": 5,\n                    \"description\": \"Specify the minimum number of reads a position needs to be covered to be considered for base calling.\",\n                    \"fa_icon\": \"fas fa-sort-amount-up\",\n                    \"help_text\": \"The minimal number of reads covering a base for a SNP at that position to be considered for processing by MultiVCFAnalyzer. The default depth is `5`.\\n\"\n                },\n                \"min_allele_freq_hom\": {\n                    \"type\": \"number\",\n                    \"default\": 0.9,\n                    \"description\": \"Specify the minimum allele frequency that a base requires to be considered a 'homozygous' call.\",\n                    \"fa_icon\": \"fas fa-percent\",\n                    \"help_text\": \"The minimal frequency of a nucleotide for a 'homozygous' SNP to be called. In other words, e.g. 90% of the reads covering that position must have that SNP to be called. If the threshold is not reached, and the previous two parameters are matched, a reference call is made (displayed as . in the SNP table). If the above two parameters are not met, an 'N' is called. The default allele frequency is `0.9`.\\n\"\n                },\n                \"min_allele_freq_het\": {\n                    \"type\": \"number\",\n                    \"default\": 0.9,\n                    \"description\": \"Specify the minimum allele frequency that a base requires to be considered a 'heterozygous' call.\",\n                    \"fa_icon\": \"fas fa-percent\",\n                    \"help_text\": \"The minimum frequency of a nucleotide for a 'heterozygous' SNP to be called. If\\nthis parameter is set to the same as `--min_allele_freq_hom`, then only\\nhomozygous calls are made. If this value is less than the previous parameter,\\nthen a SNP call will be made. If it is between this and the previous parameter,\\nit will be displayed as a IUPAC uncertainty call. Default is `0.9`.\"\n                },\n                \"additional_vcf_files\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify paths to additional pre-made VCF files to be included in the SNP table generation. Use wildcard(s) for multiple files.\",\n                    \"fa_icon\": \"fas fa-copy\",\n                    \"help_text\": \"If you wish to add to the table previously created VCF files, specify here a path with wildcards (in quotes). These VCF files must be created the same way as your settings for [GATK UnifiedGenotyping](#genotyping-parameters) module above.\"\n                },\n                \"reference_gff_annotations\": {\n                    \"type\": \"string\",\n                    \"default\": \"NA\",\n                    \"description\": \"Specify path to the reference genome annotations in '.gff' format. Optional.\",\n                    \"fa_icon\": \"fas fa-file-signature\",\n                    \"help_text\": \"If you wish to report in the SNP table annotation information for the regions\\nSNPs fall in, provide a file in GFF format (the path must be in quotes).\\n\"\n                },\n                \"reference_gff_exclude\": {\n                    \"type\": \"string\",\n                    \"default\": \"NA\",\n                    \"description\": \"Specify path to the positions to be excluded in '.gff' format. Optional.\",\n                    \"fa_icon\": \"fas fa-times\",\n                    \"help_text\": \"If you wish to exclude SNP regions from consideration by MultiVCFAnalyzer (such as for problematic regions), provide a file in GFF format (the path must be in quotes).\\n\"\n                },\n                \"snp_eff_results\": {\n                    \"type\": \"string\",\n                    \"default\": \"NA\",\n                    \"description\": \"Specify path to the output file from SNP effect analysis in '.txt' format. Optional.\",\n                    \"fa_icon\": \"fas fa-magic\",\n                    \"help_text\": \"If you wish to include results from SNPEff effect analysis, supply the output\\nfrom SNPEff in txt format (the path must be in quotes).\"\n                }\n            },\n            \"fa_icon\": \"fas fa-table\",\n            \"help_text\": \"SNP Table Generation here is performed by MultiVCFAnalyzer. The current version\\nof MultiVCFAnalyzer version only accepts GATK UnifiedGenotyper 3.5 VCF files,\\nand when the ploidy was set to 2 (this allows MultiVCFAnalyzer to report\\nfrequencies of polymorphic positions). A description of how the tool works can\\nbe seen in the Supplementary Information of [Bos et al.\\n(2014)](https://doi.org/10.1038/nature13591) under \\\"SNP Calling and Phylogenetic\\nAnalysis\\\".\\n\\nMore can be seen in the [MultiVCFAnalyzer\\ndocumentation](https://github.com/alexherbig/MultiVCFAnalyzer).\\n\\nIf using TSV input, MultiVCFAnalyzer is performed on all samples gathered\\ntogether.\"\n        },\n        \"mitochondrial_to_nuclear_ratio\": {\n            \"title\": \"Mitochondrial to Nuclear Ratio\",\n            \"type\": \"object\",\n            \"description\": \"Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_mtnucratio\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on mitochondrial to nuclear ratio calculation.\",\n                    \"fa_icon\": \"fas fa-balance-scale-left\",\n                    \"help_text\": \"Turn on the module to estimate the ratio of mitochondrial to nuclear reads.\\n\"\n                },\n                \"mtnucratio_header\": {\n                    \"type\": \"string\",\n                    \"default\": \"MT\",\n                    \"description\": \"Specify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).\",\n                    \"fa_icon\": \"fas fa-heading\",\n                    \"help_text\": \"Specify the FASTA entry in the reference file specified as `--fasta`, which acts\\nas the mitochondrial 'chromosome' to base the ratio calculation on. The tool\\nonly accepts the first section of the header before the first space. The default\\nchromosome name is based on hs37d5/GrCH37 human reference genome. Default: 'MT'\"\n                }\n            },\n            \"fa_icon\": \"fas fa-balance-scale-left\",\n            \"help_text\": \"If using TSV input, Mitochondrial to Nuclear Ratio calculation is calculated per\\ndeduplicated library (after lane merging)\"\n        },\n        \"human_sex_determination\": {\n            \"title\": \"Human Sex Determination\",\n            \"type\": \"object\",\n            \"description\": \"Options for the calculation of biological sex of human individuals.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_sexdeterrmine\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on sex determination for human reference genomes. This will run on single- and double-stranded variants of a library separately.\",\n                    \"fa_icon\": \"fas fa-transgender-alt\",\n                    \"help_text\": \"Specify to run the optional process of sex determination.\\n\"\n                },\n                \"sexdeterrmine_bedfile\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify path to SNP panel in bed format for error bar calculation. Optional (see documentation).\",\n                    \"fa_icon\": \"fas fa-bed\",\n                    \"help_text\": \"Specify an optional bedfile of the list of SNPs to be used for X-/Y-rate calculation. Running without this parameter will considerably increase runtime, and render the resulting error bars untrustworthy. Theoretically, any set of SNPs that are distant enough that two SNPs are unlikely to be covered by the same read can be used here. The programme was coded with the 1240K panel in mind. The path must be in quotes.\"\n                }\n            },\n            \"fa_icon\": \"fas fa-transgender\",\n            \"help_text\": \"An optional process for human DNA. It can be used to calculate the relative\\ncoverage of X and Y chromosomes compared to the autosomes (X-/Y-rate). Standard\\nerrors for these measurements are also calculated, assuming a binomial\\ndistribution of reads across the SNPs.\\n\\nIf using TSV input, SexDetERRmine is performed on all samples gathered together.\"\n        },\n        \"nuclear_contamination_for_human_dna\": {\n            \"title\": \"Nuclear Contamination for Human DNA\",\n            \"type\": \"object\",\n            \"description\": \"Options for the estimation of contamination of human DNA.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_nuclear_contamination\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on nuclear contamination estimation for human reference genomes.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Specify to run the optional processes for (human) nuclear DNA contamination estimation.\\n\"\n                },\n                \"contamination_chrom_name\": {\n                    \"type\": \"string\",\n                    \"default\": \"X\",\n                    \"description\": \"The name of the X chromosome in your bam/FASTA header. 'X' for hs37d5, 'chrX' for HG19.\",\n                    \"fa_icon\": \"fas fa-address-card\",\n                    \"help_text\": \"The name of the human chromosome X in your bam. `'X'` for hs37d5, `'chrX'` for HG19. Defaults to `'X'`.\"\n                }\n            },\n            \"fa_icon\": \"fas fa-radiation-alt\"\n        },\n        \"metagenomic_screening\": {\n            \"title\": \"Metagenomic Screening\",\n            \"type\": \"object\",\n            \"description\": \"Options for metagenomic screening of off-target reads.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"metagenomic_complexity_filter\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on removal of low-sequence complexity reads for metagenomic screening with bbduk\",\n                    \"help_text\": \"Turns on low-sequence complexity filtering of off-target reads using `bbduk`.\\n\\nThis is typically performed to reduce the number of uninformative reads or potential false-positive reads, typically for input for metagenomic screening. This thus reduces false positive species IDs and also run-time and resource requirements.\\n\\nSee `--metagenomic_complexity_entropy` for how complexity is calculated. **Important** There are no MultiQC output results for this module, you must check the number of reads removed with the `_bbduk.stats` output file.\\n\\nDefault: off\\n\",\n                    \"fa_icon\": \"fas fa-filter\"\n                },\n                \"metagenomic_complexity_entropy\": {\n                    \"type\": \"number\",\n                    \"default\": 0.3,\n                    \"description\": \"Specify the entropy threshold that under which a sequencing read will be complexity filtered out. This should be between 0-1.\",\n                    \"minimum\": 0,\n                    \"maximum\": 1,\n                    \"help_text\": \"Specify a minimum entropy threshold that under which it will be _removed_ from the FASTQ file that goes into metagenomic screening. \\n\\nA mono-nucleotide read such as GGGGGG will have an entropy of 0, a completely random sequence has an entropy of almost 1.\\n\\nSee the `bbduk` [documentation](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/-filter) on entropy for more information.\\n\\n> Modifies`bbduk` parameter `entropy=`\",\n                    \"fa_icon\": \"fas fa-percent\"\n                },\n                \"run_metagenomic_screening\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on metagenomic screening module for reference-unmapped reads.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turn on the metagenomic screening module.\\n\"\n                },\n                \"metagenomic_tool\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify which classifier to use. Options: 'malt', 'kraken'.\",\n                    \"fa_icon\": \"fas fa-tools\",\n                    \"help_text\": \"Specify which taxonomic classifier to use. There are two options available:\\n\\n- `kraken` for [Kraken2](https://ccb.jhu.edu/software/kraken2)\\n- `malt` for [MALT](https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html)\\n\\n:warning: **Important** It is very important to run `nextflow clean -f` on your\\nNextflow run directory once completed. RMA6 files are VERY large and are\\n_copied_ from a `work/` directory into the results folder. You should clean the\\nwork directory with the command to ensure non-redundancy and large HDD\\nfootprints!\"\n                },\n                \"database\": {\n                    \"type\": \"string\",\n                    \"description\": \"Specify path to classifier database directory. For Kraken2 this can also be a `.tar.gz` of the directory.\",\n                    \"fa_icon\": \"fas fa-database\",\n                    \"help_text\": \"Specify the path to the _directory_ containing your taxonomic classifier's database (malt or kraken).\\n\\nFor Kraken2, it can be either the path to the _directory_ or the path to the `.tar.gz` compressed directory of the Kraken2 database.\"\n                },\n                \"metagenomic_min_support_reads\": {\n                    \"type\": \"integer\",\n                    \"default\": 1,\n                    \"description\": \"Specify a minimum number of reads a taxon of sample total is required to have to be retained. Not compatible with --malt_min_support_mode 'percent'.\",\n                    \"fa_icon\": \"fas fa-sort-numeric-up-alt\",\n                    \"help_text\": \"Specify the minimum number of reads a given taxon is required to have to be retained as a positive 'hit'.  \\nFor malt, this only applies when `--malt_min_support_mode` is set to 'reads'. Default: 1.\\n\\n> Modifies MALT or kraken_parse.py parameter: `-sup` and `-c` respectively\\n\"\n                },\n                \"percent_identity\": {\n                    \"type\": \"integer\",\n                    \"default\": 85,\n                    \"description\": \"Percent identity value threshold for MALT.\",\n                    \"fa_icon\": \"fas fa-id-card\",\n                    \"help_text\": \"Specify the minimum percent identity (or similarity) a sequence must have to the reference for it to be retained. Default is `85`\\n\\nOnly used when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-id`\"\n                },\n                \"malt_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"BlastN\",\n                    \"description\": \"Specify which alignment mode to use for MALT. Options: 'Unknown', 'BlastN', 'BlastP', 'BlastX', 'Classifier'.\",\n                    \"fa_icon\": \"fas fa-align-left\",\n                    \"help_text\": \"Use this to run the program in 'BlastN', 'BlastP', 'BlastX' modes to align DNA\\nand DNA, protein and protein, or DNA reads against protein references\\nrespectively. Ensure your database matches the mode. Check the\\n[MALT\\nmanual](http://ab.inf.uni-tuebingen.de/data/software/malt/download/manual.pdf)\\nfor more details. Default: `'BlastN'`\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-m`\\n\",\n                    \"enum\": [\n                        \"BlastN\",\n                        \"BlastP\",\n                        \"BlastX\"\n                    ]\n                },\n                \"malt_alignment_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"SemiGlobal\",\n                    \"description\": \"Specify alignment method for MALT. Options: 'Local', 'SemiGlobal'.\",\n                    \"fa_icon\": \"fas fa-align-center\",\n                    \"help_text\": \"Specify what alignment algorithm to use. Options are 'Local' or 'SemiGlobal'. Local is a BLAST like alignment, but is much slower. Semi-global alignment aligns reads end-to-end. Default: `'SemiGlobal'`\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-at`\",\n                    \"enum\": [\n                        \"Local\",\n                        \"SemiGlobal\"\n                    ]\n                },\n                \"malt_top_percent\": {\n                    \"type\": \"integer\",\n                    \"default\": 1,\n                    \"description\": \"Specify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).\",\n                    \"fa_icon\": \"fas fa-percent\",\n                    \"help_text\": \"Specify the top percent value of the LCA algorithm. From the [MALT manual](http://ab.inf.uni-tuebingen.de/data/software/malt/download/manual.pdf): \\\"For each\\nread, only those matches are used for taxonomic placement whose bit disjointScore is within\\n10% of the best disjointScore for that read.\\\". Default: `1`.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-top`\"\n                },\n                \"malt_min_support_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"percent\",\n                    \"description\": \"Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT. Options: 'percent', 'reads'.\",\n                    \"fa_icon\": \"fas fa-drumstick-bite\",\n                    \"help_text\": \"Specify whether to use a percentage, or raw number of reads as the value used to decide the minimum support a taxon requires to be retained.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-sup -supp`\",\n                    \"enum\": [\n                        \"percent\",\n                        \"reads\"\n                    ]\n                },\n                \"malt_min_support_percent\": {\n                    \"type\": \"number\",\n                    \"default\": 0.01,\n                    \"description\": \"Specify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.\",\n                    \"fa_icon\": \"fas fa-percentage\",\n                    \"help_text\": \"Specify the minimum number of reads (as a percentage of all assigned reads) a given taxon is required to have to be retained as a positive 'hit' in the RMA6 file. This only applies when `--malt_min_support_mode` is set to 'percent'. Default 0.01.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-supp`\"\n                },\n                \"malt_max_queries\": {\n                    \"type\": \"integer\",\n                    \"default\": 100,\n                    \"description\": \"Specify the maximum number of queries a read can have for MALT.\",\n                    \"fa_icon\": \"fas fa-phone\",\n                    \"help_text\": \"Specify the maximum number of alignments a read can have. All further alignments are discarded. Default: `100`\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `-mq`\"\n                },\n                \"malt_memory_mode\": {\n                    \"type\": \"string\",\n                    \"default\": \"load\",\n                    \"description\": \"Specify the memory load method. Do not use 'map' with GPFS file systems for MALT as can be very slow. Options: 'load', 'page', 'map'.\",\n                    \"fa_icon\": \"fas fa-memory\",\n                    \"help_text\": \"\\nHow to load the database into memory. Options are `'load'`, `'page'` or `'map'`.\\n'load' directly loads the entire database into memory prior seed look up, this\\nis slow but compatible with all servers/file systems. `'page'` and `'map'`\\nperform a sort of 'chunked' database loading, allowing seed look up prior entire\\ndatabase loading. Note that Page and Map modes do not work properly not with\\nmany remote file-systems such as GPFS. Default is `'load'`.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MALT parameter: `--memoryMode`\",\n                    \"enum\": [\n                        \"load\",\n                        \"page\",\n                        \"map\"\n                    ]\n                },\n                \"malt_sam_output\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Specify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.\",\n                    \"fa_icon\": \"fas fa-file-alt\",\n                    \"help_text\": \"Specify to _also_ produce gzipped SAM files of all alignments and un-aligned reads in addition to RMA6 files. These are **not** soft-clipped or in 'sparse' format. Can be useful for downstream analyses due to more common file format. \\n\\n:warning: can result in very large run output directories as this is essentially duplication of the RMA6 files.\\n\\n> Modifies MALT parameter `-a -f`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-search\",\n            \"help_text\": \"\\nAn increasingly common line of analysis in high-throughput aDNA analysis today\\nis simultaneously screening off target reads of the host for endogenous\\nmicrobial signals - particularly of pathogens. Metagenomic screening is\\ncurrently offered via MALT with aDNA specific verification via MaltExtract, or\\nKraken2.\\n\\nPlease note the following:\\n\\n- :warning: Metagenomic screening is only performed on _unmapped_ reads from a\\n  mapping step.\\n  - You _must_ supply the `--run_bam_filtering` flag with unmapped reads in\\n    FASTQ format.\\n  - If you wish to run solely MALT (i.e. the HOPS pipeline), you must still\\n    supply a small decoy genome like phiX or human mtDNA `--fasta`.\\n- MALT database construction functionality is _not_ included within the pipeline\\n  - this should be done independently, **prior** the nf-core/eager run.\\n  - To use `malt-build` from the same version as `malt-run`, load either the\\n    Docker, Singularity or Conda environment.\\n- MALT can often require very large computing resources depending on your\\n  database. We set a absolute minimum of 16 cores and 128GB of memory (which is\\n  1/4 of the recommendation from the developer). Please leave an issue on the\\n  [nf-core github](https://github.com/nf-core/eager/issues) if you would like to\\n  see this changed.\\n\\n> :warning: Running MALT on a server with less than 128GB of memory should be\\n> performed at your own risk.\\n\\nIf using TSV input, metagenomic screening is performed on all samples gathered\\ntogether.\"\n        },\n        \"metagenomic_authentication\": {\n            \"title\": \"Metagenomic Authentication\",\n            \"type\": \"object\",\n            \"description\": \"Options for authentication of metagenomic screening performed by MALT.\",\n            \"default\": \"\",\n            \"properties\": {\n                \"run_maltextract\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on MaltExtract for MALT aDNA characteristics authentication.\",\n                    \"fa_icon\": \"fas fa-power-off\",\n                    \"help_text\": \"Turn on MaltExtract for MALT aDNA characteristics authentication of metagenomic output from MALT.\\n\\nMore can be seen in the [MaltExtract documentation](https://github.com/rhuebler/)\\n\\nOnly when `--metagenomic_tool malt` is also supplied\"\n                },\n                \"maltextract_taxon_list\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)\",\n                    \"fa_icon\": \"fas fa-list-ul\",\n                    \"help_text\": \"\\nPath to a `.txt` file with taxa of interest you wish to assess for aDNA characteristics. In `.txt` file should be one taxon per row, and the taxon should be in a valid [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy) name format.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\"\n                },\n                \"maltextract_ncbifiles\": {\n                    \"type\": \"string\",\n                    \"description\": \"Path to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)\",\n                    \"fa_icon\": \"fas fa-database\",\n                    \"help_text\": \"Path to directory containing containing the NCBI resource tree and taxonomy table files (ncbi.tre and ncbi.map; available at the [HOPS repository](https://github.com/rhuebler/HOPS/Resources)).\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\"\n                },\n                \"maltextract_filter\": {\n                    \"type\": \"string\",\n                    \"default\": \"def_anc\",\n                    \"description\": \"Specify which MaltExtract filter to use. Options: 'def_anc', 'ancient', 'default', 'crawl', 'scan', 'srna', 'assignment'.\",\n                    \"fa_icon\": \"fas fa-filter\",\n                    \"help_text\": \"Specify which MaltExtract filter to use. This is used to specify what types of characteristics to scan for. The default will output statistics on all alignments, and then a second set with just reads with one C to T mismatch in the first 5 bases. Further details on other parameters can be seen in the [HOPS documentation](https://github.com/rhuebler/HOPS/#maltextract-parameters). Options: `'def_anc'`, `'ancient'`, `'default'`, `'crawl'`, `'scan'`, `'srna'`, 'assignment'. Default: `'def_anc'`.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `-f`\",\n                    \"enum\": [\n                        \"def_anc\",\n                        \"default\",\n                        \"ancient\",\n                        \"scan\",\n                        \"crawl\",\n                        \"srna\"\n                    ]\n                },\n                \"maltextract_toppercent\": {\n                    \"type\": \"number\",\n                    \"default\": 0.01,\n                    \"description\": \"Specify percent of top alignments to use.\",\n                    \"fa_icon\": \"fas fa-percent\",\n                    \"help_text\": \"Specify frequency of top alignments for each read to be considered for each node.\\nDefault is 0.01, i.e. 1% of all reads (where 1 would correspond to 100%).\\n\\n> :warning: this parameter follows the same concept as `--malt_top_percent` but\\n> uses a different notation i.e. integer (MALT) versus float (MALTExtract)\\n\\nDefault: `0.01`.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `-a`\"\n                },\n                \"maltextract_destackingoff\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn off destacking.\",\n                    \"fa_icon\": \"fas fa-align-center\",\n                    \"help_text\": \"Turn off destacking. If left on, a read that overlaps with another read will be\\nremoved (leaving a depth coverage of 1).\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--destackingOff`\"\n                },\n                \"maltextract_downsamplingoff\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn off downsampling.\",\n                    \"fa_icon\": \"fab fa-creative-commons-sampling\",\n                    \"help_text\": \"Turn off downsampling. By default, downsampling is on and will randomly select 10,000 reads if the number of reads on a node exceeds this number. This is to speed up processing, under the assumption at 10,000 reads the species is a 'true positive'.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--downSampOff`\"\n                },\n                \"maltextract_duplicateremovaloff\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn off duplicate removal.\",\n                    \"fa_icon\": \"fas fa-align-left\",\n                    \"help_text\": \"\\nTurn off duplicate removal. By default, reads that are an exact copy (i.e. same start, stop coordinate and exact sequence match) will be removed as it is considered a PCR duplicate.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--dupRemOff`\"\n                },\n                \"maltextract_matches\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on exporting alignments of hits in BLAST format.\",\n                    \"fa_icon\": \"fas fa-equals\",\n                    \"help_text\": \"\\nExport alignments of hits for each node in BLAST format. By default turned off.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--matches`\"\n                },\n                \"maltextract_megansummary\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on export of MEGAN summary files.\",\n                    \"fa_icon\": \"fas fa-download\",\n                    \"help_text\": \"Export 'minimal' summary files (i.e. without alignments) that can be loaded into [MEGAN6](https://doi.org/10.1371/journal.pcbi.1004957). By default turned off.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--meganSummary`\"\n                },\n                \"maltextract_percentidentity\": {\n                    \"type\": \"number\",\n                    \"description\": \"Minimum percent identity alignments are required to have to be reported. Recommended to set same as MALT parameter.\",\n                    \"default\": 85,\n                    \"fa_icon\": \"fas fa-id-card\",\n                    \"help_text\": \"Minimum percent identity alignments are required to have to be reported. Higher values allows fewer mismatches between read and reference sequence, but therefore will provide greater confidence in the hit. Lower values allow more mismatches, which can account for damage and divergence of a related strain/species to the reference. Recommended to set same as MALT parameter or higher. Default: `85`.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--minPI`\"\n                },\n                \"maltextract_topalignment\": {\n                    \"type\": \"boolean\",\n                    \"description\": \"Turn on using top alignments per read after filtering.\",\n                    \"fa_icon\": \"fas fa-star-half-alt\",\n                    \"help_text\": \"Use the best alignment of each read for every statistic, except for those concerning read distribution and coverage. Default: off.\\n\\nOnly when `--metagenomic_tool malt` is also supplied.\\n\\n> Modifies MaltExtract parameter: `--useTopAlignment`\"\n                }\n            },\n            \"fa_icon\": \"fas fa-tasks\",\n            \"help_text\": \"Turn on MaltExtract for MALT aDNA characteristics authentication of metagenomic\\noutput from MALT.\\n\\nMore can be seen in the [MaltExtract\\ndocumentation](https://github.com/rhuebler/)\\n\\nOnly when `--metagenomic_tool malt` is also supplied\"\n        }\n    },\n    \"allOf\": [\n        {\n            \"$ref\": \"#/definitions/input_output_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/input_data_additional_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/reference_genome_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/output_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/generic_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/max_job_request_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/institutional_config_options\"\n        },\n        {\n            \"$ref\": \"#/definitions/skip_steps\"\n        },\n        {\n            \"$ref\": \"#/definitions/complexity_filtering\"\n        },\n        {\n            \"$ref\": \"#/definitions/read_merging_and_adapter_removal\"\n        },\n        {\n            \"$ref\": \"#/definitions/mapping\"\n        },\n        {\n            \"$ref\": \"#/definitions/host_removal\"\n        },\n        {\n            \"$ref\": \"#/definitions/bam_filtering\"\n        },\n        {\n            \"$ref\": \"#/definitions/deduplication\"\n        },\n        {\n            \"$ref\": \"#/definitions/library_complexity_analysis\"\n        },\n        {\n            \"$ref\": \"#/definitions/adna_damage_analysis\"\n        },\n        {\n            \"$ref\": \"#/definitions/feature_annotation_statistics\"\n        },\n        {\n            \"$ref\": \"#/definitions/bam_trimming\"\n        },\n        {\n            \"$ref\": \"#/definitions/genotyping\"\n        },\n        {\n            \"$ref\": \"#/definitions/consensus_sequence_generation\"\n        },\n        {\n            \"$ref\": \"#/definitions/snp_table_generation\"\n        },\n        {\n            \"$ref\": \"#/definitions/mitochondrial_to_nuclear_ratio\"\n        },\n        {\n            \"$ref\": \"#/definitions/human_sex_determination\"\n        },\n        {\n            \"$ref\": \"#/definitions/nuclear_contamination_for_human_dna\"\n        },\n        {\n            \"$ref\": \"#/definitions/metagenomic_screening\"\n        },\n        {\n            \"$ref\": \"#/definitions/metagenomic_authentication\"\n        }\n    ]\n}"
  }
]