Full Code of nf-core/eager for AI

master 3f9d64ced5e2 cached

73 files

774.9 KB

205.8k tokens

21 symbols

1 requests

Download .txt

Showing preview only (806K chars total). Download the full file or copy to clipboard to get everything.

Repository: nf-core/eager
Branch: master
Commit: 3f9d64ced5e2
Files: 73
Total size: 774.9 KB

Directory structure:
gitextract_rzjf5t_w/

├── .gitattributes
├── .github/
│   ├── .dockstore.yml
│   ├── CONTRIBUTING.md
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   └── feature_request.md
│   ├── PULL_REQUEST_TEMPLATE/
│   │   └── pull_request_template.md
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── markdownlint.yml
│   ├── workflows/
│   │   ├── awsfulltest.yml
│   │   ├── awstest.yml
│   │   ├── branch.yml
│   │   ├── ci.yml
│   │   ├── linting.yml
│   │   ├── linting_comment.yml
│   │   ├── push_dockerhub_dev.yml
│   │   └── push_dockerhub_release.yml
│   └── yamllint.yml
├── .gitignore
├── .gitpod.yml
├── .nf-core-lint.yml
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── README.md
├── assets/
│   ├── angsd_resources/
│   │   ├── README
│   │   └── getALL.txt
│   ├── email_template.html
│   ├── email_template.txt
│   ├── multiqc_config.yaml
│   ├── nf-core_eager_dummy.txt
│   ├── nf-core_eager_dummy2.txt
│   ├── sendmail_template.txt
│   └── where_are_my_files.txt
├── bin/
│   ├── endorS.py
│   ├── extract_map_reads.py
│   ├── filter_bam_fragment_length.py
│   ├── kraken_parse.py
│   ├── markdown_to_html.py
│   ├── merge_kraken_res.py
│   ├── parse_snp_cov.py
│   ├── print_x_contamination.py
│   └── scrape_software_versions.py
├── conf/
│   ├── base.config
│   ├── benchmarking_human.config
│   ├── benchmarking_vikingfish.config
│   ├── igenomes.config
│   ├── test.config
│   ├── test_direct.config
│   ├── test_full.config
│   ├── test_resources.config
│   ├── test_stresstest_human.config
│   ├── test_tsv_bam.config
│   ├── test_tsv_complex.config
│   ├── test_tsv_fna.config
│   ├── test_tsv_humanbam.config
│   ├── test_tsv_kraken.config
│   └── test_tsv_pretrim.config
├── docs/
│   ├── README.md
│   ├── images/
│   │   ├── README.md
│   │   └── usage/
│   │       └── nfcore-eager_tsv_template.tsv
│   ├── output.md
│   └── usage.md
├── environment.yml
├── lib/
│   ├── Checks.groovy
│   ├── Completion.groovy
│   ├── Headers.groovy
│   ├── NfcoreSchema.groovy
│   └── nfcore_external_java_deps.jar
├── main.nf
├── nextflow.config
└── nextflow_schema.json

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitattributes
================================================
*.config linguist-language=nextflow


================================================
FILE: .github/.dockstore.yml
================================================
# Dockstore config version, not pipeline version
version: 1.2
workflows:
  - subclass: nfl
    primaryDescriptorPath: /nextflow.config
    publish: True


================================================
FILE: .github/CONTRIBUTING.md
================================================
# nf-core/eager: Contributing Guidelines

Hi there!
Many thanks for taking an interest in improving nf-core/eager.

We try to manage the required tasks for nf-core/eager using GitHub issues, you probably came to this page when creating one.
Please use the pre-filled template to save time.

However, don't be put off by this template - other more general issues and suggestions are welcome!
Contributions to the code are even more welcome ;)

> If you need help using or modifying nf-core/eager then the best place to ask is on the nf-core Slack [#eager](https://nfcore.slack.com/channels/eager) channel ([join our Slack here](https://nf-co.re/join/slack)).

## Contribution workflow

If you'd like to write some code for nf-core/eager, the standard workflow is as follows:

1. Check that there isn't already an issue about your idea in the [nf-core/eager issues](https://github.com/nf-core/eager/issues) to avoid duplicating work
    * If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/eager repository](https://github.com/nf-core/eager) to your GitHub account
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged

If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).

## Tests

When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.

There are typically two types of tests that run:

### Lint tests

`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.

If any failures or warnings are encountered, please follow the listed URL for more documentation.

### Pipeline tests

Each `nf-core` pipeline should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.

## Patch

:warning: Only in the unlikely and regretful event of a release happening with a bug.

* On your own fork, make a new branch `patch` based on `upstream/master`.
* Fix the bug, and bump version (X.Y.Z+1).
* A PR should be made on `master` from patch to directly this particular bug.

## Getting help

For further information/help, please consult the [nf-core/eager documentation](https://nf-co.re/eager/usage) and don't hesitate to get in touch on the nf-core Slack [#eager](https://nfcore.slack.com/channels/eager) channel ([join our Slack here](https://nf-co.re/join/slack)).

## Pipeline contribution conventions

To make the nf-core/eager code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.

### Adding a new step

If you wish to contribute a new step, please use the following coding standards:

1. Define the corresponding input channel into your new process from the expected previous process channel
2. Write the process block (see below).
3. Define the output channel if needed (see below).
4. Add any new flags/options to `nextflow.config` with a default (see below).
5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`).
6. Add sanity checks for all relevant parameters.
7. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
8. Do local tests that the new code works properly and as expected.
9. Add a new test command in `.github/workflow/ci.yaml`.
10. If applicable add a [MultiQC](https://https://multiqc.info/) module.
11. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
12. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.

### Default values

Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.

Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.

### Default processes resource requirements

Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.

:warning: Note that in nf-core/eager we currently have our own custom process labels, so please check `base.config`!

The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.

### Naming schemes

Please use the following naming schemes, to make it easy to understand what is going where.

* initial process channel: `ch_output_from_<process>`
* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`
* skipped process output: `ch_<previousstage>_for_<skipprocess>`(this goes out of the bypass statement described above)

### Nextflow version bumping

If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`

### Software version reporting

If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.

Add to the script block of the process, something like the following:

```bash
<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
```

or

```bash
<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
```

You then need to edit the script `bin/scrape_software_versions.py` to:

1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.

### Images and figures

For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).

For all internal nf-core/eager documentation images we are using the 'Kalam' font by the Indian Type Foundry and licensed under the Open Font License. It can be found for download here [here](https://fonts.google.com/specimen/Kalam).

## Process Concept

We are providing a highly configurable pipeline, with many options to turn on and off different processes in different combinations. This can make a very complex graph structure that can cause a large amount of duplicated channels coming out of every process to account for each possible combination.

The EAGER pipeline can currently be broken down into the following 'stages', where a stage is a collection of  non-terminal mutually exclusive processes, which is the output of which is used for another file reporting module (but not reporting!) .

* Input
* Convert BAM
* PolyG Clipping
* AdapterRemoval
* Mapping (either `bwa`, `bwamem`, or `circularmapper`)
* BAM Filtering
* Deduplication (either `dedup` or `markduplicates`)
* BAM Trimming
* PMDtools
* Genotyping

Every step can potentially be skipped, therefore the output of a previous stage must be able to be passed to the next stage, if the given stage is not run.

To somewhat simplify this logic, we have implemented the following structure.

The concept is as follows:

* Every 'stage' of the pipeline (i.e. collection of mutually exclusive processes) must always have a if else statement following it.
* This if else 'bypass' statement collects and standardises all possible input files into single channel(s) for the next stage.
* Importantly - within the bypass statement, a channel from the previous stage's bypass mixes into these output channels. This additional channel is named `ch_previousstage_for_skipcurrentstage`. This contains the output from the previous stage, i.e. not the modified version from the current stage.
* The bypass statement works as follows:
  * If the current stage is turned on: will mix the previous stage and current stage output and filter for file suffixes unique to the current stage output
  * If the current stage is turned off or skipped: will mix the previous stage and current stage output. However as there there is no files in the output channel from the current stage, no filtering is required and the files in the 'ch_XXX_for_skipXXX' stage will be used.
  
 This ensures the same channel inputs to the next stage is 'homogeneous' - i.e. all comes from the same source (the bypass statement)
  
 An example schematic can be given as follows

```nextflow
 // PREVIOUS STAGE OUTPUT
if (params.run_bam_filtering) {
    ch_input_for_skipconvertbam.mix(ch_output_ch_convertbam)
        .filter{ it =~/.*converted.fq/}
        .into { ch_convertbam_for_fastp; ch_convertbam_for_skipfastp }
} else {
    ch_input_for_skipconvertbam
        .into { ch_convertbam_for_fastp; ch_convertbam_for_skipfastp }
}

// SKIPPABLE CURRENT STAGE PROCESS
process fastp {
    publishDir "${params.outdir}/fastp", mode: 'copy'

    when:
    params.run_fastp

    input:
    file fq from ch_convertbam_for_fastp

    output:
    file "*pG.fq" into ch_output_from_fastp

    script:
    """
    echo "I have been fastp'd" > ${fq}  
    mv ${fq} ${fq}.pG.fq
    """
}

// NEXT STAGE INPUT PREPARATION
if (params.run_fastp) {
    ch_convertbam_for_skipfastp.mix(ch_output_from_fastp)
        .filter { it =~/.*pG.fq/ }
        .into { ch_fastp_for_adapterremoval; ch_fastp_for_skipadapterremoval }
} else {
    ch_convertbam_for_skipfastp
        .into { ch_fastp_for_adapterremoval; ch_fastp_for_skipadapterremoval }
}

 ```


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Report something that is broken or incorrect
labels: bug
---

<!--
# nf-core/eager bug report

Hi there!

Thanks for telling us about a problem with the pipeline.
Please delete this text and anything that's not relevant from the template below:
-->

## Check Documentation

I have checked the following places for your error:

- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
- [ ] [nf-core/eager pipeline documentation](https://nf-co.re/nf-core/eager/usage)
      - nf-core/eager FAQ/troubleshooting can be found [here](https://nf-co.re/eager/usage#troubleshooting-and-faqs)

## Description of the bug

<!-- A clear and concise description of what the bug is. -->

## Steps to reproduce

Steps to reproduce the behaviour:

1. Command line: `nextflow run ...`
2. See error: _Please provide your error message_

## Expected behaviour

<!-- A clear and concise description of what you expected to happen. -->

## Log files

Have you provided the following extra information/files:

- [ ] The command used to run the pipeline
- [ ] The `.nextflow.log` file <!-- this is a hidden file in the directory where you launched the pipeline -->
- [ ] The exact error: <!-- [Please provide your error message] -->

## System

- Hardware: <!-- [e.g. HPC, Desktop, Cloud...] -->
- Executor: <!-- [e.g. slurm, local, awsbatch...] -->
- OS: <!-- [e.g. CentOS Linux, macOS, Linux Mint...] -->
- Version <!-- [e.g. 7, 10.13.6, 18.3...] -->

## Nextflow Installation

- Version: <!-- [e.g. 19.10.0] -->

## Container engine

- Engine: <!-- [e.g. Conda, Docker, Singularity, Podman, Shifter or Charliecloud] -->
- version: <!-- [e.g. 1.0.0] -->
- Image tag: <!-- [e.g. nfcore/eager:1.0.0] -->

## Additional context

<!-- Add any other context about the problem here. -->


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false
contact_links:
  - name: Join nf-core
    url: https://nf-co.re/join
    about: Please join the nf-core community here
  - name: "Slack #eager channel"
    url: https://nfcore.slack.com/channels/eager
    about: Discussion about the nf-core/eager pipeline


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for the nf-core/eager pipeline
labels: enhancement
---

<!--
# nf-core/eager feature request

Hi there!

Thanks for suggesting a new feature for the pipeline!
Please delete this text and anything that's not relevant from the template below:
-->

## Is your feature request related to a problem? Please describe

<!-- A clear and concise description of what the problem is. -->

<!-- e.g. [I'm always frustrated when ...] -->

## Describe the solution you'd like

<!-- A clear and concise description of what you want to happen. -->

## Describe alternatives you've considered

<!-- A clear and concise description of any alternative solutions or features you've considered. -->

## Additional context

<!-- Add any other context about the feature request here. -->


================================================
FILE: .github/PULL_REQUEST_TEMPLATE/pull_request_template.md
================================================
Many thanks to contributing to nf-core/eager!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

## PR checklist

 - [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
   - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
   - [ ] If necessary, also make a PR on the [nf-core/eager branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/eager).
 - [ ] Make sure your code lints (`nf-core lint .`).
 - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
 - [ ] Usage Documentation in `docs/usage.md` is updated.
 - [ ] Output Documentation in `docs/output.md` is updated.
 - [ ] `CHANGELOG.md` is updated.
 - [ ] `README.md` is updated (including new tool citations and authors/contributors).

**Learn more about contributing:** https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md


================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
<!--
# nf-core/eager pull request

Many thanks for contributing to nf-core/eager!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release.

Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md)
-->
<!-- markdownlint-disable ul-indent -->

## PR checklist

- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
    - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
    - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](<https://github.com/>nf-core/eager/tree/master/.github/CONTRIBUTING.md)
    - [ ] If necessary, also make a PR on the nf-core/eager _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
- [ ] Make sure your code lints (`nf-core lint .`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).


================================================
FILE: .github/markdownlint.yml
================================================
# Markdownlint configuration file
default: true
line-length: false
no-duplicate-header:
    siblings_only: true
no-inline-html:
    allowed_elements:
        - img
        - p
        - kbd
        - details
        - summary


================================================
FILE: .github/workflows/awsfulltest.yml
================================================
name: nf-core AWS full size tests
# This workflow is triggered on published releases.
# It can be additionally triggered manually with GitHub actions workflow dispatch.
# It runs the -profile 'test_full' on AWS batch

on:
  workflow_run:
    workflows: ["nf-core Docker push (release)"]
    types: [completed]
  workflow_dispatch:


env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}


jobs:
  run-awstest:
    name: Run AWS full tests
    if: github.repository == 'nf-core/eager'
    runs-on: ubuntu-latest
    steps:
      - name: Setup Miniconda
        uses: conda-incubator/setup-miniconda@v2
        with:
          auto-update-conda: true
          python-version: 3.7
      - name: Install awscli
        run: conda install -c conda-forge awscli
      - name: Start AWS batch job
        # Add full size test data (but still relatively small datasets for few samples)
        # on the `test_full.config` test runs with only one set of parameters
        # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command
        run: |
          aws batch submit-job \
            --region eu-west-1 \
            --job-name nf-core-eager \
            --job-queue $AWS_JOB_QUEUE \
            --job-definition $AWS_JOB_DEFINITION \
            --container-overrides '{"command": ["nf-core/eager", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/eager/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/eager/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'


================================================
FILE: .github/workflows/awstest.yml
================================================
name: nf-core AWS test
# This workflow is triggered on push to the master branch.
# It can be additionally triggered manually with GitHub actions workflow dispatch.
# It runs the -profile 'test' on AWS batch.

on:
  workflow_dispatch:


env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}


jobs:
  run-awstest:
    name: Run AWS tests
    if: github.repository == 'nf-core/eager'
    runs-on: ubuntu-latest
    steps:
      - name: Setup Miniconda
        uses: conda-incubator/setup-miniconda@v2
        with:
          auto-update-conda: true
          python-version: 3.7
      - name: Install awscli
        run: conda install -c conda-forge awscli
      - name: Start AWS batch job
        # For example: adding multiple test runs with different parameters
        # Remember that you can parallelise this by using strategy.matrix
        run: |
          aws batch submit-job \
          --region eu-west-1 \
          --job-name nf-core-eager \
          --job-queue $AWS_JOB_QUEUE \
          --job-definition $AWS_JOB_DEFINITION \
          --container-overrides '{"command": ["nf-core/eager", "-r '"${GITHUB_SHA}"' -profile test_tsv_complex --outdir s3://'"${AWS_S3_BUCKET}"'/eager/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/eager/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'


================================================
FILE: .github/workflows/branch.yml
================================================
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
on:
  pull_request_target:
    branches: [master]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
      - name: Check PRs
        if: github.repository == 'nf-core/eager'
        run: |
          { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/eager ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]


      # If the above check failed, post a comment on the PR explaining the failure
      # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
      - name: Post PR comment
        if: failure()
        uses: mshick/add-pr-comment@v1
        with:
          message: |
            ## This PR is against the `master` branch :x:

            * Do not close this PR
            * Click _Edit_ and change the `base` to `dev`
            * This CI test will remain failed until you push a new commit

            ---

            Hi @${{ github.event.pull_request.user.login }},

            It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
            The `master` branch on nf-core repositories should always contain code from the latest release.
            Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.

            You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
            Note that even after this, the test will continue to show as failing until you push a new commit.

            Thanks again for your contribution!
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          allow-repeats: false



================================================
FILE: .github/workflows/ci.yml
================================================
name: nf-core CI
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
on:
  push:
    branches:
      - dev
  pull_request:
  release:
    types: [published]

# Uncomment if we need an edge release of Nextflow again
# env: NXF_EDGE: 1

jobs:
  test:
    name: Run workflow tests
    # Only run on push if this is the nf-core dev branch (merged PRs)
    if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/eager') }}
    runs-on: ubuntu-latest
    env:
      NXF_VER: ${{ matrix.nxf_ver }}
      NXF_ANSI_LOG: false
    strategy:
      matrix:
        # Nextflow versions: check pipeline minimum and current latest
        nxf_ver: ["20.07.1", "22.10.6"]
    steps:
      - name: Check out pipeline code
        uses: actions/checkout@v2
      - name: Install older Java
        uses: actions/setup-java@v4
        with:
          distribution: "temurin" # See 'Supported distributions' for available options
          java-version: "11"
      - name: Check if Dockerfile or Conda environment changed
        uses: technote-space/get-diff-action@v4
        with:
          FILES: |
            Dockerfile
            environment.yml

      - name: Build new docker image
        if: env.MATCHED_FILES
        run: docker build --no-cache . -t nfcore/eager:2.5.3

      - name: Pull docker image
        if: ${{ !env.MATCHED_FILES }}
        run: |
          docker pull nfcore/eager:dev
          docker tag nfcore/eager:dev nfcore/eager:2.5.3
      - name: Install Nextflow
        env:
          CAPSULE_LOG: none
        run: |
          wget -qO- https://github.com/nextflow-io/nextflow/releases/download/v22.10.6/nextflow | bash
          sudo mv nextflow /usr/local/bin/
      - name: HELPTEXT Run with the help flag
        run: |
          nextflow run ${GITHUB_WORKSPACE} --help
      - name: Get test data for cases where we don't use TSV input
        run: |
          git clone --single-branch --branch eager https://github.com/nf-core/test-datasets.git data
      - name: DELAY to try address some odd behaviour with what appears to be a conflict between parallel htslib jobs leading to CI hangs
        run: |
          if [[ $NXF_VER = '' ]]; then sleep 1200; fi
      - name: BASIC Run the basic pipeline with directly supplied single-end FASTQ
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/fastq/*_R1_*.fq.gz' --single_end
      - name: BASIC Run the basic pipeline with directly supplied paired-end FASTQ
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/fastq/*_{R1,R2}_*tengrand.fq.gz'
      - name: BASIC Run the basic pipeline with supplied --input BAM
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_direct,docker --input 'data/testdata/Mammoth/bam/*_R1_*.bam' --bam --single_end
      - name: BASIC Run the basic pipeline with the test profile with, PE/SE, bwa aln
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --save_reference
      - name: REFERENCE Basic workflow, with supplied indices
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --bwa_index 'results/reference_genome/bwa_index/BWAIndex/' --fasta_index 'https://github.com/nf-core/test-datasets/blob/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.fai'
      - name: REFERENCE Run the basic pipeline with FastA reference with `fna` extension
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker
      - name: REFERENCE Test with zipped reference input
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --fasta 'https://github.com/nf-core/test-datasets/raw/eager/reference/Mammoth/Mammoth_MT_Krause.fasta.gz'
      - name: FASTP Test fastp complexity filtering
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --complexity_filter_poly_g
      - name: ADAPTERREMOVAL Test skip paired end collapsing
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --skip_collapse
      - name: ADAPTERREMOVAL Test paired end collapsing but no trimming
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_pretrim,docker --skip_trim
      - name: ADAPTERREMOVAL Run the basic pipeline with paired end data without adapterRemoval
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --skip_adapterremoval
      - name: ADAPTERREMOVAL Run the basic pipeline with preserve5p end option
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preserve5p
      - name: ADAPTERREMOVAL Run the basic pipeline with merged only option
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mergedonly
      - name: ADAPTERREMOVAL Run the basic pipeline with preserve5p end and merged reads only options
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preserve5p --mergedonly
      - name: ADAPTER LIST Run the basic pipeline using an adapter list
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_adapters_list 'https://github.com/nf-core/test-datasets/raw/eager/databases/adapters/adapter-list.txt'
      - name: ADAPTER LIST Run the basic pipeline using an adapter list, skipping adapter removal
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_adapters_list 'https://github.com/nf-core/test-datasets/raw/eager/databases/adapters/adapter-list.txt' --skip_adapterremoval
      - name: POST_AR_FASTQ_TRIMMING Run the basic pipeline post-adapterremoval FASTQ trimming
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_post_ar_trimming
      - name: POST_AR_FASTQ_TRIMMING Run the basic pipeline post-adapterremoval FASTQ trimming, but skip adapterremoval
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_post_ar_trimming --skip_adapterremoval
      - name: MAPPER_CIRCULARMAPPER Test running with CircularMapper
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'circularmapper' --circulartarget 'NC_007596.2'
      - name: MAPPER_BWAMEM Test running with BWA Mem
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'bwamem' --skip_collapse
      - name: MAPPER_BT2 Test running with BowTie2
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --mapper 'bowtie2' --bt2_alignmode 'local' --bt2_sensitivity 'sensitive' --bt2n 1 --bt2l 16 --bt2_trim5 1 --bt2_trim3 1
      - name: HOST_REMOVAL_FASTQ Run the basic pipeline with output unmapped reads as fastq
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --hostremoval_input_fastq
      - name: BAM_FILTERING Run basic mapping pipeline with mapping quality filtering, and unmapped export
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering --bam_mapping_quality_threshold 37  --bam_unmapped_type 'fastq'
      - name: BAM_FILTERING Run basic mapping pipeline with post-mapping length filtering
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --clip_readlength 0 --run_bam_filtering --bam_filter_minreadlength 50
      - name: PRESEQ Run basic mapping pipeline with different preseq mode
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --preseq_mode 'lc_extrap' --preseq_maxextrap 10000 --preseq_bootstrap 10
      - name: DEDUPLICATION Test with dedup
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --dedupper 'dedup' --dedup_all_merged
      - name: BEDTOOLS Test bedtools feature annotation
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bedtools_coverage --anno_file 'https://github.com/nf-core/test-datasets/raw/eager/reference/Mammoth/Mammoth_MT_Krause.gff3'
      - name: MAPDAMAGE2 damage calculation
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --damage_calculation_tool 'mapdamage'
      - name: GENOTYPING_HC Test running GATK HaplotypeCaller
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker --run_genotyping --genotyping_tool 'hc' --gatk_hc_out_mode 'EMIT_ALL_ACTIVE_SITES' --gatk_hc_emitrefconf 'BP_RESOLUTION'
      - name: GENOTYPING_FB Test running FreeBayes
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'freebayes'
      - name: GENOTYPING_PC Test running pileupCaller
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --run_genotyping --genotyping_tool 'pileupcaller'
      - name: GENOTYPING_ANGSD Test running ANGSD genotype likelihood calculation
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --run_genotyping --genotyping_tool 'angsd'
      - name: GENOTYPING_BCFTOOLS Test running FreeBayes with bcftools stats turned on
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'freebayes' --run_bcftools_stats
      - name: SKIPPING Test checking all skip steps work i.e. input bam, skipping straight to genotyping
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --skip_preseq --skip_damage_calculation --run_genotyping --genotyping_tool 'freebayes'
      - name: TRIMBAM Test bamutils works alone
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_trim_bam
      - name: PMDTOOLS Test PMDtools works alone
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_pmdtools
      - name: GENOTYPING_UG AND MULTIVCFANALYZER Test running GATK UnifiedGenotyper and MultiVCFAnalyzer, additional VCFS
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies
      - name: COMPLEX LANE/LIBRARY MERGING Test running lane and library merging prior to GATK UnifiedGenotyper and running MultiVCFAnalyzer
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer
      - name: GENOTYPING_UG ON TRIMMED BAM Test
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --genotyping_tool 'ug' --gatk_ug_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'
      - name: BAM_INPUT Run the basic pipeline with the bam input profile, skip AdapterRemoval as no convertBam
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_adapterremoval
      - name: BAM_INPUT Run the basic pipeline with the bam input profile, convert to FASTQ for adapterremoval test and downstream
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --run_convertinputbam
      - name: METAGENOMIC Download MALT database
        run: |
          mkdir -p databases/malt
          readlink -f databases/malt/
          for i in index0.idx ref.db ref.idx ref.inf table0.db table0.idx taxonomy.idx taxonomy.map taxonomy.tre; do wget https://github.com/nf-core/test-datasets/raw/eager/databases/malt/"$i" -P databases/malt/; done
      - name: METAGENOMIC Run the basic pipeline but with unmapped reads going into MALT
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database "/home/runner/work/eager/eager/databases/malt/" --malt_sam_output
      - name: METAGENOMIC Run the basic pipeline but low-complexity filtered reads going into MALT
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database "/home/runner/work/eager/eager/databases/malt/" --metagenomic_complexity_filter
      - name: MALTEXTRACT Download resource files
        run: |
          mkdir -p databases/maltextract
          for i in ncbi.tre ncbi.map; do wget https://github.com/rhuebler/HOPS/raw/0.33/Resources/"$i" -P databases/maltextract/; done
      - name: MALTEXTRACT Basic with MALT plus MaltExtract
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_bam_filtering  --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database "/home/runner/work/eager/eager/databases/malt" --run_maltextract --maltextract_ncbifiles "/home/runner/work/eager/eager/databases/maltextract/" --maltextract_taxon_list 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/maltextract/MaltExtract_list.txt'
      - name: METAGENOMIC Run the basic pipeline but with unmapped reads going into Kraken
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_kraken,docker --run_bam_filtering  --bam_unmapped_type 'fastq'
      - name: SNPCAPTURE Run the basic pipeline with the bam input profile, generating statistics with a SNP capture bed
        run: |
          wget https://github.com/nf-core/test-datasets/raw/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz && gunzip 1240K.pos.list_hs37d5.0based.bed.gz
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --snpcapture_bed 1240K.pos.list_hs37d5.0based.bed
      - name: SEXDETERMINATION Run the basic pipeline with the bam input profile, but don't convert BAM, skip everything but sex determination
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --run_sexdeterrmine
      - name: NUCLEAR CONTAMINATION Run basic pipeline with bam input profile, but don't convert BAM, skip everything but nuclear contamination estimation
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --run_nuclear_contamination
      - name: MTNUCRATIO Run basic pipeline with bam input profile, but don't convert BAM, skip everything but nmtnucratio
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --skip_preseq --skip_damage_calculation --run_mtnucratio
      - name: RESCALING Run basic pipeline with basic pipeline but with mapDamage rescaling of BAM files. Note this will be slow
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --run_mapdamage_rescaling --run_genotyping --genotyping_tool hc --genotyping_source 'rescaled'


================================================
FILE: .github/workflows/linting.yml
================================================
name: nf-core linting
# This workflow is triggered on pushes and PRs to the repository.
# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines
on:
  push:
  pull_request:
  release:
    types: [published]

jobs:
  Markdown:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2

      - name: Install markdownlint
        run: npm install -g markdownlint-cli
      - name: Run Markdownlint
        run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml

      # If the above check failed, post a comment on the PR explaining the failure
      - name: Post PR comment
        if: failure()
        uses: mshick/add-pr-comment@v1
        with:
          message: |
            ## Markdown linting is failing

            To keep the code consistent with lots of contributors, we run automated code consistency checks.
            To fix this CI test, please run:

            * Install `markdownlint-cli`
                * On Mac: `brew install markdownlint-cli`
                * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`)
            * Fix the markdown errors
                * Automatically: `markdownlint . --config .github/markdownlint.yml --fix`
                * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml`

            Once you push these changes the test should pass, and you can hide this comment :+1:

            We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

            Thanks again for your contribution!
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          allow-repeats: false

  YAML:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v1
      - uses: actions/setup-node@v2

      - name: Install yaml-lint
        run: npm install -g yaml-lint
      - name: Run yaml-lint
        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml") -c .github/yamllint.yml

      # If the above check failed, post a comment on the PR explaining the failure
      - name: Post PR comment
        if: failure()
        uses: mshick/add-pr-comment@v1
        with:
          message: |
            ## YAML linting is failing

            To keep the code consistent with lots of contributors, we run automated code consistency checks.
            To fix this CI test, please run:

            * Install `yaml-lint`
                * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`)
            * Fix the markdown errors
                * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")`
                * Fix any reported errors in your YAML files

            Once you push these changes the test should pass, and you can hide this comment :+1:

            We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

            Thanks again for your contribution!
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          allow-repeats: false

  nf-core:
    runs-on: ubuntu-latest
    steps:
      - name: Check out pipeline code
        uses: actions/checkout@v2

      - name: Install Nextflow
        env:
          CAPSULE_LOG: none
        run: |
          wget -qO- get.nextflow.io | bash
          sudo mv nextflow /usr/local/bin/

      - uses: actions/setup-python@v1
        with:
          python-version: "3.6"
          architecture: "x64"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install nf-core==1.14

      - name: Run nf-core lint
        env:
          GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
        run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} --markdown lint_results.md

      - name: Save PR number
        if: ${{ always() }}
        run: echo ${{ github.event.pull_request.number }} > PR_number.txt

      - name: Upload linting log file artifact
        if: ${{ always() }}
        uses: actions/upload-artifact@v2
        with:
          name: linting-logs
          path: |
            lint_log.txt
            lint_results.md
            PR_number.txt


================================================
FILE: .github/workflows/linting_comment.yml
================================================

name: nf-core linting comment
# This workflow is triggered after the linting action is complete
# It posts an automated comment to the PR, even if the PR is coming from a fork

on:
  workflow_run:
    workflows: ["nf-core linting"]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Download lint results
        uses: dawidd6/action-download-artifact@v2
        with:
          workflow: linting.yml

      - name: Get PR number
        id: pr_number
        run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)"

      - name: Post PR comment
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          number: ${{ steps.pr_number.outputs.pr_number }}
          path: linting-logs/lint_results.md



================================================
FILE: .github/workflows/push_dockerhub_dev.yml
================================================
name: nf-core Docker push (dev)
# This builds the docker image and pushes it to DockerHub
# Runs on nf-core repo releases and push event to 'dev' branch (PR merges)
on:
  push:
    branches:
      - dev

jobs:
  push_dockerhub:
    name: Push new Docker image to Docker Hub (dev)
    runs-on: ubuntu-latest
    # Only run for the nf-core repo, for releases and merged PRs
    if: ${{ github.repository == 'nf-core/eager' }}
    env:
      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
      DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }}
    steps:
      - name: Check out pipeline code
        uses: actions/checkout@v2

      - name: Build new docker image
        run: docker build --no-cache . -t nfcore/eager:dev

      - name: Push Docker image to DockerHub (dev)
        run: |
          echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
          docker push nfcore/eager:dev


================================================
FILE: .github/workflows/push_dockerhub_release.yml
================================================
name: nf-core Docker push (release)
# This builds the docker image and pushes it to DockerHub
# Runs on nf-core repo releases and push event to 'dev' branch (PR merges)
on:
  release:
    types: [published]

jobs:
  push_dockerhub:
    name: Push new Docker image to Docker Hub (release)
    runs-on: ubuntu-latest
    # Only run for the nf-core repo, for releases and merged PRs
    if: ${{ github.repository == 'nf-core/eager' }}
    env:
      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
      DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }}
    steps:
      - name: Check out pipeline code
        uses: actions/checkout@v2

      - name: Build new docker image
        run: docker build --no-cache . -t nfcore/eager:latest

      - name: Push Docker image to DockerHub (release)
        run: |
          echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
          docker push nfcore/eager:latest
          docker tag nfcore/eager:latest nfcore/eager:${{ github.event.release.tag_name }}
          docker push nfcore/eager:${{ github.event.release.tag_name }}


================================================
FILE: .github/yamllint.yml
================================================
rules:
  document-start: disable
  comments: disable
  truthy: disable
  line-length: disable
  empty-lines: disable
  


================================================
FILE: .gitignore
================================================
.nextflow*
work/
data/
results/
.DS_Store
tests/
testing/
testing*
*.pyc
main_playground.nf
.vscode
*.code-workspace
nf-params.json

================================================
FILE: .gitpod.yml
================================================
image: nfcore/gitpod:latest

vscode:
  extensions: # based on nf-core.nf-core-extensionpack
    - codezombiech.gitignore # Language support for .gitignore files
    # - cssho.vscode-svgviewer                 # SVG viewer
    - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
    - eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed
    - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
    - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
    - mechatroner.rainbow-csv # Highlight columns in csv files in different colors
    # - nextflow.nextflow                      # Nextflow syntax highlighting
    - oderwat.indent-rainbow # Highlight indentation level
    - streetsidesoftware.code-spell-checker # Spelling checker for source code


================================================
FILE: .nf-core-lint.yml
================================================
files_unchanged:
  - assets/multiqc_config.yaml
  - .github/CONTRIBUTING.md
  - .github/ISSUE_TEMPLATE/bug_report.md
  - docs/README.md
  - .github/workflows/linting.yml


================================================
FILE: CHANGELOG.md
================================================
# nf-core/eager: Changelog

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.5.3] - 2025-03-18

### `Added`

### `Fixed`

- [#1119](https://github.com/nf-core/eager/issues/1119) - Fix typo in variable of IndelRealigner step of UnifiedGenotyper when generating a targetIntervals file (♥ to @Dog13Golf for reporting, fix by @jfy133).

### `Dependencies`

### `Deprecated`

## [2.5.2] - 2024-06-28

### `Added`

- [#1079](https://github.com/nf-core/eager/issues/1079) - Added the `lanemerging` output directory in the output documentation (♥ to @TessaZei for reporting, fix by @TCLamnidis).

### `Fixed`

- [#1037](https://github.com/nf-core/eager/issues/1073) - Fixed post-adapterremoval trimmed FastQC results not being displayed in MultiQC (♥ to @kieren-j-mitchell for reporting, fix by @jfy133 and @TCLamnidis)

### `Dependencies`

### `Deprecated`

## [2.5.1] - 2024-02-21

### `Added`

- [#1037](https://github.com/nf-core/eager/issues/1037) Added an option to deactivate the `-sorted` option of bedtools coverage, in case the feature file is not sorted the same way as the fasta file, albeit with the caveat this will be very slow. (♥ Thanks to @IdoBar for reporting, and contributing.)

### `Fixed`

- [#1048](https://github.com/nf-core/eager/issues/1048) `--vcf2genome_outfile` parameter now gets prefixed by the sample_name and suffixed with `.fasta` (i.e. `<sample_name>_<vcf2genome_outfile>.fasta`). This ensures we avoid overwriting the output fasta of one sample with that of another when the option is provided. (♥ Thanks to @MeriamOs for reporting.)
- [#1047](https://github.com/nf-core/eager/issues/1047) Changed the row some statistics were reported in the General Stats table. The File name collision fixed in 2.5.0 (see #1017) caused these statistics to be reported in the wrong row due to an added suffix.
- [#1051](https://github.com/nf-core/eager/issues/1051) An error is now thrown if input BAM files end in `.unmapped.bam`, as this breaks the bam filtering process and empties the bam files in the process. (♥ Thanks to @PCQuilis for reporting.)

### `Dependencies`

### `Deprecated`

## [2.5.0] - Bopfingen - 2023-11-03

### `Added`

- [#1020](https://github.com/nf-core/eager/issues/1020) Added mapDamage2 as an alternative for damage calculation.

### `Fixed`

- [#1017](https://github.com/nf-core/eager/issues/1017) Fixed file name collision in niche cases with multiple libraries of multiple UDG treatments.
- [#1024](https://github.com/nf-core/eager/issues/1024) `multiqc_general_stats.txt` is now generated even if the table is a beeswarm plot in the report.
- [#655](https://github.com/nf-core/eager/issues/655) Updated RG tags for all mappers. RG-id now includes Sample as well as Library ID. Added `LB:` tag with the library ID.
- [#1031](https://github.com/nf-core/eager/issues/1031) Always index fasta regardless of mapper. This ensures that DamageProfiler and genotyping processes get submitted when using bowtie2 and not providing a fasta index.

### `Dependencies`

- `multiqc`: 1.14 -> 1.16

### `Deprecated`

## [2.4.7] - 2023-05-16

### `Added`

### `Fixed`

- [#983](https://github.com/nf-core/eager/issues/983) Bump `pygments` version due to incompatibility with MultiQC dependencies (♥ to @MinLuke for reporting)

### `Dependencies`

- `pygments`: 2.9 -> 2.14
- `multiqc`: 1.13 -> 1.14

### `Deprecated`

## [2.4.6] - 2022-11-14

### `Added`

- [#933](https://github.com/nf-core/eager/issues/933) Added support for customising --seq-length in mapDamage rescaling (♥ to @ashildv for requesting)

### `Fixed`

- Changed endors.py license from GPL to MIT (♥ to @aidaanva for fixing)
- Removed erroneous R2 in single-end example in input TSV of usage docs (♥ to @aidaanva for fixing)
- [#928](https://github.com/nf-core/eager/issues/928) Fixed read group incompatibility by re-adding picard AddOrReplaceReadGroups for MultiVCFAnalyzer (♥ to @aidaanva, @meganemichel for reporting)
- Fixed edge case of DamageProfiler occasionally requiring FASTA index (♥ to @asmaa-a-abdelwahab for reporting)
- [#834](https://github.com/nf-core/eager/issues/834) Increased significance values in general stats table for Qualimap mean/median coverages (♥ to @neija2611 for reporting)
- Fixed parameter documentation for `--snpcapture_bed` regarding on-target SNP stats to state these stats currently not displayed in MultiQC only in the Qualimap results (♥ to @meganemichel and @TCLamnidis for reporting)
- [#934](https://github.com/nf-core/eager/issues/934) Fixed broken parameter setting in mapDamage2 rescale length (♥ to @ashildv for reporting)

### `Dependencies`

- Updated MultiQC to official 1.13 version (rather than alpha)
- Added pinned MALT dependency to ensure working version in future versions of eager

### `Deprecated`

## [2.4.5] - 2022-08-02

### `Added`

### `Fixed`

- [#882](https://github.com/nf-core/eager/pull/882) Define DSL1 execution explicitly, as new versions Nextflow made DSL2 default (♥ to & fix from @Lehmann-Fabian)
- [#879](https://github.com/nf-core/eager/issues/879) Add missing threads parameter for pre-clipping FastQC for single end data that caused insufficient memory in some cases (♥ to @marcel-keller for reporting)
- [#880](https://github.com/nf-core/eager/issues/880) Fix failure of endorSpy to be cached or reexecuted on resume (♥ to @KathrinNaegele, @TCLamnidis, & @mahesh-panchal for reporting and debugging)
- [#885](https://github.com/nf-core/eager/issues/885) Specify task memory for all tools in get_software_versions to account for incompatibilty of java with some SGE clusters causing hanging of the process (♥ to @maxibor for reporting)
- [#887](https://github.com/nf-core/eager/issues/887) Clarify what is considered 'ultra-short' reads in the help text of clip_readlength, for when you may wish to turn of length filtering during AdapterRemoval (♥ to @TCLamnidis for reporting)
- [#889](https://github.com/nf-core/eager/issues/889) Remove/update parameters from benchmarking test profiles (♥ to @TCLamnidis for reporting)
- [#895](https://github.com/nf-core/eager/issues/895) Output documentation typo fix and added location of output docs in pipeline summary (♥ to @RodrigoBarquera for reporting)
- [#897](https://github.com/nf-core/eager/issues/897) Fix pipeline crash if no Kraken2 results generated (♥ to @alexandregilardet for reporting)
- [#899](https://github.com/nf-core/eager/issues/897) Fix pipeline crash for circulargenerator if reference file does not end in .fasta (♥ to @scarlhoff for reporting)
- Fixed some missing default values in the nextflow parameter schema JSON
- [#789](https://github.com/nf-core/eager/issues/789) Substantial speed and memory optimisation of the `extract_map_reads.py` script (♥ to @ivelsko for reporting, @maxibor for optimisation)
- Fix staging of input bams for genotyping_pileupcaller process. Downstream changes from changes introduced when fixing endorspy caching.
- Made slight correction on metro map diagram regarding input data to SexDeterrmine (only BAM trimming output files)

### `Dependencies`

- Updated MultiQC to latest stable alpha version on bioconda, correcting the previously nonsensical AdapterRemoval plots (♥ to @NiemannJ for fixing in MultiQC)

### `Deprecated`

## [2.4.4] - 2022-04-08

### `Added`

### `Fixed`

- Fixed some auxiliary files (adapater list, snpcapture/pileupcaller/sexdeterrmine BED files, and pileupCaller SNP file, PMD reference mask) in some cases only be used against one sample (❤ to @meganemichel for reporting, fix by @jfy133)

### `Dependencies`

### `Deprecated`

## [2.4.3] - 2022-03-24

### `Added`

### `Fixed`

- [#828](https://github.com/nf-core/eager/issues/828) Improved error message if required metagenomic screening parameters not set correctly
- [#836](https://github.com/nf-core/eager/issues/836) Remove deprecated parameters from test profiles
- [#838](https://github.com/nf-core/eager/issues/838) Fix --snpcapture_bed files not being picked up by Nextflow (❤ to @meganemichel for reporting)
- [#843](https://github.com/nf-core/eager/issues/843) Re-add direct piping of AdapterRemovalFixPrefix to pigz
- [#844](https://github.com/nf-core/eager/issues/844) Fixed reference masking prior to pmdtools
- [#845](https://github.com/nf-core/eager/issues/845) Updates parameter documention to specify `-s` preseq parameter also applies to lc_extrap
- [#851](https://github.com/nf-core/eager/issues/851) Fixes a file-name clash during additional_library_merge, post-BAM trimming of different UDG treated libraries of a sample (❤ to @alexandregilardet for reporting)
- Renamed a range of MultiQC general stats table headers to improve clarity, documentation has been updated accordingly
- [#857](https://github.com/nf-core/eager/issues/857) Corrected samtools fastq flag to _retain_ read-pair information when converting off-target BAM files to fastq in paired-end mapping (❤ to @alexhbnr for reporting)
- [#866](https://github.com/nf-core/eager/issues/866) Fixed a typo in the indexing step of BWA mem when not-collapsing (❤ to @alexhbnr for reporting)
- Corrected tutorials to reflect updated BAM trimming flags (❤ to @marcel-keller for reporting and correcting)

### `Dependencies`

- [#829](https://github.com/nf-core/eager/issues/829) Bumped sequencetools: 1.4.0.5 -> 1.5.2
- Bumped MultiQC: 1.11 -> 1.12 (for run-time optimisation and tool citation information)

### `Deprecated`

## [2.4.2] - 2022-01-24

### `Added`

### `Fixed`

- [#824](https://github.com/nf-core/eager/issues/824) Fixes large memory footprint of bedtools coverage calculation.
- [#822](https://github.com/nf-core/eager/issues/822) Fixed post-adapterremoval trimmed files not being lane-merged and included in downstream analyses
- Fixed a couple of software version reporting commands

### `Dependencies`

### `Deprecated`

## [2.4.1] - 2021-11-30

### `Added`

- [#805](https://github.com/nf-core/eager/issues/805) Changes to bam_trim options to allow flexible trimming by library strandedness (in addition to UDG treatment). (@TCLamnidis)
- [#808](https://github.com/nf-core/eager/issues/808) Retain read group information across bam merges. Sample set to sample name (rather than library name) in bwa output 'RG' readgroup tag. (@TCLamnidis)
- Map and base quality filters prior to genotyping with pileupcaller can now be specified. (@TCLamnidis)
- [#774](https://github.com/nf-core/eager/issues/774) Added support for multi-threaded Bowtie2 build reference genome indexing (@jfy133)
- [#804](https://github.com/nf-core/eager/issues/804) Improved output documentation description to add how 'cluster factor' is calculated (thanks to @meganemichel)

### `Fixed`

- [#803](https://github.com/nf-core/eager/issues/803) Fixed mistake in metro-map diagram (`samtools index` is now correctly `samtools faidx`) (@jfy133)

### `Dependencies`

### `Deprecated`

## [2.4.0] - Wangen - 2021-09-14

### `Added`

- [#317](https://github.com/nf-core/eager/issues/317) Added bcftools stats for general genotyping statistics of VCF files
- [#651](https://github.com/nf-core/eager/issues/651) - Adds removal of adapters specified in an AdapterRemoval adapter list file
- [#642](https://github.com/nf-core/eager/issues/642) and [#431](https://github.com/nf-core/eager/issues/431) adds post-adapter removal barcode/fastq trimming
- [#769](https://github.com/nf-core/eager/issues/769) - Adds lc_extrap mode to preseq (suggested by @roberta-davidson)

### `Fixed`

- Fixed some missing or incorrectly reported software versions
- [#771](https://github.com/nf-core/eager/issues/771) Remove legacy code
- Improved output documentation for MultiQC general stats table (thanks to @KathrinNaegele and @esalmela)
- Improved output documentation for BowTie2 (thanks to @isinaltinkaya)
- [#612](https://github.com/nf-core/eager/issues/612) Updated BAM trimming defaults to 0 to ensure no unwanted trimming when mixing half-UDG with no-UDG (thanks to @scarlhoff)
- [#722](https://github.com/nf-core/eager/issues/722) Updated BWA mapping mapping parameters to latest recommendations - primarily alnn back to 0.01 and alno to 2 as per Oliva et al. 2021 (10.1093/bib/bbab076)
- Updated workflow diagrams to reflect latest functionality
- [#787](https://github.com/nf-core/eager/issues/787) Adds memory specification flags for the GATK UnifiedGenotyper and HaplotyperCaller steps (thanks to @nylander)
- Fixed issue where MultiVCFAnalyzer would not pick up newly generated VCF files, when specifying additional VCF files.
- [#790](https://github.com/nf-core/eager/issues/790) Fixed kraken2 report file-name collision when sample names have `.` in them
- [#792](https://github.com/nf-core/eager/issues/792) Fixed java error messages for AdapterRemovalFixPrefix being hidden in output
- [#794](https://github.com/nf-core/eager/issues/794) Aligned default test profile with nf-core standards (`test_tsv` is now `test`)

### `Dependencies`

- Bumped python: 3.7.3 -> 3.9.4
- Bumped markdown: 3.2.2 -> 3.3.4
- Bumped pymdown-extensions: 7.1 -> 8.2
- Bumped pyments: 2.6.1 -> 2.9.0
- Bumped adapterremoval: 2.3.1 -> 2.3.2
- Bumped picard: 2.22.9 -> 2.26.0
- Bumped samtools 1.9 -> 1.12
- Bumped angsd: 0.933 -> 0.935
- Bumped gatk4: 4.1.7.0 -> 4.2.0.0
- Bumped multiqc: 1.10.1 -> 1.11
- Bumped bedtools 2.29.2 -> 2.30.0
- Bumped libiconv: 1.15 -> 1.16
- Bumped preseq: 2.0.3 -> 3.1.2
- Bumped bamutil: 1.0.14 -> 1.0.15
- Bumped pysam: 0.15.4 -> 0.16.0
- Bumped kraken2: 2.1.1 -> 2.1.2
- Bumped pandas: 1.0.4 -> 1.2.4
- Bumped freebayes: 1.3.2 -> 1.3.5
- Bumped biopython: 1.76 -> 1.79
- Bumped xopen: 0.9.0 -> 1.1.0
- Bumped bowtie2: 2.4.2 -> 2.4.4
- Bumped mapdamage2: 2.2.0 -> 2.2.1
- Bumped bbmap: 38.87 -> 38.92
- Added bcftools: 1.12

### `Deprecated`

## [2.3.5] - 2021-06-03

### `Added`

- [#722](https://github.com/nf-core/eager/issues/722) - Adds bwa `-o` flag for more flexibility in bwa parameters
- [#736](https://github.com/nf-core/eager/issues/736) - Add printing of multiqc run report location on successful completion
- New logo that is more visible when a user is using darkmode on GitHub or nf-core website!

### `Fixed`

- [#723](https://github.com/nf-core/eager/issues/723) - Fixes empty fields in TSV resulting in uninformative error
- Updated template to nf-core/tools 1.14
- [#688](https://github.com/nf-core/eager/issues/688) - Clarified the pipeline is not just for humans and microbes, but also plants and animals, and also for modern DNA
- [#751](https://github.com/nf-core/eager/pull/751) - Added missing label to mtnucratio
- General code cleanup and standardisation of parameters with no default setting
- [#750](https://github.com/nf-core/eager/issues/750) - Fixed piped commands requesting the same number of CPUs at each command step
- [#757](https://github.com/nf-core/eager/issues/757) - Removed confusing 'Data Type' variable from MultiQC workflow summary (not consistent with TSV input)
- [#759](https://github.com/nf-core/eager/pull/759) - Fixed malformed software scraping regex that resulted in N/A in MultiQC report
- [#761](https://github.com/nf-core/eager/pull/759) - Fixed issues related to instability of samtools filtering related CI tests

### `Dependencies`

### `Deprecated`

## [2.3.4] - 2021-05-05

### `Added`

- [#729](https://github.com/nf-core/eager/issues/729) - Added Bowtie2 flag `--maxins` for PE mapping modern DNA mapping contexts

### `Fixed`

- Corrected explanation of the "--min_adap_overlap" parameter for AdapterRemoval in the docs
- [#725](https://github.com/nf-core/eager/pull/725) - `bwa_index` doc update
- Re-adds gzip piping to AdapterRemovalFixPrefix to speed up process after reports of being very slow
- Updated DamageProfiler citation from bioRxiv to publication

### `Dependencies`

- Removed pinning of `tbb` (upstream bug in bioconda fixed)
- Bumped `pigz` to 2.6 to fix rare stall bug when compressing data after AdapterRemoval
- Bumped Bowtie2 to 2.4.2 to fix issues with `tbb` version

### `Deprecated`

## [2.3.3] - 2021-04-08

### `Added`

- [#349](https://github.com/nf-core/eager/issues/349) - Added option enabling platypus formatted output of pmdtools misincorporation frequencies.

### `Fixed`

- [#719](https://github.com/nf-core/eager/pull/719) - Fix filename for bam output of `mapdamage_rescaling`
- [#707](https://github.com/nf-core/eager/pull/707) - Fix typo in UnifiedGenotyper IndelRealigner command
- Fixed some Java tools not following process memory specifications
- Updated template to nf-core/tools 1.13.2
- [#711](https://github.com/nf-core/eager/pull/711) - Fix conditional execution preventing multivcfanalyze to run
- [#714](https://github.com/nf-core/eager/issues/714) - Fixes bug in nuc contamination by upgrading to latest MultiQC v1.10.1 bugfix release

### `Dependencies`

### `Deprecated`

## [2.3.2] - 2021-03-16

### `Added`

- [#687](https://github.com/nf-core/eager/pull/687) - Adds Kraken2 unique kmer counting report
- [#676](https://github.com/nf-core/eager/issues/676) - Refactor help message / summary message formatting to automatic versions using nf-core library
- [#682](https://github.com/nf-core/eager/issues/682) - Add AdapterRemoval `--qualitymax` flag to allow FASTQ Phred score range max more than 41

### `Fixed`

- [#666](https://github.com/nf-core/eager/issues/666) - Fixed input file staging for `print_nuclear_contamination`
- [#631](https://github.com/nf-core/eager/issues/631) - Update minimum Nextflow version to 20.07.1, due to unfortunate bug in Nextflow 20.04.1 causing eager to crash if patch pulled
- Made MultiQC crash behaviour stricter when dealing with large datasets, as reported by @ashildv
- [#652](https://github.com/nf-core/eager/issues/652) - Added note to documentation that when using `--skip_collapse` this will use _paired-end_ alignment mode with mappers when using PE data
- [#626](https://github.com/nf-core/eager/issues/626) - Add additional checks to ensure pipeline will give useful error if cells of a TSV column are empty
- Added note to documentation that when using `--skip_collapse` this will use _paired-end_ alignment mode with mappers when using PE data
- [#673](https://github.com/nf-core/eager/pull/673) - Fix Kraken database loading when loading from directory instead of compressed file
- [#688](https://github.com/nf-core/eager/issues/668) - Allow pipeline to complete, even if Qualimap crashes due to an empty or corrupt BAM file for one sample/library
- [#683](https://github.com/nf-core/eager/pull/683) - Sets `--igenomes_ignore` to true by default, as rarely used by users currently and makes resolving configs less complex
- Added exit code `140` to re-tryable exit code list to account for certain scheduler wall-time limit fails
- [#672](https://github.com/nf-core/eager/issues/672) - Removed java parameter from picard tools which could cause memory issues
- [#679](https://github.com/nf-core/eager/issues/679) - Refactor within-process bash conditions to groovy/nextflow, due to incompatibility with some servers environments
- [#690](https://github.com/nf-core/eager/pull/690) - Fixed ANGSD output mode for beagle by setting `-doMajorMinor 1` as default in that case
- [#693](https://github.com/nf-core/eager/issues/693) - Fixed broken TSV input validation for the Colour Chemistry column
- [#695](https://github.com/nf-core/eager/issues/695) - Fixed incorrect `-profile` order in tutorials (originally written reversed due to [nextflow bug](https://github.com/nextflow-io/nextflow/issues/1792))
- [#653](https://github.com/nf-core/eager/issues/653) - Fixed file collision errors with sexdeterrmine for two same-named libraries with different strandedness

### `Dependencies`

- Bumped MultiQC to 1.10 for improved functionality
- Bumped HOPS to 0.35 for MultiQC 1.10 compatibility

### `Deprecated`

## [2.3.1] - 2021-01-14

### `Added`

### `Fixed`

- [#654](https://github.com/nf-core/eager/issues/654) - Fixed some values in JSON schema (used in launch GUI) not passing validation checks during run
- [#655](https://github.com/nf-core/eager/issues/655) - Updated read groups for all mappers to allow proper GATK validation
- Fixed issue with Docker container not being pullable by Nextflow due to version-number inconsistencies

### `Dependencies`

### `Deprecated`

## [2.3.0] - Aalen - 2021-01-11

### `Added`

- [#640](https://github.com/nf-core/eager/issues/640) - Added a pre-metagenomic screening filtering of low-sequence complexity reads with `bbduk`
- [#583](https://github.com/nf-core/eager/issues/583) - Added `mapDamage2` rescaling of BAM files to remove damage
- Updated usage (merging files) and workflow images reflecting new functionality.

### `Fixed`

- Removed leftover old DockerHub push CI commands.
- [#627](https://github.com/nf-core/eager/issues/627) - Added de Barros Damgaard citation to README
- [#630](https://github.com/nf-core/eager/pull/630) - Better handling of Qualimap memory requirements and error strategy.
- Fixed some incomplete schema options to ensure users supply valid input values
- [#638](https://github.com/nf-core/eager/issues/638#issuecomment-748877567) Fixed inverted circularfilter filtering (previously filtering would happen by default, not when requested by user as originally recorded in documentation)
- [DeDup:](https://github.com/apeltzer/DeDup/commit/07d47868f10a6830da8c9161caa3755d9da155bf) Fixed Null Pointer Bug in DeDup by updating to 0.12.8 version
- [#650](https://github.com/nf-core/eager/pull/650) - Increased memory given to FastQC for larger files by making it multithreaded

### `Dependencies`

- Update: DeDup v0.12.7 to v0.12.8

### `Deprecated`

## [2.2.2] - 2020-12-09

### `Added`

- Added large scale 'stress-test' profile for AWS (using de Barros Damgaard et al. 2018's 137 ancient human genomes).
  - This will now be run automatically for every release. All processed data will be available on the nf-core website: <https://nf-co.re/eager/results>
    - You can run this yourself using `-profile test_full`

### `Fixed`

- Fixed AWS full test profile.
- [#587](https://github.com/nf-core/eager/issues/587) - Re-implemented AdapterRemovalFixPrefix for DeDup compatibility of including singletons
- [#602](https://github.com/nf-core/eager/issues/602) - Added the newly available GATK 3.5 conda package.
- [#610](https://github.com/nf-core/eager/issues/610) - Create bwa_index channel when specifying circularmapper as mapper
- Updated template to nf-core/tools 1.12.1
- General documentation improvements

### `Deprecated`

- Flag `--gatk_ug_jar` has now been removed as GATK 3.5 is now avaliable within the nf-core/eager software environment.

## [2.2.1] - 2020-10-20

### `Fixed`

- [#591](https://github.com/nf-core/eager/issues/591) - Fixed offset underlines in lane merging diagram in docs
- [#592](https://github.com/nf-core/eager/issues/592) - Fixed issue where supplying Bowtie2 index reported missing bwamem_index error
- [#590](https://github.com/nf-core/eager/issues/592) - Removed redundant dockstore.yml from root
- [#596](https://github.com/nf-core/eager/issues/596) - Add workaround for issue regarding gzipped FASTAs and pre-built indices
- [#589](https://github.com/nf-core/eager/issues/582) - Updated template to nf-core/tools 1.11
- [#582](https://github.com/nf-core/eager/issues/582) - Clarify memory limit issue on FAQ

## [2.2.0] - Ulm - 2020-10-20

### `Added`

- **Major** Automated cloud tests with large-scale data on [AWS](https://aws.amazon.com/)
- **Major** Re-wrote input logic to accept a TSV 'map' file in addition to direct paths to FASTQ files
- **Major** Added JSON Schema, enabling web GUI for configuration of pipeline available [here](https://nf-co.re/launch?pipeline=eager&release=2.2.0)
- **Major** Lane and library merging implemented
  - When using TSV input, one library with the multiple _lanes_ will be merged together, before mapping
  - Strip FASTQ will also produce a lane merged 'raw' but 'stripped' FASTQ file
  - When using TSV input, one sample with multiple (same treatment) libraries will be merged together
  - Important: direct FASTQ paths will not have this functionality. TSV is required.
- [#40](https://github.com/nf-core/eager/issues/40) - Added the pileupCaller genotyper from [sequenceTools](https://github.com/stschiff/sequenceTools)
- Added validation check and clearer error message when `--fasta_index` is provided and filepath does not end in `.fai`.
- Improved error messages
- Added ability for automated emails using `mailutils` to also send MultiQC reports
- General documentation additions, cleaning, and updated figures with CC-BY license
- Added large 'full size' dataset test-profiles for ancient fish and human contexts human
- [#257](https://github.com/nf-core/eager/issues/257) - Added the bowtie2 aligner as option for mapping, following Poullet and Orlando 2020 doi: [10.3389/fevo.2020.00105](https://doi.org/10.3389/fevo.2020.00105)
- [#451](https://github.com/nf-core/eager/issues/451) - Adds ANGSD genotype likelihood calculations as an alternative to typical 'genotypers'
- [#566](https://github.com/nf-core/eager/issues/466) - Add tutorials on how to set up nf-core/eager for different contexts
- Nuclear contamination results are now shown in the MultiQC report
- Tutorial on how to use profiles for reproducible science (i.e. parameter sharing between different groups)
- [#522](https://github.com/nf-core/eager/issues/522) - Added post-mapping length filter to assist in more realistic endogenous DNA calculations
- [#512](https://github.com/nf-core/eager/issues/512) - Added flexible trimming of BAMs by library type. 'half' and 'none' UDG libraries can now be trimmed differentially within a single eager run.
- Added a `.dockstore.yml` config file for automatic workflow registration with [dockstore.org](https://dockstore.org/)
- Updated template to nf-core/tools 1.10.2
- [#544](https://github.com/nf-core/eager/pull/544) - Add script to perform bam filtering on fragment length
- [#456](https://github.com/nf-core/eager/pull/546) - Bumps the base (default) runtime of all processes to 4 hours, and set shorter time limits for test profiles (1 hour)
- [#552](https://github.com/nf-core/eager/issues/552) - Adds optional creation of MALT SAM files alongside RMA6 files
- Added eigenstrat snp coverage statistics to MultiQC report. Process results are published in `genotyping/*_eigenstrat_coverage.txt`.

### `Fixed`

- [#368](https://github.com/nf-core/eager/issues/368) - Fixed the profile `test` to contain a parameter for `--paired_end`
- Mini bugfix for typo in line 1260+1261
- [#374](https://github.com/nf-core/eager/issues/374) - Fixed output documentation rendering not containing images
- [#379](https://github.com/nf-core/eager/issues/378) - Fixed insufficient memory requirements for FASTQC edge case
- [#390](https://github.com/nf-core/eager/issues/390) - Renamed clipped/merged output directory to be more descriptive
- [#398](https://github.com/nf-core/eager/issues/498) - Stopped incompatible FASTA indexes being accepted
- [#400](https://github.com/nf-core/eager/issues/400) - Set correct recommended bwa mapping parameters from [Schubert et al. 2012](https://doi.org/10.1186/1471-2164-13-178)
- [#410](https://github.com/nf-core/eager/issues/410) - Fixed nf-core/configs not being loaded properly
- [#473](https://github.com/nf-core/eager/issues/473) - Fixed bug in sexdet_process on AWS
- [#444](https://github.com/nf-core/eager/issues/444) - Provide option for preserving realigned bam + index
- Fixed deduplication output logic. Will now pass along only the post-rmdup bams if duplicate removal is not skipped, instead of both the post-rmdup and pre-rmdup bams
- [#497](https://github.com/nf-core/eager/issues/497) - Simplifies number of parameters required to run bam filtering
- [#501](https://github.com/nf-core/eager/issues/501) - Adds additional validation checks for MALT/MaltExtract database input files
- [#508](https://github.com/nf-core/eager/issues/508) - Made Markduplicates default dedupper due to narrower context specificity of dedup
- [#516](https://github.com/nf-core/eager/issues/516) - Made bedtools not report out of memory exit code when warning of inconsistent FASTA/Bed entry names
- [#504](https://github.com/nf-core/eager/issues/504) - Removed uninformative sexdeterrmine-snps plot from MultiQC report.
- Nuclear contamination is now reported with the correct library names.
- [#531](https://github.com/nf-core/eager/pull/531) - Renamed 'FASTQ stripping' to 'host removal'
- Merged all tutorials and FAQs into `usage.md` for display on [nf-co.re](https://www.nf-co.re)
- Corrected header of nuclear contamination table (`nuclear_contamination.txt`).
- Fixed a bug with `nSNPs` definition in `print_x_contamination.py`. Number of SNPs now correctly reported
- `print_x_contamination.py` now correctly converts all NA values to "N/A"
- Increased amount of memory MultiQC by default uses, to account for very large nf-core/eager runs (e.g. >1000 samples)

### `Dependencies`

- Added sequenceTools (1.4.0.6) that adds the ability to do genotyping with the 'pileupCaller'
- Latest version of DeDup (0.12.6) which now reports mapped reads after deduplication
- [#560](https://github.com/nf-core/eager/issues/560) Latest version of Dedup (0.12.7), which now correctly reports deduplication statistics based on calculations of mapped reads only (prior denominator was total reads of BAM file)
- Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficient reads
- Latest version of MultiQC (1.9) with support for lots of extra tools in the pipeline (MALT, SexDetERRmine, DamageProfiler, MultiVCFAnalyzer)
- Latest versions of Pygments (7.1), Pymdown-Extensions (2.6.1) and Markdown (3.2.2) for documentation output
- Latest version of Picard (2.22.9)
- Latest version of GATK4 (4.1.7.0)
- Latest version of sequenceTools (1.4.0.6)
- Latest version of fastP (0.20.1)
- Latest version of Kraken2 (2.0.9beta)
- Latest version of FreeBayes (1.3.2)
- Latest version of xopen (0.9.0)
- Added Bowtie 2 (2.4.1)
- Latest version of Sex.DetERRmine (1.1.2)
- Latest version of endorS.py (0.4)

## [2.1.0] - Ravensburg - 2020-03-05

### `Added`

- Added Support for automated tests using [GitHub Actions](https://github.com/features/actions), replacing travis
- [#40](https://github.com/nf-core/eager/issues/40), [#231](https://github.com/nf-core/eager/issues/231) - Added genotyping capability through GATK UnifiedGenotyper (v3.5), GATK HaplotypeCaller (v4.1) and FreeBayes
- Added MultiVCFAnalyzer module
- [#240](https://github.com/nf-core/eager/issues/240) - Added human sex determination module
- [#226](https://github.com/nf-core/eager/issues/226) - Added `--preserve5p` function for AdapterRemoval
- [#212](https://github.com/nf-core/eager/issues/212) - Added ability to use only merged reads downstream from AdapterRemoval
- [#265](https://github.com/nf-core/eager/issues/265) - Adjusted full markdown linting in Travis CI
- [#247](https://github.com/nf-core/eager/issues/247) - Added nuclear contamination with angsd
- [#258](https://github.com/nf-core/eager/issues/258) - Added ability to report bedtools stats to features (e.g. depth/breadth of annotated genes)
- [#249](https://github.com/nf-core/eager/issues/249) - Added metagenomic classification of unmapped reads with MALT and aDNA authentication with MaltExtract
- [#302](https://github.com/nf-core/eager/issues/302) - Added mitochondrial to nuclear ratio calculation
- [#302](https://github.com/nf-core/eager/issues/302) - Added VCF2Genome for consensus sequence generation
- Fancy new logo from [ZandraFagernas](https://github.com/ZandraFagernas)
- [#286](https://github.com/nf-core/eager/issues/286) - Adds pipeline-specific profiles (loaded from nf-core configs)
- [#310](https://github.com/nf-core/eager/issues/310) - Generalises base.config
- [#326](https://github.com/nf-core/eager/pull/326) - Add Biopython and [xopen](https://github.com/marcelm/xopen/) dependencies
- [#336](https://github.com/nf-core/eager/issues/336) - Change default Y-axis maximum value of DamageProfiler to 30% to match popular (but slower) mapDamage, and allow user to set their own value.
- [#352](https://github.com/nf-core/eager/pull/352) - Add social preview image
- [#355](https://github.com/nf-core/eager/pull/355) - Add Kraken2 metagenomics classifier
- [#90](https://github.com/nf-core/eager/issues/90) - Added endogenous DNA calculator (original repository: [https://github.com/aidaanva/endorS.py/](https://github.com/aidaanva/endorS.py/))

### `Fixed`

- [#227](https://github.com/nf-core/eager/issues/227) - Large re-write of input/output process logic to allow maximum flexibility. Originally to address [#227](https://github.com/nf-core/eager/issues/227), but further expanded
- Fixed Travis-Ci.org to Travis-Ci.com migration issues
- [#266](https://github.com/nf-core/eager/issues/266) - Added sanity checks for input filetypes (i.e. only BAM files can be supplied if `--bam`)
- [#237](https://github.com/nf-core/eager/issues/237) - Fixed and Updated script scrape_software_versions
- [#322](https://github.com/nf-core/eager/pull/322) - Move extract map reads fastq compression to pigz
- [#327](https://github.com/nf-core/eager/pull/327) - Speed up strip_input_fastq process and make it more robust
- [#342](https://github.com/nf-core/eager/pull/342) - Updated to match nf-core tools 1.8 linting guidelines
- [#339](https://github.com/nf-core/eager/issues/339) - Converted unnecessary zcat + gzip to just cat for a performance boost
- [#344](https://github.com/nf-core/eager/issues/344) - Fixed pipeline still trying to run when using old nextflow version

### `Dependencies`

- adapterremoval=2.2.2 upgraded to 2.3.1
- adapterremovalfixprefix=0.0.4 upgraded to 0.0.5
- damageprofiler=0.4.3 upgraded to 0.4.9
- angsd=0.923 upgraded to 0.931
- gatk4=4.1.2.0 upgraded to 4.1.4.1
- mtnucratio=0.5 upgraded to 0.6
- conda-forge::markdown=3.1.1 upgraded to 3.2.1
- bioconda::fastqc=0.11.8 upgraded to 0.11.9
- bioconda::picard=2.21.4 upgraded to 2.22.0
- bioconda::bedtools=2.29.0 upgraded to 2.29.2
- pysam=0.15.3 upgraded to 0.15.4
- conda-forge::pandas=1.0.0 upgraded to 1.0.1
- bioconda::freebayes=1.3.1 upgraded to 1.3.2
- conda-forge::biopython=1.75 upgraded to 1.76

## [2.0.7] - 2019-06-10

### `Added`

- [#189](https://github.com/nf-core/eager/pull/189) - Outputting unmapped reads in a fastq files with the --strip_input_fastq flag
- [#186](https://github.com/nf-core/eager/pull/186) - Make FastQC skipping [possible](https://github.com/nf-core/eager/issues/182)
- Merged in [nf-core/tools](https://github.com/nf-core/tools) release V1.6 template changes
- A lot more automated tests using Travis CI
- Don't ignore DamageProfiler errors any more
- [#220](https://github.com/nf-core/eager/pull/220) - Added post-mapping filtering statistics module and corresponding MultiQC statistics [#217](https://github.com/nf-core/eager/issues/217)

### `Fixed`

- [#152](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline any more](https://github.com/nf-core/eager/issues/171)
- [#176](https://github.com/nf-core/eager/pull/176) - Increase runtime for DamageProfiler on [large reference genomes](https://github.com/nf-core/eager/issues/173)
- [#172](https://github.com/nf-core/eager/pull/152) - DamageProfiler errors [won't crash entire pipeline any more](https://github.com/nf-core/eager/issues/171)
- [#174](https://github.com/nf-core/eager/pull/190) - Publish DeDup files [properly](https://github.com/nf-core/eager/issues/183)
- [#196](https://github.com/nf-core/eager/pull/196) - Fix reference [issues](https://github.com/nf-core/eager/issues/150)
- [#196](https://github.com/nf-core/eager/pull/196) - Fix issues with PE data being mapped incompletely
- [#200](https://github.com/nf-core/eager/pull/200) - Fix minor issue with some [typos](https://github.com/nf-core/eager/pull/196)
- [#210](https://github.com/nf-core/eager/pull/210) - Fix PMDTools [encoding issue](https://github.com/pontussk/PMDtools/issues/6) from `samtools calmd` generated files by running through `sa]mtools view` first
- [#221](https://github.com/nf-core/eager/pull/221) - Fix BWA Index [not being reused by multiple samples](https://github.com/nf-core/eager/issues/219)

### `Dependencies`

- Added DeDup v0.12.5 (json support)
- Added mtnucratio v0.5 (json support)
- Updated Picard 2.18.27 -> 2.20.2
- Updated GATK 4.1.0.0 -> 4.1.2.0
- Updated damageprofiler 0.4.4 -> 0.4.5
- Updated r-rmarkdown 1.11 -> 1.12
- Updated fastp 0.19.7 -> 0.20.0
- Updated qualimap 2.2.2b -> 2.2.2c

## [2.0.6] - 2019-03-05

### `Added`

- [#152](https://github.com/nf-core/eager/pull/152) - Clarified `--complexity_filter` flag to be specifically for poly G trimming.
- [#155](https://github.com/nf-core/eager/pull/155) - Added [Dedup log to output folders](https://github.com/nf-core/eager/issues/154)
- [#159](https://github.com/nf-core/eager/pull/159) - Added Possibility to skip AdapterRemoval, skip merging, skip trimming fixing [#64](https://github.com/nf-core/eager/issues/64),[#137](https://github.com/nf-core/eager/issues/137) - thanks to @maxibor, @jfy133

### `Fixed`

- [#151](https://github.com/nf-core/eager/pull/151) - Fixed [post-deduplication step errors](https://github.com/nf-core/eager/issues/128)
- [#147](https://github.com/nf-core/eager/pull/147) - Fix Samtools Index for [large references](https://github.com/nf-core/eager/issues/146)
- [#145](https://github.com/nf-core/eager/pull/145) - Added Picard Memory Handling [fix](https://github.com/nf-core/eager/issues/144)

### `Dependencies`

- Picard Tools 2.18.23 -> 2.18.27
- GATK 4.0.12.0 -> 4.1.0.0
- FastP 0.19.6 -> 0.19.7

## [2.0.5] - 2019-01-28

### `Added`

- [#127](https://github.com/nf-core/eager/pull/127) - Added a second test case for testing the pipeline properly
- [#129](https://github.com/nf-core/eager/pull/129) - Support BAM files as [input format](https://github.com/nf-core/eager/issues/41)
- [#131](https://github.com/nf-core/eager/pull/131) - Support different [reference genome file extensions](https://github.com/nf-core/eager/issues/130)

### `Fixed`

- [#128](https://github.com/nf-core/eager/issues/128) - Fixed reference genome handling errors

### `Dependencies`

- Picard Tools 2.18.21 -> 2.18.23
- R-Markdown 1.10 -> 1.11
- FastP 0.19.5 -> 0.19.6

## [2.0.4] - 2019-01-09

### `Added`

- [#111](https://github.com/nf-core/eager/pull/110) - Allow [Zipped FastA reference input](https://github.com/nf-core/eager/issues/91)
- [#113](https://github.com/nf-core/eager/pull/113) - All files are now staged via channels, which is considered best practice by Nextflow
- [#114](https://github.com/nf-core/eager/pull/113) - Add proper runtime defaults for multiple processes
- [#118](https://github.com/nf-core/eager/pull/118) - Add [centralized configs handling](https://github.com/nf-core/configs)
- [#115](https://github.com/nf-core/eager/pull/115) - Add DamageProfiler MultiQC support
- [#122](https://github.com/nf-core/eager/pull/122) - Add pulling from Dockerhub again

### `Fixed`

- [#110](https://github.com/nf-core/eager/pull/110) - Fix for [MultiQC Missing Second FastQC report](https://github.com/nf-core/eager/issues/107)
- [#112](https://github.com/nf-core/eager/pull/112) - Remove [redundant UDG options](https://github.com/nf-core/eager/issues/89)

## [2.0.3] - 2018-12-12

### `Added`

- [#80](https://github.com/nf-core/eager/pull/80) - BWA Index file handling
- [#77](https://github.com/nf-core/eager/pull/77) - Lots of documentation updates by [@jfy133](https://github.com/jfy133)
- [#81](https://github.com/nf-core/eager/pull/81) - Renaming of certain BAM options
- [#92](https://github.com/nf-core/eager/issues/92) - Complete restructure of BAM options

### `Fixed`

- [#84](https://github.com/nf-core/eager/pull/85) - Fix for [Samtools index issues](https://github.com/nf-core/eager/issues/84)
- [#96](https://github.com/nf-core/eager/issues/96) - Fix for [MarkDuplicates issues](https://github.com/nf-core/eager/issues/96) found by [@nilesh-tawari](https://github.com/nilesh-tawari)

### Other

- Added Slack button to repository readme

## [2.0.2] - 2018-11-03

### `Changed`

- [#70](https://github.com/nf-core/eager/issues/70) - Uninitialized `readPaths` warning removed

### `Added`

- [#73](https://github.com/nf-core/eager/pull/73) - Travis CI Testing of Conda Environment added

### `Fixed`

- [#72](https://github.com/nf-core/eager/issues/72) - iconv Issue with R in conda environment

## [2.0.1] - 2018-11-02

### `Fixed`

- [#69](https://github.com/nf-core/eager/issues/67) - FastQC issues with conda environments

## [2.0.0] "Kaufbeuren" - 2018-10-17

Initial release of nf-core/eager:

### `Added`

- FastQC read quality control
- (Optional) Read complexity filtering with FastP
- Read merging and clipping using AdapterRemoval v2
- Mapping using BWA / BWA Mem or CircularMapper
- Library Complexity Estimation with Preseq
- Conversion and Filtering of BAM files using Samtools
- Damage assessment via DamageProfiler, additional filtering using PMDTools
- Duplication removal via DeDup
- BAM Clipping with BamUtil for UDGhalf protocols
- QualiMap BAM quality control analysis

Furthermore, this already creates an interactive report using MultiQC, which will be upgraded in V2.1 "Ulm" to contain more aDNA specific metrics.


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Code of Conduct at nf-core (v1.0)

## Our Pledge

In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of:

- Age
- Body size
- Familial status
- Gender identity and expression
- Geographical location
- Level of experience
- Nationality and national origins
- Native language
- Physical and neurological ability
- Race or ethnicity
- Religion
- Sexual identity and orientation
- Socioeconomic status

Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance.

## Preamble

> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply.

An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva.

nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals.

We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc.

Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities.

We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC.

Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re

## Our Responsibilities

The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour.

The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC.

## When are where does this Code of Conduct apply?

Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference:

- Communicating with an official project email address.
- Communicating with community members within the nf-core Slack channel.
- Participating in hackathons organised by nf-core (both online and in-person events).
- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence.
- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc.
- Representing nf-core on social media. This includes both official and personal accounts.

## nf-core cares 😊

nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order):

- Ask for consent before sharing another community member’s personal information (including photographs) on social media.
- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity.
- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !)
- Demonstrate empathy towards other community members. (We don’t all have the same amount of time to dedicate to nf-core. If tasks are pending, don’t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.)
- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so let’s do this the best we can)
- Focus on what is best for the team and the community. (When in doubt, ask)
- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn.
- Introduce yourself to members of the community. (We’ve all been outsiders and we know that talking to strangers can be hard for some, but remember we’re interested in getting to know you and your visions for open science!)
- Show appreciation and **provide clear feedback**. (This is especially important because we don’t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**)
- Take breaks when you feel like you need them.
- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.)

## nf-core frowns on 😕

The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces.

- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom.
- “Doxing” i.e. posting (or threatening to post) another person’s personal identifying information online.
- Spamming or trolling of individuals on social media.
- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention.
- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience.

### Online Trolling

The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately.

All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls.

## Procedures for Reporting CoC violations

If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible.

You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s).

Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course.

All reports will be handled with utmost discretion and confidentially.

## Attribution and Acknowledgements

- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4)
- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition)
- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/)
- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla)

## Changelog

### v1.0 - March 12th, 2021

- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC.


================================================
FILE: Dockerfile
================================================
FROM nfcore/base:1.14
LABEL authors="The nf-core/eager community" \
      description="Docker image containing all software requirements for the nf-core/eager pipeline"

# Install the conda environment
COPY environment.yml /
RUN conda env create --quiet -f /environment.yml && conda clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-eager-2.5.3/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-eager-2.5.3 > nf-core-eager-2.5.3.yml

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) The nf-core/eager community

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# ![nf-core/eager](docs/images/nf-core_eager_logo_outline_drop.png)

**A fully reproducible and state-of-the-art ancient DNA analysis pipeline**.

[![GitHub Actions CI Status](https://github.com/nf-core/eager/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/eager/actions)
[![GitHub Actions Linting Status](https://github.com/nf-core/eager/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/eager/actions)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.07.1-brightgreen.svg)](https://www.nextflow.io/)
[![nf-core](https://img.shields.io/badge/nf--core-pipeline-brightgreen.svg)](https://nf-co.re/)
[![DOI](https://zenodo.org/badge/135918251.svg)](https://zenodo.org/badge/latestdoi/135918251)
[![Published in PeerJ](https://img.shields.io/badge/peerj-published-%2300B2FF)](https://peerj.com/articles/10947/)

[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/)
[![Docker](https://img.shields.io/docker/automated/nfcore/eager.svg)](https://hub.docker.com/r/nfcore/eager)
![Singularity Container available](https://img.shields.io/badge/singularity-available-7E4C74.svg)

[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23eager-4A154B?logo=slack)](https://nfcore.slack.com/channels/eager)

>[!IMPORTANT]  
> nf-core/eager versions 2.* are only compatible with Nextflow versions up to 22.10.6!

## Introduction

<!-- nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
**nf-core/eager** is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.

<p align="center">
    <img src="docs/images/usage/eager2_workflow.png" alt="nf-core/eager schematic workflow" width="70%"
</p>

## Quick Start

1. Install [`nextflow`](https://nf-co.re/usage/installation) (`>=20.07.1` && `<=22.10.6`)

2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_

3. Download the pipeline and test it on a minimal dataset with a single command:

    ```bash
    nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
    ```

    > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.

4. Start running your own analysis!

    ```bash
    nextflow run nf-core/eager -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
    ```

5. Once your run has completed successfully, clean up the intermediate files.

    ```bash
    nextflow clean -f -k
    ```

See [usage docs](https://nf-co.re/eager/usage) for all of the available options when running the pipeline.

**N.B.** You can see an overview of the run in the MultiQC report located at `./results/MultiQC/multiqc_report.html`

Modifications to the default pipeline are easily made using various options as described in the documentation.

## Pipeline Summary

### Default Steps

By default the pipeline currently performs the following:

* Create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
* Sequencing quality control (`FastQC`)
* Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)
* Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, or `bowtie2`)
* Post-mapping processing, statistics and conversion to bam (`samtools`)
* Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler` or `mapDamage`)
* PCR duplicate removal (`DeDup` or `MarkDuplicates`)
* Post-mapping statistics and BAM quality control (`Qualimap`)
* Library Complexity Estimation (`preseq`)
* Overall pipeline statistics summaries (`MultiQC`)

### Additional Steps

Additional functionality contained by the pipeline currently includes:

#### Input

* Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)

#### Preprocessing

* Illumina two-coloured sequencer poly-G tail removal (`fastp`)
* Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)
* Automatic conversion of unmapped reads to FASTQ (`samtools`)
* Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)

#### aDNA Damage manipulation

* Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)
* Damaged reads extraction and assessment (`PMDTools`)
* Nuclear DNA contamination estimation of human samples (`angsd`)

#### Genotyping

* Creation of VCF genotyping files (`GATK UnifiedGenotyper`, `GATK HaplotypeCaller` and `FreeBayes`)
* Creation of EIGENSTRAT genotyping files (`pileupCaller`)
* Creation of Genotype Likelihood files (`angsd`)
* Consensus sequence FASTA creation (`VCF2Genome`)
* SNP Table generation (`MultiVCFAnalyzer`)

#### Biological Information

* Mitochondrial to Nuclear read ratio calculation (`MtNucRatioCalculator`)
* Statistical sex determination of human individuals (`Sex.DetERRmine`)

#### Metagenomic Screening

* Low-sequenced complexity filtering (`BBduk`)
* Taxonomic binner with alignment (`MALT`)
* Taxonomic binner without alignment (`Kraken2`)
* aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)

#### Functionality Overview

A graphical overview of suggested routes through the pipeline depending on context can be seen below.

<p align="center">
    <img src="docs/images/usage/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%"
</p>

## Documentation

The nf-core/eager pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/eager/usage) and [output](https://nf-co.re/eager/output).

1. [Nextflow installation](https://nf-co.re/usage/installation)
2. Pipeline configuration
    * [Pipeline installation](https://nf-co.re/usage/local_installation)
    * [Adding your own system config](https://nf-co.re/usage/adding_own_config)
    * [Reference genomes](https://nf-co.re/usage/reference_genomes)
3. [Running the pipeline](https://nf-co.re/eager/usage)
   * This includes tutorials, FAQs, and troubleshooting instructions
4. [Output and how to interpret the results](https://nf-co.re/eager/output)

## Credits

This pipeline was mostly written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)) and [James A. Fellows Yates](https://github.com/jfy133), with contributions from [Stephen Clayton](https://github.com/sc13-bioinf), [Thiseas C. Lamnidis](https://github.com/TCLamnidis), [Maxime Borry](https://github.com/maxibor), [Zandra Fagernäs](https://github.com/ZandraFagernas), [Aida Andrades Valtueña](https://github.com/aidaanva) and [Maxime Garcia](https://github.com/MaxUlysse) and the nf-core community.

We thank the following people for their extensive assistance in the development
of this pipeline:

## Authors (alphabetical)

* [Aida Andrades Valtueña](https://github.com/aidaanva)
* [Alexander Peltzer](https://github.com/apeltzer)
* [James A. Fellows Yates](https://github.com/jfy133)
* [Judith Neukamm](https://github.com/JudithNeukamm)
* [Maxime Borry](https://github.com/maxibor)
* [Maxime Garcia](https://github.com/MaxUlysse)
* [Stephen Clayton](https://github.com/sc13-bioinf)
* [Thiseas C. Lamnidis](https://github.com/TCLamnidis)
* [Zandra Fagernäs](https://github.com/ZandraFagernas)

## Additional Contributors (alphabetical)

Those who have provided conceptual guidance, suggestions, bug reports etc.

* [Alex Hübner](https://github.com/alexhbnr)
* [Alexandre Gilardet](https://github.com/alexandregilardet)
* Arielle Munters
* [Åshild Vågene](https://github.com/ashildv)
* [Asmaa Ali](https://github.com/asmaa-a-abdelwahab)
* [Charles Plessy](https://github.com/charles-plessy)
* [Elina Salmela](https://github.com/esalmela)
* [Fabian Lehmann](https://github.com/Lehmann-Fabian)
* [He Yu](https://github.com/paulayu)
* [Hester van Schalkwyk](https://github.com/hesterjvs)
* [Ido Bar](https://github.com/IdoBar)
* [Irina Velsko](https://github.com/ivelsko)
* [Işın Altınkaya](https://github.com/isinaltinkaya)
* [Johan Nylander](https://github.com/nylander)
* [Jonas Niemann](https://github.com/NiemannJ)
* [Katerine Eaton](https://github.com/ktmeaton)
* [Kathrin Nägele](https://github.com/KathrinNaegele)
* [Kevin Lord](https://github.com/lordkev)
* [Laura Lacher](https://github.com/neija2611)
* [Luc Venturini](https://github.com/lucventurini)
* [Mahesh Binzer-Panchal](https://github.com/mahesh-panchal)
* [Marcel Keller](https://github.com/marcel-keller)
* [Megan Michel](https://github.com/meganemichel)
* [Pierre Lindenbaum](https://github.com/lindenb)
* [Pontus Skoglund](https://github.com/pontussk)
* [Raphael Eisenhofer](https://github.com/EisenRa)
* [Roberta Davidson](https://github.com/roberta-davidson)
* [Rodrigo Barquera](https://github.com/RodrigoBarquera)
* [Selina Carlhoff](https://github.com/scarlhoff)
* [Torsten Günter](https://bitbucket.org/tguenther)

If you've contributed and you're missing in here, please let us know and we will add you in of course!

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

For further information or help, don't hesitate to get in touch on the [Slack `#eager` channel](https://nfcore.slack.com/channels/eager) (you can join with [this invite](https://nf-co.re/join/slack)).

## Citations

If you use `nf-core/eager` for your analysis, please cite the `eager` preprint as follows:

> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947).

You can cite the eager zenodo record for a specific version using the following [doi: 10.5281/zenodo.3698082](https://zenodo.org/badge/latestdoi/135918251)

You can cite the `nf-core` publication as follows:

> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

In addition, references of tools and data used in this pipeline are as follows:

* **EAGER v1**, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z).  Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI)
* **FastQC** Download: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
* **AdapterRemoval v2** Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. [https://doi.org/10.1186/s13104-016-1900-2](https://doi.org/10.1186/s13104-016-1900-2). Download: [https://github.com/MikkelSchubert/adapterremoval](https://github.com/MikkelSchubert/adapterremoval)
* **bwa** Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324). Download: [http://bio-bwa.sourceforge.net/bwa.shtml](http://bio-bwa.sourceforge.net/bwa.shtml)
* **SAMtools** Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. [https://doi.org/10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352). Download: [http://www.htslib.org/](http://www.htslib.org/)
* **DamageProfiler** Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). [https://doi.org/10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: [https://github.com/Integrative-Transcriptomics/DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)
* **QualiMap** Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. [https://doi.org/10.1093/bioinformatics/btv566](https://doi.org/10.1093/bioinformatics/btv566). Download: [http://qualimap.bioinfo.cipf.es/](http://qualimap.bioinfo.cipf.es/)
* **preseq** Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. [https://doi.org/10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375). Download: [http://smithlabresearch.org/software/preseq/](http://smithlabresearch.org/software/preseq/)
* **PMDTools** Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. [https://doi.org/10.1073/pnas.1318934111](https://doi.org/10.1073/pnas.1318934111). Download: [https://github.com/pontussk/PMDtools](https://github.com/pontussk/PMDtools)
* **MultiQC** Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354). Download: [https://multiqc.info/](https://multiqc.info/)
* **BamUtils** Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. [https://doi.org/10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114). Download: [https://genome.sph.umich.edu/wiki/BamUtil](https://genome.sph.umich.edu/wiki/BamUtil)
* **FastP** Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. [https://doi.org/10.1093/bioinformatics/bty560](https://doi.org/10.1093/bioinformatics/bty560). Download: [https://github.com/OpenGene/fastp](https://github.com/OpenGene/fastp)
* **GATK 3.5** DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., … Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. [https://doi.org/10.1038/ng.806](https://doi.org/10.1038/ng.806.).Download: [https://console.cloud.google.com/storage/browser/gatk](https://console.cloud.google.com/storage/browser/gatk)
* **GATK 4.X** - no citation available yet. Download: [https://github.com/broadinstitute/gatk/releases](https://github.com/broadinstitute/gatk/releases)
* **VCF2Genome** - Alexander Herbig and Alex Peltzer (unpublished). Download: [https://github.com/apeltzer/VCF2Genome](https://github.com/apeltzer/VCF2Genome)
* **MultiVCFAnalyzer** Bos, K.I. et al., 2014. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature, 514(7523), pp.494–497. Available at: [http://dx.doi.org/10.1038/nature13591](http://dx.doi.org/10.1038/nature13591). Download: [https://github.com/alexherbig/MultiVCFAnalyzer](https://github.com/alexherbig/MultiVCFAnalyzer)
* **MTNucRatioCalculator** Alex Peltzter (Unpublished). Download: [https://github.com/apeltzer/MTNucRatioCalculator](https://github.com/apeltzer/MTNucRatioCalculator)
* **Sex.DetERRmine.py** Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: [http://dx.doi.org/10.1038/s41467-018-07483-5](http://dx.doi.org/10.1038/s41467-018-07483-5). Download: [https://github.com/TCLamnidis/Sex.DetERRmine.git](https://github.com/TCLamnidis/Sex.DetERRmine.git)
* **ANGSD** Korneliussen, T.S., Albrechtsen, A. & Nielsen, R., 2014. ANGSD: Analysis of Next Generation Sequencing Data. BMC bioinformatics, 15, p.356. Available at: [http://dx.doi.org/10.1186/s12859-014-0356-4](http://dx.doi.org/10.1186/s12859-014-0356-4). Download: [https://github.com/ANGSD/angsd](https://github.com/ANGSD/angsd)
* **bedtools** Quinlan, A.R. & Hall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics , 26(6), pp.841–842. Available at: [http://dx.doi.org/10.1093/bioinformatics/btq033](http://dx.doi.org/10.1093/bioinformatics/btq033). Download: [https://github.com/arq5x/bedtools2/releases](https://github.com/arq5x/bedtools2/)
* **MALT**. Download: [https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html](https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html)
  * Vågene, Å.J. et al., 2018. Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature ecology & evolution, 2(3), pp.520–528. Available at: [http://dx.doi.org/10.1038/s41559-017-0446-6](http://dx.doi.org/10.1038/s41559-017-0446-6).
  * Herbig, A. et al., 2016. MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman. bioRxiv, p.050559. Available at: [http://biorxiv.org/content/early/2016/04/27/050559](http://biorxiv.org/content/early/2016/04/27/050559).
* **MaltExtract** Huebler, R. et al., 2019. HOPS: Automated detection and authentication of pathogen DNA in archaeological remains. bioRxiv, p.534198. Available at: [https://www.biorxiv.org/content/10.1101/534198v1?rss=1](https://www.biorxiv.org/content/10.1101/534198v1?rss=1). Download: [https://github.com/rhuebler/MaltExtract](https://github.com/rhuebler/MaltExtract)
* **Kraken2** Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. Available at: [https://doi.org/10.1186/s13059-019-1891-0](https://doi.org/10.1186/s13059-019-1891-0). Download: [https://ccb.jhu.edu/software/kraken2/](https://ccb.jhu.edu/software/kraken2/)
* **endorS.py** Aida Andrades Valtueña (Unpublished). Download: [https://github.com/aidaanva/endorS.py](https://github.com/aidaanva/endorS.py)
* **Bowtie2**  Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: [10.1038/nmeth.1923](https:/dx.doi.org/10.1038/nmeth.1923).
* **sequenceTools** Stephan Schiffels (Unpublished). Download: [https://github.com/stschiff/sequenceTools](https://github.com/stschiff/sequenceTools)
* **EigenstratDatabaseTools** Thiseas C. Lamnidis (Unpublished). Download: [https://github.com/TCLamnidis/EigenStratDatabaseTools.git](https://github.com/TCLamnidis/EigenStratDatabaseTools.git)
* **mapDamage** Jónsson, H., et al 2013. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics , 29(13), 1682–1684. [https://doi.org/10.1093/bioinformatics/btt193](https://doi.org/10.1093/bioinformatics/btt193)
* **BBduk** Brian Bushnell (Unpublished). Download: [https://sourceforge.net/projects/bbmap/](sourceforge.net/projects/bbmap/)

## Data References

This repository uses test data from the following studies:

* Fellows Yates, J. A. et al. (2017) ‘Central European Woolly Mammoth Population Dynamics: Insights from Late Pleistocene Mitochondrial Genomes’, Scientific reports, 7(1), p. 17714. [doi: 10.1038/s41598-017-17723-1](https://doi.org/10.1038/s41598-017-17723-1).
* Gamba, C. et al. (2014) ‘Genome flux and stasis in a five millennium transect of European prehistory’, Nature communications, 5, p. 5257. [doi: 10.1038/ncomms6257](https://doi.org/10.1038/ncomms6257).
* Star, B. et al. (2017) ‘Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany’, Proceedings of the National Academy of Sciences of the United States of America, 114(34), pp. 9152–9157. [doi: 10.1073/pnas.1710186114](https://doi.org/10.1073/pnas.1710186114).
* de Barros Damgaard, P. et al. (2018). '137 ancient human genomes from across the Eurasian steppes.', Nature, 557(7705), 369–374. [doi: 10.1038/s41586-018-0094-2](https://doi.org/10.1038/s41586-018-0094-2)


================================================
FILE: assets/angsd_resources/README
================================================
**These files are originally part of angsd (release 0.931). They have been added here for convinence.**

This file describes how the 'hapmap' and mappability files used by angsd is generated

##download
wget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz
wget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt.gz

#with the md5sum
a105316eaa2ebbdb3f8d62a9cb10a2d5  allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt.gz
5a0f920951ce2ded4afe2f10227110ac  allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz


##create dummy bed file to use the liftover tools
gunzip -c allele_freqs_chrX_CEU_r28_nr.b36_fwd.txt.gz| awk '{print $2" "$3-1" "$3" "$11" "$12" "$4" "$14}'|sed 1d >allele.txt

##do the liftover
liftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit

##now remove invarible sites, and redundant columns
cut -f1,3 --complement hit |grep -v -P "\t1.0"|grep -v -P "\t0\t"|gzip -c  >HapMapchrX.gz


##create dummy bed file to use the liftover tools
gunzip -c allele_freqs_chr21_CEU_r28_nr.b36_fwd.txt| awk '{print $2" "$3-1" "$3" "$11" "$12" "$4" "$14}'|sed 1d >allele.txt

##do the liftover
liftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit

##now remove invarible sites, and redundant columns
cut -f1,3 --complement hit |grep -v -P "\t1.0"|grep -v -P "\t0\t"|gzip -c  >HapMapchr21.gz


#######
##download 100kmer mappability
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign100mer.bigWig

#md5sum
a1b1a8c99431fedf6a3b4baef028cca4  wgEncodeCrgMapabilityAlign100mer.bigWig

##download convert program
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph

##convert
./bigWigToBedGraph wgEncodeCrgMapabilityAlign100mer.bigWig chrX -chrom=chrX
./bigWigToBedGraph wgEncodeCrgMapabilityAlign100mer.bigWig chr21 -chrom=chr21

##only keep unique regions and discard the chr* column
grep -P "\t1$" chr21 |cut -f2-3 |gzip -c >chr21.unique.gz
grep -P "\t1$" chrX |cut -f2-3 |gzip -c >chrX.unique.gz


================================================
FILE: assets/angsd_resources/getALL.txt
================================================
F="ASW CEU CHB CHD GIH JPT LWK MEX MKK TSI YRI"
for f in $F
do 
    echo $f
    wget http://hapmap.ncbi.nlm.nih.gov/downloads/frequencies/2010-08_phaseII+III/allele_freqs_chrX_${f}_r28_nr.b36_fwd.txt.gz
done

cat allele*.gz >allele_freqs_chrX_ALL_r28_nr.b36_fwd.txt.gz

gunzip -c allele_freqs_chrX_ALL_r28_nr.b36_fwd.txt.gz| awk '{print $2" "$3-1" "$3" "$11" "$12" "$4" "$14}'|grep -v pos >allele.txt


/opt/liftover/liftOver allele.txt /opt/liftover/hg18ToHg19.over.chain.gz hit nohit
cut -f1,3 --complement hit |grep -v -P "\t1.0"|grep -v -P "\t0\t"|gzip -c  >HapMapALL.gz



================================================
FILE: assets/email_template.html
================================================
<html>
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <meta name="description" content="nf-core/eager: A fully reproducible and state-of-the-art ancient DNA analysis pipeline">
  <title>nf-core/eager Pipeline Report</title>
</head>
<body>
<div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">

<img src="cid:nfcorepipelinelogo">

<h1>nf-core/eager v${version}</h1>
<h2>Run Name: $runName</h2>

<% if (!success){
    out << """
    <div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
        <h4 style="margin-top:0; color: inherit;">nf-core/eager execution completed unsuccessfully!</h4>
        <p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
        <p>The full error message was:</p>
        <pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
    </div>
    """
} else {
    out << """
    <div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
        nf-core/eager execution completed successfully!
    </div>
    """
}
%>

<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
<p>The command used to launch the workflow was as follows:</p>
<pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>

<h3>Pipeline Configuration:</h3>
<table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
    <tbody style="border-bottom: 1px solid #ddd;">
        <% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
    </tbody>
</table>

<p>nf-core/eager</p>
<p><a href="https://github.com/nf-core/eager">https://github.com/nf-core/eager</a></p>

</div>

</body>
</html>


================================================
FILE: assets/email_template.txt
================================================
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~\\
  |\\ | |__  __ /  ` /  \\ |__) |__         }  {
  | \\| |       \\__, \\__/ |  \\ |___     \\`-._,-`-,
                                        `._,._,'
  nf-core/eager v${version}
----------------------------------------------------

Run Name: $runName

<% if (success){
    out << "## nf-core/eager execution completed successfully! ##"
} else {
    out << """####################################################
## nf-core/eager execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:

${errorReport}
"""
} %>


The workflow was completed at $dateComplete (duration: $duration)

The command used to launch the workflow was as follows:

  $commandLine



Pipeline Configuration:
-----------------------
<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>

--
nf-core/eager
https://github.com/nf-core/eager


================================================
FILE: assets/multiqc_config.yaml
================================================
custom_logo: "nf-core_eager_logo_outline_drop.png"
custom_logo_url: https://github.com/nf-core/eager/
custom_logo_title: "nf-core/eager"

report_comment: >
  This report has been generated by the <a href="https://github.com/nf-core/eager" target="_blank">nf-core/eager</a>
  analysis pipeline. For information about how to interpret these results, please see the
  <a href="https://github.com/nf-core/eager" target="_blank">documentation</a>.
run_modules:
  - adapterRemoval
  - bowtie2
  - custom_content
  - damageprofiler
  - dedup
  - fastp
  - fastqc
  - gatk
  - kraken
  - malt
  - mapdamage
  - mtnucratio
  - multivcfanalyzer
  - picard
  - preseq
  - qualimap
  - samtools
  - sexdeterrmine
  - hops
  - bcftools

extra_fn_clean_exts:
  - "_fastp"
  - ".pe.settings"
  - ".se.settings"
  - ".settings"
  - ".pe.combined"
  - ".se.truncated"
  - ".mapped"
  - ".mapped_rmdup"
  - ".mapped_rmdup_stats"
  - "_libmerged_rg_rmdup"
  - "_libmerged_rg_rmdup_stats"
  - "_postfilterflagstat.stats"
  - "_flagstat.stat"
  - ".filtered"
  - ".filtered_rmdup"
  - ".filtered_rmdup_stats"
  - "_libmerged_rg_add"
  - "_libmerged_rg_add_stats"
  - "_rmdup"
  - ".unmapped"
  - ".fastq.gz"
  - ".fastq"
  - ".fq.gz"
  - ".fq"
  - ".bam"
  - ".kreport"
  - ".unifiedgenotyper"
  - ".trimmed_stats"
  - "_libmerged"
  - "_bt2"
  - type: "regex"
    pattern: "_udg(half|none|full)"

top_modules:
  - "fastqc":
      name: "FastQC (pre-Trimming)"
      path_filters:
        - "*_raw_fastqc.zip"
  - "fastp"
  - "adapterRemoval"
  - "fastqc":
      name: "FastQC (post-Trimming)"
      path_filters:
        - "*.truncated_fastqc.zip"
        - "*.combined*_fastqc.zip"
        - "*_postartrimmed_fastqc.zip"
  - "bowtie2":
      path_filters:
        - "*_bt2.log"
  - "malt"
  - "hops"
  - "kraken"
  - "samtools":
      name: "Samtools Flagstat (pre-samtools filter)"
      path_filters:
        - "*_flagstat.stats"
  - "samtools":
      name: "Samtools Flagstat (post-samtools filter)"
      path_filters:
        - "*_postfilterflagstat.stats"
  - "dedup"
  - "picard"
  - "preseq":
      path_filters:
        - "*.preseq"
  - "damageprofiler"
  - "mapdamage"
  - "mtnucratio"
  - "qualimap"
  - "sexdeterrmine"
  - "bcftools"
  - "multivcfanalyzer":
      path_filters:
        - "*MultiVCFAnalyzer.json"
qualimap_config:
  general_stats_coverage:
    - 1
    - 2
    - 3
    - 4
    - 5

remove_sections:
  - sexdeterrmine-snps

table_columns_visible:
  FastQC (pre-Trimming):
    percent_duplicates: False
    percent_gc: True
    avg_sequence_length: True
  fastp:
    pct_duplication: False
    after_filtering_gc_content: True
    pct_surviving: False
  Adapter Removal:
    aligned_total: False
    percent_aligned: True
  FastQC (post-Trimming):
    avg_sequence_length: True
    percent_duplicates: False
    total_sequences: True
    percent_gc: True
  bowtie2:
    overall_alignment_rate: True
  MALT:
    Taxonomic assignment success: False
    Assig. Taxonomy: False
    Mappability: True
    Total reads: False
    Num. of queries: False
  Kraken:
    "% Unclassified": True
    "% Top 5": False
  Samtools Flagstat (pre-samtools filter):
    flagstat_total: True
    mapped_passed: True
  Samtools Flagstat (post-samtools filter):
    mapped_passed: True
  DeDup:
    dup_rate: False
    clusterfactor: True
    mapped_after_dedup: True
  Picard:
    PERCENT_DUPLICATION: True
  DamageProfiler:
    5 Prime1: True
    5 Prime2: True
    3 Prime1: False
    3 Prime2: False
    mean_readlength: True
    median: True
  mapDamage:
    5 Prime1: True
    5 Prime2: True
    3 Prime1: False
    3 Prime2: False
  mtnucratio:
    mt_nuc_ratio: True
  QualiMap:
    mapped_reads: True
    mean_coverage: True
    1_x_pc: True
    5_x_pc: True
    percentage_aligned: False
    median_insert_size: False
  MultiVCFAnalyzer:
    Heterozygous SNP alleles (percent): True
  endorSpy:
    endogenous_dna: True
    endogenous_dna_post: True
  nuclear_contamination:
    Num_SNPs: True
    Method1_MOM_estimate: False
    Method1_MOM_SE: False
    Method1_ML_estimate: True
    Method1_ML_SE: True
    Method2_MOM_estimate: False
    Method2_MOM_SE: False
    Method2_ML_estimate: False
    Method2_ML_SE: False
  snp_coverage:
    Covered_Snps: True
    Total_Snps: False

table_columns_placement:
  FastQC (pre-Trimming):
    total_sequences: 100
    avg_sequence_length: 110
    percent_gc: 120
  fastp:
    after_filtering_gc_content: 200
  Adapter Removal:
    percent_aligned: 300
  FastQC (post-Trimming):
    total_sequences: 400
    avg_sequence_length: 410
    percent_gc: 420
  Bowtie 2 / HiSAT2:
    overall_alignment_rate: 450
  MALT:
    Num. of queries: 430
    Total reads: 440
    Mappability: 450
    Assig. Taxonomy: 460
    Taxonomic assignment success: 470
  Kraken:
    "% Unclassified": 480
  Samtools Flagstat (pre-samtools filter):
    flagstat_total: 551
    mapped_passed: 552
  Samtools Flagstat (post-samtools filter):
    flagstat_total: 600
    mapped_passed: 620
  endorSpy:
    endogenous_dna: 610
    endogenous_dna_post: 640
  nuclear_contamination:
    Num_SNPs: 1100
    Method1_MOM_estimate: 1110
    Method1_MOM_SE: 1120
    Method1_ML_estimate: 1130
    Method1_ML_SE: 1140
    Method2_MOM_estimate: 1150
    Method2_MOM_SE: 1160
    Method2_ML_estimate: 1170
    Method2_ML_SE: 1180
  snp_coverage:
    Covered_Snps: 1050
    Total_Snps: 1060
  DeDup:
    mapped_after_dedup: 620
    clusterfactor: 630
  Picard:
    PERCENT_DUPLICATION: 650
  DamageProfiler:
    5 Prime1: 700
    5 Prime2: 710
    3 Prime1: 720
    3 Prime2: 730
    mean_readlength: 740
    median: 750
  mapDamage:
    5 Prime1: 760
    5 Prime2: 765
    3 Prime1: 770
    3 Prime2: 775
  mtnucratio:
    mtreads: 780
    mt_cov_avg: 785
    mt_nuc_ratio: 790
  QualiMap:
    mapped_reads: 800
    mean_coverage: 805
    median_coverage: 810
    1_x_pc: 820
    2_x_pc: 830
    3_x_pc: 840
    4_x_pc: 850
    5_x_pc: 860
    avg_gc: 870
  sexdeterrmine:
    RateX: 1000
    RateY: 1010
  MultiVCFAnalyzer:
    Heterozygous SNP alleles (percent): 1200
read_count_multiplier: 1
read_count_prefix: ""
read_count_desc: ""
ancient_read_count_prefix: ""
ancient_read_count_desc: ""
ancient_read_count_multiplier: 1
decimalPoint_format: "."
thousandsSep_format: ","
report_section_order:
  software_versions:
    order: -1000
  nf-core-eager-summary:
    order: -1001
export_plots: true
table_columns_name:
  FastQC (pre-Trimming):
    total_sequences: "Nr. Input Reads"
    avg_sequence_length: "Length Input Reads"
    percent_gc: "% GC Input Reads"
    percent_duplicates: "% Dups Input Reads"
    percent_fails: "% Failed Input Reads"
  FastQC (post-Trimming):
    total_sequences: "Nr. Processed Reads"
    avg_sequence_length: "Length Processed Reads"
    percent_gc: "% GC Processed Reads"
    percent_duplicates: "% Dups Processed Reads"
    percent_fails: "%Failed Processed Reads"
  Samtools Flagstat (pre-samtools filter):
    flagstat_total: "Nr. Reads Into Mapping"
    mapped_passed: "Nr. Mapped Reads"
  Samtools Flagstat (post-samtools filter):
    flagstat_total: "Nr. Mapped Reads Post-Filter"
    mapped_passed: "Nr. Mapped Reads Passed Post-Filter"
  Endogenous DNA Post (%):
    endogenous_dna_post (%): "Endogenous DNA Post-Filter (%)"
  Picard:
    PERCENT_DUPLICATION: "% Dup. Mapped Reads"
  DamageProfiler:
    mean_readlength: "Mean Length Mapped Reads"
    median_readlength: "Median Length Mapped Reads"
  QualiMap:
    mapped_reads: "Nr. Dedup. Mapped Reads"
    total_reads: "Nr. Dedup. Total Reads"
    avg_gc: "% GC Dedup. Mapped Reads"
  Bcftools Stats:
    number_of_records: "Nr. Overall Variants"
    number_of_SNPs: "Nr. SNPs"
    number_of_indels: "Nr. InDels"
  MALT:
    Mappability: "% Metagenomic Mappability"
  SexDetErrmine:
    RateErrX: "SexDet Err X Chr"
    RateErrY: "SexDet Err Y Chr"
    RateX: "SexDet Rate X Chr"
    RateY: "SexDet Rate Y Chr"
  custom_table_header_config:
    general_stats_table:
      median_coverage:
        format: "{:,.3f}"
      mean_coverage:
        format: "{:,.3f}"


================================================
FILE: assets/nf-core_eager_dummy.txt
================================================
This is a dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one.

================================================
FILE: assets/nf-core_eager_dummy2.txt
================================================
This is a second dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actually only use one.

================================================
FILE: assets/sendmail_template.txt
================================================
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfcoremimeboundary"

--nfcoremimeboundary
Content-Type: text/html; charset=utf-8

$email_html

--nfcoremimeboundary
Content-Type: image/png;name="nf-core-eager_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <nfcorepipelinelogo>
Content-Disposition: inline; filename="nf-core-eager_logo.png"

<% out << new File("$projectDir/assets/nf-core-eager_logo.png").
  bytes.
  encodeBase64().
  toString().
  tokenize( '\n' )*.
  toList()*.
  collate( 76 )*.
  collect { it.join() }.
  flatten().
  join( '\n' ) %>

<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"

${mqcFileObj.
  bytes.
  encodeBase64().
  toString().
  tokenize( '\n' )*.
  toList()*.
  collate( 76 )*.
  collect { it.join() }.
  flatten().
  join( '\n' )}
"""
}}
%>

--nfcoremimeboundary--


================================================
FILE: assets/where_are_my_files.txt
================================================
=====================
 Where are my files?
=====================

By default, the nfcore/eager pipeline does not save large intermediate files to the
results directory. This is to try to conserve disk space.

These files can be found in the pipeline `work` directory if needed.
Alternatively, re-run the pipeline using `-resume` in addition to one of
the below command-line options and they will be copied into the results directory:

`--saveReference`
Save any downloaded or generated reference genome files to your results folder.
These can then be used for future pipeline runs, reducing processing times.

-----------------------------------
 Setting defaults in a config file
-----------------------------------
If you would always like these files to be saved without having to specify this on
the command line, you can save the following to your personal configuration file
(eg. `~/.nextflow/config`):

params.saveReference = true

For more help, see the following documentation:

https://github.com/nf-core/eager/blob/master/docs/usage.md
https://www.nextflow.io/docs/latest/getstarted.html
https://www.nextflow.io/docs/latest/config.html


================================================
FILE: bin/endorS.py
================================================
#!/usr/bin/env python3

# Written by Aida Andrades Valtueña and released under MIT license. 
# See git repository (https://github.com/aidaanva/endorS.py) for full license text.

"""Script to calculate the endogenous DNA in a sample from samtools flag stats.
It can accept up to two files: pre-quality and post-quality filtering. We recommend
to use both files but you can also use the pre-quality filtering.
"""
import re
import sys
import json
import argparse
import textwrap

parser = argparse.ArgumentParser(prog='endorS.py',
   usage='python %(prog)s [-h] [--version] <samplesfile>.stats [<samplesfile>.stats]',
   formatter_class=argparse.RawDescriptionHelpFormatter,
   description=textwrap.dedent('''\
   author:
     Aida Andrades Valtueña (aida.andrades[at]gmail.com)

   description:
     %(prog)s calculates endogenous DNA from samtools flagstat files and print to screen
     Use --output flag to write results to a file
   '''))
parser.add_argument('samtoolsfiles', metavar='<samplefile>.stats', type=str, nargs='+',
                    help='output of samtools flagstat in a txt file (at least one required). If two files are supplied, the mapped reads of the second file is divided by the total reads in the first, since it assumes that the <samplefile.stats> are related to the same sample. Useful after BAM filtering')
parser.add_argument('-v','--version', action='version', version='%(prog)s 0.4')
parser.add_argument('--output', '-o', nargs='?', help='specify a file format for an output file. Options: <json> for a MultiQC json output. Default: none')
parser.add_argument('--name', '-n', nargs='?', help='specify name for the output file. Default: extracted from the first samtools flagstat file provided')
args = parser.parse_args()

#Open the samtools flag stats pre-quality filtering:
try:
    with open(args.samtoolsfiles[0], 'r') as pre:
        contentsPre = pre.read()
    #Extract number of total reads
    totalReads = float((re.findall(r'^([0-9]+) \+ [0-9]+ in total',contentsPre))[0])
    #Extract number of mapped reads pre-quality filtering:
    mappedPre = float((re.findall(r'([0-9]+) \+ [0-9]+ mapped ',contentsPre))[0])
    #Calculation of endogenous DNA pre-quality filtering:
    if totalReads == 0.0:
        endogenousPre = 0.000000
        print("WARNING: no reads in the fastq input, Endogenous DNA raw (%) set to 0.000000")
    elif mappedPre == 0.0:
        endogenousPre = 0.000000
        print("WARNING: no mapped reads, Endogenous DNA raw (%) set to 0.000000")
    else:
        endogenousPre = float("{0:.6f}".format(round((mappedPre / totalReads * 100), 6)))
except:
    print("Incorrect input, please provide at least a samtools flag stats as input\nRun:\npython endorS.py --help \nfor more information on how to run this script")
    sys.exit()
#Check if the samtools stats post-quality filtering have been provided:
try:
    #Open the samtools flag stats post-quality filtering:
    with open(args.samtoolsfiles[1], 'r') as post:
        contentsPost = post.read()
    #Extract number of mapped reads post-quality filtering:
    mappedPost = float((re.findall(r'([0-9]+) \+ [0-9]+ mapped',contentsPost))[0])
    #Calculation of endogenous DNA post-quality filtering:
    if totalReads == 0.0:
        endogenousPost = 0.000000
        print("WARNING: no reads in the fastq input, Endogenous DNA modified (%) set to 0.000000")
    elif mappedPost == 0.0:
        endogenousPost = 0.000000
        print("WARNING: no mapped reads, Endogenous DNA modified (%) set to 0.000000")
    else:
        endogenousPost = float("{0:.6f}".format(round((mappedPost / totalReads * 100),6)))
except:
    print("Only one samtools flagstat file provided")
    #Set the number of reads post-quality filtering to 0 if samtools
    #samtools flag stats not provided:
    mappedPost = "NA"

#Setting the name depending on the -name flag:
if args.name is not None:
    name = args.name
else:
    #Set up the name based on the first samtools flagstats:
    name= str(((args.samtoolsfiles[0].rsplit(".",1)[0]).rsplit("/"))[-1])
#print(name)


if mappedPost == "NA":
    #Creating the json file
    jsonOutput={
    "id": "endorSpy",
    "plot_type": "generalstats",
    "pconfig": {
        "endogenous_dna": { "max": 100, "min": 0, "title": "Endogenous DNA (%)", "format": '{:,.2f}'}
    },
    "data": {
        name : { "endogenous_dna": endogenousPre}
    }
    }
else:
    #Creating the json file
    jsonOutput={
    "id": "endorSpy",
    "plot_type": "generalstats",
    "pconfig": {
        "endogenous_dna": { "max": 100, "min": 0, "title": "Endogenous DNA (%)", "format": '{:,.2f}'},
        "endogenous_dna_post": { "max": 100, "min": 0, "title": "Endogenous DNA Post (%)", "format": '{:,.2f}'}
    },
    "data": {
        name : { "endogenous_dna": endogenousPre, "endogenous_dna_post": endogenousPost}
    },
    }
#Checking for print to screen argument:
if args.output is not None:
   #Creating file with the named after the name variable:
   #Writing the json output:
   fileName = name + "_endogenous_dna_mqc.json"
   #print(fileName)
   with open(fileName, "w+") as outfile:
      json.dump(jsonOutput, outfile)
      print(fileName,"has been generated")
else:
   if mappedPost == "NA":
      print("Endogenous DNA (%):",endogenousPre)
   else:
      print("Endogenous DNA raw (%):",endogenousPre)
      print("Endogenous DNA modified (%):",endogenousPost)


================================================
FILE: bin/extract_map_reads.py
================================================
#!/usr/bin/env python3

# Written by Maxime Borry and released under the MIT license.
# See git repository (https://github.com/nf-core/eager) for full license text.

import argparse
import pysam
from xopen import xopen
import logging
import os
from pathlib import Path


def _get_args():
    """This function parses and return arguments passed in"""
    parser = argparse.ArgumentParser(
        prog="extract_mapped_reads",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description="Remove mapped in bam file from fastq files",
    )
    parser.add_argument("bam_file", help="path to bam file")
    parser.add_argument("fwd", help="path to forward fastq file")
    parser.add_argument(
        "-merged",
        dest="merged",
        default=False,
        action="store_true",
        help="specify if bam file was created from merged fastq files",
    )
    parser.add_argument(
        "-rev", dest="rev", default=None, help="path to reverse fastq file"
    )
    parser.add_argument(
        "-of", dest="out_fwd", default=None, help="path to forward output fastq file"
    )
    parser.add_argument(
        "-or", dest="out_rev", default=None, help="path to forward output fastq file"
    )
    parser.add_argument(
        "-m",
        dest="mode",
        default="remove",
        help="Read removal mode: remove reads (remove) or replace sequence by N (replace). Default = remove",
    )
    parser.add_argument(
        "-t", dest="threads", default=4, help="Number of parallel threads"
    )

    args = parser.parse_args()

    bam = args.bam_file
    in_fwd = args.fwd
    merged = args.merged
    in_rev = args.rev
    out_fwd = args.out_fwd
    out_rev = args.out_rev
    mode = args.mode
    threads = int(args.threads)

    return (bam, in_fwd, merged, in_rev, out_fwd, out_rev, mode, threads)


def extract_mapped(bamfile, merged):
    """Get mapped reads in parallel
    Args:
        threads(int): number of threads to use
        bam(str): path to bamfile
    Returns:
        bamfile(str): path to bam alignment file
        result(set): list of mapped reads name (str)
    """
    if bamfile.endswith(".bam") or bamfile.endswith(".gz"):
        read_mode = "rb"
    else:
        read_mode = "r"
    mapped_reads = set()
    bamfile = pysam.AlignmentFile(bamfile, mode=read_mode)
    for read in bamfile.fetch():
        if read.flag != 4:
            if merged:
                if read.query_name.startswith("M_"):
                    mapped_reads.add(read.query_name[2:])
                elif read.query_name.startswith("MT_"):
                    mapped_reads.add(read.query_name[3:])
                else:
                    mapped_reads.add(read.query_name)
            else:
                mapped_reads.add(read.query_name)
    return mapped_reads


def read_write_fq(fq_in, fq_out, mapped_reads, mode, write_mode, proc):
    """
    Read and write fastq file with mapped reads removed
    Args:
        fq_in(str): path to input fastq file
        fq_out(str): path to output fastq file
        mapped_reads(set): set of mapped reads name (str)
        mode(str): read removal mode (remove or replace)
        write_mode(str): write mode (w or wb)
        proc(int): number of parallel processes
        merged(bool): True if bam file was created from merged fastq files
    """
    if write_mode == "w":
        cm = open(fq_out, write_mode)
    elif write_mode == "wb":
        cm = xopen(fq_out, mode=write_mode, threads=proc)
    with pysam.FastxFile(fq_in) as fh:
        with cm as fh_out:
            for read in fh:
                try:
                    if read.name in mapped_reads:
                        if mode == "replace":
                            read.sequence = "N" * len(read.sequence)
                            read = str(read) + "\n"
                            if write_mode == "w":
                                fh_out.write(read)
                            elif write_mode == "wb":
                                fh_out.write(read.encode())
                    else:
                        read = str(read) + "\n"
                        if write_mode == "w":
                            fh_out.write(read)
                        elif write_mode == "wb":
                            fh_out.write(read.encode())
                except Exception as e:
                    logging.error(f"Problem with {str(read)}")
                    logging.error(e)

def check_remove_mode(mode):
    if mode.lower() not in ["replace", "remove"]:
        logging.info(f"Mode must be {' or '.join(mode)}")
    return mode.lower()


if __name__ == "__main__":
    BAM, IN_FWD, MERGED, IN_REV, OUT_FWD, OUT_REV, MODE, PROC = _get_args()

    logging.basicConfig(level=logging.INFO, format="%(message)s")

    if OUT_FWD == None:
        out_fwd = os.path.join(os.getcwd(), Path(IN_FWD).stem + ".r1.fq.gz")
    else:
        out_fwd = OUT_FWD

    if out_fwd.endswith(".gz"):
        write_mode = "wb"
    else:
        write_mode = "w"

    remove_mode = check_remove_mode(MODE)

    # FORWARD OR SE FILE
    logging.info(f"- Extracting mapped reads from {BAM}")
    mapped_reads = extract_mapped(BAM, merged=MERGED)
    logging.info(f"- Checking forward fq file {IN_FWD}")
    read_write_fq(
        fq_in=IN_FWD,
        fq_out=out_fwd,
        mapped_reads=mapped_reads,
        mode=remove_mode,
        write_mode=write_mode,
        proc=PROC,
    )
    logging.info(f"- Cleaned forward FastQ file written to {out_fwd}")

    # REVERSE FILE
    if IN_REV:
        if OUT_REV == None:
            out_rev = os.path.join(os.getcwd(), Path(IN_REV).stem + ".r2.fq.gz")
        else:
            out_rev = OUT_REV
        logging.info(f"- Checking reverse fq file {IN_FWD}")
        read_write_fq(
            fq_in=IN_REV,
            fq_out=out_rev,
            mapped_reads=mapped_reads,
            mode=remove_mode,
            write_mode=write_mode,
            proc=PROC,
        )
        logging.info(f"- Cleaned reverse FastQ file written to {out_rev}")


================================================
FILE: bin/filter_bam_fragment_length.py
================================================
#!/usr/bin/env python3

# Written by Maxime Borry and released under the MIT license. 
# See git repository (https://github.com/nf-core/eager) for full license text.

import argparse
import pysam


def get_args():
    """This function parses and return arguments passed in"""
    parser = argparse.ArgumentParser(
        prog="bam_filter", description="Filter bam on fragment length"
    )
    parser.add_argument("bam", help="Bam aligment file")
    parser.add_argument(
        "-l",
        dest="fraglen",
        default=35,
        type=int,
        help="Minimum fragment length. Default = 35",
    )
    parser.add_argument(
        "-a",
        dest="all",
        default=False,
        action="store_true",
        help="Include all reads, even unmapped",
    )
    parser.add_argument(
        "-o",
        dest="output",
        default=None,
        help="Output bam basename. Default = {bam_basename}.filtered.bam",
    )

    args = parser.parse_args()

    bam = args.bam
    fraglen = args.fraglen
    allreads = args.all
    outfile = args.output

    return (bam, fraglen, allreads, outfile)


def getBasename(file_name):
    if ("/") in file_name:
        basename = file_name.split("/")[-1].split(".")[0]
    else:
        basename = file_name.split(".")[0]
    return basename


def filter_bam(infile, outfile, fraglen, allreads):
    """Write bam to file

    Args:
        infile (stream): pysam stream
        outfile (str): Path to output bam
        fraglen(int): Minimum fragment length to keep
        allreads(bool): Apply on all reads, not only mapped
    """
    bamfile = pysam.AlignmentFile(infile, "rb")
    bamwrite = pysam.AlignmentFile(outfile + ".filtered.bam", "wb", template=bamfile)

    for read in bamfile.fetch(until_eof=True):
        if allreads:
            if read.query_length >= fraglen:
                bamwrite.write(read)
        else:
            if read.is_unmapped == False and read.query_length >= fraglen:
                bamwrite.write(read)


if __name__ == "__main__":
    BAM, FRAGLEN, ALLREADS, OUTFILE = get_args()

    BAMFILE = pysam.AlignmentFile(BAM, "rb")

    if OUTFILE is None:
        OUTFILE = getBasename(BAM)

    filter_bam(BAM, OUTFILE, FRAGLEN, ALLREADS)



================================================
FILE: bin/kraken_parse.py
================================================
#!/usr/bin/env python

# Written by Maxime Borry and released under the MIT license. 
# See git repository (https://github.com/nf-core/eager) for full license text.

import argparse
import csv

def _get_args():
    '''This function parses and return arguments passed in'''
    parser = argparse.ArgumentParser(
        prog='kraken_parse',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description='Parsing kraken')
    parser.add_argument('krakenReport', help="path to kraken report file")
    parser.add_argument(
        '-c',
        dest="count",
        default=50,
        help="Minimum number of hits on clade to report it. Default = 50")
    parser.add_argument(
        '-or',
        dest="readout",
        default=None,
        help="Read count output file. Default = <basename>.read_kraken_parsed.csv")
    parser.add_argument(
        '-ok',
        dest="kmerout",
        default=None,
        help="Kmer Output file. Default = <basename>.kmer_kraken_parsed.csv")

    args = parser.parse_args()

    infile = args.krakenReport
    countlim = int(args.count)
    readout = args.readout
    kmerout = args.kmerout

    return(infile, countlim, readout, kmerout)


def _get_basename(file_name):
    if ("/") in file_name:
        basename = file_name.split("/")[-1].split(".")[0]
    else:
        basename = file_name.split(".")[0]
    return(basename)


def parse_kraken(infile, countlim):
    '''
    INPUT:
        infile (str): path to kraken report file
        countlim (int): lowest count threshold to report hit
    OUTPUT:
        resdict (dict): key=taxid, value=readCount

    '''
    with open(infile, 'r') as f:
        read_dict = {}
        kmer_dict = {}
        csvreader = csv.reader(f, delimiter='\t')
        for line in csvreader:
            reads = int(line[1])
            if reads >= countlim:
                taxid = line[6]
                kmer = line[3]
                unique_kmer = line[4]
                try:
                    kmer_duplicity = float(kmer)/float(unique_kmer)
                except ZeroDivisionError:
                    kmer_duplicity = 0
                read_dict[taxid] = reads
                kmer_dict[taxid] = kmer_duplicity

        return(read_dict, kmer_dict)


def write_output(resdict, infile, outfile):
    with open(outfile, 'w') as f:
        basename = _get_basename(infile)
        f.write(f"TAXID,{basename}\n")
        for akey in resdict.keys():
            f.write(f"{akey},{resdict[akey]}\n")


if __name__ == '__main__':
    INFILE, COUNTLIM, readout, kmerout = _get_args()

    if not readout:
        read_outfile = _get_basename(INFILE)+".read_kraken_parsed.csv"
    else:
        read_outfile = readout
    if not kmerout:    
        kmer_outfile = _get_basename(INFILE)+".kmer_kraken_parsed.csv"
    else:
        kmer_outfile = kmerout

    read_dict, kmer_dict = parse_kraken(infile=INFILE, countlim=COUNTLIM)
    write_output(resdict=read_dict, infile=INFILE, outfile=read_outfile)
    write_output(resdict=kmer_dict, infile=INFILE, outfile=kmer_outfile)


================================================
FILE: bin/markdown_to_html.py
================================================
#!/usr/bin/env python
from __future__ import print_function
import argparse
import markdown
import os
import sys
import io


def convert_markdown(in_fn):
    input_md = io.open(in_fn, mode="r", encoding="utf-8").read()
    html = markdown.markdown(
        "[TOC]\n" + input_md,
        extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"],
        extension_configs={
            "pymdownx.b64": {"base_path": os.path.dirname(in_fn)},
            "pymdownx.highlight": {"noclasses": True},
            "toc": {"title": "Table of Contents"},
        },
    )
    return html


def wrap_html(contents):
    header = """<!DOCTYPE html><html>
    <head>
        <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
        <style>
            body {
              font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol","Noto Color Emoji";
              padding: 3em;
              margin-right: 350px;
              max-width: 100%;
            }
            .toc {
              position: fixed;
              right: 20px;
              width: 300px;
              padding-top: 20px;
              overflow: scroll;
              height: calc(100% - 3em - 20px);
            }
            .toctitle {
              font-size: 1.8em;
              font-weight: bold;
            }
            .toc > ul {
              padding: 0;
              margin: 1rem 0;
              list-style-type: none;
            }
            .toc > ul ul { padding-left: 20px; }
            .toc > ul > li > a { display: none; }
            img { max-width: 800px; }
            pre {
              padding: 0.6em 1em;
            }
            h2 {

            }
        </style>
    </head>
    <body>
    <div class="container">
    """
    footer = """
    </div>
    </body>
    </html>
    """
    return header + contents + footer


def parse_args(args=None):
    parser = argparse.ArgumentParser()
    parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.")
    parser.add_argument(
        "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout."
    )
    return parser.parse_args(args)


def main(args=None):
    args = parse_args(args)
    converted_md = convert_markdown(args.mdfile.name)
    html = wrap_html(converted_md)
    args.out.write(html)


if __name__ == "__main__":
    sys.exit(main())


================================================
FILE: bin/merge_kraken_res.py
================================================
#!/usr/bin/env python

# Written by Maxime Borry and released under the MIT license. 
# See git repository (https://github.com/nf-core/eager) for full license text.

import argparse
import os
import pandas as pd
import numpy as np

def _get_args():
    '''This function parses and return arguments passed in'''
    parser = argparse.ArgumentParser(
        prog='merge_kraken_res',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description='Merging csv count files in one table')
    parser.add_argument(
        '-or',
        dest="readout",
        default="kraken_read_count_table.csv",
        help="Read count output file. Default = kraken_read_count_table.csv")
    parser.add_argument(
        '-ok',
        dest="kmerout",
        default="kraken_kmer_unicity_table.csv",
        help="Kmer unicity output file. Default = kraken_kmer_unicity_table.csv")

    args = parser.parse_args()

    readout = args.readout
    kmerout = args.kmerout

    return(readout, kmerout)


def get_csv():
    tmp = [i for i in os.listdir() if ".csv" in i]
    kmer = [i for i in tmp if '.kmer_' in i]
    read = [i for i in tmp if '.read_' in i]
    return(read, kmer)


def _get_basename(file_name):
    if ("/") in file_name:
        basename = file_name.split("/")[-1].split(".")[0]
    else:
        basename = file_name.split(".")[0]
    return(basename)


def merge_csv(all_csv):
    df = pd.read_csv(all_csv[0], index_col=0)
    for i in range(1, len(all_csv)):
        df_tmp = pd.read_csv(all_csv[i], index_col=0)
        df = pd.merge(left=df, right=df_tmp, on='TAXID', how='outer')
    df.fillna(0, inplace=True)
    return(df)


def write_csv(pd_dataframe, outfile):
    pd_dataframe.to_csv(outfile)


if __name__ == "__main__":
    READOUT, KMEROUT = _get_args()
    reads, kmers = get_csv()
    read_df = merge_csv(reads)
    kmer_df = merge_csv(kmers)
    write_csv(read_df, READOUT)
    write_csv(kmer_df, KMEROUT)

================================================
FILE: bin/parse_snp_cov.py
================================================
#!/usr/bin/env python3

# Written by Thiseas C. Lamnidis and released under the MIT license. 
# See git repository (https://github.com/nf-core/eager) for full license text.

import sys, json
from collections import OrderedDict

jsonOut = OrderedDict()
data = OrderedDict()


input = open(sys.argv[1], 'r')
for line in input:
  fields = line.strip().split()
  sample_id = fields[0]
  covered_snps = fields[1]
  total_snps = fields[2]
  if sample_id[0] == "#":
    continue
  
  data[sample_id] = {"Covered_Snps":covered_snps, "Total_Snps":total_snps}

jsonOut = {"plot_type": "generalstats", "id": "snp_coverage",
    "pconfig": {
        "Covered_Snps" : {"title" : "#SNPs Covered"},
        "Total_Snps" : {"title": "#SNPs Total"}
    }, 
    "data" : data
}

with open(sys.argv[1].rstrip('.txt')+'_mqc.json', 'w') as outfile:
    json.dump(jsonOut, outfile)


================================================
FILE: bin/print_x_contamination.py
================================================
#!/usr/bin/env python3

# Written by Thiseas C. Lamnidis and released under the MIT license. 
# See git repository (https://github.com/nf-core/eager) for full license text.

import sys, re, json
from collections import OrderedDict

jsonOut=OrderedDict()
data=OrderedDict()

## Function to convert a set of elements into floating point numbers, when possible, else leave them be.
def make_float(x):
    # print (x)
    output=[None for i in range(len(x))]
    ## If value for an estimate/error is -nan, replace with "NA". JSON does not accept NaN as a valid field.
    for i in range(len(x)):
        if x[i] == "-nan" or x[i] == "nan":
            output[i]="N/A"
            continue
        try:
            output[i]=float(x[i])
        except:
            output[i]=x[i]
    
    return(tuple(output))


Input_files=sys.argv[1:]

output = open("nuclear_contamination.txt", 'w')
print ("Individual", "Num_SNPs", "Method1_MOM_estimate", "Method1_MOM_SE", "Method1_ML_estimate", "Method1_ML_SE", "Method2_MOM_estimate", "Method2_MOM_SE", "Method2_ML_estimate", "Method2_ML_SE", sep="\t", file=output)
for fn in Input_files:
    ## For each file, reset the values to "N/A" so they don't carry over from last file.
    mom1, err_mom1= "N/A","N/A"
    ml1, err_ml1="N/A","N/A"
    mom2, err_mom2= "N/A","N/A"
    ml2, err_ml2="N/A","N/A"
    nSNPs="0"
    with open(fn, 'r') as f:
        Estimates={}
        Ind=re.sub('\.X.contamination.out$', '', fn).split("/")[-1]
        for line in f:
            fields=line.strip().split()
            if line.strip()[0:19] == "We have nSNP sites:":
                nSNPs=fields[4].rstrip(",")
            elif line.strip()[0:7] == "Method1" and line.strip()[9:16] == 'new_llh':
                mom1=fields[3].split(":")[1]
                err_mom1=fields[4].split(":")[1]
                ml1=fields[5].split(":")[1]
                err_ml1=fields[6].split(":")[1]
                ## Sometimes angsd fails to run method 2, and the error is printed directly after the SE for ML. When that happens, exclude the first word in the error from the output. (Method 2 jsonOut will be shown as NA)
                if err_ml1.endswith("contamination"):
                    err_ml1 = err_ml1[:-13]
            elif line.strip()[0:7] == "Method2" and line.strip()[9:16] == 'new_llh':
                mom2=fields[3].split(":")[1]
                err_mom2=fields[4].split(":")[1]
                ml2=fields[5].split(":")[1]
                err_ml2=fields[6].split(":")[1]
        ## Convert estimates and errors to floating point numbers
        (ml1, err_ml1, mom1, err_mom1, ml2, err_ml2, mom2, err_mom2) = make_float((ml1, err_ml1, mom1, err_mom1, ml2, err_ml2, mom2, err_mom2))
        data[Ind]={ "Num_SNPs" : int(nSNPs), "Method1_MOM_estimate" : mom1, "Method1_MOM_SE" : err_mom1, "Method1_ML_estimate" : ml1, "Method1_ML_SE" : err_ml1, "Method2_MOM_estimate" : mom2, "Method2_MOM_SE" : err_mom2, "Method2_ML_estimate" : ml2, "Method2_ML_SE" : err_ml2 }
        print (Ind, nSNPs, mom1, err_mom1, ml1, err_ml1, mom2, err_mom2, ml2, err_ml2, sep="\t", file=output)


jsonOut = {"plot_type": "generalstats", "id": "nuclear_contamination",
    "pconfig": {
        "Num_SNPs" : {"title" : "Number of SNPs"},
        "Method1_MOM_estimate" : {"title": "Contamination Estimate (Method1_MOM)"},
        "Method1_MOM_SE" : {"title": "Estimate Error (Method1_MOM)"},
        "Method1_ML_estimate" : {"title": "Contamination Estimate (Method1_ML)"},
        "Method1_ML_SE" : {"title": "Estimate Error (Method1_ML)"},
        "Method2_MOM_estimate" : {"title": "Contamination Estimate (Method2_MOM)"},
        "Method2_MOM_SE" : {"title": "Estimate Error (Method2_MOM)"},
        "Method2_ML_estimate" : {"title": "Contamination Estimate (Method2_ML)"},
        "Method2_ML_SE" : {"title": "Estimate Error (Method2_ML)"}
    }, 
    "data" : data
}
with open('nuclear_contamination_mqc.json', 'w') as outfile:
    json.dump(jsonOut, outfile)


================================================
FILE: bin/scrape_software_versions.py
================================================
#!/usr/bin/env python
from __future__ import print_function
from collections import OrderedDict
import re

regexes = {
    "nf-core/eager": ["v_pipeline.txt", r"(\S+)"],
    "Nextflow": ["v_nextflow.txt", r"(\S+)"],
    "FastQC": ["v_fastqc.txt", r"FastQC v(\S+)"],
    "MultiQC": ["v_multiqc.txt", r"multiqc, version (\S+)"],
    'AdapterRemoval':['v_adapterremoval.txt', r"AdapterRemoval ver. (\S+)"],
    'Picard MarkDuplicates': ['v_markduplicates.txt', r"Version:(\S+)"],
    'Samtools': ['v_samtools.txt', r"samtools (\S+)"],
    'Preseq': ['v_preseq.txt', r"Version: (\S+)"],
    'BWA': ['v_bwa.txt', r"Version: (\S+)"], 
    'Bowtie2': ['v_bowtie2.txt', r"bowtie2-([0-9]+\.[0-9]+\.[0-9]+) -fdebug"],
    'Qualimap': ['v_qualimap.txt', r"QualiMap v.(\S+)"],
    'GATK HaplotypeCaller': ['v_gatk.txt', r"The Genome Analysis Toolkit \(GATK\) v(\S+)"],
    'GATK UnifiedGenotyper': ['v_gatk3.txt', r"(\S+)"],
    'bamUtil' : ['v_bamutil.txt', r"Version: (\S+);"],
    'fastP': ['v_fastp.txt', r"([\d\.]+)"],
    'DamageProfiler' : ['v_damageprofiler.txt', r"DamageProfiler v(\S+)"],
    'angsd':['v_angsd.txt',r"version: (\S+)"],
    'bedtools':['v_bedtools.txt',r"bedtools v(\S+)"],
    'circulargenerator':['v_circulargenerator.txt',r"CircularGeneratorv(\S+)"],
    'DeDup':['v_dedup.txt',r"DeDup v(\S+)"],
    'freebayes':['v_freebayes.txt',r"v([0-9]\S+)"],
    'sequenceTools':['v_sequencetools.txt',r"(\S+)"],
    'maltextract':['v_maltextract.txt', r"version(\S+)"],
    'malt':['v_malt.txt',r"version (\S+)"],
    'multivcfanalyzer':['v_multivcfanalyzer.txt', r"MultiVCFAnalyzer - (\S+)"],
    'pmdtools':['v_pmdtools.txt',r"pmdtools v(\S+)"],
    'sexdeterrmine':['v_sexdeterrmine.txt',r"(\S+)"],
    'MTNucRatioCalculator':['v_mtnucratiocalculator.txt',r"Version: (\S+)"],
    'VCF2genome':['v_vcf2genome.txt', r"VCF2Genome \(v. ([0-9].[0-9]+) "],
    'endorS.py':['v_endorSpy.txt', r"endorS.py (\S+)"],
    'kraken':['v_kraken.txt', r"Kraken version (\S+)"],
    'eigenstrat_snp_coverage':['v_eigenstrat_snp_coverage.txt',r"(\S+)"],
    'mapDamage2':['v_mapdamage.txt',r"(\S+)"],
    'bbduk':['v_bbduk.txt',r"(.*)"],
    'bcftools':['v_bcftools.txt',r"(\S+)"]
}

results = OrderedDict()
results["nf-core/eager"] = '<span style="color:#999999;">N/A</span>'
results["Nextflow"] = '<span style="color:#999999;">N/A</span>'
results["FastQC"] = '<span style="color:#999999;">N/A</span>'
results["MultiQC"] = '<span style="color:#999999;">N/A</span>'
results['AdapterRemoval'] = '<span style="color:#999999;\">N/A</span>'
results['fastP'] = '<span style="color:#999999;\">N/A</span>'
results['BWA'] = '<span style="color:#999999;\">N/A</span>'
results['Bowtie2'] = '<span style="color:#999999;\">N/A</span>'
results['circulargenerator'] = '<span style="color:#999999;\">N/A</span>'
results['Samtools'] = '<span style="color:#999999;\">N/A</span>'
results['endorS.py'] = '<span style="color:#999999;\">N/A</span>'
results['DeDup'] = '<span style="color:#999999;\">N/A</span>'
results['Picard MarkDuplicates'] = '<span style="color:#999999;\">N/A</span>'
results['Qualimap'] = '<span style="color:#999999;\">N/A</span>'
results['Preseq'] = '<span style="color:#999999;\">N/A</span>'
results['GATK HaplotypeCaller'] = '<span style="color:#999999;\">N/A</span>'
results['GATK UnifiedGenotyper'] = '<span style="color:#999999;\">N/A</span>'
results['freebayes'] = '<span style="color:#999999;\">N/A</span>'
results['sequenceTools'] = '<span style="color:#999999;\">N/A</span>'
results['VCF2genome'] = '<span style="color:#999999;\">N/A</span>'
results['MTNucRatioCalculator'] = '<span style="color:#999999;\">N/A</span>'
results['bedtools'] = '<span style="color:#999999;\">N/A</span>'
results['DamageProfiler'] = '<span style="color:#999999;\">N/A</span>'
results['bamUtil'] = '<span style="color:#999999;\">N/A</span>'
results['pmdtools'] = '<span style="color:#999999;\">N/A</span>'
results['angsd'] = '<span style="color:#999999;\">N/A</span>'
results['sexdeterrmine'] = '<span style="color:#999999;\">N/A</span>'
results['multivcfanalyzer'] = '<span style="color:#999999;\">N/A</span>'
results['malt'] = '<span style="color:#999999;\">N/A</span>'
results['kraken'] = '<span style="color:#999999;\">N/A</span>'
results['maltextract'] = '<span style="color:#999999;\">N/A</span>'
results['eigenstrat_snp_coverage'] = '<span style="color:#999999;\">N/A</span>'
results['mapDamage2'] = '<span style="color:#999999;\">N/A</span>'
results['bbduk'] = '<span style="color:#999999;\">N/A</span>'
results['bcftools'] = '<span style="color:#999999;\">N/A</span>'

# Search each file using its regex
for k, v in regexes.items():
    try:
        with open(v[0]) as x:
            versions = x.read()
            match = re.search(v[1], versions)
            if match:
                results[k] = "v{}".format(match.group(1))
    except IOError:
        results[k] = False

# Remove software set to false in results
for k in list(results):
    if not results[k]:
        del results[k]

# Dump to YAML
print(
    """
id: 'software_versions'
section_name: 'nf-core/eager Software Versions'
section_href: 'https://github.com/nf-core/eager'
plot_type: 'html'
description: 'are collected at run time from the software output.'
data: |
    <dl class="dl-horizontal">
"""
)
for k, v in results.items():
    print("        <dt>{}</dt><dd><samp>{}</samp></dd>".format(k, v))
print("    </dl>")

# Write out regexes as csv file:
with open("software_versions.csv", "w") as f:
    for k, v in results.items():
        f.write("{}\t{}\n".format(k, v))


================================================
FILE: conf/base.config
================================================
/*
 * -------------------------------------------------
 *  nf-core/eager Nextflow base config file
 * -------------------------------------------------
 * A 'blank slate' config file, appropriate for general
 * use on most high performace compute environments.
 * Assumes that all software is installed and available
 * on the PATH. Runs in `local` mode - all jobs will be
 * run on the logged in environment.
 */

process {
  cpus = { check_max( 1 * task.attempt, 'cpus' ) }
  memory = { check_max( 7.GB * task.attempt, 'memory' ) }
  time = { check_max( 24.h * task.attempt, 'time' ) }

  errorStrategy = { task.exitStatus in [143,137,104,134,139, 140] ? 'retry' : 'finish' }
  maxRetries = 3
  maxErrors = '-1'

  // Process-specific resource requirements
  // NOTE - Only one of the labels below are used in the fastqc process in the main script.
  //        If possible, it would be nice to keep the same label naming convention when
  //        adding in your processes.
  // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors

  // Generic resource requirements - s(ingle)c(ore)/m(ulti)c(ore)

  withLabel:'sc_tiny'{
      cpus = { check_max( 1, 'cpus' ) }
      memory = { check_max( 1.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'sc_small'{
      cpus = { check_max( 1, 'cpus' ) }
      memory = { check_max( 4.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'sc_medium'{
      cpus = { check_max( 1, 'cpus' ) }
      memory = { check_max( 8.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'mc_small'{
      cpus = { check_max( 2 * task.attempt, 'cpus' ) }
      memory = { check_max( 4.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'mc_medium' {
      cpus = { check_max( 4 * task.attempt, 'cpus' ) }
      memory = { check_max( 8.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'mc_large'{
      cpus = { check_max( 8 * task.attempt, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  withLabel:'mc_huge'{
      cpus = { check_max( 32 * task.attempt, 'cpus' ) }
      memory = { check_max( 256.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
  }

  // Process-specific resource requirements (others leave at default, e.g. Fastqc)
  withName:get_software_versions {
    cache = false
  }

  withName:qualimap{
    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : task.exitStatus in [255] ? 'ignore' : 'finish' }
  }

  withName:preseq {
    errorStrategy = 'ignore'
  }

  withName:damageprofiler {
    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' }
  }

  // Add 1 retry for certain java tools as not enough heap space java errors gives exit code 1
  withName: dedup {
    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' } 
  }
  
  withName: markduplicates {
    errorStrategy = { task.exitStatus in [143,137, 140] ? 'retry' : 'finish' } 
  }

  // Add 1 retry as not enough heapspace java error gives exit code 1
  withName: malt {
    errorStrategy = { task.exitStatus in [1,143,137,104,134,139, 140] ? 'retry' : 'finish' } 
  }

  // other process specific exit statuses
  withName: nuclear_contamination {
    errorStrategy = { task.exitStatus in [143,137,104,134,139, 140] ? 'ignore' : 'retry' }
  }

}

params {
  // Defaults only, expecting to be overwritten
  max_memory = 128.GB
  max_cpus = 16
  max_time = 240.h
  igenomes_base = 's3://ngi-igenomes/igenomes/'
}


================================================
FILE: conf/benchmarking_human.config
================================================
/*
 * -------------------------------------------------
 *  Nextflow config file for running tests
 * -------------------------------------------------
 * Defines bundled input files and everything required
 * to run a fast and simple test. Use as follows:
 * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
 */

params {
   config_profile_name = 'nf-core/eager benchmarking - human profile'
   config_profile_description = "A 'fullsized' benchmarking profile for deepish Human sequencing aDNA data" 

   //Input data
   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_human.tsv'
   // Genome reference
   fasta = 'https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz'

   run_bam_filtering = true
   bam_unmapped_type = 'discard'
   bam_mapping_quality_threshold = 30

   dedupper = 'markduplicates'
  
   run_trim_bam = true
   bamutils_clip_double_stranded_none_udg_left = 1
   bamutils_clip_double_stranded_none_udg_right = 1
   
   // JAR will need to be downloaded first!
   run_genotyping = true
   genotyping_tool = 'ug'
   genotyping_source = 'trimmed'
   gatk_call_conf = 20

   run_sexdeterrmine = true
   sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_HG19.0based.bed.gz'

   run_nuclear_contamination = true
   contamination_chrom_name = 'chrX'

   run_mtnucratio = true
}

process {
   withName:'makeBWAIndex'{
      time = { check_max( 4.h * task.attempt, 'time' ) }
   }
   withName:'adapter_removal'{
      cpus = { check_max( 8, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 2.h * task.attempt, 'time' ) }
   }
   withName:'bwa'{
      cpus = { check_max( 8, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
   }
   withName:'markDup'{
      cpus = { check_max( 16, 'cpus' ) }
      memory = { check_max( 64.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
   }
   withName:'damageprofiler'{
      cpus = 1
      memory = { check_max( 8.GB * task.attempt, 'memory' ) }
      time = { check_max( 2.h * task.attempt, 'time' ) }
   }
}


================================================
FILE: conf/benchmarking_vikingfish.config
================================================
/*
 * -------------------------------------------------
 *  Nextflow config file for running tests
 * -------------------------------------------------
 * Defines bundled input files and everything required
 * to run a fast and simple test. Use as follows:
 * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
 */

params {
   config_profile_name = 'nf-core/eager benchmarking - Viking Fish profile'
   config_profile_description = "A 'fullsized' benchmarking profile for deepish sequencing aDNA data" 
   
   //Input data
   input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Benchmarking/benchmarking_vikingfish.tsv'   
   // Genome reference
   fasta = 's3://nf-core-awsmegatests/eager/ENA_Data_Fish/GCF_902167405.1_gadMor3.0_genomic.fna.gz'
   
   bwaalnn = 0.04
   bwaalnl = 1024
   
   run_bam_filtering = true
   bam_unmapped_type = 'discard'
   bam_mapping_quality_threshold = 25
     
   run_genotyping = true
   genotyping_tool = 'hc'
   genotyping_source = 'raw'
   gatk_ploidy = 2
}

process {
   withName:'adapter_removal'{
      cpus = { check_max( 8, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 2.h * task.attempt, 'time' ) }
   }
   withName:'bwa'{
      cpus = { check_max( 8, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 8.h * task.attempt, 'time' ) }
   }
   withName:'dedup'{
      cpus = { check_max( 8, 'cpus' ) }
      memory = { check_max( 16.GB * task.attempt, 'memory' ) }
      time = { check_max( 4.h * task.attempt, 'time' ) }
   }
   withName:'genotyping_hc'{
     cpus = { check_max( 8, 'cpus' ) }
     memory = { check_max( 16.GB * task.attempt, 'memory' ) }
     time = { check_max( 8.h * task.attempt, 'time' ) }
   }

}


================================================
FILE: conf/igenomes.config
================================================
/*
 * -------------------------------------------------
 *  Nextflow config file for iGenomes paths
 * -------------------------------------------------
 * Defines reference genomes, using iGenome paths
 * Can be used by any config that customises the base
 * path using $params.igenomes_base / --igenomes_base
 */

params {
  // illumina iGenomes reference file paths
  genomes {
    'GRCh37' {
      fasta       = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
      mito_name   = "MT"
      macs_gsize  = "2.7e9"
      blacklist   = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed"
    }
    'GRCh38' {
      fasta       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
      mito_name   = "chrM"
      macs_gsize  = "2.7e9"
      blacklist   = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
    }
    'GRCm38' {
      fasta       = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt"
      mito_name   = "MT"
      macs_gsize  = "1.87e9"
      blacklist   = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed"
    }
    'TAIR10' {
      fasta       = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt"
      mito_name   = "Mt"
    }
    'EB2' {
      fasta       = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt"
    }
    'UMD3.1' {
      fasta       = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt"
      mito_name   = "MT"
    }
    'WBcel235' {
      fasta       = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed"
      mito_name   = "MtDNA"
      macs_gsize  = "9e7"
    }
    'CanFam3.1' {
      fasta       = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt"
      mito_name   = "MT"
    }
    'GRCz10' {
      fasta       = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed"
      mito_name   = "MT"
    }
    'BDGP6' {
      fasta       = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed"
      mito_name   = "M"
      macs_gsize  = "1.2e8"
    }
    'EquCab2' {
      fasta       = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt"
      mito_name   = "MT"
    }
    'EB1' {
      fasta       = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt"
    }
    'Galgal4' {
      fasta       = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed"
      mito_name   = "MT"
    }
    'Gm01' {
      fasta       = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt"
    }
    'Mmul_1' {
      fasta       = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt"
      mito_name   = "MT"
    }
    'IRGSP-1.0' {
      fasta       = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed"
      mito_name   = "Mt"
    }
    'CHIMP2.1.4' {
      fasta       = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/"
      bismark     = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/"
      gtf         = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf"
      bed12       = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed"
      readme      = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt"
      mito_name   = "MT"
    }
    'Rnor_6.0' {
      fasta       = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa"
      bwa         = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa"
      bowtie2     = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/"
      star        = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex

Download .txt

gitextract_rzjf5t_w/

├── .gitattributes
├── .github/
│   ├── .dockstore.yml
│   ├── CONTRIBUTING.md
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   └── feature_request.md
│   ├── PULL_REQUEST_TEMPLATE/
│   │   └── pull_request_template.md
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── markdownlint.yml
│   ├── workflows/
│   │   ├── awsfulltest.yml
│   │   ├── awstest.yml
│   │   ├── branch.yml
│   │   ├── ci.yml
│   │   ├── linting.yml
│   │   ├── linting_comment.yml
│   │   ├── push_dockerhub_dev.yml
│   │   └── push_dockerhub_release.yml
│   └── yamllint.yml
├── .gitignore
├── .gitpod.yml
├── .nf-core-lint.yml
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── README.md
├── assets/
│   ├── angsd_resources/
│   │   ├── README
│   │   └── getALL.txt
│   ├── email_template.html
│   ├── email_template.txt
│   ├── multiqc_config.yaml
│   ├── nf-core_eager_dummy.txt
│   ├── nf-core_eager_dummy2.txt
│   ├── sendmail_template.txt
│   └── where_are_my_files.txt
├── bin/
│   ├── endorS.py
│   ├── extract_map_reads.py
│   ├── filter_bam_fragment_length.py
│   ├── kraken_parse.py
│   ├── markdown_to_html.py
│   ├── merge_kraken_res.py
│   ├── parse_snp_cov.py
│   ├── print_x_contamination.py
│   └── scrape_software_versions.py
├── conf/
│   ├── base.config
│   ├── benchmarking_human.config
│   ├── benchmarking_vikingfish.config
│   ├── igenomes.config
│   ├── test.config
│   ├── test_direct.config
│   ├── test_full.config
│   ├── test_resources.config
│   ├── test_stresstest_human.config
│   ├── test_tsv_bam.config
│   ├── test_tsv_complex.config
│   ├── test_tsv_fna.config
│   ├── test_tsv_humanbam.config
│   ├── test_tsv_kraken.config
│   └── test_tsv_pretrim.config
├── docs/
│   ├── README.md
│   ├── images/
│   │   ├── README.md
│   │   └── usage/
│   │       └── nfcore-eager_tsv_template.tsv
│   ├── output.md
│   └── usage.md
├── environment.yml
├── lib/
│   ├── Checks.groovy
│   ├── Completion.groovy
│   ├── Headers.groovy
│   ├── NfcoreSchema.groovy
│   └── nfcore_external_java_deps.jar
├── main.nf
├── nextflow.config
└── nextflow_schema.json

Download .txt

SYMBOL INDEX (21 symbols across 6 files)

FILE: bin/extract_map_reads.py
  function _get_args (line 14) | def _get_args():
  function extract_mapped (line 63) | def extract_mapped(bamfile, merged):
  function read_write_fq (line 92) | def read_write_fq(fq_in, fq_out, mapped_reads, mode, write_mode, proc):
  function check_remove_mode (line 130) | def check_remove_mode(mode):

FILE: bin/filter_bam_fragment_length.py
  function get_args (line 10) | def get_args():
  function getBasename (line 47) | def getBasename(file_name):
  function filter_bam (line 55) | def filter_bam(infile, outfile, fraglen, allreads):

FILE: bin/kraken_parse.py
  function _get_args (line 9) | def _get_args():
  function _get_basename (line 42) | def _get_basename(file_name):
  function parse_kraken (line 50) | def parse_kraken(infile, countlim):
  function write_output (line 79) | def write_output(resdict, infile, outfile):

FILE: bin/markdown_to_html.py
  function convert_markdown (line 10) | def convert_markdown(in_fn):
  function wrap_html (line 24) | def wrap_html(contents):
  function parse_args (line 74) | def parse_args(args=None):
  function main (line 83) | def main(args=None):

FILE: bin/merge_kraken_res.py
  function _get_args (line 11) | def _get_args():
  function get_csv (line 36) | def get_csv():
  function _get_basename (line 43) | def _get_basename(file_name):
  function merge_csv (line 51) | def merge_csv(all_csv):
  function write_csv (line 60) | def write_csv(pd_dataframe, outfile):

FILE: bin/print_x_contamination.py
  function make_float (line 13) | def make_float(x):

Download .json

Condensed preview — 73 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (824K chars).

[
  {
    "path": ".gitattributes",
    "chars": 36,
    "preview": "*.config linguist-language=nextflow\n"
  },
  {
    "path": ".github/.dockstore.yml",
    "chars": 153,
    "preview": "# Dockstore config version, not pipeline version\nversion: 1.2\nworkflows:\n  - subclass: nfl\n    primaryDescriptorPath: /n"
  },
  {
    "path": ".github/CONTRIBUTING.md",
    "chars": 11269,
    "preview": "# nf-core/eager: Contributing Guidelines\n\nHi there!\nMany thanks for taking an interest in improving nf-core/eager.\n\nWe t"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 1812,
    "preview": "---\nname: Bug report\nabout: Report something that is broken or incorrect\nlabels: bug\n---\n\n<!--\n# nf-core/eager bug repor"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 284,
    "preview": "blank_issues_enabled: false\ncontact_links:\n  - name: Join nf-core\n    url: https://nf-co.re/join\n    about: Please join "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 814,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for the nf-core/eager pipeline\nlabels: enhancement\n---\n\n<!--\n# nf-core/"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE/pull_request_template.md",
    "chars": 1123,
    "preview": "Many thanks to contributing to nf-core/eager!\n\nPlease fill in the appropriate checklist below (delete whatever is not re"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 1479,
    "preview": "<!--\n# nf-core/eager pull request\n\nMany thanks for contributing to nf-core/eager!\n\nPlease fill in the appropriate checkl"
  },
  {
    "path": ".github/markdownlint.yml",
    "chars": 226,
    "preview": "# Markdownlint configuration file\ndefault: true\nline-length: false\nno-duplicate-header:\n    siblings_only: true\nno-inlin"
  },
  {
    "path": ".github/workflows/awsfulltest.yml",
    "chars": 1865,
    "preview": "name: nf-core AWS full size tests\n# This workflow is triggered on published releases.\n# It can be additionally triggered"
  },
  {
    "path": ".github/workflows/awstest.yml",
    "chars": 1648,
    "preview": "name: nf-core AWS test\n# This workflow is triggered on push to the master branch.\n# It can be additionally triggered man"
  },
  {
    "path": ".github/workflows/branch.yml",
    "chars": 2244,
    "preview": "name: nf-core branch protection\n# This workflow is triggered on PRs to master branch on the repository\n# It fails when s"
  },
  {
    "path": ".github/workflows/ci.yml",
    "chars": 15583,
    "preview": "name: nf-core CI\n# This workflow runs the pipeline with the minimal test dataset to check that it completes without any "
  },
  {
    "path": ".github/workflows/linting.yml",
    "chars": 4708,
    "preview": "name: nf-core linting\n# This workflow is triggered on pushes and PRs to the repository.\n# It runs the `nf-core lint` and"
  },
  {
    "path": ".github/workflows/linting_comment.yml",
    "chars": 810,
    "preview": "\nname: nf-core linting comment\n# This workflow is triggered after the linting action is complete\n# It posts an automated"
  },
  {
    "path": ".github/workflows/push_dockerhub_dev.yml",
    "chars": 921,
    "preview": "name: nf-core Docker push (dev)\n# This builds the docker image and pushes it to DockerHub\n# Runs on nf-core repo release"
  },
  {
    "path": ".github/workflows/push_dockerhub_release.yml",
    "chars": 1102,
    "preview": "name: nf-core Docker push (release)\n# This builds the docker image and pushes it to DockerHub\n# Runs on nf-core repo rel"
  },
  {
    "path": ".github/yamllint.yml",
    "chars": 120,
    "preview": "rules:\n  document-start: disable\n  comments: disable\n  truthy: disable\n  line-length: disable\n  empty-lines: disable\n  \n"
  },
  {
    "path": ".gitignore",
    "chars": 131,
    "preview": ".nextflow*\nwork/\ndata/\nresults/\n.DS_Store\ntests/\ntesting/\ntesting*\n*.pyc\nmain_playground.nf\n.vscode\n*.code-workspace\nnf-"
  },
  {
    "path": ".gitpod.yml",
    "chars": 917,
    "preview": "image: nfcore/gitpod:latest\n\nvscode:\n  extensions: # based on nf-core.nf-core-extensionpack\n    - codezombiech.gitignore"
  },
  {
    "path": ".nf-core-lint.yml",
    "chars": 170,
    "preview": "files_unchanged:\n  - assets/multiqc_config.yaml\n  - .github/CONTRIBUTING.md\n  - .github/ISSUE_TEMPLATE/bug_report.md\n  -"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 41402,
    "preview": "# nf-core/eager: Changelog\n\nThe format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)\nand this proj"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 9040,
    "preview": "# Code of Conduct at nf-core (v1.0)\n\n## Our Pledge\n\nIn the interest of fostering an open, collaborative, and welcoming e"
  },
  {
    "path": "Dockerfile",
    "chars": 564,
    "preview": "FROM nfcore/base:1.14\nLABEL authors=\"The nf-core/eager community\" \\\n      description=\"Docker image containing all softw"
  },
  {
    "path": "LICENSE",
    "chars": 1079,
    "preview": "MIT License\n\nCopyright (c) The nf-core/eager community\n\nPermission is hereby granted, free of charge, to any person obta"
  },
  {
    "path": "README.md",
    "chars": 22022,
    "preview": "# ![nf-core/eager](docs/images/nf-core_eager_logo_outline_drop.png)\n\n**A fully reproducible and state-of-the-art ancient"
  },
  {
    "path": "assets/angsd_resources/README",
    "chars": 2124,
    "preview": "**These files are originally part of angsd (release 0.931). They have been added here for convinence.**\n\nThis file descr"
  },
  {
    "path": "assets/angsd_resources/getALL.txt",
    "chars": 576,
    "preview": "F=\"ASW CEU CHB CHD GIH JPT LWK MEX MKK TSI YRI\"\nfor f in $F\ndo \n    echo $f\n    wget http://hapmap.ncbi.nlm.nih.gov/down"
  },
  {
    "path": "assets/email_template.html",
    "chars": 2496,
    "preview": "<html>\n<head>\n  <meta charset=\"utf-8\">\n  <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n  <meta name=\"viewport\" c"
  },
  {
    "path": "assets/email_template.txt",
    "chars": 1124,
    "preview": "----------------------------------------------------\n                                        ,--./,-.\n        ___     __"
  },
  {
    "path": "assets/multiqc_config.yaml",
    "chars": 8057,
    "preview": "custom_logo: \"nf-core_eager_logo_outline_drop.png\"\ncustom_logo_url: https://github.com/nf-core/eager/\ncustom_logo_title:"
  },
  {
    "path": "assets/nf-core_eager_dummy.txt",
    "chars": 138,
    "preview": "This is a dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if we actu"
  },
  {
    "path": "assets/nf-core_eager_dummy2.txt",
    "chars": 145,
    "preview": "This is a second dummy file for when we need a 'fake' file to satisfy all nextflow channel inputs being filled, even if "
  },
  {
    "path": "assets/sendmail_template.txt",
    "chars": 1104,
    "preview": "To: $email\nSubject: $subject\nMime-Version: 1.0\nContent-Type: multipart/related;boundary=\"nfcoremimeboundary\"\n\n--nfcoremi"
  },
  {
    "path": "assets/where_are_my_files.txt",
    "chars": 1147,
    "preview": "=====================\n Where are my files?\n=====================\n\nBy default, the nfcore/eager pipeline does not save la"
  },
  {
    "path": "bin/endorS.py",
    "chars": 5404,
    "preview": "#!/usr/bin/env python3\n\n# Written by Aida Andrades Valtueña and released under MIT license. \n# See git repository (https"
  },
  {
    "path": "bin/extract_map_reads.py",
    "chars": 6043,
    "preview": "#!/usr/bin/env python3\n\n# Written by Maxime Borry and released under the MIT license.\n# See git repository (https://gith"
  },
  {
    "path": "bin/filter_bam_fragment_length.py",
    "chars": 2240,
    "preview": "#!/usr/bin/env python3\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://git"
  },
  {
    "path": "bin/kraken_parse.py",
    "chars": 3076,
    "preview": "#!/usr/bin/env python\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://gith"
  },
  {
    "path": "bin/markdown_to_html.py",
    "chars": 2729,
    "preview": "#!/usr/bin/env python\nfrom __future__ import print_function\nimport argparse\nimport markdown\nimport os\nimport sys\nimport "
  },
  {
    "path": "bin/merge_kraken_res.py",
    "chars": 1947,
    "preview": "#!/usr/bin/env python\n\n# Written by Maxime Borry and released under the MIT license. \n# See git repository (https://gith"
  },
  {
    "path": "bin/parse_snp_cov.py",
    "chars": 860,
    "preview": "#!/usr/bin/env python3\n\n# Written by Thiseas C. Lamnidis and released under the MIT license. \n# See git repository (http"
  },
  {
    "path": "bin/print_x_contamination.py",
    "chars": 3966,
    "preview": "#!/usr/bin/env python3\n\n# Written by Thiseas C. Lamnidis and released under the MIT license. \n# See git repository (http"
  },
  {
    "path": "bin/scrape_software_versions.py",
    "chars": 5536,
    "preview": "#!/usr/bin/env python\nfrom __future__ import print_function\nfrom collections import OrderedDict\nimport re\n\nregexes = {\n "
  },
  {
    "path": "conf/base.config",
    "chars": 3846,
    "preview": "/*\n * -------------------------------------------------\n *  nf-core/eager Nextflow base config file\n * -----------------"
  },
  {
    "path": "conf/benchmarking_human.config",
    "chars": 2298,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/benchmarking_vikingfish.config",
    "chars": 1821,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/igenomes.config",
    "chars": 31553,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for iGenomes paths\n * -----------------"
  },
  {
    "path": "conf/test.config",
    "chars": 911,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_direct.config",
    "chars": 938,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_full.config",
    "chars": 2014,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running full-size tests\n * --------"
  },
  {
    "path": "conf/test_resources.config",
    "chars": 1750,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_stresstest_human.config",
    "chars": 2285,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_bam.config",
    "chars": 900,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_complex.config",
    "chars": 931,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_fna.config",
    "chars": 901,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_humanbam.config",
    "chars": 1366,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_kraken.config",
    "chars": 1109,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "conf/test_tsv_pretrim.config",
    "chars": 911,
    "preview": "/*\n * -------------------------------------------------\n *  Nextflow config file for running tests\n * ------------------"
  },
  {
    "path": "docs/README.md",
    "chars": 1055,
    "preview": "# nf-core/eager: Documentation\n\nThe nf-core/eager documentation is split into the following pages:\n\n* [Usage](usage.md)\n"
  },
  {
    "path": "docs/images/README.md",
    "chars": 360,
    "preview": "# Documentation Images Information\n\nThe font used for all documentation images is Kalam by Indian Type Foundry and is re"
  },
  {
    "path": "docs/images/usage/nfcore-eager_tsv_template.tsv",
    "chars": 99,
    "preview": "Sample_Name\tLibrary_ID\tLane\tColour_Chemistry\tSeqType\tOrganism\tStrandedness\tUDG_Treatment\tR1\tR2\tBAM\n"
  },
  {
    "path": "docs/output.md",
    "chars": 77041,
    "preview": "# nf-core/eager: Output\n\n## Introduction\n\nThe output of nf-core/eager primarily consists of the following main component"
  },
  {
    "path": "docs/usage.md",
    "chars": 132765,
    "preview": "# nf-core/eager: Usage\n\n## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/eager/usag"
  },
  {
    "path": "environment.yml",
    "chars": 1700,
    "preview": "# You can use this file to create a conda environment for this pipeline:\n#   conda env create -f environment.yml\nname: n"
  },
  {
    "path": "lib/Checks.groovy",
    "chars": 3986,
    "preview": "import org.yaml.snakeyaml.Yaml\n\n/*\n * This file holds several functions used to perform standard checks for the nf-core "
  },
  {
    "path": "lib/Completion.groovy",
    "chars": 6505,
    "preview": "/*\n * Functions to be run on completion of pipeline\n */\n\nclass Completion {\n    static void email(workflow, params, summ"
  },
  {
    "path": "lib/Headers.groovy",
    "chars": 2139,
    "preview": "/*\n * This file holds several functions used to render the nf-core ANSI header.\n */\n\nclass Headers {\n\n    private static"
  },
  {
    "path": "lib/NfcoreSchema.groovy",
    "chars": 24305,
    "preview": "/*\n * This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf"
  },
  {
    "path": "main.nf",
    "chars": 156722,
    "preview": "#!/usr/bin/env nextflow\n/*\n---------------------------------------------------------------------------------------------"
  },
  {
    "path": "nextflow.config",
    "chars": 13588,
    "preview": "/*\n * -------------------------------------------------\n *  nf-core/eager Nextflow config file\n * ----------------------"
  },
  {
    "path": "nextflow_schema.json",
    "chars": 149477,
    "preview": "{\n    \"$schema\": \"http://json-schema.org/draft-07/schema\",\n    \"$id\": \"https://raw.githubusercontent.com/nf-core/eager/m"
  }
]

// ... and 1 more files (download for full content)

About this extraction

This page contains the full source code of the nf-core/eager GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 73 files (774.9 KB), approximately 205.8k tokens, and a symbol index with 21 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo