[
  {
    "path": ".gitignore",
    "content": "# ignore ide folders\n.idea/\n.vscode/\n\n# ignore python generated files\n*.pyc\n"
  },
  {
    "path": ".gitmodules",
    "content": ""
  },
  {
    "path": ".travis.yml",
    "content": "language: python\n\npython:\n  - \"3.6\"\n\ninstall:\n  - pip install pytest\n  - pip install coverage\n  - pip install pytest-cov\n  - python setup.py install\n\nscript:\n  - py.test --cov=./\n\nafter_success:\n  - bash <(curl -s https://codecov.io/bash)"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, we as\ncontributors and maintainers pledge to making participation in our project and\nour community a harassment-free experience for everyone, regardless of age, body\nsize, disability, ethnicity, sex characteristics, gender identity and expression,\nlevel of experience, education, socio-economic status, nationality, personal\nappearance, race, religion, or sexual identity and orientation.\n\n## Our Standards\n\nExamples of behavior that contributes to creating a positive environment\ninclude:\n\n* Using welcoming and inclusive language\n* Being respectful of differing viewpoints and experiences\n* Gracefully accepting constructive criticism\n* Focusing on what is best for the community\n* Showing empathy towards other community members\n\nExamples of unacceptable behavior by participants include:\n\n* The use of sexualized language or imagery and unwelcome sexual attention or\n advances\n* Trolling, insulting/derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or electronic\n address, without explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\n professional setting\n\n## Our Responsibilities\n\nProject maintainers are responsible for clarifying the standards of acceptable\nbehavior and are expected to take appropriate and fair corrective action in\nresponse to any instances of unacceptable behavior.\n\nProject maintainers have the right and responsibility to remove, edit, or\nreject comments, commits, code, wiki edits, issues, and other contributions\nthat are not aligned to this Code of Conduct, or to ban temporarily or\npermanently any contributor for other behaviors that they deem inappropriate,\nthreatening, offensive, or harmful.\n\n## Scope\n\nThis Code of Conduct applies both within project spaces and in public spaces\nwhen an individual is representing the project or its community. Examples of\nrepresenting a project or community include using an official project e-mail\naddress, posting via an official social media account, or acting as an appointed\nrepresentative at an online or offline event. Representation of a project may be\nfurther defined and clarified by project maintainers.\n\n## Enforcement\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported by contacting the project team at odiogosilva@gmail.com. All\ncomplaints will be reviewed and investigated and will result in a response that\nis deemed necessary and appropriate to the circumstances. The project team is\nobligated to maintain confidentiality with regard to the reporter of an incident.\nFurther details of specific enforcement policies may be posted separately.\n\nProject maintainers who do not follow or enforce the Code of Conduct in good\nfaith may face temporary or permanent repercussions as determined by other\nmembers of the project's leadership.\n\n## Attribution\n\nThis Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,\navailable at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html\n\n[homepage]: https://www.contributor-covenant.org\n\nFor answers to common questions about this code of conduct, see\nhttps://www.contributor-covenant.org/faq\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to Assemblerflow\n\nThank you for your interest in contributing to Assemblerflow. All kinds of \ncontributions are welcome :tada:!\n\n## Issues\n\nFeel free to [submit issues](https://github.com/assemblerflow/assemblerflow/issues)\nand enhancement requests.\n\n## Git branch convention\n\nContributions with new code (not documentation), should follow this standard procedure:\n\n    <new_branch> >> dev >> master\n\n1. Create a new branch for the new feature/bug fix.\n2. One the new code is finished and **passes all automated tests**, it will be \nmerged into the `dev` branch. This branch is where all the new code lives and \nserves as an incubator stage while field tests are performed to ensure that everything\nis working correctly.\n3. Merging the `dev` code into `master` is associated with a new release. Therefore, \nthe `master` branch is basically the same of the latest official release in PyPI. \n\n## Contributing\n\nIn general, we follow the \"fork-and-pull\" Git workflow.\n\n 1. **Fork** the repo on GitHub\n 2. **Clone** the project to your own machine\n 3. **Commit** changes to your own branch\n 4. **Push** your work back up to your fork\n 5. Submit a **Pull request** so that we can review your changes. Pull requests will be merged first into the `dev` branch to perform some field tests before being merged into `master` \n\nNOTE: Be sure to merge the latest from \"upstream\" before making a pull request!\n  "
  },
  {
    "path": "LICENSE",
    "content": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>\n Everyone is permitted to copy and distribute verbatim copies\n of this license document, but changing it is not allowed.\n\n                            Preamble\n\n  The GNU General Public License is a free, copyleft license for\nsoftware and other kinds of works.\n\n  The licenses for most software and other practical works are designed\nto take away your freedom to share and change the works.  By contrast,\nthe GNU General Public License is intended to guarantee your freedom to\nshare and change all versions of a program--to make sure it remains free\nsoftware for all its users.  We, the Free Software Foundation, use the\nGNU General Public License for most of our software; it applies also to\nany other work released this way by its authors.  You can apply it to\nyour programs, too.\n\n  When we speak of free software, we are referring to freedom, not\nprice.  Our General Public Licenses are designed to make sure that you\nhave the freedom to distribute copies of free software (and charge for\nthem if you wish), that you receive source code or can get it if you\nwant it, that you can change the software or use pieces of it in new\nfree programs, and that you know you can do these things.\n\n  To protect your rights, we need to prevent others from denying you\nthese rights or asking you to surrender the rights.  Therefore, you have\ncertain responsibilities if you distribute copies of the software, or if\nyou modify it: responsibilities to respect the freedom of others.\n\n  For example, if you distribute copies of such a program, whether\ngratis or for a fee, you must pass on to the recipients the same\nfreedoms that you received.  You must make sure that they, too, receive\nor can get the source code.  And you must show them these terms so they\nknow their rights.\n\n  Developers that use the GNU GPL protect your rights with two steps:\n(1) assert copyright on the software, and (2) offer you this License\ngiving you legal permission to copy, distribute and/or modify it.\n\n  For the developers' and authors' protection, the GPL clearly explains\nthat there is no warranty for this free software.  For both users' and\nauthors' sake, the GPL requires that modified versions be marked as\nchanged, so that their problems will not be attributed erroneously to\nauthors of previous versions.\n\n  Some devices are designed to deny users access to install or run\nmodified versions of the software inside them, although the manufacturer\ncan do so.  This is fundamentally incompatible with the aim of\nprotecting users' freedom to change the software.  The systematic\npattern of such abuse occurs in the area of products for individuals to\nuse, which is precisely where it is most unacceptable.  Therefore, we\nhave designed this version of the GPL to prohibit the practice for those\nproducts.  If such problems arise substantially in other domains, we\nstand ready to extend this provision to those domains in future versions\nof the GPL, as needed to protect the freedom of users.\n\n  Finally, every program is threatened constantly by software patents.\nStates should not allow patents to restrict development and use of\nsoftware on general-purpose computers, but in those that do, we wish to\navoid the special danger that patents applied to a free program could\nmake it effectively proprietary.  To prevent this, the GPL assures that\npatents cannot be used to render the program non-free.\n\n  The precise terms and conditions for copying, distribution and\nmodification follow.\n\n                       TERMS AND CONDITIONS\n\n  0. Definitions.\n\n  \"This License\" refers to version 3 of the GNU General Public License.\n\n  \"Copyright\" also means copyright-like laws that apply to other kinds of\nworks, such as semiconductor masks.\n\n  \"The Program\" refers to any copyrightable work licensed under this\nLicense.  Each licensee is addressed as \"you\".  \"Licensees\" and\n\"recipients\" may be individuals or organizations.\n\n  To \"modify\" a work means to copy from or adapt all or part of the work\nin a fashion requiring copyright permission, other than the making of an\nexact copy.  The resulting work is called a \"modified version\" of the\nearlier work or a work \"based on\" the earlier work.\n\n  A \"covered work\" means either the unmodified Program or a work based\non the Program.\n\n  To \"propagate\" a work means to do anything with it that, without\npermission, would make you directly or secondarily liable for\ninfringement under applicable copyright law, except executing it on a\ncomputer or modifying a private copy.  Propagation includes copying,\ndistribution (with or without modification), making available to the\npublic, and in some countries other activities as well.\n\n  To \"convey\" a work means any kind of propagation that enables other\nparties to make or receive copies.  Mere interaction with a user through\na computer network, with no transfer of a copy, is not conveying.\n\n  An interactive user interface displays \"Appropriate Legal Notices\"\nto the extent that it includes a convenient and prominently visible\nfeature that (1) displays an appropriate copyright notice, and (2)\ntells the user that there is no warranty for the work (except to the\nextent that warranties are provided), that licensees may convey the\nwork under this License, and how to view a copy of this License.  If\nthe interface presents a list of user commands or options, such as a\nmenu, a prominent item in the list meets this criterion.\n\n  1. Source Code.\n\n  The \"source code\" for a work means the preferred form of the work\nfor making modifications to it.  \"Object code\" means any non-source\nform of a work.\n\n  A \"Standard Interface\" means an interface that either is an official\nstandard defined by a recognized standards body, or, in the case of\ninterfaces specified for a particular programming language, one that\nis widely used among developers working in that language.\n\n  The \"System Libraries\" of an executable work include anything, other\nthan the work as a whole, that (a) is included in the normal form of\npackaging a Major Component, but which is not part of that Major\nComponent, and (b) serves only to enable use of the work with that\nMajor Component, or to implement a Standard Interface for which an\nimplementation is available to the public in source code form.  A\n\"Major Component\", in this context, means a major essential component\n(kernel, window system, and so on) of the specific operating system\n(if any) on which the executable work runs, or a compiler used to\nproduce the work, or an object code interpreter used to run it.\n\n  The \"Corresponding Source\" for a work in object code form means all\nthe source code needed to generate, install, and (for an executable\nwork) run the object code and to modify the work, including scripts to\ncontrol those activities.  However, it does not include the work's\nSystem Libraries, or general-purpose tools or generally available free\nprograms which are used unmodified in performing those activities but\nwhich are not part of the work.  For example, Corresponding Source\nincludes interface definition files associated with source files for\nthe work, and the source code for shared libraries and dynamically\nlinked subprograms that the work is specifically designed to require,\nsuch as by intimate data communication or control flow between those\nsubprograms and other parts of the work.\n\n  The Corresponding Source need not include anything that users\ncan regenerate automatically from other parts of the Corresponding\nSource.\n\n  The Corresponding Source for a work in source code form is that\nsame work.\n\n  2. Basic Permissions.\n\n  All rights granted under this License are granted for the term of\ncopyright on the Program, and are irrevocable provided the stated\nconditions are met.  This License explicitly affirms your unlimited\npermission to run the unmodified Program.  The output from running a\ncovered work is covered by this License only if the output, given its\ncontent, constitutes a covered work.  This License acknowledges your\nrights of fair use or other equivalent, as provided by copyright law.\n\n  You may make, run and propagate covered works that you do not\nconvey, without conditions so long as your license otherwise remains\nin force.  You may convey covered works to others for the sole purpose\nof having them make modifications exclusively for you, or provide you\nwith facilities for running those works, provided that you comply with\nthe terms of this License in conveying all material for which you do\nnot control copyright.  Those thus making or running the covered works\nfor you must do so exclusively on your behalf, under your direction\nand control, on terms that prohibit them from making any copies of\nyour copyrighted material outside their relationship with you.\n\n  Conveying under any other circumstances is permitted solely under\nthe conditions stated below.  Sublicensing is not allowed; section 10\nmakes it unnecessary.\n\n  3. Protecting Users' Legal Rights From Anti-Circumvention Law.\n\n  No covered work shall be deemed part of an effective technological\nmeasure under any applicable law fulfilling obligations under article\n11 of the WIPO copyright treaty adopted on 20 December 1996, or\nsimilar laws prohibiting or restricting circumvention of such\nmeasures.\n\n  When you convey a covered work, you waive any legal power to forbid\ncircumvention of technological measures to the extent such circumvention\nis effected by exercising rights under this License with respect to\nthe covered work, and you disclaim any intention to limit operation or\nmodification of the work as a means of enforcing, against the work's\nusers, your or third parties' legal rights to forbid circumvention of\ntechnological measures.\n\n  4. Conveying Verbatim Copies.\n\n  You may convey verbatim copies of the Program's source code as you\nreceive it, in any medium, provided that you conspicuously and\nappropriately publish on each copy an appropriate copyright notice;\nkeep intact all notices stating that this License and any\nnon-permissive terms added in accord with section 7 apply to the code;\nkeep intact all notices of the absence of any warranty; and give all\nrecipients a copy of this License along with the Program.\n\n  You may charge any price or no price for each copy that you convey,\nand you may offer support or warranty protection for a fee.\n\n  5. Conveying Modified Source Versions.\n\n  You may convey a work based on the Program, or the modifications to\nproduce it from the Program, in the form of source code under the\nterms of section 4, provided that you also meet all of these conditions:\n\n    a) The work must carry prominent notices stating that you modified\n    it, and giving a relevant date.\n\n    b) The work must carry prominent notices stating that it is\n    released under this License and any conditions added under section\n    7.  This requirement modifies the requirement in section 4 to\n    \"keep intact all notices\".\n\n    c) You must license the entire work, as a whole, under this\n    License to anyone who comes into possession of a copy.  This\n    License will therefore apply, along with any applicable section 7\n    additional terms, to the whole of the work, and all its parts,\n    regardless of how they are packaged.  This License gives no\n    permission to license the work in any other way, but it does not\n    invalidate such permission if you have separately received it.\n\n    d) If the work has interactive user interfaces, each must display\n    Appropriate Legal Notices; however, if the Program has interactive\n    interfaces that do not display Appropriate Legal Notices, your\n    work need not make them do so.\n\n  A compilation of a covered work with other separate and independent\nworks, which are not by their nature extensions of the covered work,\nand which are not combined with it such as to form a larger program,\nin or on a volume of a storage or distribution medium, is called an\n\"aggregate\" if the compilation and its resulting copyright are not\nused to limit the access or legal rights of the compilation's users\nbeyond what the individual works permit.  Inclusion of a covered work\nin an aggregate does not cause this License to apply to the other\nparts of the aggregate.\n\n  6. Conveying Non-Source Forms.\n\n  You may convey a covered work in object code form under the terms\nof sections 4 and 5, provided that you also convey the\nmachine-readable Corresponding Source under the terms of this License,\nin one of these ways:\n\n    a) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by the\n    Corresponding Source fixed on a durable physical medium\n    customarily used for software interchange.\n\n    b) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by a\n    written offer, valid for at least three years and valid for as\n    long as you offer spare parts or customer support for that product\n    model, to give anyone who possesses the object code either (1) a\n    copy of the Corresponding Source for all the software in the\n    product that is covered by this License, on a durable physical\n    medium customarily used for software interchange, for a price no\n    more than your reasonable cost of physically performing this\n    conveying of source, or (2) access to copy the\n    Corresponding Source from a network server at no charge.\n\n    c) Convey individual copies of the object code with a copy of the\n    written offer to provide the Corresponding Source.  This\n    alternative is allowed only occasionally and noncommercially, and\n    only if you received the object code with such an offer, in accord\n    with subsection 6b.\n\n    d) Convey the object code by offering access from a designated\n    place (gratis or for a charge), and offer equivalent access to the\n    Corresponding Source in the same way through the same place at no\n    further charge.  You need not require recipients to copy the\n    Corresponding Source along with the object code.  If the place to\n    copy the object code is a network server, the Corresponding Source\n    may be on a different server (operated by you or a third party)\n    that supports equivalent copying facilities, provided you maintain\n    clear directions next to the object code saying where to find the\n    Corresponding Source.  Regardless of what server hosts the\n    Corresponding Source, you remain obligated to ensure that it is\n    available for as long as needed to satisfy these requirements.\n\n    e) Convey the object code using peer-to-peer transmission, provided\n    you inform other peers where the object code and Corresponding\n    Source of the work are being offered to the general public at no\n    charge under subsection 6d.\n\n  A separable portion of the object code, whose source code is excluded\nfrom the Corresponding Source as a System Library, need not be\nincluded in conveying the object code work.\n\n  A \"User Product\" is either (1) a \"consumer product\", which means any\ntangible personal property which is normally used for personal, family,\nor household purposes, or (2) anything designed or sold for incorporation\ninto a dwelling.  In determining whether a product is a consumer product,\ndoubtful cases shall be resolved in favor of coverage.  For a particular\nproduct received by a particular user, \"normally used\" refers to a\ntypical or common use of that class of product, regardless of the status\nof the particular user or of the way in which the particular user\nactually uses, or expects or is expected to use, the product.  A product\nis a consumer product regardless of whether the product has substantial\ncommercial, industrial or non-consumer uses, unless such uses represent\nthe only significant mode of use of the product.\n\n  \"Installation Information\" for a User Product means any methods,\nprocedures, authorization keys, or other information required to install\nand execute modified versions of a covered work in that User Product from\na modified version of its Corresponding Source.  The information must\nsuffice to ensure that the continued functioning of the modified object\ncode is in no case prevented or interfered with solely because\nmodification has been made.\n\n  If you convey an object code work under this section in, or with, or\nspecifically for use in, a User Product, and the conveying occurs as\npart of a transaction in which the right of possession and use of the\nUser Product is transferred to the recipient in perpetuity or for a\nfixed term (regardless of how the transaction is characterized), the\nCorresponding Source conveyed under this section must be accompanied\nby the Installation Information.  But this requirement does not apply\nif neither you nor any third party retains the ability to install\nmodified object code on the User Product (for example, the work has\nbeen installed in ROM).\n\n  The requirement to provide Installation Information does not include a\nrequirement to continue to provide support service, warranty, or updates\nfor a work that has been modified or installed by the recipient, or for\nthe User Product in which it has been modified or installed.  Access to a\nnetwork may be denied when the modification itself materially and\nadversely affects the operation of the network or violates the rules and\nprotocols for communication across the network.\n\n  Corresponding Source conveyed, and Installation Information provided,\nin accord with this section must be in a format that is publicly\ndocumented (and with an implementation available to the public in\nsource code form), and must require no special password or key for\nunpacking, reading or copying.\n\n  7. Additional Terms.\n\n  \"Additional permissions\" are terms that supplement the terms of this\nLicense by making exceptions from one or more of its conditions.\nAdditional permissions that are applicable to the entire Program shall\nbe treated as though they were included in this License, to the extent\nthat they are valid under applicable law.  If additional permissions\napply only to part of the Program, that part may be used separately\nunder those permissions, but the entire Program remains governed by\nthis License without regard to the additional permissions.\n\n  When you convey a copy of a covered work, you may at your option\nremove any additional permissions from that copy, or from any part of\nit.  (Additional permissions may be written to require their own\nremoval in certain cases when you modify the work.)  You may place\nadditional permissions on material, added by you to a covered work,\nfor which you have or can give appropriate copyright permission.\n\n  Notwithstanding any other provision of this License, for material you\nadd to a covered work, you may (if authorized by the copyright holders of\nthat material) supplement the terms of this License with terms:\n\n    a) Disclaiming warranty or limiting liability differently from the\n    terms of sections 15 and 16 of this License; or\n\n    b) Requiring preservation of specified reasonable legal notices or\n    author attributions in that material or in the Appropriate Legal\n    Notices displayed by works containing it; or\n\n    c) Prohibiting misrepresentation of the origin of that material, or\n    requiring that modified versions of such material be marked in\n    reasonable ways as different from the original version; or\n\n    d) Limiting the use for publicity purposes of names of licensors or\n    authors of the material; or\n\n    e) Declining to grant rights under trademark law for use of some\n    trade names, trademarks, or service marks; or\n\n    f) Requiring indemnification of licensors and authors of that\n    material by anyone who conveys the material (or modified versions of\n    it) with contractual assumptions of liability to the recipient, for\n    any liability that these contractual assumptions directly impose on\n    those licensors and authors.\n\n  All other non-permissive additional terms are considered \"further\nrestrictions\" within the meaning of section 10.  If the Program as you\nreceived it, or any part of it, contains a notice stating that it is\ngoverned by this License along with a term that is a further\nrestriction, you may remove that term.  If a license document contains\na further restriction but permits relicensing or conveying under this\nLicense, you may add to a covered work material governed by the terms\nof that license document, provided that the further restriction does\nnot survive such relicensing or conveying.\n\n  If you add terms to a covered work in accord with this section, you\nmust place, in the relevant source files, a statement of the\nadditional terms that apply to those files, or a notice indicating\nwhere to find the applicable terms.\n\n  Additional terms, permissive or non-permissive, may be stated in the\nform of a separately written license, or stated as exceptions;\nthe above requirements apply either way.\n\n  8. Termination.\n\n  You may not propagate or modify a covered work except as expressly\nprovided under this License.  Any attempt otherwise to propagate or\nmodify it is void, and will automatically terminate your rights under\nthis License (including any patent licenses granted under the third\nparagraph of section 11).\n\n  However, if you cease all violation of this License, then your\nlicense from a particular copyright holder is reinstated (a)\nprovisionally, unless and until the copyright holder explicitly and\nfinally terminates your license, and (b) permanently, if the copyright\nholder fails to notify you of the violation by some reasonable means\nprior to 60 days after the cessation.\n\n  Moreover, your license from a particular copyright holder is\nreinstated permanently if the copyright holder notifies you of the\nviolation by some reasonable means, this is the first time you have\nreceived notice of violation of this License (for any work) from that\ncopyright holder, and you cure the violation prior to 30 days after\nyour receipt of the notice.\n\n  Termination of your rights under this section does not terminate the\nlicenses of parties who have received copies or rights from you under\nthis License.  If your rights have been terminated and not permanently\nreinstated, you do not qualify to receive new licenses for the same\nmaterial under section 10.\n\n  9. Acceptance Not Required for Having Copies.\n\n  You are not required to accept this License in order to receive or\nrun a copy of the Program.  Ancillary propagation of a covered work\noccurring solely as a consequence of using peer-to-peer transmission\nto receive a copy likewise does not require acceptance.  However,\nnothing other than this License grants you permission to propagate or\nmodify any covered work.  These actions infringe copyright if you do\nnot accept this License.  Therefore, by modifying or propagating a\ncovered work, you indicate your acceptance of this License to do so.\n\n  10. Automatic Licensing of Downstream Recipients.\n\n  Each time you convey a covered work, the recipient automatically\nreceives a license from the original licensors, to run, modify and\npropagate that work, subject to this License.  You are not responsible\nfor enforcing compliance by third parties with this License.\n\n  An \"entity transaction\" is a transaction transferring control of an\norganization, or substantially all assets of one, or subdividing an\norganization, or merging organizations.  If propagation of a covered\nwork results from an entity transaction, each party to that\ntransaction who receives a copy of the work also receives whatever\nlicenses to the work the party's predecessor in interest had or could\ngive under the previous paragraph, plus a right to possession of the\nCorresponding Source of the work from the predecessor in interest, if\nthe predecessor has it or can get it with reasonable efforts.\n\n  You may not impose any further restrictions on the exercise of the\nrights granted or affirmed under this License.  For example, you may\nnot impose a license fee, royalty, or other charge for exercise of\nrights granted under this License, and you may not initiate litigation\n(including a cross-claim or counterclaim in a lawsuit) alleging that\nany patent claim is infringed by making, using, selling, offering for\nsale, or importing the Program or any portion of it.\n\n  11. Patents.\n\n  A \"contributor\" is a copyright holder who authorizes use under this\nLicense of the Program or a work on which the Program is based.  The\nwork thus licensed is called the contributor's \"contributor version\".\n\n  A contributor's \"essential patent claims\" are all patent claims\nowned or controlled by the contributor, whether already acquired or\nhereafter acquired, that would be infringed by some manner, permitted\nby this License, of making, using, or selling its contributor version,\nbut do not include claims that would be infringed only as a\nconsequence of further modification of the contributor version.  For\npurposes of this definition, \"control\" includes the right to grant\npatent sublicenses in a manner consistent with the requirements of\nthis License.\n\n  Each contributor grants you a non-exclusive, worldwide, royalty-free\npatent license under the contributor's essential patent claims, to\nmake, use, sell, offer for sale, import and otherwise run, modify and\npropagate the contents of its contributor version.\n\n  In the following three paragraphs, a \"patent license\" is any express\nagreement or commitment, however denominated, not to enforce a patent\n(such as an express permission to practice a patent or covenant not to\nsue for patent infringement).  To \"grant\" such a patent license to a\nparty means to make such an agreement or commitment not to enforce a\npatent against the party.\n\n  If you convey a covered work, knowingly relying on a patent license,\nand the Corresponding Source of the work is not available for anyone\nto copy, free of charge and under the terms of this License, through a\npublicly available network server or other readily accessible means,\nthen you must either (1) cause the Corresponding Source to be so\navailable, or (2) arrange to deprive yourself of the benefit of the\npatent license for this particular work, or (3) arrange, in a manner\nconsistent with the requirements of this License, to extend the patent\nlicense to downstream recipients.  \"Knowingly relying\" means you have\nactual knowledge that, but for the patent license, your conveying the\ncovered work in a country, or your recipient's use of the covered work\nin a country, would infringe one or more identifiable patents in that\ncountry that you have reason to believe are valid.\n\n  If, pursuant to or in connection with a single transaction or\narrangement, you convey, or propagate by procuring conveyance of, a\ncovered work, and grant a patent license to some of the parties\nreceiving the covered work authorizing them to use, propagate, modify\nor convey a specific copy of the covered work, then the patent license\nyou grant is automatically extended to all recipients of the covered\nwork and works based on it.\n\n  A patent license is \"discriminatory\" if it does not include within\nthe scope of its coverage, prohibits the exercise of, or is\nconditioned on the non-exercise of one or more of the rights that are\nspecifically granted under this License.  You may not convey a covered\nwork if you are a party to an arrangement with a third party that is\nin the business of distributing software, under which you make payment\nto the third party based on the extent of your activity of conveying\nthe work, and under which the third party grants, to any of the\nparties who would receive the covered work from you, a discriminatory\npatent license (a) in connection with copies of the covered work\nconveyed by you (or copies made from those copies), or (b) primarily\nfor and in connection with specific products or compilations that\ncontain the covered work, unless you entered into that arrangement,\nor that patent license was granted, prior to 28 March 2007.\n\n  Nothing in this License shall be construed as excluding or limiting\nany implied license or other defenses to infringement that may\notherwise be available to you under applicable patent law.\n\n  12. No Surrender of Others' Freedom.\n\n  If conditions are imposed on you (whether by court order, agreement or\notherwise) that contradict the conditions of this License, they do not\nexcuse you from the conditions of this License.  If you cannot convey a\ncovered work so as to satisfy simultaneously your obligations under this\nLicense and any other pertinent obligations, then as a consequence you may\nnot convey it at all.  For example, if you agree to terms that obligate you\nto collect a royalty for further conveying from those to whom you convey\nthe Program, the only way you could satisfy both those terms and this\nLicense would be to refrain entirely from conveying the Program.\n\n  13. Use with the GNU Affero General Public License.\n\n  Notwithstanding any other provision of this License, you have\npermission to link or combine any covered work with a work licensed\nunder version 3 of the GNU Affero General Public License into a single\ncombined work, and to convey the resulting work.  The terms of this\nLicense will continue to apply to the part which is the covered work,\nbut the special requirements of the GNU Affero General Public License,\nsection 13, concerning interaction through a network will apply to the\ncombination as such.\n\n  14. Revised Versions of this License.\n\n  The Free Software Foundation may publish revised and/or new versions of\nthe GNU General Public License from time to time.  Such new versions will\nbe similar in spirit to the present version, but may differ in detail to\naddress new problems or concerns.\n\n  Each version is given a distinguishing version number.  If the\nProgram specifies that a certain numbered version of the GNU General\nPublic License \"or any later version\" applies to it, you have the\noption of following the terms and conditions either of that numbered\nversion or of any later version published by the Free Software\nFoundation.  If the Program does not specify a version number of the\nGNU General Public License, you may choose any version ever published\nby the Free Software Foundation.\n\n  If the Program specifies that a proxy can decide which future\nversions of the GNU General Public License can be used, that proxy's\npublic statement of acceptance of a version permanently authorizes you\nto choose that version for the Program.\n\n  Later license versions may give you additional or different\npermissions.  However, no additional obligations are imposed on any\nauthor or copyright holder as a result of your choosing to follow a\nlater version.\n\n  15. Disclaimer of Warranty.\n\n  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY\nAPPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT\nHOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM \"AS IS\" WITHOUT WARRANTY\nOF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,\nTHE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR\nPURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM\nIS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\nALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n\n  16. Limitation of Liability.\n\n  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\nWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\nTHE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\nGENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\nUSE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\nDATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\nPARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\nEVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\nSUCH DAMAGES.\n\n  17. Interpretation of Sections 15 and 16.\n\n  If the disclaimer of warranty and limitation of liability provided\nabove cannot be given local legal effect according to their terms,\nreviewing courts shall apply local law that most closely approximates\nan absolute waiver of all civil liability in connection with the\nProgram, unless a warranty or assumption of liability accompanies a\ncopy of the Program in return for a fee.\n\n                     END OF TERMS AND CONDITIONS\n\n            How to Apply These Terms to Your New Programs\n\n  If you develop a new program, and you want it to be of the greatest\npossible use to the public, the best way to achieve this is to make it\nfree software which everyone can redistribute and change under these terms.\n\n  To do so, attach the following notices to the program.  It is safest\nto attach them to the start of each source file to most effectively\nstate the exclusion of warranty; and each file should have at least\nthe \"copyright\" line and a pointer to where the full notice is found.\n\n    <one line to give the program's name and a brief idea of what it does.>\n    Copyright (C) <year>  <name of author>\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation, either version 3 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program.  If not, see <http://www.gnu.org/licenses/>.\n\nAlso add information on how to contact you by electronic and paper mail.\n\n  If the program does terminal interaction, make it output a short\nnotice like this when it starts in an interactive mode:\n\n    <program>  Copyright (C) <year>  <name of author>\n    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.\n    This is free software, and you are welcome to redistribute it\n    under certain conditions; type `show c' for details.\n\nThe hypothetical commands `show w' and `show c' should show the appropriate\nparts of the General Public License.  Of course, your program's commands\nmight be different; for a GUI interface, you would use an \"about box\".\n\n  You should also get your employer (if you work as a programmer) or school,\nif any, to sign a \"copyright disclaimer\" for the program, if necessary.\nFor more information on this, and how to apply and follow the GNU GPL, see\n<http://www.gnu.org/licenses/>.\n\n  The GNU General Public License does not permit incorporating your program\ninto proprietary programs.  If your program is a subroutine library, you\nmay consider it more useful to permit linking proprietary applications with\nthe library.  If this is what you want to do, use the GNU Lesser General\nPublic License instead of this License.  But first, please read\n<http://www.gnu.org/philosophy/why-not-lgpl.html>.\n"
  },
  {
    "path": "README.md",
    "content": "# FlowCraft :whale2::package:\n\n![Nextflow version](https://img.shields.io/badge/nextflow->0.27.0-brightgreen.svg)\n![Python version](https://img.shields.io/badge/python-3.6-brightgreen.svg)\n[![Build Status](https://travis-ci.org/assemblerflow/flowcraft.svg?branch=master)](https://travis-ci.org/assemblerflow/flowcraft)\n[![codecov](https://codecov.io/gh/assemblerflow/flowcraft/branch/master/graph/badge.svg)](https://codecov.io/gh/assemblerflow/flowcraft)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/f518854f780b41a08ca2fb1c14e360f0)](https://www.codacy.com/app/o.diogosilva/assemblerflow?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=ODiogoSilva/assemblerflow&amp;utm_campaign=Badge_Grade)\n[![Documentation Status](https://readthedocs.org/projects/flowcraft/badge/?version=latest)](http://flowcraft.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/flowcraft.svg)](https://badge.fury.io/py/flowcraft)\n[![Anaconda-Server Badge](https://anaconda.org/bioconda/flowcraft/badges/version.svg)](https://anaconda.org/bioconda/flowcraft)\n[![Gitter](https://badges.gitter.im/flowcraft-community/community.svg)](https://gitter.im/flowcraft-community/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)\n\n<p align=\"center\">\n  <img width=\"360\" src=\"docs/resources/logo_large.png\" alt=\"nextflow_logo\"/>\n</p>\n\nA [Nextflow](https://www.nextflow.io/) pipeline assembler for genomics.\nPick your modules. Assemble them. Run the pipeline.\n\n(Previously known as Assemblerflow)\n\n## The premisse\n\n#### Build a pipeline\n\nWhat if building your own genomics pipeline would be as simple as:\n\n```\nflowcraft.py build -t \"trimmomatic fastqc skesa pilon\" -o my_pipeline.nf\n```\n\nSeems pretty simple right? What if we could run this pipeline with a single command on any linux machine or cluster by leveraging\nthe awesomeness of [nextflow](https://www.nextflow.io/) and [docker](https://www.docker.com/)/[singularity](http://singularity.lbl.gov/)\ncontainers without having to install any of the pipeline dependencies?\n\n#### Run the pipeline\n\n```\nnextflow run my_pipeline.nf --fastq path/to/fastq\n\nN E X T F L O W  ~  version 0.30.1\nLaunching `my_pipeline.nf` [admiring_lamarck] - revision: 82cc9cd2ed\n\n============================================================\n                M Y   P I P E L I N E\n============================================================\nBuilt using flowcraft v1.2.1\n\n Input FastQ                 : 2\n Input samples               : 1\n Reports are found in        : ./reports\n Results are found in        : ./results\n Profile                     : standard\n\nStarting pipeline at Tue Jun 12 19:38:26 WEST 2018\n\n[warm up] executor > local\n[7c/eb5f2f] Submitted process > integrity_coverage_1_1 (02AR0553)\n(...)\n[31/7d90a1] Submitted process > compile_pilon_report_1_6\n\nCompleted at: Tue Jun 12 19:58:32 WEST 2018\nDuration    : 20m 6s\nSuccess     : true\nExit status : 0\n```\n\nCongratulations! You just built and executed your own pipeline with\nonly two commands! :tada:\n\n## Installation\n\nFlowCraft is available as a bioconda package, which already brings\nnextflow:\n\n```\nconda install flowcraft\n```\n\n#### Container engines\n\nPipelines built with FlowCraft require at least one container\nengine to be installed, among `docker`, `singularity` or `shifter`.\nIf you already have any one of these installed, you're good to go.\nIf not, we recommend installing singularity, though it should be installed with\nroot privileges and accessible in all compute nodes. \n\n## How to use it\n\nThe complete user guide of FlowCraft can be found on [readthedocs.org](http://flowcraft.readthedocs.io/en/latest/?badge=latest).\nFor a quick and dirty demonstration, see below.\n\n### Quick guide\n\n#### Building a pipeline\n\nFlowCraft comes with a number of [ready-to-use components](http://flowcraft.readthedocs.io/en/latest/user/available_components.html) to build your\nown pipeline. Following some basic rules, such as the output type of one process\nmust match the input type of the next process, assembling a pipeline is done\nusing the `build` mode and the `-t` option:\n\n```\nflowcraft build -t \"trimmomatic spades abricate\" -o my_pipeline.nf -n \"assembly pipe\"\n```\n\nThis command will generate everything that is necessary to run the\npipeline automatically, but the main pipeline executable\nfile will be `my_pipeline.nf`. This file will contain a nextflow pipeline\nfor genome assembly starts with `trimmomatic` and finishes with anti-microbial\ngene annotation using `abricate`.\n\n#### Wait... what about the software parameters?\n\nEach component in the pipeline has its own set of parameters that can be\nmodified before or when executing the pipeline. These parameters are\ndescribed in the documentation of each process and you can check the options\nof your particular pipeline using the `help` option:\n\n```\n$ nextflow run my_pipeline.nf --help\nN E X T F L O W  ~  version 0.30.1\nLaunching `my_pipeline.nf` [prickly_picasso] - revision: 2e1a226e6d\n\n============================================================\n                F L O W C R A F T\n============================================================\nBuilt using flowcraft v1.2.1\n\n\nUsage: \n    nextflow run my_pipeline.nf\n\n       --fastq                     Path expression to paired-end fastq files. (default: fastq/*_{1,2}.*) (default: 'fastq/*_{1,2}.*')\n       \n       Component 'INTEGRITY_COVERAGE_1_1'\n       ----------------------------------\n       --genomeSize_1_1            Genome size estimate for the samples in Mb. It is used to estimate the coverage and other assembly parameters andchecks (default: 1)\n       --minCoverage_1_1           Minimum coverage for a sample to proceed. By default it's setto 0 to allow any coverage (default: 0)\n       \n       Component 'TRIMMOMATIC_1_2'\n       ---------------------------\n       --adapters_1_2              Path to adapters files, if any. (default: 'None')\n       --trimSlidingWindow_1_2     Perform sliding window trimming, cutting once the average quality within the window falls below a threshold (default: '5:20')\n       --trimLeading_1_2           Cut bases off the start of a read, if below a threshold quality (default: 3)\n       --trimTrailing_1_2          Cut bases of the end of a read, if below a threshold quality (default: 3)\n       --trimMinLength_1_2         Drop the read if it is below a specified length  (default: 55)\n       \n       Component 'FASTQC_1_3'\n       ----------------------\n       --adapters_1_3              Path to adapters files, if any. (default: 'None')\n       \n       Component 'ASSEMBLY_MAPPING_1_5'\n       --------------------------------\n       --minAssemblyCoverage_1_5   In auto, the default minimum coverage for each assembled contig is 1/3 of the assembly mean coverage or 10x, if the mean coverage is below 10x (default: 'auto')\n       --AMaxContigs_1_5           A warning is issued if the number of contigs is overthis threshold. (default: 100)\n       --genomeSize_1_5            Genome size estimate for the samples. It is used to check the ratio of contig number per genome MB (default: 2.1)\n```\n\nThis help message is dynamically generated depending on the pipeline you build.\nSince this pipeline starts with `trimmomatic`, which receives fastq files as input,\n`--fastq` is the default parameter for providing paired-end fastq files.\n\n#### Running a pipeline\n\nNow that we have our nextflow pipeline built, we are ready to executed it by\nproviding input data. By default, FlowCraft pipelines will run locally and use\n`singularity` to run the containers of each component. This can be\nchanged in multiple ways, but for convenience FlowCraft has already defined\nprofiles for most configurations of `executors` and `container` engines.\n\nRunning a pipeline locally with `singularity` can be done with:\n\n```\n# Pattern for paired-end fastq is '<sample>_1.fastq.gz <sample>_2.fastq.gz'\nnextflow run my_pipeline --fastq \"path/to/fastq/*_{1,2}.*\"\n```\n\nIf you want to run a pipeline in a cluster with SLURM and singularity, just use\nthe appropriate profile:\n\n```\nnextflow run my_pipeline --fastq \"path/to/fastq/*_{1,2}.*\" -profile slurm_sing\n```\n\nDuring the execution of the pipeline, the results and reports for each component\nare continuously saved to the `results` and `reports` directory, respectively.\n\n#### Inspecting a pipeline progress\n\nSince version 1.2.0, it is possible to inspect the progress of a nextflow pipeline\nusing the `flowcraft inspect` mode. To check the progress in a terminal, simply\ntype:\n\n```\nflowcraft inspect\n```\n\nOn the directory where the pipeline is running. Alternatively, you can view the progress\nin FlowCraft's web service by using the ``broadcast`` option:\n\n```\nflowcraft inspect -m broadcast\n```\n\n<img src=\"https://github.com/assemblerflow/flowcraft-webapp/raw/master/flowcraft-webapp/frontend/resources/fc_short_demo.gif\"/>\n\n## Why not just write a Nextflow pipeline?\n\nIn many cases, building a static nextflow pipeline is sufficient for our goals.\nHowever, when building our own pipelines, we often felt the need to add dynamism\nto this process, particularly if we take into account how fast new tools arise\nand existing ones change. Our biological goals also change over time and we\nmight need different pipelines to answer different questions. FlowCraft makes\nthis very easy, by having a set of pre-made and ready-to-use components that can\nbe freely assembled.\n\nFor instance, changing the assembly software in a genome assembly pipeline becomes\nas easy as:\n\n```\n# Use spades\ntrimmomatic spades pilon\n# Use skesa\ntrimmomatic skesa pilon\n```\n\n![example1](https://github.com/assemblerflow/flowcraft/raw/master/docs/resources/example_3.png)\n\nIf you are interested in having some sort of genome annotation, simply add those\ncomponents at the end, using a fork syntax:\n\n```\n# Run prokka and abricate at the end of the assembly\ntrimmomatic spades pilon (prokka | abricate)\n```\n\n![example2](https://github.com/assemblerflow/flowcraft/raw/master/docs/resources/example_1.png)\n\nOn the other hand, if you are interest in just perform allele calling for wgMLST,\nsimply add `chewbbaca`:\n\n```\ntrimmomatic spades pilon chewbbaca\n```\n\n![example3](https://github.com/assemblerflow/flowcraft/raw/master/docs/resources/example_2.png)\n\nSince nextflow handles parallelism of large sets of data so well, simple pipelines\nof two components are also useful to build:\n\n```\ntrimmomatic fastqc\n```\n\nAs the number of existing components grow, so does your freedom to build pipelines.\n\n## Roadmap\n\nYou can see what we're planning next on our [roadmap guide](https://github.com/assemblerflow/flowcraft/wiki/Roadmap).\n\n## Developer guide\n\n### Adding new components\n\nIs there a missing component that you would like to see included? We would love\nto expand! You could make a component request in our\n[issue tracker](https://github.com/assemblerflow/flowcraft/issues).\n\nIf you want to be part of the team, you can contribute with the code as well. Each component\nin FlowCraft can be independently added without having to worry about\nthe rest of the code base. You'll just need to have some knowledge of python\nand nextflow. [Check the developer documentation for how-to guides](http://assemblerflow.readthedocs.io/en/latest/)\n"
  },
  {
    "path": "changelog.md",
    "content": "# Changelog\n\n## 1.4.2\n\n### New components\n\n- `Bwa`: align short paired-end sequencing reads to long reference sequences\n- `MarkDuplicates`: Identifies duplicate reads\n- `BaseRecalibrator`: Detects systematic errors in base quality scores\n- `Haplotypecaller`: Call germline SNPs and indels via local re-assembly of haplotypes\n\n- `Seroba`: Serotyping of *Streptococcus pneumoniae* sequencing data (FastQ)\n- `Concoct`: Clustering metagenomic assembled comtigs with coverage and composition\n- `MetaBAT2`: A robust statistical framework for reconstructing genomes from metagenomic data\n\n### Minor/Other changes\n\n- added manifest information to the `nextflow.config` file to allow for remote execution\n- Added checks for the DAG's dot files in the compile_reports component\n\n## 1.4.1\n\n### New features\n\n- Added support for the report system to:\n    - `maxbin2`\n- Added new `manifest.config` with the pipeline metadata\n\n### New components\n\n- `Kraken2`: Taxonomic identification on FastQ files\n\n### Bug fixes\n\n- Fix bug in `momps`component related to added in the introduction of the clear input parameter\n- Fixed bug with the `-ft` parameters not retrieving the dockerhub tags for \nall the components.\n- Fixed bug in the `megahit` process where the fastg mode would break the process\n- Fix inspect and report mode to fetch the nextflow file independently of its \nposition in the `nextflow run` command inside the .nextflow.log file.\n- Fix parsing of .nextflow.log file when searching for `nextflow run` command.\n- Fixed bug between mash_sketch_fasta and mash_dist.\n\n### Minor/Other changes\n\n- Added option to `dengue_typing` to retrieve closest reference sequence and link it \nwith a secondary channel into `mafft`\n- New version of DEN-IM recipe\n- Now prints an ordered list of components\n- Moved taxonomy results from `results/annotation/` to `results/taxonomy/`\n\n\n## 1.4.0\n\n### New features\n\n- Added new `recipe` system to flowcraft along with 6 starting recipes.\nRecipes are pre-made and curated pipelines that address specific questions.\nTo create a recipe, the `-r <recipe_name>` can be used. To list available\nrecipes, the `--recipe-list` and `--recipe-list-short` options were added. \n- Added `-ft` or `--fetch-tags` which allows to retrieve all DockerHub \ncontainer tags.\n- Added function to collect all the components from the components classes,\nreplacing the current process_map dictionary implementation. Now, it will be\ngenerated from the engine rather than hardcoded into the dict.\n\n### Components changes\n\n- Added new `disableRR` param in the `spades` component that disables repeat\nresolution\n- The `abyss` and `spades` components emit GFA in a secondary channel.\n- The new `bandage` component can accept either FASTA from a primary channel\n  or GFA from a secondary channel.\n- Updated skesa to version 2.3.0.\n- Updated mash based components for the latest version - 1.6.0-1.\n\n### New components\n\n- Added component `abyss`.\n- Added component `bandage`.\n- Added component `unicycler`.\n- Added component `prokka`.\n- Added component `bcalm`.\n- Added component `diamond`.\n\n### Minor/Other changes\n\n- Added removal of duplicate IDs from `reads_download` component input.\n- Added seed parameter to `downsample_fastq` component.\n- Added bacmet database to `abricate` component.\n- Added default docker option to avoid docker permission errors.\n- Changed the default URL generated by inspect and report commands. \n- Changed the default URL generated by inspect and report commands.\n- Added directives to `-L` parameter of build module.\n\n\n### Bug fixes\n\n- Fixed forks with same source process name.\n- Fixed `inspect` issue when tasks took more than a day in duration.\n- Added hardware address to `inpsect` and `report` hash.\n\n## 1.3.1\n\n### Features\n\n- Added a new `clearInput` parameter to components that change their input.\nThe aim of this option is to allow the controlled removal of temporary files,\nwhich is particularly useful in very large workflows.\n\n### Components changes\n\n- Updated images for components `mash_dist`, `mash_screen` and \n`mapping_patlas`.\n\n### New components\n\n- Added component `fast_ani`.\n\n### Minor/Other changes\n\n- Added `--export-directives` option to `build` mode to export component's \ndirectives in JSON format to standard output.\n- Added more date information in `inspect` mode, including the year and the\nlocale of the executing system.\n\n## 1.3.0\n\n### Features\n- Added `report` run mode to Flowcraft that displays the report of any given\npipeline in the Flowcraft's web application. The `report` mode can be executed\nafter a pipeline ended or during the pipeline execution using the `--watch`\noption.\n- Added standalone report HTML at the end of the pipeline execution.\n- Components with support for the new report system:\n    - `abricate`\n    - `assembly_mapping`\n    - `check_coverage`\n    - `chewbbaca`\n    - `dengue_typing`\n    - `fastqc`\n    - `fastqc_trimmomatic`\n    - `integrity_coverage`\n    - `mlst`\n    - `patho_typing`\n    - `pilon`\n    - `process_mapping`\n    - `process_newick`\n    - `process_skesa`\n    - `process_spades`\n    - `process_viral_assembly`\n    - `seq_typing`\n    - `trimmomatic`\n    - `true_coverage`\n\n### Minor/Other changes\n\n- Refactored report json for components `mash_dist`, `mash_screen` and \n`mapping_patlas`\n\n### Bug fixes\n- Fixed issue where `seq_typing` and `patho_typing` processes were not feeding\nreport data to report compiler.\n- Fixed fail messages for `process_assembly` and `process_viral_assembly` \ncomponents\n\n## 1.2.2\n\n### Components changes\n\n- `mapping_patlas`: refactored to remove temporary files used to create\nsam and bam files and added data to .report.json. Updated databases to pATLAS\nversion 1.5.2.\n- `mash_screen` and `mash_dist`: added data to .report.json. Updated databases \nto pATLAS version 1.5.2.\n- Added new options to `abricate` componente. Users can now provide custom database\ndirectories, minimum coverage and minimum identity parameters.\n\n### New components\n\n- Added component `fasterq_dump`\n- Added component `mash_sketch_fasta`\n- Added component `mash_sketch_fastq`\n- Added component `downsample_fastq` for FastQ read sub sampling using seqtk\n- Added component `momps` for typing of Legionella pneumophila\n- Added component `split_assembly`\n- Added component `mafft`\n- Added component `raxml`\n- Added component `viral_assembly`\n- Added component `progressive_mauve`\n- Added component `dengue_typing`\n\n### Minor/Other changes\n\n- Added check for `params.accessions` that enables to report a proper\nerror when it is set to `null`.\n- Added `build` option to export component parameters information in JSON format. \n- Fixed minor issue preventing the `maxbin2` and `split_assembly` components \nfrom being used multiples times in a pipeline\n- Added a catch to the `filter_poly` process for cases where the input file is empty. \n- spades template now reports the exit code of spades' execution\n\n### Bug fixes\n\n- Removed the need for the nf process templates to have an empty line\nat the beginning of the template files.\n- Fixed issue when the `inspect` mode was executed on a pipeline directory\nwith failed processes but with the work directory removed (the log files\nwhere no longer available).\n- Fixed issue when the `inspect` mode was executed on a pipeline without the \nmemory directory defined.\n- Fixed issue in the `inspect` mode, where there is a rare race condition between\ntags in the log and trace files.\n- Fixed bug on `midas_species` process where the output file was not being \nlinked correctly, causing the process to fail\n- Fixed bug on `bowtie` where the reference parameter was missing the pid\n- Fixed bug on `filter_poly` where the tag was missing\n\n## 1.2.1\n\n### Improvements\n\n- The parameter system has been revamped, and parameters are now component-specific\nand independent by default. This allows a better fine-tuning of the parameters\nand also the execution of the same component multiple times (for instance in a fork)\nwith different parameters. The old parameter system that merged identical parameters\nis still available by using the `--merge-params` flag when building the pipeline.\n- Added a global `--clearAtCheckpoint` parameter that, when set to true, will remove\ntemporary files that are no longer necessary for downstream steps of the pipeline\nfrom the work directory. This option is currently supported for the `trimmomatic`,\n`fastqc_trimmomatic`, `skesa` and `spades` components. \n\n### New components\n\n- `maxbin2`: An automatic tool for binning metagenomic sequences.\n- `bowtie2`: Align short paired-end sequencing reads to long reference\nsequences.\n- `retrieve_mapped`: Retrieves the mapped reads of a previous bowtie2 mapping process.\n\n### New recipes\n\n- `plasmids`: A recipe to perform mapping, mash screen on reads\nand also mash dist for assembly based approaches (all to detect\nplasmids). This also includes annotation with abricate for the assembly.\n- `plasmids_mapping`: A recipe to perform mapping for plasmids.\n- `plasmids_mash`: A recipe to perform mash screen for plasmids.\n- `plasmids_assembly`: A recipe to perform mash dist for plasmid\nassemblies.\n\n### Minor/Other changes\n\n- Added \"smart\" check when the user provides a typo in pipeline string\nfor a given process, outputting some \"educated\" guesses to the\nterminal.\n- Added \"-cr\" option to show current recipe `pipeline_string`.\n- Changed the way recipes were being parsed by `proc_collector` for the\nusage of `-l` and `-L` options.\n- Added check for non-ascii characters in colored_print.\n- Fixed log when a file with the pipeline is provided to -t option\ninstead of a string.\n\n### Bug fixes\n\n- Fixed pipeline names that contain new line characters.\n- Fixed pipeline generation when automatic dependencies were added right after a fork\n- **Template: sistr.nf**: Fixed comparison that determined process status.\n- Fixed issue with `--version` option.\n\n## 1.2.0\n\n### New components\n\n- `card_rgi`: Anti-microbial resistance gene screening for assemblies\n- `filter_poly`: Runs PrinSeq on paired-end FastQ files to remove low complexity sequences\n- `kraken`: Taxonomic identification on FastQ files\n- `megahit`: Metagenomic assembler for paired-end FastQ files\n- `metaprob`: Performs read binning on metagenomic FastQ files\n- `metamlst`: Checks the Sequence Type of metagenomic FastQ reads using Multilocus Sequence Typing\n- `metaspades`: Metagenomic assembler for paired-end FastQ files\n- `midas_species`: Taxonomic identification on FastQ files at the species level\n- `remove host`: Read mapping with Bowtie2 against the target host genome (default hg19) and removes the mapping reads\n- `sistr`: Salmonella *in silico* typing component for assemblies. \n\n### Features\n\n- Added `inspect` run mode to flowcraft for displaying the progress overview\n  during a nextflow run. This run mode has `overview` and `broadcast` options\n  for viewing the progress of a pipeline.\n\n### Minor/Other changes\n\n- Changed `mapping_patlas` docker container tag and variable\n(PR [#76](https://github.com/assemblerflow/assemblerflow/pull/76)).\n- The `env` scope of nextflow.config now extends the `PYTHONPATH`\nenvironmental variable.\n- Updated indexes for both `mapping_patlas` and `mash` based processes.\n- New logo!\n\n### Bug Fixes\n\n- **Template: fastqc_report.py**: Added fix to trim range evaluation.\n- **Script: merge_json.py**: Fixed chewbbaca JSON merge function.\n"
  },
  {
    "path": "docker/Dockerfile",
    "content": "FROM python:3.6-alpine3.7\nMAINTAINER Bruno Gonçalves <bfgoncalves@medicina.ulisboa.pt>\n\nRUN apk add --no-cache git\n\nWORKDIR /flowcraft\n\n# Clone FlowCraft\nRUN git clone https://github.com/assemblerflow/flowcraft.git\nWORKDIR ./flowcraft\n\n# Install flowcraft\nRUN python setup.py install\n\nWORKDIR /flowcraft\n\n# Remove unnecessary packages\nRUN apk del git"
  },
  {
    "path": "docs/Makefile",
    "content": "# Makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS   ?=\nSPHINXBUILD  ?= sphinx-build\nPAPER        ?=\nBUILDDIR      = _build\n\n# Internal variables.\nPAPEROPT_a4     = -D latex_elements.papersize=a4\nPAPEROPT_letter = -D latex_elements.papersize=letter\nALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .\n# the i18n builder cannot share the environment and doctrees with the others\nI18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .\n\n.PHONY: help\nhelp:\n\t@echo \"Please use \\`make <target>' where <target> is one of\"\n\t@echo \"  html        to make standalone HTML files\"\n\t@echo \"  dirhtml     to make HTML files named index.html in directories\"\n\t@echo \"  singlehtml  to make a single large HTML file\"\n\t@echo \"  pickle      to make pickle files\"\n\t@echo \"  json        to make JSON files\"\n\t@echo \"  htmlhelp    to make HTML files and an HTML help project\"\n\t@echo \"  qthelp      to make HTML files and a qthelp project\"\n\t@echo \"  applehelp   to make an Apple Help Book\"\n\t@echo \"  devhelp     to make HTML files and a Devhelp project\"\n\t@echo \"  epub        to make an epub\"\n\t@echo \"  epub3       to make an epub3\"\n\t@echo \"  latex       to make LaTeX files, you can set PAPER=a4 or PAPER=letter\"\n\t@echo \"  latexpdf    to make LaTeX files and run them through pdflatex\"\n\t@echo \"  latexpdfja  to make LaTeX files and run them through platex/dvipdfmx\"\n\t@echo \"  lualatexpdf to make LaTeX files and run them through lualatex\"\n\t@echo \"  xelatexpdf  to make LaTeX files and run them through xelatex\"\n\t@echo \"  text        to make text files\"\n\t@echo \"  man         to make manual pages\"\n\t@echo \"  texinfo     to make Texinfo files\"\n\t@echo \"  info        to make Texinfo files and run them through makeinfo\"\n\t@echo \"  gettext     to make PO message catalogs\"\n\t@echo \"  changes     to make an overview of all changed/added/deprecated items\"\n\t@echo \"  xml         to make Docutils-native XML files\"\n\t@echo \"  pseudoxml   to make pseudoxml-XML files for display purposes\"\n\t@echo \"  linkcheck   to check all external links for integrity\"\n\t@echo \"  doctest     to run all doctests embedded in the documentation (if enabled)\"\n\t@echo \"  coverage    to run coverage check of the documentation (if enabled)\"\n\t@echo \"  dummy       to check syntax errors of document sources\"\n\n.PHONY: clean\nclean:\n\trm -rf $(BUILDDIR)/*\n\n.PHONY: html\nhtml:\n\t$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html\n\t@echo\n\t@echo \"Build finished. The HTML pages are in $(BUILDDIR)/html.\"\n\n.PHONY: dirhtml\ndirhtml:\n\t$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml\n\t@echo\n\t@echo \"Build finished. The HTML pages are in $(BUILDDIR)/dirhtml.\"\n\n.PHONY: singlehtml\nsinglehtml:\n\t$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml\n\t@echo\n\t@echo \"Build finished. The HTML page is in $(BUILDDIR)/singlehtml.\"\n\n.PHONY: pickle\npickle:\n\t$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle\n\t@echo\n\t@echo \"Build finished; now you can process the pickle files.\"\n\n.PHONY: json\njson:\n\t$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json\n\t@echo\n\t@echo \"Build finished; now you can process the JSON files.\"\n\n.PHONY: htmlhelp\nhtmlhelp:\n\t$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp\n\t@echo\n\t@echo \"Build finished; now you can run HTML Help Workshop with the\" \\\n\t      \".hhp project file in $(BUILDDIR)/htmlhelp.\"\n\n.PHONY: qthelp\nqthelp:\n\t$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp\n\t@echo\n\t@echo \"Build finished; now you can run \"qcollectiongenerator\" with the\" \\\n\t      \".qhcp project file in $(BUILDDIR)/qthelp, like this:\"\n\t@echo \"# qcollectiongenerator $(BUILDDIR)/qthelp/Templates.qhcp\"\n\t@echo \"To view the help file:\"\n\t@echo \"# assistant -collectionFile $(BUILDDIR)/qthelp/Templates.qhc\"\n\n.PHONY: applehelp\napplehelp:\n\t$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp\n\t@echo\n\t@echo \"Build finished. The help book is in $(BUILDDIR)/applehelp.\"\n\t@echo \"N.B. You won't be able to view it unless you put it in\" \\\n\t      \"~/Library/Documentation/Help or install it in your application\" \\\n\t      \"bundle.\"\n\n.PHONY: devhelp\ndevhelp:\n\t$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp\n\t@echo\n\t@echo \"Build finished.\"\n\t@echo \"To view the help file:\"\n\t@echo \"# mkdir -p $$HOME/.local/share/devhelp/Templates\"\n\t@echo \"# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Templates\"\n\t@echo \"# devhelp\"\n\n.PHONY: epub\nepub:\n\t$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub\n\t@echo\n\t@echo \"Build finished. The epub file is in $(BUILDDIR)/epub.\"\n\n.PHONY: epub3\nepub3:\n\t$(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3\n\t@echo\n\t@echo \"Build finished. The epub3 file is in $(BUILDDIR)/epub3.\"\n\n.PHONY: latex\nlatex:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo\n\t@echo \"Build finished; the LaTeX files are in $(BUILDDIR)/latex.\"\n\t@echo \"Run \\`make' in that directory to run these through (pdf)latex\" \\\n\t      \"(use \\`make latexpdf' here to do that automatically).\"\n\n.PHONY: latexpdf\nlatexpdf:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through pdflatex...\"\n\t$(MAKE) -C $(BUILDDIR)/latex all-pdf\n\t@echo \"pdflatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\n.PHONY: latexpdfja\nlatexpdfja:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through platex and dvipdfmx...\"\n\t$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja\n\t@echo \"pdflatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\n.PHONY: lualatexpdf\nlualatexpdf:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through lualatex...\"\n\t$(MAKE) PDFLATEX=lualatex -C $(BUILDDIR)/latex all-pdf\n\t@echo \"lualatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\n.PHONY: xelatexpdf\nxelatexpdf:\n\t$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex\n\t@echo \"Running LaTeX files through xelatex...\"\n\t$(MAKE) PDFLATEX=xelatex -C $(BUILDDIR)/latex all-pdf\n\t@echo \"xelatex finished; the PDF files are in $(BUILDDIR)/latex.\"\n\n.PHONY: text\ntext:\n\t$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text\n\t@echo\n\t@echo \"Build finished. The text files are in $(BUILDDIR)/text.\"\n\n.PHONY: man\nman:\n\t$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man\n\t@echo\n\t@echo \"Build finished. The manual pages are in $(BUILDDIR)/man.\"\n\n.PHONY: texinfo\ntexinfo:\n\t$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo\n\t@echo\n\t@echo \"Build finished. The Texinfo files are in $(BUILDDIR)/texinfo.\"\n\t@echo \"Run \\`make' in that directory to run these through makeinfo\" \\\n\t      \"(use \\`make info' here to do that automatically).\"\n\n.PHONY: info\ninfo:\n\t$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo\n\t@echo \"Running Texinfo files through makeinfo...\"\n\tmake -C $(BUILDDIR)/texinfo info\n\t@echo \"makeinfo finished; the Info files are in $(BUILDDIR)/texinfo.\"\n\n.PHONY: gettext\ngettext:\n\t$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale\n\t@echo\n\t@echo \"Build finished. The message catalogs are in $(BUILDDIR)/locale.\"\n\n.PHONY: changes\nchanges:\n\t$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes\n\t@echo\n\t@echo \"The overview file is in $(BUILDDIR)/changes.\"\n\n.PHONY: linkcheck\nlinkcheck:\n\t$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck\n\t@echo\n\t@echo \"Link check complete; look for any errors in the above output \" \\\n\t      \"or in $(BUILDDIR)/linkcheck/output.txt.\"\n\n.PHONY: doctest\ndoctest:\n\t$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest\n\t@echo \"Testing of doctests in the sources finished, look at the \" \\\n\t      \"results in $(BUILDDIR)/doctest/output.txt.\"\n\n.PHONY: coverage\ncoverage:\n\t$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage\n\t@echo \"Testing of coverage in the sources finished, look at the \" \\\n\t      \"results in $(BUILDDIR)/coverage/python.txt.\"\n\n.PHONY: xml\nxml:\n\t$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml\n\t@echo\n\t@echo \"Build finished. The XML files are in $(BUILDDIR)/xml.\"\n\n.PHONY: pseudoxml\npseudoxml:\n\t$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml\n\t@echo\n\t@echo \"Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml.\"\n\n.PHONY: dummy\ndummy:\n\t$(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy\n\t@echo\n\t@echo \"Build finished. Dummy builder generates no files.\"\n"
  },
  {
    "path": "docs/_static/custom.css",
    "content": "div.wy-side-nav-search, div.wy-nav-top {\n  background: #5c6bc0;\n}\n\n.wy-menu > .caption > .caption-text {\n  color: #5c6bc0;\n}\n\n.wy-nav-content {\n  max-width: 100%\n}"
  },
  {
    "path": "docs/about/about.rst",
    "content": "About\n=====\n\nFlowCraft is developed by the Molecular `Microbiology and Infection Unit (UMMI) <http://darwin.phyloviz.net/wiki/doku.php>`_\nat the `Instituto de Medicina Molecular Joao Antunes <https://imm.medicina.ulisboa.pt/en/>`_.\n\nThis project is licensed under the `GPLv3 license <https://github.com/assemblerflow/flowcraft/blob/master/LICENSE>`_.\nThe source code of FlowCraft is available at `<https://github.com/assemblerflow/flowcraft>`_ and the\nwebservice is available at `<https://github.com/assemblerflow/flowcraft-webapp>`_."
  },
  {
    "path": "docs/conf.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#\n# Templates documentation build configuration file, created by\n# sphinx-quickstart on Mon Feb  5 14:24:12 2018.\n#\n# This file is execfile()d with the current directory set to its\n# containing dir.\n#\n# Note that not all possible configuration values are present in this\n# autogenerated file.\n#\n# All configuration values have a default; values that are commented out\n# serve to show the default.\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#\nimport os\nimport sys\nsys.path.insert(0, os.path.abspath(\"..\"))\nsys.path.insert(0, os.path.abspath(\"../flowcraft/templates\"))\nimport flowcraft\n\n# -- General configuration ------------------------------------------------\n\n# If your documentation needs a minimal Sphinx version, state it here.\n#\n# needs_sphinx = '1.0'\n\n# Add any Sphinx extension module names here, as strings. They can be\n# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom\n# ones.\nextensions = [\n    'sphinx.ext.autodoc',\n    'sphinx.ext.todo',\n    'sphinx.ext.viewcode',\n    'sphinx.ext.githubpages',\n    'numpydoc',\n    'sphinx.ext.autosummary',\n    'sphinx.ext.mathjax'\n]\n\nautodoc_member_order = 'bysource'\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = ['_templates']\n\n# The suffix(es) of source filenames.\n# You can specify multiple suffix as a list of string:\n#\n# source_suffix = ['.rst', '.md']\nsource_suffix = '.rst'\n\n# The master toctree document.\nmaster_doc = 'index'\n\n# General information about the project.\nproject = 'FlowCraft'\ncopyright = '2018, FlowCraft team'\nauthor = 'Diogo N. Silva, Tiago F. Jesus, Ines Mendes, Bruno Ribeiro-Goncalves'\n\n# The version info for the project you're documenting, acts as replacement for\n# |version| and |release|, also used in various other places throughout the\n# built documents.\n#\n# The short X.Y version.\nversion = flowcraft.__version__\n# The full version, including alpha/beta/rc tags.\nrelease = '1'\n\n# The language for content autogenerated by Sphinx. Refer to documentation\n# for a list of supported languages.\n#\n# This is also used if you do content translation via gettext catalogs.\n# Usually you set \"language\" from the command line for these cases.\nlanguage = 'en'\n\n# List of patterns, relative to source directory, that match files and\n# directories to ignore when looking for source files.\n# This patterns also effect to html_static_path and html_extra_path\nexclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']\n\n# The name of the Pygments (syntax highlighting) style to use.\npygments_style = 'sphinx'\n\n# If true, `todo` and `todoList` produce output, else they produce nothing.\ntodo_include_todos = True\n\n\n# -- Options for HTML output ----------------------------------------------\n\n# The theme to use for HTML and HTML Help pages.  See the documentation for\n# a list of builtin themes.\n#\nhtml_theme = 'sphinx_rtd_theme'\n\n# Theme options are theme-specific and customize the look and feel of a theme\n# further.  For a list of options available for each theme, see the\n# documentation.\n#\nhtml_theme_options = {\"collapse_navigation\": True}\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\nhtml_static_path = ['_static']\n\n# -- Options for HTMLHelp output ------------------------------------------\n\n# Output file base name for HTML help builder.\nhtmlhelp_basename = 'Templatesdoc'\n\n\n# -- Options for LaTeX output ---------------------------------------------\n\nlatex_elements = {\n    # The paper size ('letterpaper' or 'a4paper').\n    #\n    # 'papersize': 'letterpaper',\n\n    # The font size ('10pt', '11pt' or '12pt').\n    #\n    # 'pointsize': '10pt',\n\n    # Additional stuff for the LaTeX preamble.\n    #\n    # 'preamble': '',\n\n    # Latex figure (float) alignment\n    #\n    # 'figure_align': 'htbp',\n}\n\n# Grouping the document tree into LaTeX files. List of tuples\n# (source start file, target name, title,\n#  author, documentclass [howto, manual, or own class]).\nlatex_documents = [\n    (master_doc, 'Templates.tex', 'Templates Documentation',\n     'Diogo N. Silva', 'manual'),\n]\n\n\n# -- Options for manual page output ---------------------------------------\n\n# One entry per manual page. List of tuples\n# (source start file, name, description, authors, manual section).\nman_pages = [\n    (master_doc, 'templates', 'Templates Documentation',\n     [author], 1)\n]\n\n\n# -- Options for Texinfo output -------------------------------------------\n\n# Grouping the document tree into Texinfo files. List of tuples\n# (source start file, target name, title, author,\n#  dir menu entry, description, category)\ntexinfo_documents = [\n    (master_doc, 'Templates', 'Templates Documentation',\n     author, 'Templates', 'One line description of project.',\n     'Miscellaneous'),\n]\n\n# -- Options for Epub output ----------------------------------------------\n\n# Bibliographic Dublin Core info.\nepub_title = project\nepub_author = author\nepub_publisher = author\nepub_copyright = copyright\n\n# The unique identifier of the text. This can be a ISBN number\n# or the project homepage.\n#\n# epub_identifier = ''\n\n# A unique identification for the text.\n#\n# epub_uid = ''\n\n# A list of files that should not be packed into the epub file.\nepub_exclude_files = ['search.html']\n\n\ndef setup(app):\n    app.add_stylesheet('custom.css')"
  },
  {
    "path": "docs/dev/containers.rst",
    "content": "Docker containers guidelines\n============================\n\nAll FlowCraft components require a docker container in order to be executed,\nthus if a new component is added, a docker image should be added as well and\nuploaded to\n.. _docker hub: https://hub.docker.com/ in order to be available to pull in\nother machines. Although this can be done in any personal\nrepository, we recommend that this docker images are added to an already\nexisting .. _FlowCraft github repository: https://github.com/assemblerflow/docker-imgs\n(called here ``Official``) so that docker builds can be automated with github\nintegration. Also, the centralization of all images will allow other\ncontributors to easily access and edit these containers instead of forking from\none side to another every time a container needs to be changed/updated.\n\nOfficial FlowCraft Docker images\n--------------------------------\n\nWriting docker images\n:::::::::::::::::::::\n\nOfficial FlowCraft Docker images are available in\n.. _this github repository: https://github.com/assemblerflow/docker-imgs .\nIf you want to add your image to this repository please fork it and make a\nPull Request (PR) with the requested new image or create an issue asking to be\nadded to the organization as a contributor.\n\n\nBuilding docker images\n::::::::::::::::::::::\n\nThen, after the image has been added to the FlowCraft\n.. _docker-imgs https://github.com/assemblerflow/docker-imgs\ngithub repository, they can be built through\n.. _FlowCraft docker hub https://hub.docker.com/u/flowcraft/dashboard/ .\n\nTag naming\n^^^^^^^^^^\n\nEach time a docker image is built using the automated build of docker hub it\nshould follow this nomenclature: ``version-patch``.\nThis is used to avoid the override of previous builds for the same images,\nallowing for instance users to use different version of the same software using\nthe same docker image but with different tags.\n\n- ``Version``: Is a string with tree letters like this: ``1.1.1``. Versions should\nchange every time a new software is added the container.\n\n- ``Patch``: Is a number that follows a ``-`` after the version. Patches should\nchange every time a change does not affect\nthe software inside it. For example, updates to database related files required\nby some of the software inside the container.\n\nUnofficial FlowCraft Docker images\n----------------------------------\n\nAlthough we **strongly** recommend that all images are stored in FlowCraft\n.. _docker-imgs https://github.com/assemblerflow/docker-imgs github repo, it is\nnot mandatory to do it. Images can be built in another github repo and\nalso use another docker hub repository to build the images.\nHowever, do make sure that you define it correctly in the directives of the\nprocess as explained in :ref:`DirectivesAnchor`.\n"
  },
  {
    "path": "docs/dev/create_process.rst",
    "content": "Process creation guidelines\n===========================\n\nBasic process creation\n----------------------\n\nThe addition of a new process to FlowCraft requires three main steps:\n\n#. `Create process template`_: Create a jinja2 template in ``flowcraft.generator.templates`` with the\n   nextflow code.\n\n#. `Create Process class`_: Create a :class:`~flowcraft.generator.process.Process` subclass in\n   :class:`flowcraft.generator.process` with\n   information about the process (e.g., expected input/output, secondary inputs,\n   etc.).\n\n.. _create-process:\n\nCreate process template\n:::::::::::::::::::::::\n\nFirst, create the nextflow template that will be integrated into the pipeline\nas a process. This file must be placed in ``flowcraft.generator.templates``\nand have the ``.nf`` extension. In order to allow the template to be\ndynamically added to a pipeline file, we use the jinja2_ template language to\nsubstitute key variables in the process, such as input/output channels.\n\nAn example created as a ``my_process.nf`` file is as follows::\n\n    some_channel_{{ pid }} = Channel.value(params.param1{{ param_id}})\n    other_channel_{{ pid }} = Channel.fromPath(params.param2{{ param_id}})\n\n    process myProcess_{{ pid }} {\n\n        {% include \"post.txt\" ignore missing %}\n\n        publishDir \"results/myProcess_{{ pid }}\", pattern: \"*.tsv\"\n\n        input:\n        set sample_id, <data> from {{ input_channel }}\n        val x from some_channel_{{ pid }}\n        file y from other_channel_{{ pid }}\n        val direct_from_parms from Channel.value(params.param3{{param_id}}\n\n        // The output is optional\n        output:\n        set sample_id, <data> into {{ output_channel }}\n        {% with task_name=\"abricate\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n\n        \"\"\"\n        <process code/commands>\n        \"\"\"\n    }\n\n    {{ forks }}\n\nThe fields surrounded by curly brackets are jinja placeholders that will be\ndynamically substituted when building the pipeline. They will ensure that the\nprocesses and potential forks correctly link with each other and that\nchannels are unique and correctly linked. This example contains all\nplaceholder variables that are currently supported by FlowCraft.\n\n{{pid}}\n^^^^^^^\n\nUsed as a unique process identifier that prevent issues\nfrom process and channel duplication in the pipeline. Therefore, is should be\nappended to each process and channel name as ``_{{ pid }}`` (note the underscore)::\n\n    some_channel_{{ pid }}\n    process myProcess_{{ pid }}\n\n{{param_id}}\n^^^^^^^^^^^^\n\nSame as the **{{ pid }}**, but sets the identified for nextflow ``params``. It should\nbe appended to each ``param`` as ``{{ param_id }}``. This will allow parameters\nto be specific to each component in the pipeline::\n\n    Channel.value(params.param1{{ param_id}})\n\nNote that the parameters used in the template, should also be defined in the\nProcess class params attribute (see `Parameters`_).\n\n{% include \"post.txt\" %}\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nInserts ``beforeScript`` and ``afterScript`` statements to the process that\nsets environmental variables and a series of *dotfiles* for the process to\nlog their status, warnings, fails and reports (see :ref:`dotfiles` for\nmore information). It also includes scripts for sending requests to\nREST APIs (only when certain pipeline parameters are used).\n\n{{input_channel}}\n^^^^^^^^^^^^^^^^^\n\nAll processes must include **one and only one** input channel. In most cases,\nthis channel should be defined with a two element tuple that contains the\nsample ID and then the actual data file/stream. We suggest the sample ID\nvariable to be named ``sample_id`` as a standard. If other name variable name\nis specified and you include the ``compiler_channels.txt`` in the process,\nyou'll need to change the sample ID variable (see `Sample ID variable`_).\n\n{{output_channel}}\n^^^^^^^^^^^^^^^^^^\n\nTerminal processes may skip the output channel entirely. However, if you want\nto link the main output of this process with subsequent ones, this placeholder\nmust be used **only once**. Like in the input channel, this channel should\nbe defined with a two element tuple with the sample ID and the data. The\nsample ID must match the one specified in the ``input_channel``.\n\n.. _compiler:\n\n{% include \"compiler_channels.txt %}\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis will include the special channels that will compile the status/logging\nof the processes throughout the pipeline. **You must include the whole\nblock** (see `Status channels`_)::\n\n    {% with task_name=\"abricate\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n\n{{forks}}\n^^^^^^^^^\n\nInserts potential forks of the main output channel. It is **mandatory** if\nthe ``output_channel`` is set.\n\nComplete example\n^^^^^^^^^^^^^^^^\n\nAs an example of a complete process, this is the template of ``spades.nf``::\n\n    IN_spades_opts_{{ pid }} = Channel.value([params.spadesMinCoverage{{ param_id }},params.spadesMinKmerCoverage{{ param_id }}])\n    IN_spades_kmers_{{pid}} = Channel.value(params.spadesKmers{{ param_id }})\n\n    process spades_{{ pid }} {\n\n        // Send POST request to platform\n        {% include \"post.txt\" ignore missing %}\n\n        tag { fastq_id + \" getStats\" }\n        publishDir 'results/assembly/spades/', pattern: '*_spades.assembly.fasta', mode: 'copy'\n\n        input:\n        set fastq_id, file(fastq_pair), max_len from {{ input_channel }}.join(SIDE_max_len_{{ pid }})\n        val opts from IN_spades_opts_{{ pid }}\n        val kmers from IN_spades_kmers_{{ pid }}\n\n        output:\n        set fastq_id, file('*_spades.assembly.fasta') optional true into {{ output_channel }}\n        set fastq_id, val(\"spades\"), file(\".status\"), file(\".warning\"), file(\".fail\") into STATUS_{{ pid }}\n        file \".report.json\"\n\n        script:\n        template \"spades.py\"\n    }\n\n    {{ forks }}\n\n\nCreate Process class\n::::::::::::::::::::\n\nThe process class will contain the information that FlowCraft\nwill use to build the pipeline and assess potential conflicts/dependencies\nbetween process. This class should be created in one the category files in the\n:mod:`flowcraft.generator.components` module (e.g.: ``assembly.py``). If\nthe new component does not fit in any of the existing categories, create a\nnew one that imports :mod:`flowcraft.generator.process.Process` and add\nyour new class. This class should inherit from the\n:class:`~flowcraft.generator.process.Process` base\nclass::\n\n    class MyProcess(Process):\n\n        def __init__(self, **kwargs):\n\n            super().__init__(**kwargs)\n\n            self.input_type = \"fastq\"\n            self.output_type = \"fasta\"\n\nThis is the simplest working example of a process class, which basically needs\nto inherit the parent class attributes (the ``super`` part).\nThen we only need to define the expected input\nand output types of the process. There are no limitations to the\ninput/output types.\nHowever, a pipeline will only build successfully when all processes correctly\nlink the output with the input type.\n\nDepending on the process, other attributes may be required:\n\n    - `Parameters`_: Parameters provided by the user to be used in the process.\n    - `Secondary inputs`_: Channels created from parameters provided by the\n      user.\n    - Secondary `Link start`_ and `Link end`_: Secondary links that connect\n      secondary information between two processes.\n    - `Dependencies`_: List of other processes that may be required for\n      the current process.\n    - `Directives`_: Default information for RAM/CPU/Container directives\n      and more.\n\nAdd to available components\n::::::::::::::::::::::::::\n\nContrary to previous implementation (version <= 1.3.1), the available components\nare now retrieved automatically by FlowCraft and there is no need to add the\nprocess to any dictionary (previous ``process_map``). In order for the component\nto be accessible to ``flowcraft build`` the process template name in\n``snake_case`` must match the process class in ``CamelCase``. For instance,\nif the process template is named ``my_process.nf``, the process class must\nbe ``MyProcess``, then the FlowCraft will be able to automatically add it to the\nlist of available components.\n\n.. note::\n    Note that the template string does not include the ``.nf`` extension.\n\nProcess attributes\n------------------\n\nThis section describes the main attributes of the\n:mod:`~flowcraft.generator.process.Process` class: what they\ndo and how do they impact the pipeline generation.\n\nInput/Output types\n::::::::::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.input_type` and\n:attr:`~flowcraft.generator.process.Process.output_type` attributes\nset the expected type of input and output of the process. There are no\nlimitations to the type of input/output that are provided. However, processes\nwill only link when the output of one process matches the input of the\nsubsequent process (unless the\n:attr:`~flowcraft.generator.process.Process.ignore_type` attribute is set\nto ``True``). Otherwise, FlowCraft will raise an exception stating that\ntwo processes could not be linked.\n\n.. note::\n\n    The input/ouput types that are currently used are ``fastq``, ``fasta``.\n\nParameters\n::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.params` attribute sets\nthe parameters that can be used by the process. For each parameter, a default\nvalue and a description should be provided. The default value will be set\nin the ``params.config`` file in the pipeline directory and the description\nwill be used to generated the custom help message of the pipeline::\n\n    self.params = {\n        \"genomeSize\": {\n            \"default\": 2.1,\n            \"description\": \"Expected genome size (default: params.genomeSiz)\n        },\n        \"minCoverage\": {\n            \"default\": 15,\n            \"description\": \"Minimum coverage to proceed (default: params.minCoverage)\"\n        }\n    }\n\nThese parameters can be simple values that are not feed into\nany channel, or can be automatically set to a secondary input channel via\n`Secondary inputs`_ (see below).\n\nThey can be specified when running the pipeline like any nextflow parameter\n(e.g.: ``--genomeSize 5``) and used in the nextflow process as usual\n(e.g.: ``params.genomeSize``).\n\n.. note::\n    These pairs are then used to populate the ``params.config`` file that is\n    generated in the pipeline directory. Note that the values are replaced\n    literally in the config file. For instance, ``\"genomeSize\": 2.1,`` will appear\n    as ``genomeSize = 2.1``, whereas ``\"adapters\": \"'None'\"`` will appear as\n    ``adapters = 'None'``. If you want a value to appear as a string, the double\n    and single quotes are necessary.\n\n\nSecondary inputs\n::::::::::::::::\n\n.. warning::\n    The ``secondary_inputs`` attribute has been deprecated since **v1.2.1.**\n    Instead, specify the secondary channels directly in the nextflow template\n    files.\n\nAny process can receive one or more input channels in addition to the main\nchannel. These are particularly useful when the process needs to receive\nadditional options from the ``parameters`` scope of nextflow.\nThese additional inputs can be specified via the\n:attr:`~flowcraft.generator.process.Process.secondary_inputs` attribute,\nwhich should store a list of dictionaries (a dictionary for each input). Each dictionary should\ncontains a key:value pair with the name of the parameter (``params``) and the\ndefinition of the nextflow channel (``channel``). Consider the example below::\n\n    self.secondary_inputs = [\n            {\n                \"params\": \"genomeSize\",\n                \"channel\": \"IN_genome_size = Channel.value(params.genomeSize)\"\n            },\n            {\n                \"params\": \"minCoverage\",\n                \"channel\": \"IN_min_coverage = Channel.value(params.minCoverage)\"\n            }\n        ]\n\nThis process will receive two secondary inputs that are given by the\n``genomeSize`` and ``minCoverage`` parameters. These should be also specified\nin the :attr:`~flowcraft.generator.process.Process.params` attribute\n(See `Parameters`_ above).\n\nFor each of these parameters, the dictionary\nalso stores how the channel should be defined at the beginning of the pipeline\nfile. Note that this channel definition mentions the parameters (e.g.\n``params.genomeSize``). An additional best practice for channel definition\nis to include one or more sanity checks to ensure that the provided arguments\nare correct. These checks can be added in the nextflow template file, or\nliterally in the ``channel`` string::\n\n    self.secondary_inputs = [\n        {\n            \"params\": \"genomeSize\",\n            \"channel\":\n                    \"IN_genome_size = Channel.value(params.genomeSize)\"\n                    \"map{it -> it.toString().isNumber() ? it : exit(1, \\\"The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize}'\\\")}\"\n            }\n\nExtra input\n:::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.extra_input` attribute\nis mostly a user specified directive that allows the injection of additional\ninput data from a parameter into the main input channel of the process.\nWhen a pipeline is defined as::\n\n    process1 process2={'extra_input':'var'}\n\nFlowCraft will expose a new ``var`` parameter, setup an extra input\nchannel and mix it with ``process2`` main input channel. A more detailed\nexplanation follows below.\n\nFirst, FlowCraft will create a nextflow channel from the parameter name\nprovided via the ``extra_input`` directive. The channel string will depend\non the input type of the process (this string is fetched from the\n:attr:`~flowcraft.generator.process.Process.RAW_MAPPING` attribute).\nFor instance, if the input type of\n``process2`` is ``fastq``, the new extra channel will be::\n\n    IN_var_extraInput = Channel.fromFilePairs(params.var)\n\nSince the same extra input parameter may be used by more than one process,\nthe ``IN_var_extraInput`` channel will be automatically forked into the\nfinal destination channels::\n\n    // When there is a single destination channel\n    IN_var_extraInput.set{ EXTRA_process2_1_2 }\n    // When there are multiple destination channels for the same parameter\n    IN_var_extraInput.into{ EXTRA_process2_1_2; EXTRA_process3_1_3 }\n\nThe destination channels are the ones that will be actually mixed with\nthe main input channels::\n\n    process process2 {\n        input:\n        (...) main_channel.mix(EXTRA_process2_1_2)\n    }\n\nIn these cases, the processes that receive the extra input will process the\ndata provided by the preceding channel **AND** by the parameter. The data\nprovided via the extra input parameter does not have to wait for the\n``main_channel``, which means that they can run in parallel, if there are\nenough resources.\n\nCompiler\n::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.compiler` attribute\nallows one or more channels of the process to be fed into a compiler process\n(See `Compiler processes`_). These are special processes that collect\ninformation from one or more processes to execute a given task. Therefore,\nthis parameter can only be used when there is an appropriate compiler process\navailable (the available compiler processes are set in the\n:attr:`~flowcraft.generator.engine.NextflowGenerator.compilers` dictionary). In order to\nprovide one or more channels to a compiler process, simply add a key:value to the\nattribute, where the key is the id of the compiler process present in the\n:attr:`~flowcraft.generator.engine.NextflowGenerator.compilers` dictionary and the value\nis the list of channels::\n\n    self.compiler[\"patlas_consensus\"] = [\"mappingOutputChannel\"]\n\nLink start\n::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.link_start` attribute\nstores a list of strings of channel names that can be used as secondary\nchannels in the pipeline (See the `Secondary links between process`_ section).\nBy default, this attribute contains the main output channel, which means\nthat every process can fork the main channel to one or more receiving\nprocesses.\n\nLink end\n::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.link_end` attribute\nstores a list of dictionaries with channel names that are meant to be\nreceived by the process as secondary channel **if** the corresponding\n`Link start`_ exists in the pipeline. Each dictionary in this list will define\none secondary channel and requires two key:value pairs::\n\n    self.link_end({\n        \"link\": \"SomeChannel\",\n        \"alias\": \"OtherChannel\")\n    })\n\nIf another process exists in the pipeline with\n``self.link_start.extend([\"SomeChannel\"])``, FlowCraft will automatically\nestablish a secondary channel between the two processes. If there are multiple\nprocesses receiving from a single one, the channel from the later will\nfor into any number of receiving processes.\n\nDependencies\n::::::::::::\n\nIf a process depends on the presence of one or more processes upstream in the\npipeline, these can be specific via the\n:attr:`~flowcraft.generator.process.Process.dependencies` attribute.\nWhen building the pipeline if at least one of the dependencies is absent,\nFlowCraft will raise an exception informing of a missing dependency.\n\n.. _DirectivesAnchor:\n\nDirectives\n::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.directives` attribute\nallows for information about cpu/RAM usage and container to be specified\nfor each nextflow process in the template file. For instance, considering\nthe case where a ``Process`` has a template with two nextflow processes::\n\n    process proc_A_{{ pid }} {\n        // stuff\n    }\n\n    process proc_B_{{ pid }} {\n        // stuff\n    }\n\nThen, information about each process can be specified individually in the\n:attr:`~flowcraft.generator.process.Process.directives` attribute::\n\n\n    class myProcess(Process):\n        (...)\n        self.directives = {\n            \"proc_A\": {\n                \"cpus\": 1\n                \"memory\": \"4GB\"\n            },\n            \"proc_B\": {\n                \"cpus\": 4\n                \"container\": \"my/container\"\n                \"version\": \"1.0.0\"\n            }\n        }\n\nThe information in this attribute will then be used to build the\n``resources.config`` (containing the information about cpu/RAM) and\n``containers.config`` (containing the container images) files. Whenever a\ndirective is missing, such as the ``container`` and ``version`` from ``proc_A``\nand ``memory`` from ``proc_B``, nothing about them will be written into the\nconfig files and they will use the **default pipeline values**:\n\n- ``cpus``: ``1``\n- ``memory``: ``1GB``\n- ``container``: `flowcraft_base`_ image\n\n.. _flowcraft_base: https://hub.docker.com/r/ummidock/assemblerflow_base/~/dockerfile/\n\nIgnore type\n:::::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.ignore_type` attribute,\ncontrols whether a match between the input of the current process and the\noutput of the previous one is enforced or not. When there are multiple\nterminal processes that fork from the main channel, there is no need to\nenforce the type match and in that case this attribute can be set to ``False``.\n\nProcess ID\n::::::::::\n\nThe process ID, set via the\n:attr:`~flowcraft.generator.process.Process.pid` attribute, is an\narbitrarily and incremental number that is awarded to each process depending\non its position in the pipeline. It is mainly used to ensure that there are\nno duplicated channels even when the same process is used multiple times\nin the same pipeline.\n\nTemplate\n::::::::\n\nThe :attr:`~flowcraft.generator.process.Process.template` attribute\nis used to fetch the jinja2 template file that corresponds to the current\nprocess. The path to the template file is determined as follows::\n\n    join(<template directory>, template + \".nf\")\n\n\nStatus channels\n:::::::::::::::\n\nThe status channels are special channels dedicated to passing information\nregarding the status, warnings, fails and logging from each process\n(see :ref:`dotfiles` for more information). They are used only when the\nnextflow template file contains the appropriate jinja2 placeholder::\n\n    output:\n    {% with task_name=\"<nextflow_template_name>\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\nBy default,\nevery ``Process`` class contains a\n:attr:`~flowcraft.generator.process.Process.status_channels` list\nattribute that contains the\n:attr:`~flowcraft.generator.process.Process.template` string::\n\n    self.status_channels = [\"STATUS_{}\".format(template)]\n\nIf there is only one nextflow process in the template and the ``task_name``\nvariable in the template matches the\n:attr:`~flowcraft.generator.process.Process.template` attribute, then\nit's all automatically set up.\n\nIf the template file contains **more than one nextflow process**\ndefinition, multiple placeholders can be provided in the template::\n\n    process A {\n        (...)\n        output:\n        {% with task_name=\"A\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n    }\n\n    process B {\n        (...)\n        output:\n        {% with task_name=\"B\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n    }\n\nIn this case, the\n:attr:`~flowcraft.generator.process.Process.status_channels` attribute\nwould need to be changed to::\n\n    self.status_channels = [\"A\", \"B\"]\n\nSample ID variable\n^^^^^^^^^^^^^^^^^^\n\nIn case you change the standard nextflow variable that stores the sample ID\nin the input of the process (``sample_id``), you also need to change it for\nthe ``compiler_channels`` placeholder::\n\n    process A {\n\n    input:\n    set other_id, data from {{ input_channel }}\n\n    output:\n    {% with task_name=\"B\", sample_id=\"other_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    }\n\nAdvanced use cases\n------------------\n\nCompiler processes\n::::::::::::::::::\n\nCompilers are special processes that collect data from one or more processes\nand perform a given task with that compiled data. They are automatically\nincluded in the pipeline when at least one of the source channels is present.\nIn the case there are multiple source channels, they are merged according\nto a specified operator.\n\nCreating a compiler process\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe creation of the compiler process is simpler than that of a regular process\nbut follows the same three steps.\n\n1. Create a nextflow template file in ``flowcraft.generator.templates``::\n\n    process fullConsensus {\n\n        input:\n        set id, file(infile_list) from {{ compile_channels }}\n\n        output:\n        <output channels>\n\n        script:\n        \"\"\"\n        <commands/code/template>\n        \"\"\"\n\n    }\n\nThe only requirement is the inclusion of a ``compiler_channels`` jinja\nplaceholder in the main input channel.\n\n2. Create a Compiler class in the :mod:`flowcraft.generator.process`\n   module::\n\n    class PatlasConsensus(Compiler):\n\n        def __init__(self, **kwargs):\n\n            super().__init__(**kwargs)\n\nThis class must inherit from\n:mod:`~flowcraft.generator.process.Compiler` and does not require any\nmore changes.\n\n3. Map the compiler template file to the class in\n:attr:`~flowcraft.generator.engine.NextflowGenerator.compilers` attribute::\n\n        self.compilers = {\n        \"patlas_consensus\": {\n            \"cls\": pc.PatlasConsensus,\n            \"template\": \"patlas_consensus\",\n            \"operator\": \"join\"\n            }\n        }\n\nEach compiler should contain a key:value entry. The key is the compiler\nid that is then specified in the :attr:`~flowcraft.generator.process.Process.compiler`\nattribute of the component classes. The value is a json/dict object that\nspecies the compiler class in the ``cls`` key, the template string in the\n``template`` string and the operator used to join the channels into the\ncompiler via the ``operator`` key.\n\nHow a compiler process works\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nConsider the case where you have a compiler process named ``compiler_1`` and\ntwo processes, ``process_1`` and ``process_2``, both of which feed a single\nchannel to ``compiler_1``. This means that the class definition of these\nprocesses include::\n\n    class Process_1(Process):\n        (...)\n        self.compiler[\"compiler_1\"] = [\"channel1\"]\n\n    class Process_2(Process):\n        (...)\n        self.compiler[\"compiler_1\"] = [\"channel2\"]\n\nIf a pipeline is built with at least one of these process, the ``compiler_1``\nprocess will be automatically included in the pipeline. If more than one\nchannel is provided to the compiler, they will be merged with the specified\noperator::\n\n    process compiler_1 {\n\n        input:\n        set sample_id, file(infile_list) from channel2.join(channel1)\n\n    }\n\nThis will allow the output of multiple separate process to be processed by\na single process in the pipeline, and it automatically adjusts according\nto the channels provided to the compiler.\n\nSecondary links between process\n:::::::::::::::::::::::::::::::\n\nIn some cases, it might be necessary to perform additional links between\ntwo or more processes.\nFor example, the maximum read length might be gathered in one process, and\nthat information may be required by a subsequent process. These secondary\nchannels allow this information to be passed between theses channels.\n\nThese additional links are called secondary channels and\nthey may be explicitly or implicitly declared.\n\nExplicit secondary channels\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo create an explicit secondary channel, the origin or source of this channel\nmust be declared in the nextflow process that sends it::\n\n    // secondary channels can be created inside the process\n    output:\n    <main output> into {{ output_channel }}\n    <secondary output> into SIDE_max_read_len_{{ pid }}\n\n    // or outside\n    SIDE_phred_{{ pid }} = Channel.create()\n\nThen, we add the information that this process has a secondary channel start\nvia the ``link_start`` list attribute in the corresponding\n``flowcraft.generator.process.Process`` class::\n\n    class MyProcess(Process):\n\n        (...)\n\n        self.link_start.extend([\"SIDE_max_read_len\", \"SIDE_phred\"])\n\nNotice that we extend the ``link_start`` list, instead of simply assigning.\nThis is because all processes already have the main channel as an implicit\nlink start (See `Implicit secondary channels`_).\n\n**Now, any process that is executed after this one can receive this secondary\nchannel.**\n\nFor another process to receive this channel, it will be necessary to add this\ninformation to the process class(es) via the ``link_end`` list attribute::\n\n    class OtherProcess(Process):\n\n        (...)\n\n        self.link_end.append({\n            \"link\": \"SIDE_phred\",\n            \"alias\": \"OtherName\"\n        })\n\nNotice that now we append a dictionary with two key:values. The first, `link`\nmust match a string from the `link_start` list (in this case, `SIDE_phred`).\nThe second, `alias`, will be the channel name in the receiving process nextflow\ntemplate (which can be the same as the `link` value).\n\nNow, we only need to add the secondary channel to the nextflow template, as in\nthe example below::\n\n    input:\n    <main_input> from {{ input_channel }}.mix(OtherName_{{ pid}})\n\nImplicit secondary channels\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBy default, the main output of the channels is declared as a secondary channel\nstart. This means that any process can receive the main output channel as a\na secondary channel of a subsequent process. This can be useful in situations\nwere a post-assembly process (has ``assembly`` as expected input and output)\nneeds to receive the last channel with fastq files::\n\n    class AssemblyMapping(Process):\n\n        (...)\n\n        self.link_end.append({\n            \"link\": \"MAIN_fq\",\n            \"alias\": \"_MAIN_assembly\"\n        })\n\nIn this example, the ``AssemblyMapping`` process will receive a secondary\nchannel with from the last process that output fastq files into a channel\ncalled ``_MAIN_assembly``. Then, this channel is received in the nextflow\ntemplate like this::\n\n    input:\n    <main input> from {{ input_channel }}.join(_{{ input_channel }})\n\nImplicit secondary channels can also be used to\nfork the last output channel into multiple terminal processes::\n\n    class Abricate(Process):\n\n        (...)\n\n        self.link_end.append({\n            \"link\": \"MAIN_assembly\",\n            \"alias\": \"MAIN_assembly\"\n        })\n\nIn this case, since ``MAIN_assembly`` is already the prefix of the main\noutput channel of this process, there is no need for changes in the process\ntemplate::\n\n    input:\n    <main input> from {{ input_channel }}\n\n\n.. _jinja2: http://jinja.pocoo.org/docs/2.10/\n"
  },
  {
    "path": "docs/dev/create_recipe.rst",
    "content": "Recipe creation guidelines\n===========================\n\nRecipes are pre-made pipeline strings that may be associated with specific\nparameters and directives and are used to rapidly build a certain type of\npipeline.\n\nInstead of building a pipeline like::\n\n    -t \"integrity_coverage fastqc_trimmomatic fastqc spades pilon\"\n\nThe user simply can specific a recipe with that pipeline::\n\n    -r assembly\n\nRecipe creation\n---------------\n\nThe creation of new recipes is a very simple and straightforward process.\nYou need to create a new file in the ``flowcraft/generator/recipes`` folder\nwith any name and create a basic class with three attributes::\n\n    try:\n        from generator.recipe import Recipe\n    except ImportError:\n        from flowcraft.generator.recipe import Recipe\n\n\n    class Innuca(Recipe):\n\n        def __init__(self):\n            super().__init__()\n\n            # Recipe name\n            self.name = \"innuca\"\n\n            # Recipe pipeline\n            self.pipeline_str = <pipeline string>\n\n            # Recipe parameters and directives\n            self.directives = { <directives> }\n\nAnd that's it! Now there is a new recipe available with the ``innuca`` name and\nwe can build this pipeline using the option ``-r innuca``.\n\nName\n^^^^\n\nThis is the name of the recipe, which is used to make a match with the recipe\nname provided by the user via the ``-r`` option.\n\nPipeline_str\n^^^^^^^^^^^^\n\nThe pipeline string as if provided via the ``-t`` option.\n\nDirectives\n^^^^^^^^^^\n\nA dictionary containing the parameters and directives for each process in the\npipeline string. **Setting this attribute is optional and components\nthat are not specified here will assume their default values**. In general, each\nelement in this dictionary should have the following format::\n\n    self.directives = {\n        \"component_name\": {\n            \"params\": {\n                \"paramA\": \"value\"\n            },\n            \"directives\": {\n                \"directiveA\": \"value\"\n            }\n        }\n    }\n\nThis will set the provided parameters and directives to the component, but it is\npossible to provide only one.\n\nA more concrete example of a real component and directives follows::\n\n    self.pipeline_str = \"integrity_coverage fastqc\"\n\n    # Set parameters and directives only for integrity_coverage\n    # and leave fastqc with the defaults\n    self.directives = {\n        \"integrity_coverage\": {\n            \"params\": {\n                \"minCoverage\": 20\n            },\n            \"directives\": {\n                \"memory\": \"1GB\"\n            }\n        }\n    }\n\nDuplicate components\n~~~~~~~~~~~~~~~~~~~~\n\nIn some cases, the same component may be present multiple times in the pipeline\nstring of a recipe. In these cases, directives can be assigned to each individual\ncomponent by adding a ``#<id>`` suffix to the component::\n\n    self.pipeline_str = \"integrity_coverage ( trimmomatic spades#1 | spades#2)\"\n\n    self.directives = {\n        \"spades#1\": {\n            \"directives\": {\n                \"memory\": \"10GB\"\n            }\n        },\n        \"spades#2\": {\n            \"directives\": {\n                \"version\": \"3.7.0\"\n            }\n        }\n    }\n"
  },
  {
    "path": "docs/dev/create_recipes.rst",
    "content": "Recipe creation guidelines\n==========================\n\nUnder construction."
  },
  {
    "path": "docs/dev/create_template.rst",
    "content": "Template creation guidelines\n============================\n\nThough none of these guidelines are mandatory nor required, their usage is\nhighly recommended for several reasons:\n\n- Consistency in the outputs of the templates throughout the pipeline,\n  particularly the status and report dotfiles (see :ref:`dotfiles` section);\n- Debugging purposes;\n- Versioning;\n- Proper documentation of the template scripts.\n\nPreface header\n--------------\n\nAfter the script shebang, a header with a brief description of the purpose and\nexpected inputs and outputs should be provided. A complete example of such\ndescription can be viewed in :mod:`flowcraft.templates.integrity_coverage`.\n\nPurpose\n^^^^^^^\n\nPurpose section contains a brief description of the script's objective. E.g.::\n\n    Purpose\n    -------\n\n    This module is intended parse the results of FastQC for paired end FastQ \\\n    samples.\n\nExpected input\n^^^^^^^^^^^^^^\n\nExpected input section contains a description of the variables that are\nprovided to the main function of the template script. These variables are\ndefined in the input channels of the process in which the template is supposed\nto be executed. E.g.::\n\n    Expected input\n    --------------\n\n    The following variables are expected whether using NextFlow or the\n    :py:func:`main` executor.\n\n    - ``mash_output`` : String with the name of the mash screen output file.\n        - e.g.: ``'sortedMashScreenResults_SampleA.txt'``\n\nThis means that the process that will execute this channel will have the input\ndefined as::\n\n    input:\n    file(mash_output) from <channel>\n\nGenerated output\n^^^^^^^^^^^^^^^^\n\nGenerated output section contains a description of the output files that the\ntemplate script is intended to generated. E.g.::\n\n    Generated output\n    ----------------\n\n    The generated output are output files that contain an object, usually a string.\n\n    - ``fastqc_health`` : Stores the health check for the current sample. If it\n        passes all checks, it contains only the string 'pass'. Otherwise, contains\n        the summary categories and their respective results\n\nThese can then be passed to the output channel(s) in the nextflow process::\n\n    output:\n    file(fastqc_health) into <channel>\n\n.. note ::\n\n    Since templates can be re-used by multiple processes, not all generated\n    outputs need to be passed to output channels. Depending on the job of\n    the nextflow process, it may catch none or all of the output files\n    generated by the template.\n\n\nVersioning and logging\n----------------------\n\nFlowCraft has a specific ``logger``\n(:func:`~flowcraft.templates.flowcraft_utils.flowcraft_base.get_logger`) and\nversioning system that can be imported from\n:mod:`flowcraft.templates.flowcraft_utils`: ::\n\n    # the module that imports the logger and the decorator class for versioning\n    # of the script itself and other software used in the script\n    from flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n\nLogger\n^^^^^^\n\nA `logger` function is also required to add logs to the script. The logs\nare written to the ``.command.log`` file in the work directory of each process.\n\nFirst, the logger must be called, for example, after the **imports** as follows::\n\n    logger = get_logger(__file__)\n\nThen, it may be used at will, using the default `logging levels\n<https://docs.python.org/3.6/library/logging.html#levels>`_ . E.g.::\n\n    logger.debug(\"Information tha may be important for debugging\")\n    logger.info(\"Information related to the normal execution steps\")\n    logger.warning(\"Events that may require the attention of the developer\")\n    logger.error(\"Module exited unexpectedly with error:\\\\n{}\".format(\n                traceback.format_exc()))\n\nMainWrapper decorator\n^^^^^^^^^^^^^^^^^^^^^\n\nThis :class:`~flowcraft.templates.flowcraft_utils.flowcraft_base.MainWrapper`\nclass decorator allows the program to fetch information on the script version,\nbuild and template name. For example::\n\n    # This can also be declared after the imports\n    __version__ = \"1.0.0\"\n    __build__ = \"15012018\"\n    __template__ = \"process_abricate-nf\"\n\nThe :class:`~flowcraft.templates.flowcraft_utils.flowcraft_base.MainWrapper`\nshould decorate the main function of the script.\nE.g.::\n\n    @MainWrapper\n    def main():\n        #some awesome code\n        ...\n\nBesides searching for the script's version, build and template name this decorator\nwill also search for a specific set of functions that start with the\nsubstring ``__get_version``. For example::\n\n    def __get_version_fastqc():\n\n        try:\n\n        cli = [\"fastqc\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[1][1:].decode(\"utf8\")\n\n        except Exception as e:\n            logger.debug(e)\n            version = \"undefined\"\n\n        # Note that it returns a dictionary that will then be written to the .versions\n        # dotfile\n        return {\n            \"program\": \"FastQC\",\n            \"version\": version,\n            # some programs may also contain build.\n        }\n\nThese functions are used to fetch the version, name and other relevant\ninformation from third-party software and the only requirement is that they\nreturn a dictionary with **at least** two key:value pairs:\n\n- ``program``: String with the name of the program.\n- ``version``: String with the version of the program.\n\nFor more information, refer to the\n:func:`~flowcraft.templates.flowcraft_utils.flowcraft_base.MainWrapper.build_versions`\nmethod.\n\nNextflow `.command.sh`\n----------------------\n\nWhen these templates are used as a  Nextflow `template <https://www.nextflow.io/docs/latest/process.html#template>`_\nthey are executed as a ``.command.sh`` file in the work directory of each\nprocess. In this case, we recommended the inclusion of\nan **if statement** to parse the arguments sent from nextflow to the python\ntemplate. For example, imagine we have a path to a file name to pass as\nargument between nextflow and the required template::\n\n    # code check for nextflow execution\n    if __file__.endswith(\".command.sh\"):\n        FILE_NAME = '$Nextflow_file_name'\n        # logger output can also be included here, for example:\n        logger.debug(\"Running {} with parameters:\".format(\n            os.path.basename(__file__)))\n        logger.debug(\"FILE_NAME: {}\".format(FILE_NAME))\n\nThen, we could use this variable as the argument of a function, such as::\n\n    def main(FILE_NAME):\n        #some awesome code\n        ...\n\n\nThis way, we can use this function with nextflow arguments or without them,\nas is the case when the templates are used as standalone modules.\n\nUse numpy docstrings\n--------------------\n\n``FlowCraft`` uses numpy docstrings to document code.\nUse\n`this link <http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html>`_\nfor reference."
  },
  {
    "path": "docs/dev/general_orientation.rst",
    "content": "General orientation\n===================\n\nCodebase structure\n------------------\n\nThe most important elements of FlowCraft's directory structure are:\n\n- ``generator``:\n    - ``components``: Contains the ``Process`` classes for each component\n    - ``templates``: Contains the nextflow jinja template files for each component\n    - ``engine.py``: The engine of FlowCraft that builds the pipeline\n    - ``process.py``: Contains the abstract ``Process`` class that is inherited\n    - by all component classes\n    - ``pipeline_parser.py``: Functions that parse and check the pipeline string\n    - ``recipe.py``: Class responsible for creating recipes\n- ``templates``: A git submodule of the `templates`_ repository that contain\n  the template scripts for the components.\n\n.. _templates: https://github.com/ODiogoSilva/templates\n\n\nCode style\n----------\n\n- **Style**:  the code base of flowcraft should adhere (the best it can) to\n  the `PEP8`_ style guidelines.\n- **Docstrings**: code should be generally well documented following the\n  `numpy docstring`_ style.\n- **Quality**: there is also an integration with the `codacy`_ service to\n  evaluate code quality, which is useful for detecting several coding\n  issues that may appear.\n\n\nTesting\n-------\n\nTests are performed using `pytest`_ and the source files are stored in the\n``flowcraft/tests`` directory. Tests must be executed on the root directory\nof the repository\n\nDocumentation\n-------------\n\nDocumentation source files are stored in the ``docs`` directory. The general\nconfiguration file is found in ``docs/conf.py`` and the entry\npoint to the documentation is ``docs/index.html``.\n\n\n.. _pytest: https://docs.pytest.org/en/latest/\n.. _PEP8: https://www.python.org/dev/peps/pep-0008/\n.. _numpy docstring: https://numpydoc.readthedocs.io/en/latest/format.html\n.. _codacy: https://app.codacy.com/app/o.diogosilva/assemblerflow/dashboard"
  },
  {
    "path": "docs/dev/pipeline_reporting.rst",
    "content": "Pipeline reporting\n==================\n\nThis section describes how the reports of a FlowCraft pipeline are generated\nand collected at the end of a run. These reports can then be sent to the\n`FlowCraft web application <https://github.com/assemblerflow/flowcraft-webapp>`_\nwhere the results are visualized.\n\n.. important::\n    Note that if the nextflow process reports add new types of data, one or\n    more React components need to be added to the web application for them\n    to be rendered.\n\nData collection\n---------------\n\nThe data for the pipeline reports is collected from three dotfiles in each nextflow\nprocess (they should be present in each work sub directory):\n\n- **.report.json**: Contains report data (See :ref:`report-json` for more information).\n- **.versions**: Contains information about the versions of the software used\n  (See :ref:`versions` for more information).\n- **.command.trace**: Contains resource usage information.\n\nThe **.command.trace** file is generated by nextflow when the **trace** scope\nis active. The **.report.json** and **.version** files are specific to\nFlowCraft pipelines. \n\nGeneration of dotfiles\n^^^^^^^^^^^^^^^^^^^^^^\n\nBoth **report.json** and **.versions** empty dotfiles are automatically generated\nby the ``{% include \"post.txt\" ignore missing %}`` placeholder, specified in the\n:ref:`create-process` section. Using this placeholder in your processes is all\nthat is needed.\n\nCollection of dotfiles\n^^^^^^^^^^^^^^^^^^^^^^\n\nThe **.report.json**, **.versions** and **.command.trace** files are automatically\ncollected and sent to dedicated report channels in the pipeline by the\n``{%- include \"compiler_channels.txt\" ignore missing -%}`` placeholder, specified\nin the :ref:`process creation <compiler>` section. Placing this placeholder in your\nprocesses will generate the following line in the output channel specification::\n\n    set {{ sample_id|default(\"sample_id\") }}, val(\"{{ task_name }}_{{ pid }}\"), val(\"{{ pid }}\"), file(\".report.json\"), file(\".versions\"), file(\".command.trace\") into REPORT_{{task_name}}_{{ pid }}\n\nThis line collects several metadata associated with the process along with the three\ndotfiles.\n\nCompilation of dotfiles\n^^^^^^^^^^^^^^^^^^^^^^^\n\nAs mentioned in the previous section, the dotfiles and other relevant metadata\nfor are sent through special report channels to a FlowCraft component that is\nresponsible for compiling all the information and generate a single report\nfile at the end of each pipeline run.\n\nThis component is specified in ``flowcraft.generator.templates.report_compiler.nf``\nand it consists of two nextflow processes:\n\n- First, the **report** process receives the data from each executed process that\n  sends report data and runs the ``flowcraft/bin/prepare_reports.py`` script\n  on that data. This script will simply merge metadata and dotfiles information\n  in a single JSON file. This file contains the following keys:\n\n    - ``reportJson``: The data in **.report.json** file.\n    - ``versions``: The data in **.versions** file.\n    - ``trace``: The data in **.command.trace** file.\n    - ``processId``: The process ID\n    - ``pipelineId``: The pipeline ID that defaults to one, unless specified in\n      the parameters.\n    - ``projectid``: The project ID that defaults to one, unless specified in\n      the parameters.\n    - ``userId``: The user ID that defaults to one, unless specified in\n      the parameters.\n    - ``username``: The user name that defaults to *user*, unless specified in\n      the parameters\n    - ``processName``: The name of the flowcraft component.\n    - ``workdir``: The work directory where the process was executed.\n\n- Second, all JSON files created in the process above are merged\n  and a single reports JSON file is created. This file will contains the\n  following structure::\n\n    reportJSON = {\n        \"data\": {\n            \"results\": [<array of report JSONs>]\n        }\n    }\n"
  },
  {
    "path": "docs/dev/process_dotfiles.rst",
    "content": ".. _dotfiles:\n\nDotfiles\n========\n\nSeveral dotfiles (files prefixed by a single ``.``, as in ``.status``) are\ncreated at the beginning of every nextflow process that has the following\nplaceholder (see :ref:`create-process`): ::\n\n    process myProcess {\n        {% include \"post.txt\" ignore missing %}\n        (...)\n    }\n\nThe actual script that creates the dotfiles is found in\n``flowcraft/bin``, is called ``set_dotfiles.sh`` and executes the\nfollowing command::\n\n    touch .status .warning .fail .report.json .versions\n\nStatus\n------\n\nThe ``.status`` file simply stores a string with the run status of the process.\nThe supported status are:\n\n- ``pass``: The process finished successfully\n- ``fail``: The process ran without unexpected issues but failed due to some\n  quality control check\n- ``error``: The process exited with an unexpected error.\n\nWarning\n-------\n\nThe ``.warning`` file stores any warnings that may occur during the execution\nof the process. There is no particular format for the warning messages other\nthan that each individual warning should be in a separate line.\n\nFail\n----\n\nThe ``.fail`` file stores any fail messages that may occur during the\nexecution of the process. When this occurs, the ``.status`` channel must have\nthe ``fail`` string as well. As in the warning dotfile, there is no\nparticular format for the fail message.\n\n.. _report-json:\n\nReport JSON\n-----------\n\n.. important::\n    The general specification of the report JSON changed in version 1.2.2.\n    See the `issue tracker <https://github.com/assemblerflow/flowcraft/issues/96>`_\n    for details.\n\nThe ``.report.json`` file stores any information from a given process that is\ndeemed worthy of being reported and displayed at the end of the pipeline.\nAny information can be stored in this file, as long as it is in JSON format,\nbut there are a couple of recommendations that are necessary to follow\nfor them to be processed by a reporting web app (Currently hosted at\n`flowcraft-webapp <https://github.com/assemblerflow/flowcraft-webapp>`_). However, if\ndata processing will be performed with custom scripts, feel free to specify\nyour own format.\n\nInformation for tables\n^^^^^^^^^^^^^^^^^^^^^^\n\nInformation meant to be displayed in tables should be in the following\nformat::\n\n    json_dic = {\n        \"tableRow\": [{\n            \"sample\": \"A\",\n            \"data\": [{\n                \"header\": \"Raw BP\",\n                \"value\": 123,\n                \"table\": \"qc\"\n            }, {\n                \"header\": \"Coverage\",\n                \"value\": 32,\n                \"table\": \"qc\"\n            }]\n        }, {\n            \"sample\": \"B\",\n            \"data\": [{\n                \"header\": \"Coverage\",\n                \"value\": 35,\n                \"table\": \"qc\"\n            }]\n        }]\n    }\n\nThis provides table information for multiple samples in the same process. In\nthis case, data for two samples is provided. For each sample, values for\none or more headers can be provided. For instance, this report provides\ninformation about the **Raw BP** and **Coverage** for sample **A** and this\ninformation should go to the **qc** table. If any other information is relevant\nto build the table, feel free to add more elements to the JSON.\n\nInformation for plots\n^^^^^^^^^^^^^^^^^^^^^\n\nInformation meant to be displayed in plots should be in the following format::\n\n    json_dic = {\n        \"plotData\": [{\n            \"sample\": \"strainA\",\n            \"data\": {\n                \"sparkline\": 23123,\n                \"otherplot\": [1,2,3]\n             }\n        }],\n    }\n\nAs in the table JSON, *plotData* should be an array with an entry for each\nsample. The data for each sample should be another JSON where the keys are\nthe *plot signatures*, so that we know to which plot the data belongs. The\ncorresponding values are whatever data object you need.\n\nOther information\n^^^^^^^^^^^^^^^^^\n\nOther than tables and plots, which have a somewhat predefined format, there\nis not particular format for other information. They will simply store the\ndata of interest to report and it will be the job of a downstream report app\nto process that data into an actual visual report.\n\n.. _versions:\n\nVersions\n--------\n\nThe ``.version`` dotfile should contain a list of JSON objects with the\nversion information of the programs used in any given process. There are\nonly two required key:value pairs:\n\n- ``program``: String with the name of the software/script/template\n- ``version``: String with the version of said software.\n\nAs an example::\n\n    version = {\n        \"program\": \"abricate\"\n        \"version\": \"0.3.7\"\n    }\n\nKey:value pairs with other metadata can be included at will for downstream\nprocessing."
  },
  {
    "path": "docs/dev/reports.rst",
    "content": "Reports\n=======\n\nReport JSON specification\n-------------------------\n\nThe report JSON is quite flexibly on the information it can contain. Here are\nsome guidelines to promote consistency on the reports generated by each component.\nIn general, the reports file is an array of JSON objects that contain relevant\ninformation for each executed process in the pipeline::\n\n    reportFile = [{<processA/tagA reports>}, {<processB/tagB reports>}, ... ]\n\n\nNextflow metadata\n^^^^^^^^^^^^^^^^^\n\nThe nextflow metada is automatically added to the reportFile as a single JSON entry\nwith the ``nfMetadata`` key that contains the following information::\n\n        \"nfMetadata\": {\n            \"scriptId\": \"${workflow.scriptId}\",\n            \"scriptName\": \"${workflow.scriptId}\",\n            \"profile\": \"${workflow.profile}\",\n            \"container\": \"${workflow.container}\",\n            \"containerEngine\": \"${workflow.containerEngine}\",\n            \"commandLine\": \"${workflow.commandLine}\",\n            \"runName\": \"${workflow.runName}\",\n            \"sessionId\": \"${workflow.sessionId}\",\n            \"projectDir\": \"${workflow.projectDir}\",\n            \"launchDir\": \"${workflow.launchDir}\",\n            \"start_time\": \"${workflow.start}\"\n        }\n\n.. note::\n    Unlike the remaining JSON entries in the report file, which are generated for\n    each process execution, the ``nfMetadata`` entry is generated only once per\n    project execution.\n\nRoot\n^^^^\n\nThe reports contained in the ``reports.json`` file for each process execution\nare added to the root object::\n\n    {\n        \"pipelineId\": 1,\n        \"processId\": pid,\n        \"processName\": task_name,\n        \"projectid\": RUN_NAME,\n        \"reportJson\": reports,\n        \"runName\": RUN_NAME,\n        \"scriptId\": SCRIPT_ID,\n        \"versions\": versions,\n        \"trace\": trace,\n        \"userId\": 1,\n        \"username\": \"user\",\n        \"workdir\": dirname(abspath(report_json))\n    }\n\nThe other key:values are added automatically when the reports are compiled for each\nprocess execution.\n\nVersions\n^^^^^^^^\n\nInside the root, the signature key for software version information is ``versions``::\n\n    \"versions\": [{\n        \"program\": \"progA\",\n        \"version\": \"1.0.0\",\n        \"build\": \"1\"\n    }, {\n        \"program\": \"progB\",\n        \"version\": \"2.1\"\n    }]\n\nOnly the ``program`` and ``version`` keys are mandatory.\n\nReportJson\n^^^^^^^^^^\n\nTable data\n~~~~~~~~~~\n\nInside ``reportJson``, the signature key for table data is ``tableRow``::\n\n    \"reportJson\": {\n        \"tableRow\": [{\n            \"sample\": \"strainA\",\n            \"data\": [{\n                \"header\": \"Raw BP\",\n                \"value\": 123,\n                \"table\": \"qc\",\n            }, {\n                \"header\": \"Coverage\",\n                \"value\": 32,\n                \"table\": \"qc\"\n            }],\n            \"sample\": \"strainB\",\n            \"data\": [{\n                \"header\": \"Raw BP\",\n                \"value\": 321,\n                \"table\": \"qc\",\n            }, {\n                \"header\": \"Coverage\",\n                \"value\": 22,\n                \"table\": \"qc\"\n            }]\n        }]\n   }\n\n``tableRow`` should contain an array of JSON for each sample with two key:value pairs:\n\n    - ``sample``: Sample name\n    - ``data``: Table data (see below).\n\n``data`` should be an array of JSON with at least three key:value pairs:\n\n    - ``header``: Column header\n    - ``value``: The data value\n    - ``table``: Informs to which table this data should go.\n\n.. note::\n    Available ``table`` keys: ``typing``, ``qc``, ``assembly``, ``abricate``,\n    ``chewbbaca``.\n\n\nPlot data\n~~~~~~~~~\n\nInside ``reportJson``, the signature key for plot data is ``plotData``::\n\n    \"reportJson\": {\n        \"plotData\": [{\n            \"sample\": \"strainA\",\n            \"data\": {\n                \"sparkline\": 23123,\n                \"otherplot\": [1,2,3]\n             }\n        }],\n    }\n\n``plotData`` should contain an array of JSON for each sample with two key:value pairs:\n\n    - ``sample``: Sample name\n    - ``data``: Plot data (see below).\n\n``data`` should contain a JSON object with the plot signatures as keys, and the relevant\nplot data as value. This data can be any object (integer, float, array, JSON, etc).\n**It will be up to the components in the flowcraft web application to parse this data\nand generate the appropriate chart.**\n\nWarnings and fails\n~~~~~~~~~~~~~~~~~~\n\nInside ``reportJson``, the signature key for warnings is ``warnings`` and for\nfailures is ``fail``::\n\n    \"reportJson\": {\n        \"warnings\": [{\n            \"sample\": \"strainA\",\n            \"table\": \"qc\",\n            \"value\": [\"message 1\", \"message 2\"]\n        }],\n        \"fail\": [{\n            \"sample\": \"strainA\",\n            \"table\": \"assembly\",\n            \"value\": [\"message 1\"]\n        }]\n    }\n\n\n``warnings``/``fail`` should contain an array of JSON for each sample with\ntwo key:value pairs:\n\n    - ``sample``: Sample name\n    - ``value``: An array with one or more string messages.\n    - ``table`` **[optional]**: If a table signature is provided, the warning/fail\n      messages information will appear on that table. Otherwise, it will appear as\n      a general warning/error that is associated to the sample but not to any particular\n      table.\n"
  },
  {
    "path": "docs/flowcraft.flowcraft.rst",
    "content": "flowcraft\\.flowcraft module\n===========================\n\n.. automodule:: flowcraft.flowcraft\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.annotation.rst",
    "content": "flowcraft\\.generator\\.components\\.annotation module\n===================================================\n\n.. automodule:: flowcraft.generator.components.annotation\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.assembly.rst",
    "content": "flowcraft\\.generator\\.components\\.assembly module\n=================================================\n\n.. automodule:: flowcraft.generator.components.assembly\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.assembly_processing.rst",
    "content": "flowcraft\\.generator\\.components\\.assembly\\_processing module\n=============================================================\n\n.. automodule:: flowcraft.generator.components.assembly_processing\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.distance_estimation.rst",
    "content": "flowcraft\\.generator\\.components\\.distance\\_estimation module\n=============================================================\n\n.. automodule:: flowcraft.generator.components.distance_estimation\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.downloads.rst",
    "content": "flowcraft\\.generator\\.components\\.downloads module\n==================================================\n\n.. automodule:: flowcraft.generator.components.downloads\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.metagenomics.rst",
    "content": "flowcraft\\.generator\\.components\\.metagenomics module\n=====================================================\n\n.. automodule:: flowcraft.generator.components.metagenomics\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.mlst.rst",
    "content": "flowcraft\\.generator\\.components\\.mlst module\n=============================================\n\n.. automodule:: flowcraft.generator.components.mlst\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.patlas_mapping.rst",
    "content": "flowcraft\\.generator\\.components\\.patlas\\_mapping module\n========================================================\n\n.. automodule:: flowcraft.generator.components.patlas_mapping\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.reads_quality_control.rst",
    "content": "flowcraft\\.generator\\.components\\.reads\\_quality\\_control module\n================================================================\n\n.. automodule:: flowcraft.generator.components.reads_quality_control\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.rst",
    "content": "flowcraft\\.generator\\.components package\n========================================\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.generator.components.annotation\n   flowcraft.generator.components.assembly\n   flowcraft.generator.components.assembly_processing\n   flowcraft.generator.components.distance_estimation\n   flowcraft.generator.components.downloads\n   flowcraft.generator.components.metagenomics\n   flowcraft.generator.components.mlst\n   flowcraft.generator.components.patlas_mapping\n   flowcraft.generator.components.reads_quality_control\n   flowcraft.generator.components.typing\n\nModule contents\n---------------\n\n.. automodule:: flowcraft.generator.components\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.components.typing.rst",
    "content": "flowcraft\\.generator\\.components\\.typing module\n===============================================\n\n.. automodule:: flowcraft.generator.components.typing\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.engine.rst",
    "content": "flowcraft\\.generator\\.engine module\n===================================\n\n.. automodule:: flowcraft.generator.engine\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.error_handling.rst",
    "content": "flowcraft\\.generator\\.error\\_handling module\n============================================\n\n.. automodule:: flowcraft.generator.error_handling\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.footer_skeleton.rst",
    "content": "flowcraft\\.generator\\.footer\\_skeleton module\n=============================================\n\n.. automodule:: flowcraft.generator.footer_skeleton\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.header_skeleton.rst",
    "content": "flowcraft\\.generator\\.header\\_skeleton module\n=============================================\n\n.. automodule:: flowcraft.generator.header_skeleton\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.inspect.rst",
    "content": "flowcraft\\.generator\\.inspect module\n====================================\n\n.. automodule:: flowcraft.generator.inspect\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.pipeline_parser.rst",
    "content": "flowcraft\\.generator\\.pipeline\\_parser module\n=============================================\n\n.. automodule:: flowcraft.generator.pipeline_parser\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.process.rst",
    "content": "flowcraft\\.generator\\.process module\n====================================\n\n.. automodule:: flowcraft.generator.process\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.process_details.rst",
    "content": "flowcraft\\.generator\\.process\\_details module\n=============================================\n\n.. automodule:: flowcraft.generator.process_details\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.recipe.rst",
    "content": "flowcraft\\.generator\\.recipe module\n===================================\n\n.. automodule:: flowcraft.generator.recipe\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.generator.rst",
    "content": "flowcraft\\.generator package\n============================\n\nSubpackages\n-----------\n\n.. toctree::\n\n    flowcraft.generator.components\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.generator.engine\n   flowcraft.generator.error_handling\n   flowcraft.generator.footer_skeleton\n   flowcraft.generator.header_skeleton\n   flowcraft.generator.inspect\n   flowcraft.generator.pipeline_parser\n   flowcraft.generator.process\n   flowcraft.generator.process_details\n   flowcraft.generator.recipe\n\nModule contents\n---------------\n\n.. automodule:: flowcraft.generator\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.rst",
    "content": "flowcraft package\n=================\n\nSubpackages\n-----------\n\n.. toctree::\n\n    flowcraft.generator\n    flowcraft.templates\n    flowcraft.tests\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.flowcraft\n\nModule contents\n---------------\n\n.. automodule:: flowcraft\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.assembly_report.rst",
    "content": "flowcraft\\.templates\\.assembly\\_report module\n=============================================\n\n.. automodule:: flowcraft.templates.assembly_report\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.fastqc.rst",
    "content": "flowcraft\\.templates\\.fastqc module\n===================================\n\n.. automodule:: flowcraft.templates.fastqc\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.fastqc_report.rst",
    "content": "flowcraft\\.templates\\.fastqc\\_report module\n===========================================\n\n.. automodule:: flowcraft.templates.fastqc_report\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.flowcraft_utils.flowcraft_base.rst",
    "content": "flowcraft\\.templates\\.flowcraft\\_utils\\.flowcraft\\_base module\n==============================================================\n\n.. automodule:: flowcraft.templates.flowcraft_utils.flowcraft_base\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.flowcraft_utils.rst",
    "content": "flowcraft\\.templates\\.flowcraft\\_utils package\n==============================================\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.templates.flowcraft_utils.flowcraft_base\n\nModule contents\n---------------\n\n.. automodule:: flowcraft.templates.flowcraft_utils\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.integrity_coverage.rst",
    "content": "flowcraft\\.templates\\.integrity\\_coverage module\n================================================\n\n.. automodule:: flowcraft.templates.integrity_coverage\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.mapping2json.rst",
    "content": "flowcraft\\.templates\\.mapping2json module\n=========================================\n\n.. automodule:: flowcraft.templates.mapping2json\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.mashdist2json.rst",
    "content": "flowcraft\\.templates\\.mashdist2json module\n==========================================\n\n.. automodule:: flowcraft.templates.mashdist2json\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.mashscreen2json.rst",
    "content": "flowcraft\\.templates\\.mashscreen2json module\n============================================\n\n.. automodule:: flowcraft.templates.mashscreen2json\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.megahit.rst",
    "content": "flowcraft\\.templates\\.megahit module\n====================================\n\n.. automodule:: flowcraft.templates.megahit\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.metaspades.rst",
    "content": "flowcraft\\.templates\\.metaspades module\n=======================================\n\n.. automodule:: flowcraft.templates.metaspades\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.pATLAS_consensus_json.rst",
    "content": "flowcraft\\.templates\\.pATLAS\\_consensus\\_json module\n====================================================\n\n.. automodule:: flowcraft.templates.pATLAS_consensus_json\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.pipeline_status.rst",
    "content": "flowcraft\\.templates\\.pipeline\\_status module\n=============================================\n\n.. automodule:: flowcraft.templates.pipeline_status\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.process_abricate.rst",
    "content": "flowcraft\\.templates\\.process\\_abricate module\n==============================================\n\n.. automodule:: flowcraft.templates.process_abricate\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.process_assembly.rst",
    "content": "flowcraft\\.templates\\.process\\_assembly module\n==============================================\n\n.. automodule:: flowcraft.templates.process_assembly\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.process_assembly_mapping.rst",
    "content": "flowcraft\\.templates\\.process\\_assembly\\_mapping module\n=======================================================\n\n.. automodule:: flowcraft.templates.process_assembly_mapping\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.rst",
    "content": "flowcraft\\.templates package\n============================\n\nSubpackages\n-----------\n\n.. toctree::\n\n    flowcraft.templates.flowcraft_utils\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.templates.assembly_report\n   flowcraft.templates.fastqc\n   flowcraft.templates.fastqc_report\n   flowcraft.templates.integrity_coverage\n   flowcraft.templates.mapping2json\n   flowcraft.templates.mashdist2json\n   flowcraft.templates.mashscreen2json\n   flowcraft.templates.megahit\n   flowcraft.templates.metaspades\n   flowcraft.templates.pATLAS_consensus_json\n   flowcraft.templates.pipeline_status\n   flowcraft.templates.process_abricate\n   flowcraft.templates.process_assembly\n   flowcraft.templates.process_assembly_mapping\n   flowcraft.templates.skesa\n   flowcraft.templates.spades\n   flowcraft.templates.trimmomatic\n   flowcraft.templates.trimmomatic_report\n\nModule contents\n---------------\n\n.. automodule:: flowcraft.templates\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.skesa.rst",
    "content": "flowcraft\\.templates\\.skesa module\n==================================\n\n.. automodule:: flowcraft.templates.skesa\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.spades.rst",
    "content": "flowcraft\\.templates\\.spades module\n===================================\n\n.. automodule:: flowcraft.templates.spades\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.trimmomatic.rst",
    "content": "flowcraft\\.templates\\.trimmomatic module\n========================================\n\n.. automodule:: flowcraft.templates.trimmomatic\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.templates.trimmomatic_report.rst",
    "content": "flowcraft\\.templates\\.trimmomatic\\_report module\n================================================\n\n.. automodule:: flowcraft.templates.trimmomatic_report\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.data_pipelines.rst",
    "content": "flowcraft\\.tests\\.data\\_pipelines module\n========================================\n\n.. automodule:: flowcraft.tests.data_pipelines\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.rst",
    "content": "flowcraft\\.tests package\n========================\n\nSubmodules\n----------\n\n.. toctree::\n\n   flowcraft.tests.data_pipelines\n   flowcraft.tests.test_assemblerflow\n   flowcraft.tests.test_engine\n   flowcraft.tests.test_pipeline_parser\n   flowcraft.tests.test_process_details\n   flowcraft.tests.test_processes\n   flowcraft.tests.test_sanity\n\nModule contents\n---------------\n\n.. automodule:: flowcraft.tests\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_assemblerflow.rst",
    "content": "flowcraft\\.tests\\.test\\_assemblerflow module\n============================================\n\n.. automodule:: flowcraft.tests.test_assemblerflow\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_engine.rst",
    "content": "flowcraft\\.tests\\.test\\_engine module\n=====================================\n\n.. automodule:: flowcraft.tests.test_engine\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_pipeline_parser.rst",
    "content": "flowcraft\\.tests\\.test\\_pipeline\\_parser module\n===============================================\n\n.. automodule:: flowcraft.tests.test_pipeline_parser\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_process_details.rst",
    "content": "flowcraft\\.tests\\.test\\_process\\_details module\n===============================================\n\n.. automodule:: flowcraft.tests.test_process_details\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_processes.rst",
    "content": "flowcraft\\.tests\\.test\\_processes module\n========================================\n\n.. automodule:: flowcraft.tests.test_processes\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/flowcraft.tests.test_sanity.rst",
    "content": "flowcraft\\.tests\\.test\\_sanity module\n=====================================\n\n.. automodule:: flowcraft.tests.test_sanity\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/getting_started/installation.rst",
    "content": "Installation\n============\n\nUser installation\n-----------------\n\nFlowCraft is available as a bioconda package, which already comes with\nnextflow::\n\n    conda install flowcraft\n\nAlternatively, you can install only FlowCraft, via pip::\n\n    pip install flowcraft\n\nYou will also need a container engine (see `Container engine`_ below)\n\nContainer engine\n----------------\n\nAll components of FlowCraft are executed in docker containers, which\nmeans that you'll need to have a container engine installed. The container\nengines available are the ones supported by Nextflow:\n\n- `Docker`_,\n- `Singularity`_\n- Shifter (undocumented)\n\nIf you already have any one of these installed, you are good to go. If not,\nyou'll need to install one. We recommend singularity because it does not\nrequire the processes to run on a separate root daemon.\n\nSingularity\n:::::::::::\n\nSingularity is available to download and install `here <http://singularity.lbl.gov/install-linux>`_.\nMake sure that you have singularity v2.5.x or higher.\nNote that singularity should be installed as root and available on the machine(s) that\nwill be running the nextflow processes.\n\n.. important::\n\n    Singularity is available as a bioconda package. However, conda installs singularity\n    in user space without root privileges, which may prevent singularity images from\n    being correctly downloaded. **Therefore it is not recommended that you install\n    singularity via bioconda**.\n\nDocker\n::::::\n\nDocker can be installed following the instructions on the website:\nhttps://www.docker.com/community-edition#/download.\nTo run docker as anon-root user, you'll need to following the instructions\non the website: https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user\n\n\nDeveloper installation\n----------------------\n\nIf you are looking to contribute to FlowCraft or simply interested in\ntweaking it, clone the github repository and its submodule and then run\nsetup.py::\n\n    git clone https://github.com/assemblerflow/flowcraft.git\n    cd flowcraft\n    python3 setup.py install\n\n\n.. _Docker: https://www.nextflow.io/docs/latest/docker.html\n.. _Singularity: https://www.nextflow.io/docs/latest/singularity.html\n\n"
  },
  {
    "path": "docs/getting_started/overview.rst",
    "content": "..    include:: <isonum.txt>\n\nOverview\n========\n\nFlowCraft is an assembler of pipelines written in  nextflow_ for\nanalyses of genomic data. The premisse is simple:\n\nSoftware are container blocks |rarr| Build your lego-like pipeline |rarr| Execute it (almost) anywhere.\n\nWhat is Nextflow\n::::::::::::::::\n\nIf you do not know nextflow, be sure to check it out. It's an awesome\nframework based on the dataflow programming model used for building\nparallelized, scalable and reproducible workflows using software containers.\nIt provides an abstraction layer between the execution and the logic of the\npipeline, which means that the same pipeline code can be executed on\nmultiple platforms, from a local laptop to clusters managed with SLURM, SGE,\netc. These are quite attractive features since genomic pipelines are\nincreasingly executed on large computer clusters to handle large volumes\nof data and/or tasks. Moreover, portability and reproducibility are becoming\ncentral pillars in modern data science.\n\nWhat FlowCraft does\n:::::::::::::::::::\n\nFlowCraft is a python engine that automatically builds nextflow pipelines\nby assembling pre-made ready-to-use :ref:`components <components>`. These components are modular\npieces of software or scripts, such as ``fastqc``, ``trimmomatic``, ``spades``,\netc, that are written for nextflow and have a set of attributes, such as\ninput and output types, parameters, directives, etc. This modular nature\nallows them to be freely connected as long as they respect some basic rules,\nsuch as the input type of a component must match with the output type of\nthe preceding component. In this way, nextflow processes can be\nwritten only once, and FlowCraft is the magic glue that connects them,\nhandling the linking and forking of channels automatically. Moreover, each\ncomponent is associated with a docker image, which means that there is no\nneed to install any dependencies at all and all software runs on a\ntransparent and reliable box. To illustrate:\n\n- A linear genome assembly pipeline can be easily built using FlowCraft\n  with the following pipeline string::\n\n    trimmomatic fastqc spades\n\nWhich will generate all the necessary files to run the nextflow\npipeline on any linux system that has nextflow and a container engine.\n\n- You can easily add more components to perform assembly polishing, in this\n  case, ``pilon``::\n\n    trimmomatic fastqc spades pilon\n\n- If a new assembler comes along and you want to switch that component in the\n  pipeline, its as easy as replacing ``spades`` (or any other component)::\n\n    trimmomatic fastqc skesa pilon\n\n- And you can also fork the output of a component into multiple ones. For\n  instance, we could annotate the resulting assemblies with multiple software::\n\n    trimmomatic fastqc spades pilon (abricate | prokka)\n\n- Or fork the execution of a pipeline early on to compare different software::\n\n    trimmomatic fastqc (spades pilon | skesa pilon)\n\nThis will fork the output of ``fastqc`` into ``spades`` and ``skesa``, and\nthe pipeline will proceed independently in these two new 'lanes'.\n\n- Directives for each process can be dynamically set when building the pipeline,\n  such as the cpu/RAM usage or the software version::\n\n    trimmomatic={'cpus':'4'} fastqc={'version':'0.11.5'} skesa={'memory':'10GB'} pilon (abricate | prokka)\n\n- And extra input can be directly inserted in any part of the pipeline. For\n  example, it is possible to assemble genomes from both fastq files and SRR\n  accessions (downloaded from public databases) in a single workflow::\n\n    download_reads trimmomatic={'extra_input':'reads'} fastqc skesa pilon\n\nThis pipeline can be executed by providing a file with accession numbers\n(``--accessions`` parameter by default) **and** fastq reads, using the\n``--reads`` parameter defined with the ``extra_input`` directive.\n\n\nWho is FlowCraft for\n::::::::::::::::::::\n\nFlowCraft can be useful for bioinformaticians with varied levels of expertise\nthat need to executed genomic pipelines often and potentially in different\nplatforms. Building and executing pipelines requires no programming knowledge,\nbut familiarization with nextflow is highly recommended to take full advantage\nof the generated pipelines.\n\nAt the moment, the available pre-made processes are mainly focused on\nbacterial genome assembly simply because that was how we started.\nHowever, our goal is to expand the library of existing components to other\ncommonly used tools in the field of genomics and to widen the applicability\nand usefulness of FlowCraft pipelines.\n\nWhy not just write a Nextflow pipeline?\n:::::::::::::::::::::::::::::::::::::::\n\nIn many cases, building a static nextflow pipeline is sufficient for our goals.\nHowever, when building our own pipelines, we often felt the need to add\ndynamism to this process, particularly if we take into account how fast new\ntools arise and existing ones change. Our biological goals also change over\ntime and we might need different pipelines to answer different questions.\nFlowCraft makes this very easy by having a set of pre-made and ready-to-use\ncomponents that can be freely assembled. By using components (``fastqc``,\n``trimmomatic``) as its atomic elements, very complex pielines that take\nfull advantage of nextflow can be built with little effort. Moreover,\nthese components have explicit and standardized\ninput and output types, which means that the addition of new modules does not\nrequire any changes in the existing code base. They just need to take into\naccount how data will be received by the process and how data may be emitted\nfrom the process, to ensure that it can link with other components.\n\n**However, why not both?**\n\nFlowCraft generates a complete Nextflow pipeline file, which ca be used\nas a starting point for your customized processes!\n\n.. _nextflow: https://www.nextflow.io/"
  },
  {
    "path": "docs/index.rst",
    "content": ".. Templates documentation master file, created by\n   sphinx-quickstart on Thu Feb  8 09:51:21 2018.\n   You can adapt this file completely to your liking, but it should at least\n   contain the root `toctree` directive.\n\nFlowCraft\n=========\n\n.. image:: resources/logo_large.png\n   :scale: 20 %\n   :align: center\n\nA NextFlow pipeline assembler for genomics.\n\n.. _Getting Started:\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Getting Started\n\n   getting_started/overview\n   getting_started/installation\n   about/about\n\n.. _User Guide:\n\n.. toctree::\n   :maxdepth: 1\n   :caption: User Guide\n\n   user/basic_usage\n   user/pipeline_building\n   user/pipeline_configuration\n   user/pipeline_inspect\n   user/pipeline_reports\n   user/available_components\n\n.. _Developer Guide:\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Developer Guide\n\n   dev/general_orientation\n   dev/create_process\n   dev/create_template\n   dev/create_recipe\n   dev/containers\n   dev/process_dotfiles\n   dev/pipeline_reporting\n   dev/reports\n\n.. _Source API:\n\n.. toctree::\n   :maxdepth: 2\n   :caption: Source API\n\n   flowcraft"
  },
  {
    "path": "docs/make.bat",
    "content": "@ECHO OFF\n\nREM Command file for Sphinx documentation\n\npushd %~dp0\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=sphinx-build\n)\nset BUILDDIR=_build\nset ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .\nset I18NSPHINXOPTS=%SPHINXOPTS% .\nif NOT \"%PAPER%\" == \"\" (\n\tset ALLSPHINXOPTS=-D latex_elements.papersize=%PAPER% %ALLSPHINXOPTS%\n\tset I18NSPHINXOPTS=-D latex_elements.papersize=%PAPER% %I18NSPHINXOPTS%\n)\n\nif \"%1\" == \"\" goto help\n\nif \"%1\" == \"help\" (\n\t:help\n\techo.Please use `make ^<target^>` where ^<target^> is one of\n\techo.  html       to make standalone HTML files\n\techo.  dirhtml    to make HTML files named index.html in directories\n\techo.  singlehtml to make a single large HTML file\n\techo.  pickle     to make pickle files\n\techo.  json       to make JSON files\n\techo.  htmlhelp   to make HTML files and an HTML help project\n\techo.  qthelp     to make HTML files and a qthelp project\n\techo.  devhelp    to make HTML files and a Devhelp project\n\techo.  epub       to make an epub\n\techo.  epub3      to make an epub3\n\techo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter\n\techo.  text       to make text files\n\techo.  man        to make manual pages\n\techo.  texinfo    to make Texinfo files\n\techo.  gettext    to make PO message catalogs\n\techo.  changes    to make an overview over all changed/added/deprecated items\n\techo.  xml        to make Docutils-native XML files\n\techo.  pseudoxml  to make pseudoxml-XML files for display purposes\n\techo.  linkcheck  to check all external links for integrity\n\techo.  doctest    to run all doctests embedded in the documentation if enabled\n\techo.  coverage   to run coverage check of the documentation if enabled\n\techo.  dummy      to check syntax errors of document sources\n\tgoto end\n)\n\nif \"%1\" == \"clean\" (\n\tfor /d %%i in (%BUILDDIR%\\*) do rmdir /q /s %%i\n\tdel /q /s %BUILDDIR%\\*\n\tgoto end\n)\n\n\nREM Check if sphinx-build is available and fallback to Python version if any\n%SPHINXBUILD% 1>NUL 2>NUL\nif errorlevel 9009 goto sphinx_python\ngoto sphinx_ok\n\n:sphinx_python\n\nset SPHINXBUILD=python -m sphinx.__init__\n%SPHINXBUILD% 2> nul\nif errorlevel 9009 (\n\techo.\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\n\techo.installed, then set the SPHINXBUILD environment variable to point\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\n\techo.may add the Sphinx directory to PATH.\n\techo.\n\techo.If you don't have Sphinx installed, grab it from\n\techo.http://sphinx-doc.org/\n\texit /b 1\n)\n\n:sphinx_ok\n\n\nif \"%1\" == \"html\" (\n\t%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The HTML pages are in %BUILDDIR%/html.\n\tgoto end\n)\n\nif \"%1\" == \"dirhtml\" (\n\t%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.\n\tgoto end\n)\n\nif \"%1\" == \"singlehtml\" (\n\t%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.\n\tgoto end\n)\n\nif \"%1\" == \"pickle\" (\n\t%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished; now you can process the pickle files.\n\tgoto end\n)\n\nif \"%1\" == \"json\" (\n\t%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished; now you can process the JSON files.\n\tgoto end\n)\n\nif \"%1\" == \"htmlhelp\" (\n\t%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished; now you can run HTML Help Workshop with the ^\n.hhp project file in %BUILDDIR%/htmlhelp.\n\tgoto end\n)\n\nif \"%1\" == \"qthelp\" (\n\t%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished; now you can run \"qcollectiongenerator\" with the ^\n.qhcp project file in %BUILDDIR%/qthelp, like this:\n\techo.^> qcollectiongenerator %BUILDDIR%\\qthelp\\Templates.qhcp\n\techo.To view the help file:\n\techo.^> assistant -collectionFile %BUILDDIR%\\qthelp\\Templates.ghc\n\tgoto end\n)\n\nif \"%1\" == \"devhelp\" (\n\t%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished.\n\tgoto end\n)\n\nif \"%1\" == \"epub\" (\n\t%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The epub file is in %BUILDDIR%/epub.\n\tgoto end\n)\n\nif \"%1\" == \"epub3\" (\n\t%SPHINXBUILD% -b epub3 %ALLSPHINXOPTS% %BUILDDIR%/epub3\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The epub3 file is in %BUILDDIR%/epub3.\n\tgoto end\n)\n\nif \"%1\" == \"latex\" (\n\t%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished; the LaTeX files are in %BUILDDIR%/latex.\n\tgoto end\n)\n\nif \"%1\" == \"latexpdf\" (\n\t%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex\n\tcd %BUILDDIR%/latex\n\tmake all-pdf\n\tcd %~dp0\n\techo.\n\techo.Build finished; the PDF files are in %BUILDDIR%/latex.\n\tgoto end\n)\n\nif \"%1\" == \"latexpdfja\" (\n\t%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex\n\tcd %BUILDDIR%/latex\n\tmake all-pdf-ja\n\tcd %~dp0\n\techo.\n\techo.Build finished; the PDF files are in %BUILDDIR%/latex.\n\tgoto end\n)\n\nif \"%1\" == \"text\" (\n\t%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The text files are in %BUILDDIR%/text.\n\tgoto end\n)\n\nif \"%1\" == \"man\" (\n\t%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The manual pages are in %BUILDDIR%/man.\n\tgoto end\n)\n\nif \"%1\" == \"texinfo\" (\n\t%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.\n\tgoto end\n)\n\nif \"%1\" == \"gettext\" (\n\t%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The message catalogs are in %BUILDDIR%/locale.\n\tgoto end\n)\n\nif \"%1\" == \"changes\" (\n\t%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.The overview file is in %BUILDDIR%/changes.\n\tgoto end\n)\n\nif \"%1\" == \"linkcheck\" (\n\t%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Link check complete; look for any errors in the above output ^\nor in %BUILDDIR%/linkcheck/output.txt.\n\tgoto end\n)\n\nif \"%1\" == \"doctest\" (\n\t%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Testing of doctests in the sources finished, look at the ^\nresults in %BUILDDIR%/doctest/output.txt.\n\tgoto end\n)\n\nif \"%1\" == \"coverage\" (\n\t%SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Testing of coverage in the sources finished, look at the ^\nresults in %BUILDDIR%/coverage/python.txt.\n\tgoto end\n)\n\nif \"%1\" == \"xml\" (\n\t%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The XML files are in %BUILDDIR%/xml.\n\tgoto end\n)\n\nif \"%1\" == \"pseudoxml\" (\n\t%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.\n\tgoto end\n)\n\nif \"%1\" == \"dummy\" (\n\t%SPHINXBUILD% -b dummy %ALLSPHINXOPTS% %BUILDDIR%/dummy\n\tif errorlevel 1 exit /b 1\n\techo.\n\techo.Build finished. Dummy builder generates no files.\n\tgoto end\n)\n\n:end\npopd\n"
  },
  {
    "path": "docs/setup.rst",
    "content": "setup module\n============\n\n.. automodule:: setup\n    :members:\n    :undoc-members:\n    :show-inheritance:\n"
  },
  {
    "path": "docs/user/available_components.rst",
    "content": ".. _components:\n\nComponents\n==========\n\nThese are the currently available FlowCraft components with a short\ndescription of their tasks. For a more detailed information, follow the\nlinks of each component.\n\n\nDownload\n--------\n\n- :doc:`components/reads_download`: Downloads reads from the SRA/ENA public\n  databases from a list of accessions.\n\n- :doc:`components/fasterq_dump`: Downloads reads from the SRA public databases\n  from a list of accessions, using ``fasterq-dump``.\n\nReads Quality Control\n--------------------\n\n- :doc:`components/check_coverage`: Estimates the coverage for each sample and\n  filters FastQ files according to a specified minimum coverage threshold.\n\n- :doc:`components/fastqc`: Runs FastQC on paired-end FastQ files.\n\n- :doc:`components/fastqc_trimmomatic`: Runs Trimmomatic on\n  paired-end FastQ files informed by the FastQC report.\n\n- :doc:`components/filter_poly`:  Runs PrinSeq on paired-end\n  FastQ files to remove low complexity sequences.\n\n- :doc:`components/integrity_coverage`: Tests the integrity\n  of the provided FastQ files, provides the option to filter FastQ files\n  based on the expected assembly coverage and provides information about\n  the maximum read length and sequence encoding.\n\n- :doc:`components/trimmomatic`: Runs Trimmomatic on paired-end FastQ files.\n\n- :doc:`components/downsample_fastq`: Subsamples fastq files up to a target coverage\n  depth.\n\n\nAssembly\n--------\n\n- :doc:`components/megahit`: Assembles metagenomic paired-end FastQ files\n  using megahit.\n\n- :doc:`components/metaspades`: Assembles metagenomic paired-end FastQ files\n  using metaSPAdes.\n\n- :doc:`components/skesa`: Assembles paired-end FastQ files using\n  skesa.\n\n- :doc:`components/spades`: Assembles paired-end FastQ files\n  using SPAdes.\n\nPost-assembly\n-------------\n\n- :doc:`components/pilon`: Corrects and filters assemblies using Pilon.\n\n- :doc:`components/process_skesa`: Processes the assembly output\n  from Skesa and performs filtering base on quality criteria of GC content\n  k-mer coverage and read length.\n\n- :doc:`components/process_spades`: Processes the assembly output\n  from Spades and performs filtering base on quality criteria of GC content\n  k-mer coverage and read length.\n\nBinning\n-------\n\n- :doc:`components/maxbin2`: An automatic tool for binning metagenomic sequences\n\nAnnotation\n----------\n\n- :doc:`components/abricate`: Performs anti-microbial gene screening using\n  abricate.\n\n- :doc:`components/card_rgi`: Performs anti-microbial resistance gene screening using\n  CARD rgi (with contigs as input).\n\n- :doc:`components/prokka`: Performs assembly annotation using prokka.\n\nDistance Estimation\n-------------------\n\n- :doc:`components/mash_dist`: Executes mash distance against a reference index\n  plasmid database and generates a `JSON` for pATLAS. This component calculates\n  pairwise distances between sequences (one from the database and the query\n  sequence). However if a different database is provided it can use mash dist\n  for other purposes.\n\n- :doc:`components/mash_screen`: Performs mash screen against a reference index\n  plasmid database and generates a JSON input file for pATLAS. This component\n  searches for containment of a given sequence in read sequencing data.\n  However if a different database is provided it can use mash screen for other\n  purposes.\n\n- :doc:`components/fast_ani`: Performs pairwise comparisons between fastas,\ngiven a multifasta as input for fastANI. It will split the multifasta into\nsingle fastas that will then be provided as a matrix. The output will be the\nall pairwise comparisons that pass the minimum of 50 aligned sequences with a\ndefault length of 200 bp.\n\n- :doc:`components/mash_sketch_fasta`: Performs mash sketch for fasta files.\n\n- :doc:`components/mash_sketch_fastq`: Performes mash sketch for fastq files.\n\nMapping\n-------\n\n- :doc:`components/assembly_mapping`: Performs a mapping\n  procedure of FastQ files into a their assembly and performs filtering\n  based on quality criteria of read coverage and genome size.\n\n- :doc:`components/bowtie`: Align short paired-end sequencing reads to long reference sequences\n\n- :doc:`components/mapping_patlas`: Performs read mapping and generates a JSON\n  input file for pATLAS.\n\n- :doc:`components/remove_host`: Performs read mapping with bowtie2\n  against the target host genome (default hg19) and removes the mapping reads\n\n- :doc:`components/retrieve_mapped`: Retrieves the mapped reads of a previous\n  bowtie2 mapping process.\n\nTaxonomic Profiling\n---------------------\n\n- :doc:`components/kraken`: Performs taxonomic identification with kraken on FastQ files\n  (minikrakenDB2017 as default database)\n\n- :doc:`components/kraken2`: Performs taxonomic identification with kraken2 on FastQ files\n  (minikraken2_v1_8GB as default database)\n\n- :doc:`components/midas_species`: Performs taxonomic identification on FastQ files at the\n  species level with midas (requires database)\n\nTyping\n------\n\n- :doc:`components/chewbbaca`: Performs a core-genome/whole-genome Multilocus\n  Sequence Typing analysis on an assembly using ChewBBACA.\n\n- :doc:`components/metamlst`: Checks the Sequence Type of metagenomic reads using\n  Multilocus Sequence Typing.\n\n- :doc:`components/mlst`: Checks the Sequence Type of an assembly using\n  Multilocus Sequence Typing.\n\n- :doc:`components/patho_typing`: *In silico* pathogenic typing from raw\n  illumina reads.\n\n- :doc:`components/seq_typing`: Determines the type of a given sample from a set\n  of reference sequences.\n\n- :doc:`components/sistr`: Serovar predictions from whole-genome sequence assemblies\n  by determination of antigen gene and cgMLST gene alleles.\n\n- :doc:`components/momps`: Multi-locus sequence typing for Legionella pneumophila\n  from assemblies and reads.\n"
  },
  {
    "path": "docs/user/basic_usage.rst",
    "content": "Basic Usage\n===========\n\nFlowCraft has currently two execution modes, ``build`` and ``inspect``, that are\nused to build and inspect the nextflow pipeline, respectively. However, a\n``report`` mode is also being developed.\n\nBuild\n-----\n\nAssembling a pipeline\n:::::::::::::::::::::\n\nPipelines are generated using the ``build`` mode of FlowCraft\nand the ``-t`` parameter to specify the :ref:`components <components>` inside quotes::\n\n    flowcraft build -t \"trimmomatic fastqc spades\" -o my_pipe.nf\n\nAll components should be written inside quotes and be space separated.\nThis command will generate a linear pipeline with three components on the\ncurrent working directory (for more features and tips on how pipelines can be\nbuilt, see the :doc:`pipeline building <pipeline_building>` section). **A linear pipeline means that\nthere are no bifurcations between components, and the input data will flow\nlinearly.**\n\nThe rationale of how the data flows across the pipeline is simple and intuitive.\nData enters a component and is processed in some way, which may result on the\ncreation of result files (stored in the ``results`` directory) and reports\nfiles (stored in the ``reports`` directory) (see `Results and reports`_ below). If that\ncomponent has an ``output_type``, it will feed the processed data into the\nnext component (or components) and this will repeated until the end of the\npipeline.\n\nIf you are interesting in checking the pipeline DAG tree, open the\n``my_pipe.html`` file (same name as the pipeline with the html extension)\nin any browser.\n\n.. image:: ../resources/fork_4.png\n   :scale: 80 %\n   :align: center\n\nThe ``integrity_coverage`` component is a dependency of ``trimmomatic``, so\nit was automatically added to the pipeline.\n\n.. important::\n    Not all pipeline configurations will work. **You always need to ensure\n    that the output type of a component matches the input type of the next\n    component**, otherwise FlowCraft will exit with an error.\n\nPipeline directory\n::::::::::::::::::\n\nIn addition to the main nextflow pipeline file (``my_pipe.nf``),\nFlowCraft will write several auxiliary files that are necessary for\nthe pipeline to run. The contents of the directory should look something like\nthis::\n\n    $ ls\n    bin                lib           my_pipe.nf       params.config     templates\n    containers.config  my_pipe.html  nextflow.config  profiles.config   resources.config  user.config\n\nYou do not have to worry about most of these files. However, the\n``*.config`` files can be modified to change several aspects of the pipeline run\n(see :doc:`pipeline_configuration` for more details). Briefly:\n\n- ``params.config``: Contains all the available parameters of the pipeline (see\n  `Parameters`_ below). These can be changed here, or provided directly on\n  run-time (e.g.: ``nextflow run --fastq value``).\n- ``resources.config``: Contains the resource directives of the pipeline processes,\n  such as cpus, allocated RAM and other nextflow process directives.\n- ``containers.config``: Specifies the container and version tag of each process\n  in the pipeline.\n- ``profiles.config``: Contains a number of predefined profiles of executor and\n  container engine.\n- ``user.config``: Empty configuration file that is not over-written if you build\n  another pipeline in the same directory. Used to set persistent configurations\n  across different pipelines.\n\nParameters\n::::::::::\n\nThe parameters of the pipeline can be viewed by running the pipeline file\nwith ``nextflow`` and using the ``--help`` option::\n\n    $ nextflow run my_pipe.nf --help\n    N E X T F L O W  ~  version 0.30.1\n    Launching `my_pipe.nf` [kickass_mcclintock] - revision: 480b3455ba\n\n    ============================================================\n                    F L O W C R A F T\n    ============================================================\n    Built using flowcraft v1.2.1.dev1\n\n\n    Usage:\n        nextflow run my_pipe.nf\n\n           --fastq                     Path expression to paired-end fastq files. (default: fastq/*_{1,2}.*) (default: 'fastq/*_{1,2}.*')\n\n           Component 'INTEGRITY_COVERAGE_1_1'\n           ----------------------------------\n           --genomeSize_1_1            Genome size estimate for the samples in Mb. It is used to estimate the coverage and other assembly parameters andchecks (default: 1)\n           --minCoverage_1_1           Minimum coverage for a sample to proceed. By default it's setto 0 to allow any coverage (default: 0)\n\n           Component 'TRIMMOMATIC_1_2'\n           ---------------------------\n           --adapters_1_2              Path to adapters files, if any. (default: 'None')\n           --trimSlidingWindow_1_2     Perform sliding window trimming, cutting once the average quality within the window falls below a threshold (default: '5:20')\n           --trimLeading_1_2           Cut bases off the start of a read, if below a threshold quality (default: 3)\n           --trimTrailing_1_2          Cut bases of the end of a read, if below a threshold quality (default: 3)\n           --trimMinLength_1_2         Drop the read if it is below a specified length  (default: 55)\n\n           Component 'FASTQC_1_3'\n           ----------------------\n           --adapters_1_3              Path to adapters files, if any. (default: 'None')\n\n           Component 'SPADES_1_4'\n           ----------------------\n           --spadesMinCoverage_1_4     The minimum number of reads to consider an edge in the de Bruijn graph during the assembly (default: 2)\n           --spadesMinKmerCoverage_1_4 Minimum contigs K-mer coverage. After assembly only keep contigs with reported k-mer coverage equal or above this value (default: 2)\n           --spadesKmers_1_4           If 'auto' the SPAdes k-mer lengths will be determined from the maximum read length of each assembly. If 'default', SPAdes will use the default k-mer lengths.  (default: 'auto')\n\nAll these parameters are specific to the components of the pipeline. However,\nthe main input parameter (or parameters) of the pipeline is always available.\n**In this case, since the pipeline started with fastq paired-end files as the\nmain input, the** ``--fastq`` **parameter is available.** If the pipeline started\nwith any other input type or with more than one input type, the appropriate\nparameters will appear (more information in the :ref:`raw input types<rawInput>` section).\n\nThe parameters are composed by their name (``adapters``) followed by the ID of\nthe process it refers to (``_1_2``). The IDs can be consulted in the DAG tree\n(See `Assembling a pipeline`_). This is done to prevent issues when duplicating\ncomponents and, as such, **all parameters will be independent between different\ncomponents**. This\nbehaviour can be changed when building the pipeline by using the\n``--merge-params`` option (See :ref:`mergeParams`).\n\n.. note::\n    The ``--merge-params`` option of the ``build`` mode will merge all parameters\n    with identical names (`e.g.:` ``--genomeSize_1_1`` and ``--genomeSize_1_5``\n    become simply ``--genomeSize``) . This is usually more appropriate and useful\n    in linear pipelines without component duplication.\n\n\nProviding/modifying parameters\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThese parameters can be provided on run-time::\n\n    nextflow run my_pipe.nf --genomeSize_1_1 5 --adapters_1_2 \"/path/to/adapters\"\n\nor edited in the ``params.config`` file::\n\n    params {\n        genomeSize_1_1 = 5\n        adapters_1_2 = \"path/to/adapters\"\n    }\n\nMost parameters in FlowCraft's components already come with sensible\ndefaults, which means that usually you'll only need to provide a small number\nof arguments. In the example above, the ``--fastq`` is the only parameter\nrequired. I have placed fastq files on the ``data`` directory::\n\n    $ ls data\n    sample_1.fastq.gz  sample_2.fastq.gz\n\nWe'll need to provide the pattern to the fastq files. This pattern is perhaps\na bit confusing at first, but it's necessary for the correct inference of the\npaired::\n\n    --fastq \"data/*_{1,2}.*\"\n\nIn this case, the pairs are separated by the \"_1.\" or \"_2.\" substring, which leads\nto the pattern ``*_{1,2}.*``. Another common nomenclature for paired fastq\nfiles is something like ``sample_R1_L001.fastq.gz``. In this case, an\nacceptable pattern would be ``*_R{1,2}_*``.\n\n.. important::\n\n    Note the quotes around the fastq path pattern. These quotes are necessary\n    to allow nextflow to resolve the pattern, otherwise your shell might try\n    to resolve it and provide the wrong input to nextflow.\n\nExecution\n---------\n\nOnce you build your pipeline with Flowcraft you have a standard nextflow pipeline\nready to run. Therefore, all you need to do is::\n\n    nextflow run my_pipe.nf --fastq \"data/*_{1,2}.*\n\nChanging executor and container engine\n::::::::::::::::::::::::::::::::::::::\n\nThe default run mode of an FlowCraft pipeline is to be executed locally\nand using the singularity container engine. In nextflow terms, this is\nequivalent to have ``executor = \"local\"`` and ``singularity.enabled = true``.\nIf you want to change these settings, you can modify the\n``nextflow.config`` file, or use one of the available profiles in the\n``profiles.config`` file. These profiles provide a combination of common\n``<executor>_<container_engine>`` that are `supported by nextflow`_. Therefore,\nif you want to run the pipeline on a cluster with SLURM and shifter, you'll\njust need to specify the `` slurm_shifter`` profile::\n\n    nextflow run my_pipe.nf --fastq \"data/*_{1,2}.*\" -profile slurm_shifter\n\nCommon executors include:\n\n- ``slurm``\n- ``sge``\n- ``lsf``\n- ``pbs``\n\nOther container engines are:\n\n- ``docker``\n- ``singularity``\n- ``shifter``\n\n.. _supported by nextflow: https://www.nextflow.io/docs/latest/executor.html\n\nDocker images\n:::::::::::::\n\nAll components of FlowCraft are executed in containers, which means that\nthe first time they are executed in a machine, **the corresponding image will have\nto be downloaded**. In the case of docker, images are pulled and stored in\n``var/lib/docker`` by default. In the case of singularity, the\n``nextflow.config`` generated by FlowCraft sets the cache dir for the\nimages at ``$HOME/.singularity_cache``. Note that when an image is downloading,\nnextflow does not display any informative message, except for singularity where you'll\nget something like::\n\n    Pulling Singularity image docker://ummidock/trimmomatic:0.36-2 [cache /home/diogosilva/.singularity_cache/ummidock-trimmomatic-0.36-2.img]\n\nSo, if a process seems to take too long to run the first time, it's probably\nbecause the image is being downloaded.\n\nResults and reports\n:::::::::::::::::::\n\nAs the pipeline runs, processes may write result and report files to the\n``results`` and ``reports`` directories, respectively. For example, the\nreports of the pipeline above, would look something like this::\n\n    reports\n    ├── coverage_1_1\n    │   └── estimated_coverage_initial.csv\n    ├── fastqc_1_3\n    │   ├── FastQC_2run_report.csv\n    │   ├── run_2\n    │   │   ├── sample_1_0_summary.txt\n    │   │   └── sample_1_1_summary.txt\n    │   ├── sample_1_1_trim_fastqc.html\n    │   └── sample_1_2_trim_fastqc.html\n    └── status\n        ├── master_fail.csv\n        ├── master_status.csv\n        └── master_warning.csv\n\nThe ``estimated_coverage_initial.csv`` file contains a very rough coverage\nestimation for each sample, the ``fastqc*`` directory contains the html\nreports and summary files of FastQC for each sample, and the ``status``\ndirectory contains a log of the status, warnings and fails of each process for\neach sample.\n\nThe actual results for each process that produces them, are stored in the\n``results`` directory::\n\n    results\n    ├── assembly\n    │   └── spades_1_4\n    │       └── sample_1_trim_spades3111.fasta\n    └── trimmomatic_1_2\n        ├── sample_1_1_trim.fastq.gz\n        └── sample_1_2_trim.fastq.gz\n\nIf you are interested in checking the actual environment where the execution\nof a particular process occurred for any given sample, you can inspected the\n``pipeline_stats.txt`` file in the root of the pipeline directory. This file\ncontains rich information about the execution of each process, including\nthe working directory::\n\n    task_id hash        process         tag         status      exit    start                   container                           cpus    duration    realtime    queue   %cpu    %mem    rss     vmem\n    5       7c/cae270   trimmomatic_1_2 sample_1    COMPLETED   0       2018-04-12 11:42:29.599 docker:ummidock/trimmomatic:0.36-2  2       1m 25s      1m 17s      -       329.3%  1.1%    1.5 GB  33.3 GB\n\nThe ``hash`` column contains the start of the current working directory of that\nprocess. In the example below, the directory would be::\n\n    work/7c/cae270*\n\nInspect\n-------\n\nFlowCraft has two options (``overview`` and ``broadcast``) for inspecting the\nprogress of a pipeline that is running locally, either in a personal computer\nor a server machine. In both cases, the progress of the pipeline will be\ncontinuously updated in real-time.\n\nIn a terminal\n:::::::::::::\n\nTo open inspect in the terminal just write the following command **on the folder\nthat the pipeline is running**::\n\n    flowcraft inspect\n\n.. image:: ../resources/flowcraft_inspect_terminal.png\n   :align: center\n\n``overview`` is the default behavior of this module, but it can also be called\nlike this::\n\n    flowcraft inspect -m overview\n\n.. note::\n    To exit the inspection just type ``q`` or ``ctrl+c``.\n\nIn a browser\n::::::::::::\n\nIt is also possible to track the pipeline progress in a browser in any\ndevice using the flowcraft web application. **To do so, the following command\nshould be run in the folder where the pipeline is running**::\n\n    flowcraft inspect -m broadcast\n\n\nThis will output an URL to the terminal that can be opened in a browser.\nThis is an example of the screen that is displayed once the url is opened:\n\n.. image:: ../resources/flowcraft_inspect_broadcast.png\n   :align: center\n\n.. important::\n    This pipeline inspection will be available for **anyone** via the provided URL,\n    which means that the URL can be shared with anyone and/or any device with\n    a browser. **However, the inspection section will only be available while\n    the** ``flowcraft inspect -m broadcast`` **command is running. Once this command\n    is cancelled, the data will be erased from the service and the URL will\n    no longer be available**.\n\nWant to know more?\n::::::::::::::::::\n\n:doc:`pipeline_inspect` is the full documentation of the ``inspect`` mode.\n\n\nReports\n-------\n\nThe reporting of a FlowCraft pipeline is saved on a JSON file that is stored\nin ``pipeline_reports/pipeline_report.json``. To visualize the reports you'll just\nneed to execute the following command in the folder where the pipeline was executed::\n\n    flowcraft report\n\nThis will output an URL to the terminal that can be opened in a browser.\nThis is an example of the screen that is displayed once the url is opened:\n\n.. image:: ../resources/flowcraft_report.png\n   :align: center\n\n**The actual layout and content of the reports will depend on the pipeline you\nbuild and it will only provide the information that is directly related to\nyour pipeline components.**\n\n.. important::\n    This pipeline report will be available for **anyone** via the provided URL,\n    which means that the URL can be shared with anyone and/or any device with\n    a browser. **However, the report section will only be available while\n    the** ``flowcraft report`` **command is running. Once this command\n    is cancelled, the data will be erased from the service and the URL will\n    no longer be available**.\n\nReal time reports\n:::::::::::::::::\n\nThe reports of any FlowCraft pipeline can be monitored in real-time using the\n``--watch`` option::\n\n    flowcraft report --watch\n\nThis will output an URL exactly as in the previous section and will render the\nsame reports page with a small addition. In the top right of the screen in the\nnavigation bar, there will be a new icon that informs the user when new\nreports are available:\n\n.. image:: ../resources/flowcraft_report_watch.png\n   :align: center\n\nLocal visualization\n:::::::::::::::::::\n\nThe FlowCraft report JSON file can also be visualized locally by drag and dropping\nit into the FlowCraft web application page, currently hosted at http://www.flowcraft.live/reports\n\nOffline visualization\n:::::::::::::::::::::\n\nThe complete FlowCraft report is also available as a standalone HTML file that\ncan be visualized offline. This HTML file, stored in\n``pipeline_reports/pipeline_report.html``, can be opened in any modern browser."
  },
  {
    "path": "docs/user/components/abricate.rst",
    "content": "abricate\n========\n\nPurpose\n-------\n\nThis component performs anti-microbial gene screening using abricate. It\nincludes the default databases plus the ``virulencefinder`` database.\n\n.. note::\n    Software page: https://github.com/tseemann/abricate\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``abricateDatabases``: Specify the databases for abricate.\n\nPublished results\n-----------------\n\n- ``results/annotation/abricate``: Stores the results of the abricate screening\n  for each sample and for each specified database.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``abricate``:\n    - ``container``: ummidock/abricate\n    - ``version``: 0.8.0-1\n- ``process_assembly_mapping``:\n    - ``container``: ummidock/abricate\n    - ``version``: 0.8.0-1\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.process_abricate`\n\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``<database>``: List of gene names\n``plotData``:\n    - ``<database>``:\n        - ``contig``: Contig ID\n        - ``seqRange``: Genomic range of the contig\n        - ``gene``: Gene name\n        - ``accession``: Accession number\n        - ``coverage``: Coverage of the match\n        - ``identity``: Identity of the match"
  },
  {
    "path": "docs/user/components/assembly_mapping.rst",
    "content": "assembly_mapping\n================\n\nPurpose\n-------\n\nThis component performs a mapping procedure of FastQ files using their assembly\nas reference. The procedure is carried out with bowtie2 and samtools and aims\nto filter the assembly based on quality criteria of read coverage\nand expected genome size.\n\n.. note::\n    - bowtie2 documentation can be found `here <http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml>`_.\n    - samtools documentation can be found `here <http://www.htslib.org/doc/samtools-1.2.html>`_.\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta`` and ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``minAssemblyCoverage``: In auto, the default minimum coverage for each\n  assembled contig is 1/3 of the assembly mean coverage or 10x, if the mean\n  coverage is below 10x.\n- ``AMaxContigs``: A warning is issues if the number of contigs is over\n  this threshold.\n- ``genomeSize``: Genome size estimate for the samples. It is used to check\n  the ratio of contig number per genome MB.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``assembly_mapping``:\n    - ``cpus``: 4\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``: ummidock/bowtie2_samtools\n    - ``version``: 1.0.0-2\n- ``process_assembly_mapping``:\n    - ``cpus``: 1\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``: ummidock/bowtie2_samtools\n    - ``version``: 1.0.0-2\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.process_assembly_mapping`\n\nReports JSON\n^^^^^^^^^^^^\n\n``plotData``:\n    - ``sparkline``: Total number of base pairs.\n``warnings``:\n    - When the number of contigs exceeds a provided threshold.\n``fail``:\n    - When the genome size is below 80% or above 150% of the expected genome size."
  },
  {
    "path": "docs/user/components/bowtie.rst",
    "content": "bowtie\n======\n\nPurpose\n-------\n\nThis component performs a mapping procedure of FastQ files with a given reference.\nThe procedure is carried out with Bowtie2.\nThe reference can a set of Bowtie2 index files or a Fasta file. In the latter, the\nnecessary index will be created with Bowtie2-build and passed through to Bowtie2.\n\n.. note::\n    - Bowtie2 documentation can be found `here <http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml>`_.\n    - Software page: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``Bam``\n\n.. note::\n    The default input parameter for Fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``reference``: Specifies the reference genome to be provided to to bowtie2-build.\n- ``index``: Specifies the reference indexes to be provided to bowtie2.\n\n.. note::\n    An ``index`` OR a ``reference`` fasta file must be provided\n\nPublished results\n-----------------\n\n- ``results/mapping/bowtie``: Stores the results of the mapping for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``bowtie_build``:\n    - ``cpus``: 1\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``: flowcraft/bowtie2_samtools\n    - ``version``: 1.0.0-1\n- ``bowtie``:\n    - ``cpus``: 4\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``:flowcraft/bowtie2_samtools\n    - ``version``: 1.0.0-1\n"
  },
  {
    "path": "docs/user/components/card_rgi.rst",
    "content": "card_rgi\n========\n\nPurpose\n-------\n\nThis component performs anti-microbial gene screening using CARD rgi.\nIt uses data from CARD database.\n\n.. note::\n    Software page: https://github.com/arpcard/rgi\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``alignmentTool``: Specifies the alignment tool. Options: DIAMOND or BLAST\n\nPublished results\n-----------------\n\n- ``results/annotation/card_rgi``: Stores the results of the screening\n  for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/card_rgi\n- ``version``: 4.0.2-0.1\n\n\n"
  },
  {
    "path": "docs/user/components/check_coverage.rst",
    "content": "check_coverage\n==============\n\nPurpose\n-------\n\nThis components estimates the coverage of a given sample based on the number\nof base pairs in the FastQ files of a sample and on the expected genome size:\n\n.. math::\n    \\frac{\\text{number of base pairs}}{(\\text{genome size} \\times 1e^{6})}\n\nIf the estimated coverage of a given sample falls bellow the provided\nminimum coverage threshold, the sample is filtered and does not proceed in the\npipeline.\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``genomeSize``: Genome size estimate for the samples. It is used to\n  estimate the coverage and other assembly parameters and\n  checks.\n- ``minCoverage``: Minimum coverage for a sample to proceed. Can be set to\n  0 to allow any coverage.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\n- ``reports/coverage``: CSV table with estimated sequencing coverage for\n  each sample.\n\nDefault directives\n------------------\n\nNone.\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.integrity_coverage`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``Coverage``: Estimated coverage.\n``fail``:\n    - When estimated coverage is below the provided threshold."
  },
  {
    "path": "docs/user/components/chewbbaca.rst",
    "content": "chewbbaca\n=========\n\nPurpose\n-------\n\nThis components runs the allele calling operation of ChewBBACA on a set\nof fasta samples to perform a cg/wgMLST analysis\n\n.. note::\n    Software page: https://github.com/B-UMMI/chewBBACA\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``chewbbacaQueue``: Specifiy a queue/partition for chewbbaca. This option\n  is only used for grid schedulers.\n- ``chewbbacaTraining``: Specify the full path to the prodigal training file\n  of the corresponding species.\n- ``schemaPath``: The path to the chewbbaca schema directory.\n- ``schemaSelectedLoci``: The path to the selection of loci in the schema\n  directory to be used. If not specified, all loci in the schema will be used.\n- ``chewbbacaJson``: If set to True, chewbbaca's allele call output will be\n  set to JSON format.\n- ``chewbbacaToPhyloviz``: If set to True, the ExtractCgMLST module of\n  chewbbaca will be executed after the allele calling.\n- ``chewbbacaProfilePercentage``: Specifies the proportion of samples that\n  must be present in a locus to save the profile.\n- ``chewbbacaBatch``: Specifies whther a chewbbaca run will be performed on\n  the complete input batch (all at the same time) or one by one.\n\nPublished results\n-----------------\n\n- ``results/chewbbaca_alleleCall``: The results of the allelecall for each\n sample.\n- ``results/chewbbaca``: The cg/wgMLST schema prepared for phyloviz.\n\nPublished reports\n-----------------\n\n None. \n\nDefault directives\n------------------\n\n- ``chewbbaca``:\n    - ``cpus``: 4\n    - ``container``: mickaelsilva/chewbbaca_py3\n    - ``version``: latest\n- ``chewbbaca_batch``:\n    - ``cpus``: 4\n    - ``container``: mickaelsilva/chewbbaca_py3\n    - ``version``: latest\n- ``chewbbacaExtractMLST``:\n    - ``container``: mickaelsilva/chewbbaca_py3\n    - ``version``: latest\n"
  },
  {
    "path": "docs/user/components/diamond.rst",
    "content": "diamond\n=======\n\nPurpose\n-------\n\nThis component performs ``blastx`` or ``blastp`` with diamond. The database\nused by diamond can be provided from the local disk or generated in the process.\nThis component uses the same output type as abricate with the same blast output\nfields.\n\n.. note::\n    Software page: https://github.com/bbuchfink/diamond\n\n\nInput/Output type\n-----------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``pathToDb``: Provide full path for the diamond database. If none is provided\n  then will try to fetch from the previous process. Default: None\n\n- ``fastaToDb``: Provide the full path for the fasta to construct a diamond\n  database. Default: None\n\n- ``blastType``: Defines the type of blast that diamond will do. Can wither be\n  blastx or blastp. Default: blastx\n\nPublished results\n-----------------\n\n- ``results/annotation/diamond*``: Stores the results of the abricate screening\n  for each sample and for each specified database.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``diamond``:\n    - ``container``: flowcraft/diamond\n    - ``version``: 0.9.22-1\n    - ``memory``: { 4.GB * task.attempt }\n    - ``cpus``: 2"
  },
  {
    "path": "docs/user/components/downsample_fastq.rst",
    "content": "downsample_fastq\n================\n\nPurpose\n-------\n\ndownsample_fastq uses seqtk to subsample fastq read data to a target coverage depth\nif the estimated coverage is higher than the provided target depth. When\nno subsample is required, it outputs the original FastQ files.\n\n.. note::\n    Software page: https://github.com/lh3/seqtk\n\nInput/Output type\n------------------\n\n- Input type: ``fastq``\n- Output type: ``fastq``\n\nParameters\n----------\n\n- ``genomeSize``: Genome size estimate for the samples. It is used to\n  estimate the coverage.\n- ``depth``: The target depth to which the reads should be subsampled.\n- ``seed``: The seed number for seqtk. By default it is 100.\n\nPublished results\n-----------------\n\n- ``results/sample_fastq``: Stores the subsampled FastQ files\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 1\n- ``memory``: 4GB\n- ``container``: flowcraft/seqtk\n- ``version``: 1.3.0-3\n\nAdvanced\n--------\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``Coverage``: Estimated coverage."
  },
  {
    "path": "docs/user/components/fast_ani.rst",
    "content": "fast_ani\n========\n\nPurpose\n-------\n\nThis component performs pairwise comparisons between fastas,\ngiven a multifasta as input for fastANI. It will split the multifasta into\nsingle fastas that will then be provided as a matrix. The output will be the\nall pairwise comparisons that pass the minimum of 50 aligned sequences with a\ndefault length of 200 bp.\n\nInput/Output type\n------------------\n\n- Input type: ``fasta``\n- Output type: ``None``\n\n\nParameters\n----------\n\n- ``fragLen``: Sets the minimum size of the fragment to be passed to\n`--fragLen` argument of fastANI.\n\n\nPublished results\n-----------------\n\n- ``results/fast_ani/``: A text file with the extension `.out`, which has all\nthe pairwise comparisons between sequences, reporting ANI.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``fastAniMatrix``:\n    - ``container``: flowcraft/fast_ani\n    - ``version``: 1.1.0-2\n    - ``cpus``: 20\n    - ``memory``: { 30.GB * task.attempt }"
  },
  {
    "path": "docs/user/components/fasterq_dump.rst",
    "content": "fasterq_dump\n============\n\nPurpose\n-------\n\nThis component downloads reads from the SRA public databases from a\nlist of accessions. This component uses ``fasterq-dump`` from\n`NCBI sra-tools <https://github.com/ncbi/sra-tools>`_. ``fasterq-dump``\nincreases the download speed in comparison from ``fastq-dump`` by\n**multi-threading** the extraction of FASTQ from SRA-accessions.\nThe reads for each accession are then emitted through\nthe main output of this component to any other component (or components) that\nreceive FastQ data.\n\nInput/Output type\n------------------\n\n- Input type: ``accessions``\n- Output type: ``fastq``\n\n.. note::\n    The default input parameter for Accessions data is ``--accessions``.\n\nParameters\n----------\n\n- ``option_file``: This options enables the *option-file* parameter of\n``fasterq-dump``, allowing parameters to be passed.\n- ``compress_fastq``: This options allows users to disable the compression of\nthe fastq files resulting from this component. The default (``true``) behavior\ncompresses the fastq files to *fastq.gz*.\n\nPublished results\n-----------------\n\n- ``reads/<accession>``: Stores the reads for each provided accession.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 1\n- ``memory``: 1GB\n- ``container``: flowcraft/sra-tools\n- ``version``: 2.9.1-1\n"
  },
  {
    "path": "docs/user/components/fastqc.rst",
    "content": "fastqc\n======\n\nPurpose\n-------\n\nThis components runs FastQC on paired-end FastQ files.\n\n.. note::\n    Software page: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``adapters``: Provide a non-default fasta file containing the adapter\n  sequences to screen overrepresented sequences against.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\n- ``reports/fastqc``: Stores the FastQC HTML reports for each sample.\n- ``reports/fastqc/run_2/``: Stores the summary text files with the category\n  results of FastQC for each sample.\n\nDefault directives\n------------------\n\n- ``cpus``: 2\n- ``memory``: 4GB\n- ``container``: ummidock/fastqc\n- ``version``: 0.11.7-1\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.fastqc_report`\n\nReports JSON\n^^^^^^^^^^^^\n\n``plotData``:\n    - ``base_sequence_quality``: Per base sequence quality data\n        - (This structure is repeated for the other entries)\n        - ``status``: Status of the category (PASS, WARN, etc)\n        - ``data``: Plot data\n    - ``sequence_quality``: Per sequence quality data\n    - ``base_gc_content``: GC content distribution\n    - ``base_n_content``: Per base N content\n    - ``sequence_length_dist``: Distribution of sequence read length\n    - ``per_base_sequence_content``: Per base sequence content\n``warnings``:\n    - List of failures or warnings for some non-sensitive FastQC categories\n``fail``:\n    - Failure message when sensitive FastQC categories fail or do not pass.\n"
  },
  {
    "path": "docs/user/components/fastqc_trimmomatic.rst",
    "content": "fastqc_trimmomatic\n==================\n\nPurpose\n-------\n\nThis component runs Trimmomatic on paired-end FastQ files but uses information\non the per-base GC content variation reported by FastQC to guide the trimming\nof the FastQ reads.\n\n.. note::\n    Software pages: FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/);\n    Trimmoatic (http://www.usadellab.org/cms/?page=trimmomatic)\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``adapters``: Provide a non-default fasta file containing the adapter\n  sequences used to screen overrepresented sequences against and to filter\n  the FastQ files.\n- ``trimSlidingWindow``: Perform sliding window trimming, cutting once the\n  average quality within the window falls below a threshold.\n- ``trimLeading``: Cut bases off the start of a read, if below a threshold\n  quality.\n- ``trimTrailing``: Cut bases of the end of a read, if below a threshold\n  quality.\n- ``trimMinLength``: Drop the read if it is below a specified length.\n\nPublished results\n-----------------\n\n- ``results/trimmomatic``: The trimmed FastQ files for each sample.\n\nPublished reports\n-----------------\n\n- ``reports/fastqc``: Stores the FastQC HTML reports for each sample and a\n  ``FastQC_trim_report.csv`` file containing the trimming values suggested\n  by the analysis of the FastQC report.\n- ``reports/fastqc/run_1/``: Stores the summary text files with the category\n  results of FastQC for each sample.\n\nDefault directives\n------------------\n\n- ``fastqc``:\n    - ``cpus``: 2\n    - ``memory``: 4GB\n    - ``container``: ummidock/fastqc\n    - ``version``: 0.11.7-1\n\n- ``trimmomatic``:\n    - ``cpus``: 2\n    - ``memory``: 4GB (dynamically increased on retry)\n    - ``container``: ummidock/trimmomatic\n    - ``version``: 0.36-2\n\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.fastqc_report`\n:mod:`flowcraft.templates.trimmomatic`\n:mod:`flowcraft.templates.trimmomatic_report`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    ``Trimmed (%)``: Percentage of trimmed nucleotides\n``plotData``:\n    ``sparkline``: Number of nucleotides after trimming\n``badReads``: Number of discarded reads\n"
  },
  {
    "path": "docs/user/components/filter_poly.rst",
    "content": "filter_poly\n===========\n\nPurpose\n-------\n\nThis component removes low complexity sequence from read data\nusing PrinSeq.\n\n.. note::\n    Software page: http://prinseq.sourceforge.net/\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ`\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``adapter``: Pattern to filter the reads. Please separate parameter values with a space\n    and separate new parameter sets with semicolon (;). Parameters are defined by two values:\n    the pattern (any combination of the letters ATCGN), and the number of repeats or percentage\n    of occurence. Default: A 50%; T 50%; N 50%\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/prinseq\n- ``version``: 0.20.4-1\n- ``memory``: 4.GB * task.attempt\n- ``cpus``: 1\n\n\n"
  },
  {
    "path": "docs/user/components/integrity_coverage.rst",
    "content": "integrity_coverage\n==================\n\nPurpose\n-------\n\nThis component is intended to test the integrity of the provided FastQ files.\nIt does so by attempting to parse uncompressed or compressed (``gz``, ``bz2``\nor ``zip``) FastQ files (paired-end or single-end). During this parse, if the\nFastQ files are not corrupt, it retrieves the following information:\n\n- **sequence encoding**: Estimates the sequence encoding based on the quality\n  scores. This information can then be passed to other components that might\n  required it.\n- **estimated coverage**: Provides a rough coverage estimation for each sample\n  based on a user-provided genome size (see `Parameters`_). This estimation\n  is essentially\n\n  .. math::\n      \\frac{\\text{number of base pairs}}{(\\text{genome size} \\times 1e^{6})}\n\n  This information is written to the ``reports`` directory (See\n  `Published reports`_)\n- **maximum read length.**: Retrieves the maximum read length for each sample.\n\n.. important::\n    If the ``minCoverage`` parameter value is set to higher than 0, this\n    component will filter samples with an estimated coverage below that\n    threshold.\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``genomeSize``: Genome size estimate for the samples. It is used to\n  estimate the coverage and other assembly parameters and\n  checks.\n- ``minCoverage``: Minimum coverage for a sample to proceed. Can be set to\n  0 to allow any coverage.\n\n.. note::\n    You can use these parameters as in the following example:\n    ``--genomeSize 3``.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\n- ``reports/coverage``: CSV table with estimated sequencing coverage for\n  each sample.\n- ``reports/corrupted``: Text file with list of corrupted samples.\n\nDefault directives\n------------------\n\nNone.\n\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.integrity_coverage`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``Raw BP``: Number of nucleotides.\n    - ``Reads``: Number of reads.\n    - ``Coverage``: Estimated coverage.\n``plotData``:\n    - ``sparkline``: Number of nucleotides.\n``warnings``:\n    - When the enconding and/or phred score cannot be inferred from FastQ files.\n``fail``:\n    - When estimated coverage is below the provided threshold."
  },
  {
    "path": "docs/user/components/kraken.rst",
    "content": "kraken\n======\n\nPurpose\n-------\n\nThis component performs Kraken to assign taxonomic labels to short DNA\nsequences, usually obtained through metagenomic studies.\n\n.. note::\n    Software page: https://ccb.jhu.edu/software/kraken/\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: None\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``krakenDB``: Specifies kraken database. Default: minikraken_20171013_4GB (in path)\n\nPublished results\n-----------------\n\n- ``results/taxonomy/kraken``: Stores the results of the screening\n  for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/kraken\n- ``version``: 1.0-0.1"
  },
  {
    "path": "docs/user/components/kraken2.rst",
    "content": "kraken2\n=======\n\nPurpose\n-------\n\nThis component performs Kraken2 to assign taxonomic labels to short DNA\nsequences, usually obtained through metagenomic studies.\n\n.. note::\n    Software page: https://ccb.jhu.edu/software/kraken2/\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: txt\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``kraken2DB``: Specifies kraken2 database. Default: minikraken2_v1_8GB (in path inside the\ndefault container)\n\nPublished results\n-----------------\n\n- ``results/taxonomy/kraken2``: Stores the results of the screening\n  for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/kraken2\n- ``version``: 2.0.7-1\n- ``cpus``: 3\n- ``memory``: 5GB (dynamically increased on retry)\n"
  },
  {
    "path": "docs/user/components/mapping_patlas.rst",
    "content": "mapping_patlas\n==============\n\nPurpose\n-------\n\nThis component performs mapping (using `bowtie2` and `samtools`) against a\nplasmid database in order to find\nplasmids contained in high throughoput sequencing data. Then, the resulting file\ncan be imported into `pATLAS <http://www.patlas.site/>`_.\n\n.. note::\n    - pATLAs documentation can be found `here <https://tiagofilipe12.gitbooks.io/patlas/content/>`_.\n    - bowtie2 documentation can be found `here <http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml>`_.\n    - samtools documentation can be found `here <http://www.htslib.org/doc/samtools-1.2.html>`_.\n\nInput/Output type\n------------------\n\n- Input type: ``fastq``\n- Output type: ``json``\n\n\nParameters\n----------\n\n- ``max_k``: Sets the k parameter for bowtie2 allowing to make multiple mappings\n  of the same read against several hits on the query sequence or sequences.\n  Default: 10949.\n\n- ``trim5``: Sets trim5 option for bowtie. This will become legacy with QC\n  integration, but it enables to trim 5' end of reads to be mapped with bowtie2.\n  Default: 0\n\n- ``lengthJson``: A dictionary of all the lengths of reference sequences.\n  Default: 'jsons/*_length.json' (from docker image).\n\n- ``refIndex``: Specifies the reference indexes to be provided to bowtie2.\n  Default: '/ngstools/data/indexes/bowtie2idx/bowtie2.idx' (from docker image).\n\n- ``samtoolsIndex``: Specifies the reference indexes to be provided to samtools.\n  Default: '/ngstools/data/indexes/fasta/samtools.fasta.fai' (from docker image).\n\n\nPublished results\n-----------------\n\n- ``results/mapping/``: A `JSON` file that can be imported to `pATLAS <http://www.patlas.site/>`_\n  with the results from mapping.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``mappingBowtie``:\n    - ``container``: flowcraft/mapping-patlas\n    - ``version``: 1.6.0-1\n- ``samtoolsView``:\n    - ``container``: flowcraft/mapping-patlas\n    - ``version``: 1.6.0-1\n- ``jsonDumpingMapping``:\n    - ``container``: flowcraft/mapping-patlas\n    - ``version``: 1.6.0-1\n"
  },
  {
    "path": "docs/user/components/mash_dist.rst",
    "content": "mash_dist\n=========\n\nPurpose\n-------\n\nThis component executes mash dist to find plasmids\nwithin high throughoput sequencing data, using as inputs fasta files\n(e.g. contigs). Then, the resulting file can\nbe imported into `pATLAS <http://www.patlas.site/>`_.\nThis component calculates pairwise distances between sequences\n(one from the database and the query sequence).\nHowever, this process can be user for other purposes, by providing a different\ndatabase than the default that is intended for plasmid searches.\n\n.. note::\n    - pATLAs documentation can be found `here <https://tiagofilipe12.gitbooks.io/patlas/content/>`_.\n    - MASH documentation can be found `here <https://mash.readthedocs.io/en/latest/>`_.\n\n\nInput/Output type\n------------------\n\n- Input type: ``fasta``\n- Output type: ``json``\n\n\nParameters\n----------\n\n- ``mash_distance``: Sets the maximum distance between two sequences to be\n  included in the output. Default: 0.1.\n\n.. note::\n    The subtraction of 1 - `mash_distance` can be used as an approximation to\n    Average Nucleotide Identity (ANI). For instance a mash distance of 0.1 well\n    correlates with ANI at 0.9 (90%).\n\n- ``pValue``: P-value cutoff for the distance estimation between two sequences\n  to be included in the output. Default: 0.05.\n\n- ``shared_hashes``: Sets a minimum percentage of hashes shared between two\n  sequences in order to include its result in the output. Default: 0.8.\n\n- ``refFile``: Specifies the reference file to be provided to mash. It can either\n  be a fasta or a .msh reference sketch generated by mash.\n  Default: '/ngstools/data/patlas.msh'. If the component ``mash_sketch_fasta``\n  is executed before this component, this parameter will be ignored and instead\n  the secondary link between the two processes will be used to feed this\n  component with the reference sketch.\n\n\nPublished results\n-----------------\n\n- ``results/mashdist/``: A `JSON` file that can be imported to `pATLAS <http://www.patlas.site/>`_\n  with the results from mash dist.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``runMashDist``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n- ``mashDistOutputJson``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n"
  },
  {
    "path": "docs/user/components/mash_screen.rst",
    "content": "mash_screen\n===========\n\nPurpose\n-------\n\nThis component performes mash screen to find plasmids\ncontained in high throughout sequencing data, using as inputs read files\n(FastQ files). Then, the resulting file can\nbe imported into `pATLAS <http://www.patlas.site/>`_.\nThis component searches for containment of a given sequence in read sequencing\ndata.\nHowever, this process can be user for other purposes, by providing a different\ndatabase than the default that is intended for plasmid searches.\n\n.. note::\n    - pATLAs documentation can be found `here <https://tiagofilipe12.gitbooks.io/patlas/content/>`_.\n    - MASH documentation can be found `here <https://mash.readthedocs.io/en/latest/>`_.\n\n\nInput/Output type\n------------------\n\n- Input type: ``fastq``\n- Output type: ``json``\n\n\nParameters\n----------\n\n- ``noWinner``: A variable that enables the use of -w option for mash screen.\n  Default: false.\n\n- ``pValue``: P-value cutoff for the distance estimation between two sequences to\n  be included in the output. Default: 0.05.\n\n- ``identity``: The percentage of identity between the reads input and the\n  reference sequence. Default: 0.9.\n\n- ``refFile``: \"Specifies the reference file to be provided to mash. It can\n  either be a fastq or a .msh reference sketch generated by mash.\n  Default: '/ngstools/data/patlas.msh'. If the component ``mash_sketch_fastq``\n  is executed before this component, this parameter will be ignored and instead\n  the secondary link between the two processes will be used to feed this\n  component with the reference sketch.\n\n\nPublished results\n-----------------\n\n- ``results/mashscreen/``: A `JSON` file that can be imported to `pATLAS <http://www.patlas.site/>`_\n  with the results from mash screen.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``mashScreen``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n- ``mashOutputJson``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n"
  },
  {
    "path": "docs/user/components/mash_sketch_fasta.rst",
    "content": "mash_sketch_fasta\n=================\n\nPurpose\n-------\n\nThis component performs mash sketch for fasta input files.\n\n.. note::\n    - MASH documentation can be found `here <https://mash.readthedocs.io/en/latest/>`_.\n\n\nInput/Output type\n------------------\n\n- Input type: ``fasta``\n- Output type: ``msh``\n\n\nParameters\n----------\n\n- ``kmerSize``: Parameter to set the kmer size for hashing. Default: 21.\n  Default: false.\n\n- ``sketchSize``: Parameter to set the number of hashes per sketch.\n  Default: 1000.\n\n\nPublished results\n-----------------\n\nNone.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``mashSketchFasta``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n"
  },
  {
    "path": "docs/user/components/mash_sketch_fastq.rst",
    "content": "mash_sketch_fastq\n=================\n\nPurpose\n-------\n\nThis component performs mash sketch for fastq input files. These sketches can\nbe used by ``mash_dist`` and ``mash_screen`` components to fetch the\nreference file for mash.\n\n.. note::\n    - MASH documentation can be found `here <https://mash.readthedocs.io/en/latest/>`_.\n\n\nInput/Output type\n------------------\n\n- Input type: ``fastq``\n- Output type: ``msh``\n\n\nParameters\n----------\n\n- ``kmerSize``: Parameter to set the kmer size for hashing. Default: 21.\n  Default: false.\n\n- ``sketchSize``: Parameter to set the number of hashes per sketch.\n  Default: 1000.\n\n- ``minKmer``: Minimum copies of each k-mer required to pass noise filter for\n  reads. Default: 1.\n\n- ``genomeSize``: Genome size (raw bases or with K/M/G/T). If specified, will\n  be used for p-value calculation instead of an estimated size from k-mer\n  content. Default: *false*, meaning that it won't be used. If you want to use\n  it pass a number to this parameter.\n\n\nPublished results\n-----------------\n\nNone.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``mashSketchFastq``:\n    - ``container``: flowcraft/mash-patlas\n    - ``version``: 1.6.0-1\n"
  },
  {
    "path": "docs/user/components/maxbin2.rst",
    "content": "maxbin2\n=======\n\nPurpose\n-------\n\nThis component is an automated binning algorithm to recover genomes from multiple metagenomic datasets\n\n.. note::\n    Software page: https://sourceforge.net/projects/maxbin2/\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``  and ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for fasta is ``--fasta``. This process also requires FastQ files.\n    If the FastQ files are input to any upstream process, those will be provided to maxbin2 automatically,\n    if not, they can be provided with the parameter ``--fastq``.\n\nParameters\n----------\n\n- ``min_contig_lenght``: Minimum contig length. Default: 1000\n\n- ``max_iteration``: Maximum Expectation-Maximization algorithm iteration number. Default: 50\n\n- ``prob_threshold``: Probability threshold for EM final classification. Default: 0.9\n\nPublished results\n-----------------\n\n- ``results/maxbin2/``: Stores the results of the binning in a folder\n  for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/maxbin2\n- ``version``: 2.2.4-1\n- ``cpus``: 4\n- ``memory``: 8.GB (dynamically increased on retry)\n\n\nTemplate\n^^^^^^^^\n\n:mod:`assemblerflow.templates.maxbin2`"
  },
  {
    "path": "docs/user/components/megahit.rst",
    "content": "megahit\n=======\n\nPurpose\n-------\n\nThis components assembles metagenomic paired-end FastQ files using the megahit assembler.\n\n.. note::\n    Software page: https://github.com/voutcn/megahit\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``megahitKmers``: If 'auto' the megahit k-mer lengths will be determined\n  from the maximum read length of each assembly. If 'default', megahit will\n  use the default k-mer lengths.\n\n- ``fastg``: When true, it converts megahit intermediate contigs into fastg.\n  Default: False\n\n\n\nPublished results\n-----------------\n\n- ``results/assembly/megahit``: Stores the fasta assemblies for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 5GB (dynamically increased on retry)\n- ``container``: cimendes/megahit\n- ``version``: v1.1.3-0.1\n- ``scratch``: true\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`assemblerflow.templates.megahit`"
  },
  {
    "path": "docs/user/components/metamlst.rst",
    "content": "metamlst\n========\n\nPurpose\n-------\n\nChecks the ST of metagenomic reads using mlst.\n\n.. note::\n    Software page: https://bitbucket.org/CibioCM/metamlst\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: None\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``metamlstDB``: Specifiy the metamlst database (full path) for MLST checking\n\n- ``metamlstDB_index``: Specifiy the Bowtie2 metamlst database index (full path) for MLST checking\n\nPublished results\n-----------------\n\n- ``results/annotation/metamlst``: Stores the results of the ST for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/metamlst\n- ``version``: 1.1-1\n- ``memory``: 4.Gb * task.attempt\n\n"
  },
  {
    "path": "docs/user/components/metaspades.rst",
    "content": "metaspades\n==========\n\nPurpose\n-------\n\nThis components assembles metagenomic paired-end FastQ files using the metaSPAdes assembler.\n\n.. note::\n    Software page: http://bioinf.spbau.ru/spades\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``metaspadesKmers``: If 'auto' the metaSPAdes k-mer lengths will be determined\n  from the maximum read length of each assembly. If 'default', metaSPAdes will\n  use the default k-mer lengths.\n\nPublished results\n-----------------\n\n- ``results/assembly/metaspades``: Stores the fasta assemblies for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 5GB (dynamically increased on retry)\n- ``container``: ummidock/spades\n- ``version``: 3.11.1-1\n- ``scratch``: true\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`assemblerflow.templates.metaspades`"
  },
  {
    "path": "docs/user/components/midas_species.rst",
    "content": "midas_species\n=============\n\nPurpose\n-------\n\nThis component performs MIDAS to assign taxonomic labels fro species to short DNA\nsequences, usually obtained through metagenomic studies.\n\n.. note::\n    Software page: https://github.com/snayfach/MIDAS\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: None\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``midasDB``: Specifies MIDAS database. Default: /MidasDB/midas_db_v1.2\n\nPublished results\n-----------------\n\n- ``results/taxonomy/midas``: Stores the results of the screening\n  for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: flowcraft/midas\n- ``version``: 1.3.2-0.1\n- ``memory``: 2.Gb*task.attempt\n- ``cpus``: 3"
  },
  {
    "path": "docs/user/components/mlst.rst",
    "content": "mlst\n====\n\nPurpose\n-------\n\nChecks the ST of an assembly using mlst.\n\n.. note::\n    Software page: https://github.com/tseemann/mlst\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``mlstSpecies``: Specifiy the expected species for MLST.\n\nPublished results\n-----------------\n\n- ``results/annotation/mlst``: Stores the results of the ST for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``container``: ummidock/mlst\n\n\nAdvanced\n--------\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``mlst``: Predicted species.\n``expectedSpecies``: Name of the expected species.\n\n``species``: Name of inferred species.\n\n"
  },
  {
    "path": "docs/user/components/momps.rst",
    "content": "momps\n========\n\nPurpose\n-------\n\nThis component performs Multi-Locus Sequence Typing (MLST) on Legionella pneumophila\nfrom reads and assemblies.\n\n.. note::\n    Software page: https://github.com/bioinfo-core-BGU/mompS\n\nInput/Output type\n------------------\n\n- Input type: ``fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``. This process\n    also requires FastQ reads provided via the ``--fastq`` parameter.\n\nParameters\n----------\n\nNone.\n\nPublished results\n-----------------\n\n- ``results/typing/momps``: Stores TSV files with the ST and allelic profiles\n  for each strain.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``momps``:\n    - ``container``: flowcraft/momps\n    - ``version``: 0.1.0-4\n\nAdvanced\n--------\n\nReports JSON\n^^^^^^^^^^^^\n\n``typing``:\n    - ``momps``: <typing result>"
  },
  {
    "path": "docs/user/components/patho_typing.rst",
    "content": "patho_typing\n==========\n\nPurpose\n-------\n\nPatho_typing is a software for *in silico* pathogenic typing\ndirectly from raw Illumina reads.\n\n.. note::\n    Software page: https://github.com/B-UMMI/patho_typing\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: None\n\nParameters\n----------\n\n- ``species``: Species name. Must be the complete species name with genus\n  and species, e.g.: 'Yersinia enterocolitica'.\n\nPublished results\n-----------------\n\n- ``results/pathotyping/<sample id>``: Stores the results of patho_typing in\n  text and tabular format.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 4GB\n- ``container``: ummidock/patho_typing\n- ``version``: 0.3.0-1\n\nAdvanced\n--------\n\nReports JSON\n^^^^^^^^^^^^\n\n``typing``:\n    - ``pathotyping``: <typing result>"
  },
  {
    "path": "docs/user/components/pilon.rst",
    "content": "pilon\n=====\n\nPurpose\n-------\n\nThis components Performs a mapping procedure of FastQ files into a their\nassembly and performs filtering based on quality criteria of read coverage\nand genome size.\n\n.. note::\n    Software page: https://github.com/broadinstitute/pilon\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta`` and ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\nNone.\n\nPublished results\n-----------------\n\n- ``results/assembly/pilon``: Stores the polished fasta assemblies for each\n  sample.\n\nPublished reports\n-----------------\n\n- ``reports/assembly/pilon``: Table with several summary statistics about the\n  assembly for each sample.\n\nDefault directives\n------------------\n\n- ``pilon``:\n    - ``cpus``: 4\n    - ``memory``: 7GB (dynamically increased on retry)\n    - ``container``: ummidock/pilon\n    - ``version``: 1.22.0-2\n- ``process_assembly_mapping``:\n    - ``cpus``: 1\n    - ``memory``: 7GB (dynamically increased on retry)\n    - ``container``: ummidock/pilon\n    - ``version``: 1.22.0-2\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.assembly_report`\n\nReports JSON\n^^^^^^^^^^^^\n``tableRow``:\n    - ``Contigs``: Number of contigs.\n    - ``Assembled BP``: Number of assembled base pairs.\n``plotData``:\n    - ``size_dist``: Distribution of contig size.\n    - ``sparkline``: Number of assembled base pairs.\n    - ``genomeSliding``:\n        - ``gcData``: Genome sliding window of GC content.\n        - ``covData``: Genome sliding window of read coverage depth.\n        - ``window``: Size of sliding window\n        - ``xbars``: Position of contigs along the genome sliding window.\n        - ``assemblyFile``: Name of the input assembly file.\n``warnings``:\n    - When the number of contigs exceeds a given threshold.\n``fail``:\n    - When the genome size is below 80% or above 150% of the expected genome size.\n\n"
  },
  {
    "path": "docs/user/components/process_skesa.rst",
    "content": "process_skesa\n==============\n\nPurpose\n-------\n\nThis components processes the assembly resulting from the Skesa software and,\noptionally, filters contigs based on user-provide parameters.\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``skesaMinKmerCoverage``: Minimum contigs K-mer coverage. After assembly\n  only keep contigs with reported k-mer coverage equal or above this value.\n- ``skesaMinContigLen``: Filter contigs for length greater or equal than\n  this value.\n- ``skesaMaxContigs``: Maximum number of contigs per 1.5 Mb of expected\n  genome size.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\n- ``reports/assembly/skesa_filter``: The filter status for each contig and\n  each sample. If any contig does not pass the filters, it reports which \n  filter type it failed and the corresponding value.\n\nDefault directives\n------------------\n\n- ``container``: ummidock/skesa\n- ``version``: 0.2.0-3\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.process_assembly`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``Contigs (<assembler>)``: Number of contigs.\n    - ``Assembled BP (<assembler>)``: Number of assembled base pairs.\n``warnings``:\n    - When the number of contigs exceeds a given threshold.\n``fail``:\n    - When the genome size is below 80% or above 150% of the expected genome size.\n"
  },
  {
    "path": "docs/user/components/process_spades.rst",
    "content": "process_spades\n==============\n\n\nPurpose\n-------\n\nThis components processes the assembly resulting from the Spades software and,\noptionally, filters contigs based on user-provide parameters.\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\n- ``spadesMinKmerCoverage``: Minimum contigs K-mer coverage. After assembly\n  only keep contigs with reported k-mer coverage equal or above this value.\n- ``spadesMinContigLen``: Filter contigs for length greater or equal than\n  this value.\n- ``spadesMaxContigs``: Maximum number of contigs per 1.5 Mb of expected\n  genome size.\n\nPublished results\n-----------------\n\nNone.\n\nPublished reports\n-----------------\n\n- ``reports/assembly/spades_filter``: The filter status for each contig and\n  each sample. If any contig does not pass the filters, it reports which\n  filter type it failed and the corresponding value.\n\nDefault directives\n------------------\n\n- ``container``: ummidock/spades\n- ``version``: 3.11.1-1\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.process_assembly`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    - ``Contigs (<assembler>)``: Number of contigs.\n    - ``Assembled BP (<assembler>)``: Number of assembled base pairs.\n``warnings``:\n    - When the number of contigs exceeds a given threshold.\n``fail``:\n    - When the genome size is below 80% or above 150% of the expected genome size.\n ``process_assembly``: Failure messages"
  },
  {
    "path": "docs/user/components/prokka.rst",
    "content": "prokka\n======\n\n\nPurpose\n-------\n\nThis component performs annotations using the annotations available in\n`prokka <https://github.com/tseemann/prokka>`_.\n\n\nInput/Output type\n-----------------\n\n- Input type: ``fasta``\n- Output type: ``None``\n\n.. note::\n    - Although the component doesn't have an output channel it writes the results into the ``publishDir``.\n\n\nParameters\n----------\n\n- ``centre``: sets the center to which the sequencing center id.\n  Default: 'UMMI'.\n\n- ``kingdom``: Selects the annotation mode between Archaea, Bacteria,\n  Mitochondria, Viruses. Default: Bacteria).\n\n- ``genus``: Allows user to select a genus name. Default: 'Genus' (same\n  as prokka). This also adds the use of the --usegenus flag to prokka.\n\n\nPublished results\n-----------------\n\n- ``results/annotation/prokka_<pid>/<sample_id>``: All the outputs from prokka\n  will be available in these directories.\n\n\nPublished reports\n-----------------\n\nNone.\n\n\nDefault directives\n------------------\n\n- ``prokka``:\n    - ``cpus``: 2\n    - ``container``: ummidock/prokka\n    - ``version``: 1.12\n"
  },
  {
    "path": "docs/user/components/reads_download.rst",
    "content": "reads_download\n==============\n\nPurpose\n-------\n\nThis component downloads reads from the SRA/ENA public databases from a\nlist of accessions. First, it tries to use `aspera connect`_ to download\nreads, if a valid aspera key is provided. Otherwise it uses curl, which is\nsubstantially slower. The reads for each accession are then emitted through\nthe main output of this component to any other component (or components) that\nreceive FastQ data.\n\n.. _aspera connect: http://asperasoft.com/download_connect/\n\nInput/Output type\n------------------\n\n- Input type: ``accessions``\n- Output type: ``fastq``\n\n.. note::\n    The default input parameter for Accessions data is ``--accessions``.\n\nParameters\n----------\n\n- ``asperaKey``: Downloads fastq accessions using Aspera Connect\n  by providing the private-key file 'asperaweb_id_dsa.openssh' normally found\n  in ~/.aspera/connect/etc/asperaweb_id_dsa.openssh after the installation.\n\nPublished results\n-----------------\n\n- ``reads/<accession>``: Stores the reads for each provided accession.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 1\n- ``memory``: 1GB\n- ``container``: flowcraft/getseqena\n- ``version``: 0.4.0-2\n"
  },
  {
    "path": "docs/user/components/remove_host.rst",
    "content": "remove_host\n===========\n\nPurpose\n-------\n\nThis component performs a mapping procedure of FastQ files using a host\ngenome as referece (default: hg19). The procedure is carried out with\nbowtie2 and samtools and aims to filter the reads that map to host genome.\n\n.. note::\n    - bowtie2 documentation can be found `here <http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml>`_.\n    - samtools documentation can be found `here <http://www.htslib.org/doc/samtools-1.2.html>`_.\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for fastq data is ``--fastq``.\n\nParameters\n----------\n\n- ``refIndex``: Specifies the reference indexes to be provided to bowtie2.\nDefault: '/index_hg19/hg19' (from docker image).\n\n\nPublished results\n-----------------\n\n- ``results/mapping/``: A `txt` file from bowtie2 with the mapping statistics.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``remove_host``:\n    - ``cpus``: 3\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``: flowcraft/remove_host\n    - ``version``: 2-0.1\n\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`assemblerflow.templates.remove_host`\n"
  },
  {
    "path": "docs/user/components/retrieve_mapped.rst",
    "content": "retrieve_mapped\n===============\n\nPurpose\n-------\n\nThis component retrieves the mapping reads of a previous bowtie mapping process.\nThe procedure is carried out with samtools and aims to retrieve the reads that map to target reference.\n\n.. note::\n    - samtools documentation can be found `here <http://www.htslib.org/doc/samtools-1.2.html>`_.\n\nInput/Output type\n------------------\n\n- Input type: ``bam``\n- Output type: ``FastQ``\n\n.. note::\n    This process has the ``bowtie2`` process as a dependency.\n\nParameters\n----------\n\nNone\n\nPublished results\n-----------------\n\n- ``results/mapping/retrieve_mapped``: Contains the resulting ``FastQ`` files.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``remove_host``:\n    - ``cpus``: 2\n    - ``memory``: 5GB (dynamically increased on retry)\n    - ``container``: flowcraft/bowtie2_samtools\n    - ``version``: 1.0.0-1\n\n"
  },
  {
    "path": "docs/user/components/seq_typing.rst",
    "content": "seq_typing\n==========\n\nPurpose\n-------\n\nSeq_typing is a software that determines the type of a given sample using a\nread mapping approach against a set of reference sequences. Sample's reads\nare mapped to the given reference sequences and, based on the length of the\nsequence covered and it's depth of coverage, seq_typing decides which reference\nsequence is more likely to be present and returns the type associated with\nsuch sequences.\n\n.. note::\n    Software page: https://github.com/B-UMMI/seq_typing\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: None\n\nParameters\n----------\n\n- ``referenceFileO``: Fasta file containing reference sequences. If more\n  than one file is passed via the 'referenceFileH parameter, a reference\n  sequence for each file will be determined.\n- ``referenceFileH``: Fasta file containing reference sequences. If more\n  than one file is passed via the 'referenceFileO parameter, a reference\n  sequence for each file will be determined.\n\nPublished results\n-----------------\n\n- ``results/seqtyping/<sample id>``: Stores the results of seq_typing in\n  text and tabular format.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 4GB\n- ``container``: ummidock/seq_typing\n- ``version``: 0.1.0-1\n\nAdvanced\n--------\n\nReports JSON\n^^^^^^^^^^^^\n\n``typing``:\n    - ``seqtyping``: <typing result>"
  },
  {
    "path": "docs/user/components/sistr.rst",
    "content": "sistr\n=====\n\nPurpose\n-------\n\nSistr (Salmonella In Silico Typing Resource) is a software for Serovar\npredictions from whole-genome sequence assemblies by determination\nof antigen gene and cgMLST gene alleles using BLAST. Mash MinHash can also be\nused for serovar prediction.\n\n.. note::\n    Software page: https://github.com/peterk87/sistr_cmd\n\nInput/Output type\n------------------\n\n- Input type: ``Fasta``\n- Output type: None\n\n.. note::\n    The default input parameter for fasta data is ``--fasta``.\n\nParameters\n----------\n\nNone\n\nPublished results\n-----------------\n\n- ``results/typing/sistr``: Stores the results of sistr in a tab file\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 4GB\n- ``container``: ummidock/sistr_cmd\n- ``version``: 1.0.2\n"
  },
  {
    "path": "docs/user/components/skesa.rst",
    "content": "skesa\n=====\n\nPurpose\n-------\n\nThis components assembles paired-end FastQ files using the Skesa assembler.\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\nNone.\n\nPublished results\n-----------------\n\n- ``results/assembly/skesa``: Stores the fasta assemblies for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 5GB (dynamically increased on retry)\n- ``container``: flowcraft/skesa\n- ``version``: 2.3.0-1\n- ``scratch``: true\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.skesa`"
  },
  {
    "path": "docs/user/components/spades.rst",
    "content": "spades\n======\n\nPurpose\n-------\n\nThis components assembles paired-end FastQ files using the Spades assembler.\n\n.. note::\n    Software page: http://bioinf.spbau.ru/spades\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``Fasta``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``spadesMinCoverage``: The minimum number of reads to consider an edge in\n  the de Bruijn graph during the assembly\n- ``spadesMinKmerCoverage``: Minimum contigs K-mer coverage. After assembly\n  only keep contigs with reported k-mer coverage equal or above this value\n- ``spadesKmers``: If 'auto' the SPAdes k-mer lengths will be determined\n  from the maximum read length of each assembly. If 'default', SPAdes will\n  use the default k-mer lengths.\n\nPublished results\n-----------------\n\n- ``results/assembly/spades``: Stores the fasta assemblies for each sample.\n\nPublished reports\n-----------------\n\nNone.\n\nDefault directives\n------------------\n\n- ``cpus``: 4\n- ``memory``: 5GB (dynamically increased on retry)\n- ``container``: ummidock/spades\n- ``version``: 3.13.0-1\n- ``scratch``: true\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.spades`\n"
  },
  {
    "path": "docs/user/components/trimmomatic.rst",
    "content": "trimmomatic\n===========\n\nPurpose\n-------\n\nThis component runs Trimmomatic on paired-end FastQ files.\n\n.. note::\n    Software page: http://www.usadellab.org/cms/?page=trimmomatic\n\nInput/Output type\n------------------\n\n- Input type: ``FastQ``\n- Output type: ``FastQ``\n\n.. note::\n    The default input parameter for FastQ data is ``--fastq``. You can change\n    the ``--fastq`` parameter default pattern (``fastq/*_{1,2}.*``) according\n    to input file names (e.g.: ``--fastq \"path/to/fastq/*R{1,2}.*\"``).\n\nParameters\n----------\n\n- ``adapters``: Provide a non-default fasta file containing the adapter\n  sequences used to filter the FastQ files.\n- ``trimSlidingWindow``: Perform sliding window trimming, cutting once the\n  average quality within the window falls below a threshold.\n- ``trimLeading``: Cut bases off the start of a read, if below a threshold\n  quality.\n- ``trimTrailing``: Cut bases of the end of a read, if below a threshold\n  quality.\n- ``trimMinLength``: Drop the read if it is below a specified length.\n\nPublished results\n-----------------\n\n- ``results/trimmomatic``: The trimmed FastQ files for each sample.\n\nPublished reports\n-----------------\n\n- ``reports/fastqc``: Stores the FastQC HTML reports for each sample.\n- ``reports/fastqc/run_2/``: Stores the summary text files with the category\n  results of FastQC for each sample.\n\nDefault directives\n------------------\n\n- ``cpus``: 2\n- ``memory``: 4GB (dynamically increased on retry)\n- ``container``: ummidock/trimmomatic\n- ``version``: 0.36-2\n\n\nAdvanced\n--------\n\nTemplate\n^^^^^^^^\n\n:mod:`flowcraft.templates.trimmomatic`\n:mod:`flowcraft.templates.trimmomatic_report`\n\nReports JSON\n^^^^^^^^^^^^\n\n``tableRow``:\n    ``Trimmed (%)``: Percentage of trimmed nucleotides\n``plotData``:\n    ``sparkline``: Number of nucleotides after trimming\n``badReads``: Number of discarded reads"
  },
  {
    "path": "docs/user/pipeline_building.rst",
    "content": "Pipeline building\n=================\n\nFlowCraft offers a few extra features when building pipelines using the\n``build`` execution mode.\n\n.. _rawInput:\n\nRaw input types\n---------------\n\nThe first component (or components) you place at the start of the pipeline\ndetermine the raw input type, and the parameter for providing input data.\nThe input type information is provided in the documentation page of each\ncomponent. For instance, if the first component is FastQC, which has an input\ntype of ``FastQ``, the parameter for providing the raw input data will be\n``--fastq``. Here are the currently supported input types and their\nrespective parameters:\n\n- ``FastQ``: ``--fastq``\n- ``Fasta``: ``--fasta``\n- ``Accessions``: ``--accessions``\n\n\n.. _mergeParams:\n\nMerge parameters\n----------------\n\nBy default, parameters in a FlowCraft pipeline are unique and independent\nbetween different components, even if the parameters have the same name and/or\nthe components are the same. This allows for the execution of the same software\nusing different parameters in a single workflow. The ``params.config`` of these\npipelines will look something like::\n\n    params {\n        /*\n        Component 'trimmomatic_1_2'\n        --------------------------\n        */\n        adapters_1_2 = 'None'\n        trimSlidingWindow_1_2 = '5:20'\n        trimLeading_1_2 = 3\n        trimTrailing_1_2 = 3\n        trimMinLength_1_2 = 55\n\n        /*\n        Component 'fastqc_1_3'\n        ---------------------\n        */\n        adapters_1_3 = 'None'\n    }\n\nNotice that the ``adapters`` parameter occurs twice and can be independently set\nin each component.\n\nIf you want to override this behaviour, FlowCraft has a ``--merge-params`` option\nthat merges all parameters with the same name in a single parameter, which is then\nequally applied to all components. So, if we generate the pipeline above\nwith this option::\n\n    flowcraft build -t \"trimmomatic fastqc\" -o pipe.nf --merge-params\n\nThen, the ``params.config`` will become::\n\n    params {\n        adapters = 'None'\n        trimSlidingWindow = '5:20'\n        trimLeading = 3\n        trimTrailing = 3\n        trimMinLength = 5\n    }\n\nForks\n-----\n\nThe output of any component in an FlowCraft pipeline can be forked into\ntwo or more components, using the following fork syntax::\n\n    trimmomatic fastqc (spades | skesa)\n\n.. image:: ../resources/fork_1.png\n   :scale: 80 %\n   :align: center\n\nIn this example, the output of ``fastqc`` will be fork into two new *lanes*,\nwhich will proceed independently from each other. In this syntax, a fork is\ntriggered by the ``(`` symbol (and the corresponding closing ``)``) and each\nlane will be separated by a ``|`` symbol. There is no limitation to the number\nof forks or lanes that a pipeline has. For instance, we could add more\ncomponents after the ``skesa`` module, including another fork::\n\n    trimmomatic fastqc (spades | skesa pilon (abricate | prokka | chewbbaca))\n\n.. image:: ../resources/fork_2.png\n   :scale: 80 %\n   :align: center\n\nIn this example, data will be forked after ``fastqc`` into two new lanes,\nprocessed by ``spades`` and ``skesa``. In the skesa lane, data will continue\nto flow into the ``pilon`` component and its output will fork into three new\nlanes.\n\nIt is also possible to start a fork at the beggining of the pipeline, which\nbasically means that the pipeline will have multiple starting points. If we\nwant to provide the raw input two multiple process, the fork syntax can start\nat the beginning of the pipeline::\n\n    (seq_typing | trimmomatic fastqc (spades | skesa))\n\n.. image:: ../resources/fork_3.png\n   :scale: 80 %\n   :align: center\n\nIn this case, since both initial components (``seq_typing`` and\n``integrity_coverage``) received fastq files as input, the data provided\nvia the ``--fastq`` parameter will be forked and provided to both processes.\n\n.. note::\n    Some components have dependencies which need to be included previously\n    in the pipeline. For instance, ``trimmomatic`` requires\n    ``integrity_coverage`` and ``pilon`` requires ``assembly_mapping``. By\n    default, FlowCraft will insert any missing dependencies right before\n    the process, which is why these components appear in the figures above.\n\n.. warning::\n    Pay special attention to the syntax of the pipeline string when using\n    forks. However, when unable to parse it, FlowCraft will do its best\n    to inform you where the parsing error occurred.\n\nDirectives\n----------\n\nSeveral directives with information on cpu usage, RAM, version, etc. can be\nspecified for each individual component when building the pipeline using the\n``={}`` notation. These\ndirectives are written to the ``resources.config`` and\n``containers.config`` files that are generated in the pipeline directory. You\ncan pass any of the directives already supported by nextflow (https://www.nextflow.io/docs/latest/process.html#directives),\nbut the most commonly used include:\n\n    - ``cpus``\n    - ``memory``\n    - ``queue``\n\nIn addition, you can also pass the ``container`` and ``version`` directives\nwhich are parsed by FlowCraft to dynamically change the container and/or\nversion tag of any process.\n\nHere is an example where we specify cpu usage, allocated memory and container\nversion in the pipeline string::\n\n    flowcraft build -t \"fastqc={'version':'0.11.5'} \\\n                            trimmomatic={'cpus':'2'} \\\n                            spades={'memory':'\\'10GB\\''}\" -o my_pipeline.nf\n\nWhen a directive is not specified, it will assume the default value of the\nnextflow directive.\n\n.. warning::\n    Take special care not to include any white space characters inside the\n    directives field. Common mistakes occur when specifying directives like\n    ``fastqc={'version': '0.11.5'}``.\n\n.. note::\n    The values specified in these directives are placed in the\n    respective config files exactly as they are. For instance,\n    ``spades={'memory':'10GB'}\"`` will appear in the config as\n    ``spades.memory = 10Gb``, which will raise an error in nextflow because\n    ``10Gb`` should be a string. Therefore, if you want a string you'll need to add\n    the ``'`` as in this example: ``spades={'memory':'\\'10GB\\''}\"``. The\n    reason why these directives are not automatically converted is to allow\n    the specification of dynamic computing resources, such as\n    ``spades={'memory':'{10.Gb*task.attempt}'}\"``\n\nExtra inputs\n------------\n\nBy default, only the first process (or processes) in a pipeline will receive\nthe raw input data provided by the user. However, the ``extra_input`` special\ndirective allows one or more processes to receive input from an additional parameter\nthat is provided by the user::\n\n    reads_download integrity_coverage={'extra_input':'local'} trimmomatic spades\n\nThe default main input of this pipeline is a text file with accession numbers\nfor the ``reads_download`` component. The ``extra_input`` creates\na new parameter, named ``local`` in this example, that allows us to provide\nadditional input data to the ``integrity_coverage`` component directly::\n\n    nextflow run pipe.nf --accessions accession_list.txt --local \"fastq/*_{1,2}.*\"\n\nWhat will happen in this pipeline, is that the fastq files provided to the\n``integrity_coverage`` component will be mixed with the ones provided by the\n``reads_download`` component. Therefore, if we provide 10 accessions and 10\nfastq samples, we'll end up with 20 samples being processed by the end of the\npieline.\n\n**It is important to note that the extra input parameter expected data\ncompliant with the input type of the process.** If files other than fastq files\nwould be provided in the pipeline above, this would result in a pipeline error.\n\nIf the ``extra_input`` directive is used on a component that has a different\ninput type from the first component in the pipeline, it is possible to use\nthe ``default`` value::\n\n    trimmomatic spades abricate={'extra_input':'default'}\n\nIn this case, the input type of the first component if fastq and the input\ntype of ``abricate`` is fasta. The ``default`` value will make available the\ndefault parameter for fasta raw input, which is ``fasta``::\n\n    nextflow run pipe.nf --fastq \"fastq/*_{1,2}.*\" --fasta \"fasta/*.fasta\"\n\nPipeline file\n-------------\n\nInstead of providing the pipeline components via the command line, you can\nspecify them in a text file::\n\n    # my_pipe.txt\n    trimmomatic fastqc spades\n\nAnd then provide the pipeline file to the ``-t`` parameter::\n\n    flowcraft build -t my_pipe.txt -o my_pipe.nf\n\nPipeline files are usually more readable, particularly when they become more\ncomplex. Consider the following example::\n\n    integrity_coverage (\n        spades={'memory':'\\'50GB\\''} |\n        skesa={'memory':'\\'40GB\\'','cpus':'4'} |\n        trimmomatic fastqc (\n            spades pilon (abricate={'extra_input':'default'} | prokka) |\n            skesa pilon (abricate | prokka)\n        )\n    )\n\nIn addition to be more readable, it is also easier to edit, re-use and share.\n\n"
  },
  {
    "path": "docs/user/pipeline_configuration.rst",
    "content": "Pipeline configuration\n======================\n\nWhen a nextflow pipeline is built with FlowCraft, a number of configuration\nfiles are automatically generated in the same directory. They are all imported\nat the end of the ``nextflow.config`` file and are sorted by their configuration\nrole. All configuration files are overwritten if you build another pipeline\nin the same directory, with the exception of the ``user.config`` file, which\nis meant to be a persistent configuration file.\n\nParameters\n----------\n\nThe ``params.config`` file includes all available paramenters for the pipeline\nand their respective default values. Most of these parameters already contain\nsensible defaults.\n\nResources\n---------\n\nThe ``resources.config`` file includes the majority of the directives provided\nfor each process, including ``cpus`` and ``memory``. You'll note that each\nprocess name has a suffix like ``_1_1``, which is a unique process identifier\ncomposed of ``<lane>_<process_number>``. This ensures that even when the same\ncomponent is specified multiple times in a pipeline, you'll still be able to\nset directives for each one individually.\n\nContainers\n----------\n\nThe ``containers.config`` file includes the container directive for each\nprocess in the pipeline. These containers are retrieved from dockerhub, if they\ndo not exist locally yet. You can change the container string to any other\nvalue, but it should point to an image that exist on dockerhub or locally.\n\nProfiles\n--------\n\nThe ``profiles.config`` file includes a set of pre-made profiles with all\npossible combinations of executors and container engines. You can add new ones\nor modify existing one.\n\nUser configutations\n-------------------\n\nThe ``user.config`` file is configuration file that is not overwritten when a\nnew pipeline is build in the same directory. It can contain any configuration\nthat is supported by nextflow and will overwrite all other configuration files."
  },
  {
    "path": "docs/user/pipeline_inspect.rst",
    "content": "Pipeline inspection\n===================\n\nFlowCraft offers an ``inspect`` mode for tracking the progress of a nextflow\npipeline either directly in a terminal (``overview``) or by broadcasting information to\nthe `flowcraft web application <https://github.com/assemblerflow/flowcraft-webapp>`_\n(``broadcast``).\n\n.. note::\n    This mode was design for nextflow pipelines generated by FlowCraft. It should\n    be possible to inspect any nextflow pipeline, provided that the requirements\n    below are met, but compatibility it's not guaranteed.\n\n**How it works:** Simply run ``flowcraft inspect -m <mode>`` in the directory\nwhere the pipeline is running. In either run mode, FlowCraft will keep running\n(until you cancel it) and continuously update the progress of a pipeline. If\nthe pipeline is interrupted or fails for some reason, FlowCraft should be able\nto correctly reset the inspection automatically when resuming its execution.\n\nRequirements for inspect\n------------------------\n\nWhile the ``inspect`` mode is running, it will parse the information written\ninto two files that are generated by nextflow:\n\n- ``.nextflow.log``: The log file that is automatically generated by nextflow.\n- ``trace file``: The trace file that is generated by nextflow when using the\n  ``-with-trace`` option. By default, it searches for the ``pipeline_stats.txt`` file,\n  but this can be changed using the ``-i`` option.\n\nTrace fields\n------------\n\nFlowCraft parses several fields of the trace file, but only a few are mandatory\nfor its execution. If the trace file does not contain any of the optional fields,\nthat information will simply not appear on the terminal or web app. Nevertheless, to take\nfull advantage of the inspect mode, the following trace fields should be present:\n\n- **Mandatory:**\n    - ``tag``: The tag of the nextflow process. Flowcraft assumes that this is a string\n      with only the sample name (e.g.: *SampleA*). While this is not strictly required,\n      providing strings with other information (e.g.: *Running bowtie for sampleA*)\n      may result in some inconsistencies in the inspection.\n    - ``task_id``: The task ID is used to skip entries that have already been parsed.\n- **Optional:**\n    - ``hash``: Used to get the work directory the process execution.\n    - ``cpus``, ``%cpu``, ``memory``, ``rss``, ``rchar`` and ``wchar``: Used for statistics\n      of computational resources.\n\n.. note::\n    Any additional fields present in the trace file are ignored.\n\nUsage\n-----\n\n::\n\n    flowcraft inspect --help\n    usage: flowcraft inspect [-h] [-i TRACE_FILE] [-r REFRESH_RATE]\n                             [-m {overview,broadcast}] [-u URL] [--pretty]\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -i TRACE_FILE         Specify the nextflow trace file.\n      -r REFRESH_RATE       Set the refresh frequency for the continuous inspect\n                            functions\n      -m {overview,broadcast}, --mode {overview,broadcast}\n                            Specify the inspection run mode.\n      -u URL, --url URL     Specify the URL to where the data should be broadcast\n      --pretty              Pretty inspection mode that removes usual reporting\n                            processes.\n\n- ``-i``: Used to specify the path to the trace file that should be parsed. By\n  default, FlowCraft will try to parse the ``pipeline_stats.txt`` file in current\n  working directory.\n- ``-r``: Sets the time interval in seconds between each parsing of the\n  relevant nextflow files. By default it is set to ``0.01``.\n- ``-m``: The inspection mode. ``overview`` is the terminal display while\n  ``broadcast`` sends the data to FlowCraft's web service.\n- ``-u``: The URL of FlowCraft's web service. By default it is already set to the\n  main service and you do not need to specify it. It is only useful when the service\n  is running on local host or in other custom instance.\n- ``--pretty``: By default the inspection shows the progress of all processes in\n  the pipeline. Using this option filters the processes to the most relevant ones\n  of FlowCraft's pipelines.\n"
  },
  {
    "path": "docs/user/pipeline_reports.rst",
    "content": "Pipeline reports\n================\n\n.. include:: reports/abricate.rst\n.. include:: reports/assembly_mapping.rst\n.. include:: reports/check_coverage.rst\n.. include:: reports/chewbbaca.rst\n.. include:: reports/dengue_typing.rst\n.. include:: reports/fastqc.rst\n.. include:: reports/fastqc_trimmomatic.rst\n.. include:: reports/integrity_coverage.rst\n.. include:: reports/mash_dist.rst\n.. include:: reports/mlst.rst\n.. include:: reports/patho_typing.rst\n.. include:: reports/pilon.rst\n.. include:: reports/process_mapping.rst\n.. include:: reports/process_newick.rst\n.. include:: reports/process_skesa.rst\n.. include:: reports/process_spades.rst\n.. include:: reports/process_viral_assembly.rst\n.. include:: reports/seq_typing.rst\n.. include:: reports/sistr.rst\n.. include:: reports/trimmomatic.rst\n.. include:: reports/true_coverage.rst\n\n"
  },
  {
    "path": "docs/user/reports/abricate.rst",
    "content": "abricate\n--------\n\nTable data\n^^^^^^^^^^\n\nAMR table:\n    - **<abricate database>**: Number of hits for a particular given database\n\n.. image:: ../resources/reports/abricate_table.png\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Sliding window AMR annotation**: Provides annotation of Abricate hits for\n  each database along the genome. This report component is only available when\n  the ``pilon`` component was used downstream of ``abricate``.\n\n.. image:: ../resources/reports/sliding_window_amr.png"
  },
  {
    "path": "docs/user/reports/assembly_mapping.rst",
    "content": "assembly_mapping\n----------------\n\nPlot data\n^^^^^^^^^\n\n- **Data loss chart**: Gives a trend of the data loss\n  (in total number of base pairs) across components that may filter this data.\n\n.. image:: ../resources/reports/sparkline.png\n\nWarnings\n^^^^^^^^\n\nAssembly table:\n    - When the number of contigs exceeds the threshold of 100 contigs per 1.5Mb.\n\nFails\n^^^^^\n\nAssembly table:\n    - When the assembly size if smaller than 80% or larger than 150% of the\n      expected genome size."
  },
  {
    "path": "docs/user/reports/check_coverage.rst",
    "content": "check_coverage\n--------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Coverage**: Estimated coverage based on the number of base pairs and the expected\n      genome size.\n\n.. image:: ../resources/reports/quality_control_table.png\n    :align: center\n\nWarnings\n^^^^^^^^\n\nQuality control table:\n    - When the enconding and phred score cannot be guessed from the FastQ file(s).\n\nFails\n^^^^^\n\nQuality control table:\n    - When the sample has lower estimated coverage than the provided coverage threshold."
  },
  {
    "path": "docs/user/reports/chewbbaca.rst",
    "content": "chewbbaca\n---------\n\nTable data\n^^^^^^^^^^\n\nChewbbaca table:\n    - Table with the summary statistics of ChewBBACA allele calling, including\n      the number of exact matches, inferred loci, loci not found, etc.\n\n.. image:: ../resources/reports/chewbbaca_table.png\n    :align: center"
  },
  {
    "path": "docs/user/reports/dengue_typing.rst",
    "content": "dengue_typing\n-------------\n\nTable data\n^^^^^^^^^^\n\nTyping table:\n    - **seqtyping**: The sequence typing result (serotypy-genotype).\n\n.. image:: ../resources/reports/typing_table_dengue.png\n    :align: center"
  },
  {
    "path": "docs/user/reports/fastqc.rst",
    "content": "fastqc\n------\n\nPlot data\n^^^^^^^^^\n\n- **Base sequence quality**: The average quality score across the read length.\n\n.. image:: ../resources/reports/fastqc_base_sequence_quality.png\n\n- **Sequence quality**: Distribution of the mean sequence quality score.\n\n.. image:: ../resources/reports/fastqc_per_base_sequence_quality.png\n\n- **Base GC content**: Distribution of the GC content of each sequence.\n\n.. image:: ../resources/reports/fastqc_base_gc_content.png\n\n- **Sequence length**: Distribution of the read sequence length.\n\n.. image:: ../resources/reports/fastqc_sequence_length.png\n\n- **Missing data**: Normalized count of missing data across the read length.\n\n.. image:: ../resources/reports/fastqc_missing_data.png\n\n\nWarnings\n^^^^^^^^\n\nThe following FastQC categories will issue a warning when they have a ``WARN`` flag:\n    - Per base sequence quality.\n    - Overrepresented sequences.\n\nThe following FastQC categories will issue a warning when do not have a ``PASS`` flag:\n    - Per base sequence content.\n\nFails\n^^^^^\n\nThe following FastQC categories will issue a fail when they have  a ``FAIL`` flag:\n    - Per base sequence quality.\n    - Overrepresented sequences.\n    - Sequence length distribution.\n    - Per sequence GC content.\n\nThe following FastQC categories will issue a fail when the do not have a ``PASS`` flag:\n    - Per base N content.\n    - Adapter content.\n"
  },
  {
    "path": "docs/user/reports/fastqc_trimmomatic.rst",
    "content": "fastqc_trimmomatic\n------------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Trimmed (%)**: Percentage of trimmed base pairs.\n\n.. image:: ../resources/reports/quality_control_table.png\n    :scale: 80 %\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Data loss chart**: Gives a trend of the data loss\n  (in total number of base pairs) across components that may filter this data.\n\n.. image:: ../resources/reports/sparkline.png\n\n"
  },
  {
    "path": "docs/user/reports/integrity_coverage.rst",
    "content": "integrity_coverage\n------------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Raw BP**: Number of raw base pairs from the FastQ file(s).\n    - **Reads**: Number of reads in the FastQ file(s)\n    - **Coverage**: Estimated coverage based on the number of base pairs and the expected\n      genome size.\n\n.. image:: ../resources/reports/quality_control_table.png\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Data loss chart**: Gives a trend of the data loss\n  (in total number of base pairs) across components that may filter this data.\n\n.. image:: ../resources/reports/sparkline.png\n\nWarnings\n^^^^^^^^\n\nQuality control table:\n    - When the enconding and phred score cannot be guessed from the FastQ file(s).\n\nFails\n^^^^^\n\nQuality control table:\n    - When the sample has lower estimated coverage than the provided coverage threshold."
  },
  {
    "path": "docs/user/reports/mash_dist.rst",
    "content": "mash_dist\n---------\n\nTable data\n^^^^^^^^^^\n\nPlasmids table:\n    - **Mash Dist**: Number of plasmid hits\n\n.. image:: ../resources/reports/mash_dist_table.png\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Sliding window Plasmid annotation**: Provides annotation of plasmid\n  hits along the genome assembly. This report component is only available\n  when the ``mash_dist`` component is used.\n\n.. image:: ../resources/reports/sliding_window_mash_dist.png"
  },
  {
    "path": "docs/user/reports/maxbin2.rst",
    "content": "maxbin2\n----\n\nTable data\n^^^^^^^^^^\n\nMetagenomic Binning (sample specific):\n    - **Bin name**: The number of bin.\n    - **Completness**: Estimation of completion of genome in bin (% of Single copy genes present)\n    - **Genome size**: Total size of the bin\n    - **GC content**: Percentage of GC in the bin\n\n.. image:: ../resources/reports/binning.png\n    :scale: 80 %\n    :align: center"
  },
  {
    "path": "docs/user/reports/mlst.rst",
    "content": "mlst\n----\n\nTable data\n^^^^^^^^^^\n\nTyping table:\n    - **MLST species**: The inferred species name.\n    - **MLST ST**: The inferred sequence type.\n\n.. image:: ../resources/reports/typing_table.png\n    :scale: 80 %\n    :align: center"
  },
  {
    "path": "docs/user/reports/patho_typing.rst",
    "content": "patho_typing\n------------\n\nTable data\n^^^^^^^^^^\n\nTyping table:\n    - **Patho_typing**: The pathotyping result.\n\n.. image:: ../resources/reports/typing_table.png\n    :scale: 80 %\n    :align: center"
  },
  {
    "path": "docs/user/reports/pilon.rst",
    "content": "pilon\n-----\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Contigs**: Number of assembled contigs.\n    - **Assembled BP**: Total number of assembled base pairs.\n\n.. image:: ../resources/reports/assembly_table_skesa.png\n    :scale: 80 %\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Contig size distribution**: Distribution of the size of each assembled contig.\n\n.. image:: ../resources/reports/contig_size_distribution.png\n\n- **Sliding window coverage and GC content**: Provides coverage and GC content\n  metrics along the genome using a sliding window approach and two synchronised\n  charts.\n\n.. image:: ../resources/reports/sliding_window_amr.png\n\nWarnings\n^^^^^^^^\n\nQuality control table:\n    - When the enconding and phred score cannot be guessed from the FastQ file(s).\n\nFails\n^^^^^\n\nQuality control table:\n    - When the sample has lower estimated coverage than the provided coverage threshold."
  },
  {
    "path": "docs/user/reports/process_mapping.rst",
    "content": "process_mapping\n---------------\n\nTable data\n^^^^^^^^^^\n\nRead mapping table:\n    - **Reads**: Number reads in the the FastQ file(s).\n    - **Unmapped**: Number of unmapped reads\n    - **Mapped 1x**: Number of reads that aligned, concordantly and discordantly, exactly 1 time\n    - **Mapped >1x**: Number of reads that aligned, concordantly or disconrdantly, more than 1 times\n    - **Overall alignment rate (%)**: Overall alignment rate\n\n.. image:: ../resources/reports/read_mapping_remove_host.png\n    :align: center\n"
  },
  {
    "path": "docs/user/reports/process_newick.rst",
    "content": "process_newick\n--------------\n\nTree data\n^^^^^^^^^^\n\nPhylogenetic reconstruction with bootstrap values for the provided tree.\n\n\n.. image:: ../resources/reports/phylogenetic_tree.png\n    :align: center"
  },
  {
    "path": "docs/user/reports/process_skesa.rst",
    "content": "process_skesa\n-------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Contigs (skesa)**: Number of assembled contigs.\n    - **Assembled BP**: Total number of assembled base pairs.\n\n.. image:: ../resources/reports/assembly_table_skesa.png\n    :scale: 80 %\n    :align: center\n\nWarnings\n^^^^^^^^\n\nAssembly table:\n    - When the number of contigs exceeds the threshold of 100 contigs per 1.5Mb.\n\nFails\n^^^^^\n\nAssembly table:\n    - When the assembly size if smaller than 80% or larger than 150% of the\n      expected genome size.\n"
  },
  {
    "path": "docs/user/reports/process_spades.rst",
    "content": "process_spades\n-------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Contigs (spades)**: Number of assembled contigs.\n    - **Assembled BP**: Total number of assembled base pairs.\n\n.. image:: ../resources/reports/assembly_table_spades.png\n    :scale: 80 %\n    :align: center\n\nWarnings\n^^^^^^^^\n\nAssembly table:\n    - When the number of contigs exceeds the threshold of 100 contigs per 1.5Mb.\n\nFails\n^^^^^\n\nAssembly table:\n    - When the assembly size if smaller than 80% or larger than 150% of the\n      expected genome size.\n"
  },
  {
    "path": "docs/user/reports/process_viral_assembly.rst",
    "content": "process_viral_assembly\n----------------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Contigs (SPAdes)**: Number of assembled contigs.\n    - **Assembled BP (SPAdes)**: Total number of assembled base pairs.\n    - **ORFs**: Number of complete ORFs in the assembly.\n    - **Contigs (MEGAHIT)**: Number of assembled contigs.\n    - **Assembled BP (MEGAHIT)**: Total number of assembled base pairs.\n\n\n.. image:: ../resources/reports/assembly_table_viral_assembly.png\n    :align: center\n\nFails\n^^^^^\n\nAssembly table:\n    - When the assembly size if smaller than 80% or larger than 150% of the\n      expected genome size.\n"
  },
  {
    "path": "docs/user/reports/seq_typing.rst",
    "content": "seq_typing\n----------\n\nTable data\n^^^^^^^^^^\n\nTyping table:\n    - **seqtyping**: The sequence typing result.\n\n.. image:: ../resources/reports/typing_table.png\n    :align: center"
  },
  {
    "path": "docs/user/reports/sistr.rst",
    "content": "sistr\n-----\n\nTable data\n^^^^^^^^^^\n\nTyping table:\n    - **sistr**: The sequence typing result.\n\n.. image:: ../resources/reports/typing_table.png\n    :align: center"
  },
  {
    "path": "docs/user/reports/trimmomatic.rst",
    "content": "trimmomatic\n-----------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **Trimmed (%)**: Percentage of trimmed base pairs.\n\n.. image:: ../resources/reports/quality_control_table.png\n    :align: center\n\nPlot data\n^^^^^^^^^\n\n- **Data loss chart**: Gives a trend of the data loss\n  (in total number of base pairs) across components that may filter this data.\n\n.. image:: ../resources/reports/sparkline.png\n\n"
  },
  {
    "path": "docs/user/reports/true_coverage.rst",
    "content": "true_coverage\n-------------\n\nTable data\n^^^^^^^^^^\n\nQuality control table:\n    - **True Coverage**: Estimated coverage based on read mapping on MLST genes.\n\n.. image:: ../resources/reports/quality_control_table.png\n    :align: center\n\nFails\n^^^^^\n\nQuality control table:\n    - When the sample has lower estimated coverage than the provided coverage threshold."
  },
  {
    "path": "flowcraft/__init__.py",
    "content": "\n__version__ = \"1.4.2\"\n__build__ = \"18062019\"\n__author__ = \"Diogo N. Silva, Tiago F. Jesus, Ines Mendes, Bruno Ribeiro-Goncalves\"\n__copyright__ = \"Diogo N. Silva\"\n__license__ = \"GPL3\"\n__maintainer__ = \"Diogo N. Silva\"\n__email__ = \"o.diogosilva@gmail.com\""
  },
  {
    "path": "flowcraft/bin/final_POST.sh",
    "content": "#!/usr/bin/env sh\n\nst=$(cat $(pwd)/.status)\n\njson=\"{'project_id':'$1','pipeline_id':'$2','process_id':'$3','run_info':'None','run_output':'None','warnings':'$(pwd)/.warning','log_file':'$(pwd)/.command.log','status':'$st','type':'output'}\"\n\n{\n    curl -H  \"Content-Type: application/json\" -L -X POST -d \\\"$json\\\" $4 > /dev/null\n} || {\n    echo Curl request failed\n}"
  },
  {
    "path": "flowcraft/bin/merge_json.py",
    "content": "#!/usr/bin/env python3\n\nimport sys\nimport json\n\ncore_file, f1, f2 = sys.argv[1:4]\n\ntry:\n    sample_id = sys.argv[4]\nexcept IndexError:\n    sample_id = None\n\n\ndef get_core_genes(core_file):\n\n    with open(core_file) as fh:\n        core_genes = [x.strip() for x in fh.readlines()[1:]\n                      if x.strip() != \"\"]\n\n    return core_genes\n\n\ndef filter_core_genes(locus_array, info_array, core_genes):\n\n    core_array = []\n\n    for gene, info in zip(*[info_array, locus_array]):\n        if gene in core_genes:\n            core_array.append(info)\n\n    return core_array\n\n\ndef assess_quality(core_array, core_genes):\n\n    # Get the total number of missing loci. The sum/map approach aggretates\n    # the sum of all possible missing loci symbols.\n    missing_loci = [\"LNF\", \"PLOT3\", \"PLOT5\", \"NIPH\", \"ALM\", \"ASM\"]\n    locus_not_found = sum(map(core_array.count, missing_loci))\n\n    perc = float(locus_not_found) / float(len(core_genes))\n\n    # Fail sample with higher than 2% missing loci\n    with open(\".status\", \"w\") as fh:\n        if perc > 0.02:\n            status = \"fail\"\n        elif perc > 0.003:\n            status = \"warning\"\n        else:\n            status = \"pass\"\n\n        fh.write(status)\n\n    return status, perc\n\n\ndef get_table_data(data_obj, sample_id=None):\n\n    header_map = dict((p, h) for p, h in enumerate(data_obj[\"header\"]))\n    table_data = []\n\n    for sample, data in data_obj.items():\n\n        if sample == \"header\":\n            continue\n\n        cur_data = []\n        for pos, d in enumerate(data):\n            cur_data.append({\n                \"header\": header_map[pos],\n                \"value\": d,\n                \"table\": \"chewbbaca\"\n            })\n\n        table_data.append({\n            \"sample\": sample_id if sample_id else sample,\n            \"data\": cur_data\n        })\n\n    return table_data\n\n\ndef main():\n    core_genes = get_core_genes(core_file)\n\n    with open(f1) as f1h, open(f2) as f2h:\n\n        j1 = json.load(f1h)\n        j2 = json.load(f2h)\n\n        sample_info = [(k, v) for k, v in j1.items() if \"header\" not in k]\n        current_array = j1[\"header\"]\n        status_info = []\n        for sample, info in sample_info:\n\n            sample_name = sample_id if sample_id else sample\n\n            core_results = filter_core_genes(info, current_array, core_genes)\n            status, perc = assess_quality(core_results, core_genes)\n            status_info.append({\n                \"sample\": sample_name,\n                \"status\": status,\n                \"lnfPercentage\": perc\n            })\n\n        table_data = get_table_data(j2, sample_name)\n        res = {\"cagao\": [j1, j2], \"status\": status_info,\n               \"tableRow\": table_data}\n\n        with open(\".report.json\", \"w\") as fh:\n            fh.write(json.dumps(res, separators=(\",\", \":\")))\n\n\nmain()\n"
  },
  {
    "path": "flowcraft/bin/metadata_POST.sh",
    "content": "#!/usr/bin/env sh\n\nset -ex\n\nprojectid=$1\npipelineid=$2\nprocessid=$3\nsample=$4\nurl=$5\nusername=$6\nuserid=$7\ntask=$8\nspecies=$9\n\nmetadata_str=\"{}\"\n\n# If a .report.json file was populated, set the json_str variable\nif [ -s .metadata.json ];\nthen\n    metadata_str=$(cat $(pwd)/.metadata.json | sed 's/ /%20/g' | sed s/\\\"/\\'/g)\nfi\n\n# If a .versions OR .report.json file was populated send the request\nif [ ! \"$metadata_str\" = \"{}\" ];\nthen\n    workdir=$(pwd)\n    json=\"{'projectid':'$projectid','pipelineId':'$pipelineid','processId':'nfMetadata','sample_name':'$sample','nfMetadata':$metadata_str,'username':'$username','userId':'$userid','workdir':'$workdir','task':'nfMetadata','processName':'nfMetadata','species':'$species','overwrite':'false'}\"\n    echo \\\"${json}\\\" > .final.json\n    {\n        cat .final.json | curl -H  \"Content-Type: application/json\" -k -L -X POST -d @- $url > /dev/null\n    } || {\n        echo Curl request failed\n    }\n\nfi\n"
  },
  {
    "path": "flowcraft/bin/parse_fasta.py",
    "content": "#!/usr/bin/env python3\n\n\nimport argparse\nfrom itertools import groupby\nimport os\n\n\ndef replace_char(text):\n    for ch in ['/', '`', '*', '{', '}', '[', ']', '(', ')', '#', '+', '-', '.', '!', '$', ':']:\n        text = text.replace(ch, \"_\")\n    return text\n\ndef getSequence(ref, fasta):\n\n    entry = (x[1] for x in groupby(fasta, lambda line: line[0] == \">\"))\n\n    for header in entry:\n        headerStr = header.__next__()[1:].strip()\n        seq = \"\".join(s.strip() for s in entry.__next__())\n\n        if ref == headerStr.replace('>',''):\n            filename = os.path.join(os.getcwd(), ref.replace('/','_').split('|')[0])\n            fasta_header = replace_char(headerStr)\n            output_file = open(filename + '.fa', \"w\")\n            output_file.write(\">\" + fasta_header + \"\\n\" + seq.upper() + \"\\n\")\n            output_file.close()\n            header_file = open(\"header.txt\", \"w\")\n            header_file.write(fasta_header)\n            header_file.close()\n\ndef main():\n\n    parser = argparse.ArgumentParser(prog='parse_fasta.py', description=\"Parse FASTA files for a specific header\", formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    parser.add_argument('--version', help='Version information', action='version', version=str('%(prog)s v0.1'))\n\n    parser_required = parser.add_argument_group('Required options')\n    parser_required.add_argument('-t', type=str, metavar='header of sequence to be retrieved',\n                             help='Uncompressed fastq file containing mate 1 reads', required=True)\n    parser_required.add_argument('-f', type=argparse.FileType('r'), metavar='/path/to/input/file.fasta',\n                             help='Fasta with the sequences', required=True)\n\n    args = parser.parse_args()\n\n    getSequence(args.t, args.f)\n\n\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "flowcraft/bin/parse_true_coverage.py",
    "content": "#!/usr/bin/env python\n\nimport sys\nimport json\n\n\ndef parse_true_coverage(report_json, fail_json=None):\n\n    with open(report_json) as fh:\n        res = json.load(fh)\n        print(\"Report JSON: {}\".format(res))\n\n    with open(\".report.json\", \"w\") as report_fh:\n\n        json_dic = {\n            \"tableRow\": [\n                {\"header\": \"True Coverage\",\n                 \"value\": res[\"mean_sample_coverage\"],\n                 \"table\": \"assembly\",\n                 \"columnBar\": True},\n            ]\n        }\n\n        if fail_json:\n            with open(fail_json) as fail_fh:\n                fail = json.load(fail_fh)\n                print(\"Fail JSON: {}\".format(fail))\n\n            json_dic[\"fail\"] = {\n                \"process\": \"true_coverage\",\n                \"value\": []\n            }\n\n            for v in fail.values():\n                json_dic[\"fail\"][\"value\"].append(v)\n\n        report_fh.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\ndef main():\n\n    args = sys.argv[1:]\n    report_json = args[0]\n    try:\n        fail_json = args[1]\n    except IndexError:\n        fail_json = None\n\n    print(\"Parsing report {} and fail {}\".format(report_json, fail_json))\n\n    parse_true_coverage(report_json, fail_json)\n\n\nmain()\n"
  },
  {
    "path": "flowcraft/bin/prepare_reports.py",
    "content": "#!/usr/bin/env python3\n\nimport sys\nimport json\nimport logging\n\nfrom os.path import dirname, abspath\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n\ndef write_json(report_json, version_json, trace_file, task_name,\n               project_name, sample_name, pid, script_id, run_name):\n\n    logging.info(\"Parsing report JSON\")\n    try:\n        with open(report_json) as fh:\n            _reports = fh.read().replace(\"'\", '\"')\n            reports = json.loads(_reports)\n            if \"task\" in reports:\n                del reports[\"task\"]\n    except json.JSONDecodeError:\n        logging.warning(\"Could not parse report JSON: {}\".format(report_json))\n        reports = {}\n\n    logging.info(\"Parsing versions JSON\")\n    try:\n        with open(version_json) as fh:\n            _version = fh.read().replace(\"'\", '\"')\n            versions = json.loads(_version)\n    except json.JSONDecodeError:\n        logging.warning(\"Could not parse versions JSON: {}\".format(\n            version_json))\n        versions = []\n\n    logging.info(\"Parsing trace file\")\n    with open(trace_file) as fh:\n        trace = fh.readlines()\n\n    report = {\n        \"pipelineId\": run_name,\n        \"processId\": pid,\n        \"processName\": task_name,\n        \"projectid\": run_name,\n        \"reportJson\": reports,\n        \"runName\": run_name,\n        \"scriptId\": script_id,\n        \"versions\": versions,\n        \"sampleName\": sample_name,\n        \"trace\": trace,\n        \"userId\": 1,\n        \"username\": \"user\",\n        \"workdir\": dirname(abspath(report_json))\n    }\n\n    logging.info(\"Dumping final report JSON file\")\n    logging.debug(\"Final JSON file: {}\".format(report))\n    with open(\"{}_{}_report.json\".format(task_name, sample_name), \"w\") \\\n            as report_fh:\n        report_fh.write(json.dumps(report, separators=(\",\", \":\")))\n\n\ndef main():\n\n    # Fetch arguments\n    args = sys.argv[1:]\n    report_json = args[0]\n    version_json = args[1]\n    trace = args[2]\n    sample_name = args[3]\n    task_name = args[4]\n    project_name = args[5]\n    pid = args[6]\n    script_id = args[7]\n    run_name = args[8]\n    logging.debug(\"Report JSON: {}\".format(report_json))\n    logging.debug(\"Version JSON: {}\".format(version_json))\n    logging.debug(\"Trace file: {}\".format(trace))\n    logging.debug(\"Sample name: {}\".format(sample_name))\n    logging.debug(\"Task name: {}\".format(task_name))\n    logging.debug(\"Project name: {}\".format(project_name))\n    logging.debug(\"Process ID: {}\".format(pid))\n    logging.debug(\"Script ID: {}\".format(script_id))\n    logging.debug(\"Run name: {}\".format(run_name))\n\n    # Write the final report JSON that compiles all information\n    write_json(report_json, version_json, trace, task_name,\n               project_name, sample_name, pid, script_id, run_name)\n\n\nmain()\n"
  },
  {
    "path": "flowcraft/bin/renamePE_samtoolsFASTQ.py",
    "content": "#!/usr/bin/env python2\n\n#TODO - change to py3\n# -*- coding: utf-8 -*-\n\n\"\"\"\nrenamePE_samtoolsFASTQ.py - Rename the fastq headers with PE terminations\nthat were not include in samtools fastq command\n<https://github.com/miguelpmachado/pythonScripts>\nCopyright (C) 2017 Miguel Machado <mpmachado@medicina.ulisboa.pt>\nLast modified: January 10, 2017\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see <http://www.gnu.org/licenses/>.\n\"\"\"\n\nimport os\nimport sys\nimport time\nimport argparse\nimport itertools\n\n\nversion = '0.1'\n\n\ndef formartFastqHeaders(in_fastq_1, in_fastq_2, outdir):\n\tout_fastq_1 = os.path.join(outdir, os.path.splitext(os.path.basename(in_fastq_1))[0] + '.headersRenamed_1.fq')\n\tout_fastq_2 = os.path.join(outdir, os.path.splitext(os.path.basename(in_fastq_2))[0] + '.headersRenamed_2.fq')\n\twriter_in_fastq_1 = open(out_fastq_1, 'wt')\n\twriter_in_fastq_2 = open(out_fastq_2, 'wt')\n\toutfiles = [out_fastq_1, out_fastq_2]\n\twith open(in_fastq_1, 'rtU') as reader_in_fastq_1, open(in_fastq_2, 'rtU') as reader_in_fastq_2:\n\t\tplus_line = True\n\t\tquality_line = True\n\t\tnumber_reads = 0\n\t\tfor in_1, in_2 in itertools.izip(reader_in_fastq_1, reader_in_fastq_2):\n\t\t\tif len(in_1) > 0:\n\t\t\t\tin_1 = in_1.splitlines()[0]\n\t\t\t\tin_2 = in_2.splitlines()[0]\n\t\t\t\tif in_1.startswith('@') and plus_line and quality_line:\n\t\t\t\t\tif in_1 != in_2:\n\t\t\t\t\t\tsys.exit('The PE fastq files are not aligned properly!')\n\t\t\t\t\tin_1 += '/1' + '\\n'\n\t\t\t\t\tin_2 += '/2' + '\\n'\n\t\t\t\t\twriter_in_fastq_1.write(in_1)\n\t\t\t\t\twriter_in_fastq_2.write(in_2)\n\t\t\t\t\tplus_line = False\n\t\t\t\t\tquality_line = False\n\t\t\t\telif in_1.startswith('+') and not plus_line:\n\t\t\t\t\tin_1 += '\\n'\n\t\t\t\t\twriter_in_fastq_1.write(in_1)\n\t\t\t\t\twriter_in_fastq_2.write(in_1)\n\t\t\t\t\tplus_line = True\n\t\t\t\telif plus_line and not quality_line:\n\t\t\t\t\tin_1 += '\\n'\n\t\t\t\t\tin_2 += '\\n'\n\t\t\t\t\twriter_in_fastq_1.write(in_1)\n\t\t\t\t\twriter_in_fastq_2.write(in_2)\n\t\t\t\t\twriter_in_fastq_1.flush()\n\t\t\t\t\twriter_in_fastq_2.flush()\n\t\t\t\t\tnumber_reads += 1\n\t\t\t\t\tquality_line = True\n\t\t\t\telse:\n\t\t\t\t\tin_1 += '\\n'\n\t\t\t\t\tin_2 += '\\n'\n\t\t\t\t\twriter_in_fastq_1.write(in_1)\n\t\t\t\t\twriter_in_fastq_2.write(in_2)\n\treturn number_reads, outfiles\n\n\ndef compressionType(file_to_test):\n\tmagic_dict = {'\\x1f\\x8b\\x08': ['gzip', 'gunzip'], '\\x42\\x5a\\x68': ['bzip2', 'bunzip2']}\n\n\tmax_len = max(len(x) for x in magic_dict)\n\n\twith open(file_to_test, 'r') as reader:\n\t\tfile_start = reader.read(max_len)\n\n\tfor magic, filetype in magic_dict.items():\n\t\tif file_start.startswith(magic):\n\t\t\treturn filetype\n\treturn None\n\n\ndef runTime(start_time):\n\tend_time = time.time()\n\ttime_taken = end_time - start_time\n\thours, rest = divmod(time_taken, 3600)\n\tminutes, seconds = divmod(rest, 60)\n\tprint 'Runtime :' + str(hours) + 'h:' + str(minutes) + 'm:' + str(round(seconds, 2)) + 's'\n\treturn time_taken\n\n\ndef main():\n\tparser = argparse.ArgumentParser(prog='renamePE_samtoolsFASTQ.py', description='Rename the fastq headers with PE terminations that were not include in samtools fastq command', formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n\tparser.add_argument('--version', help='Version information', action='version', version=str('%(prog)s v' + version))\n\n\tparser_required = parser.add_argument_group('Required options')\n\tparser_required.add_argument('-1', '--fastq_1', type=argparse.FileType('r'), metavar='/path/to/input/file_1.fq', help='Uncompressed fastq file containing mate 1 reads', required=True)\n\tparser_required.add_argument('-2', '--fastq_2', type=argparse.FileType('r'), metavar='/path/to/input/file_2.fq', help='Uncompressed fastq file containing mate 2 reads', required=True)\n\n\tparser_optional_general = parser.add_argument_group('General facultative options')\n\tparser_optional_general.add_argument('-o', '--outdir', type=str, metavar='/output/directory/', help='Path for output directory', required=False, default='.')\n\n\targs = parser.parse_args()\n\n\tprint '\\n' + 'STARTING renamePE_samtoolsFASTQ.py' + '\\n'\n\tstart_time = time.time()\n\n\tfastq_files = [os.path.abspath(args.fastq_1.name), os.path.abspath(args.fastq_2.name)]\n\n\tprint 'Check if files are compressed' + '\\n'\n\tfor fastq in fastq_files:\n\t\tif compressionType(fastq) is not None:\n\t\t\tsys.exit('Compressed fastq files found')\n\n\toutdir = os.path.abspath(args.outdir)\n\tif not os.path.isdir(outdir):\n\t\tos.makedirs(outdir)\n\n\tprint 'Renaming fastq headers' + '\\n'\n\tnumber_reads, outfiles = formartFastqHeaders(fastq_files[0], fastq_files[1], outdir)\n\n\tprint 'It was written ' + str(number_reads) + ' read pairs in ' + str(outfiles) + ' files' + '\\n'\n\n\tprint '\\n' + 'END renamePE_samtoolsFASTQ.py'\n\ttime_taken = runTime(start_time)\n\tdel time_taken\n\n\nif __name__ == \"__main__\":\n\tmain()"
  },
  {
    "path": "flowcraft/bin/report_POST.sh",
    "content": "#!/usr/bin/env sh\n\nset -ex\n\nprojectid=$1\npipelineid=$2\nprocessid=$3\nsample=$4\nurl=$5\nusername=$6\nuserid=$7\ntask=$8\nspecies=$9\noverwrite=${10}\n\njson_str=\"{}\"\nversion_str=\"[]\"\ntrace_str=\"\"\n\n# If a .report.json file was populated, set the json_str variable\nif [ -s .report.json ];\nthen\n\n    # Modification of the JSON string should be different for chewbbaca\n    # output\n    if [ $task = \"chewbbaca\" ];\n    then\n        json_str=$(cat $(pwd)/.report.json | sed 's/ //g' | sed s/\\\"/\\'/g)\n    else\n        json_str=$(cat $(pwd)/.report.json | sed 's/ /%20/g' | sed s/\\\"/\\'/g)\n    fi\nfi\n\n# If a .versions file was populated, set the version_str variable\nif [ -s .versions ];\nthen\n    version_str=$(< $(pwd)/.versions sed 's/ /%20/g' | sed s/\\\"/\\'/g)\nfi\n\nif [ -s .command.trace ];\nthen\n    trace_str=\"$(< $(pwd)/.command.trace tr \"\\n\" \";\")\"\nfi\n\n# If a .versions OR .report.json file was populated send the request\nif [ ! \"$json_str\" = \"{}\" ] || [ ! \"$version_str\" = \"[]\" ] || [ ! \"$trace_str\" = \"\" ];\nthen\n    workdir=$(pwd)\n    json=\"{'projectid':'$projectid','pipelineId':'$pipelineid','processId':'$processid','sample_name':'$sample','reportJson':$json_str,'username':'$username','userId':'$userid','workdir':'$workdir','task':'$task','processName':'$task','species':'$species','versions':$version_str,'trace':'$trace_str', 'overwrite': '$overwrite'}\"\n    echo \\\"${json}\\\" > .final.json\n    {\n        cat .final.json | curl -H  \"Content-Type: application/json\" -k -L -X POST -d @- $url > /dev/null\n    } || {\n        echo Curl request failed\n    }\n\nfi\n"
  },
  {
    "path": "flowcraft/bin/set_dotfiles.sh",
    "content": "#!/usr/bin/env bash\n\ntouch .status .warning .fail .report.json .versions"
  },
  {
    "path": "flowcraft/bin/startup_POST.sh",
    "content": "#!/usr/bin/env bash\n\njson=\"{'project_id':'$1','pipeline_id':'$2','process_id':'$3','run_property':'log_file,status','run_property_value':'$(pwd)/.command.log,running','type':'output'}\"\n\n{\n    curl -H  \"Content-Type: application/json\" -L -X PUT -d \\\"$json\\\" $4 > /dev/null\n} || {\n    echo Curl request failed\n}"
  },
  {
    "path": "flowcraft/flowcraft.py",
    "content": "#!/usr/bin/env python3\n\nimport os\nimport sys\nimport shutil\nimport logging\nimport argparse\nimport logging.config\n\nfrom distutils.dir_util import copy_tree\n\nfrom os.path import join, dirname\n\ntry:\n    from __init__ import __version__, __build__\n    from generator.engine import NextflowGenerator\n    from generator.inspect import NextflowInspector\n    from generator.report import FlowcraftReport\n    from generator.process_collector import collect_process_map\n    from generator.recipe import brew_innuendo, brew_recipe, list_recipes\n    from generator.pipeline_parser import parse_pipeline, SanityError\n    from generator.process_details import proc_collector, colored_print\n    import generator.error_handling as eh\nexcept ImportError as e:\n    from flowcraft import __version__, __build__\n    from flowcraft.generator.engine import NextflowGenerator\n    from flowcraft.generator.inspect import NextflowInspector\n    from flowcraft.generator.report import FlowcraftReport\n    from flowcraft.generator.recipe import brew_innuendo, \\\n        brew_recipe, list_recipes\n    from flowcraft.generator.pipeline_parser import parse_pipeline, \\\n        SanityError\n    from flowcraft.generator.process_details import proc_collector, \\\n        colored_print\n    import flowcraft.generator.error_handling as eh\n    from flowcraft.generator.process_collector import collect_process_map\n\nlogger = logging.getLogger(\"main\")\n\n\ndef get_args(args=None):\n\n    parser = argparse.ArgumentParser(\n        description=\"A Nextflow pipeline generator\")\n\n    subparsers = parser.add_subparsers(help=\"Select which mode to run\",\n                                       dest=\"main_op\")\n\n    # BUILD MODE\n    build_parser = subparsers.add_parser(\"build\",\n                                         help=\"Build a nextflow pipeline\")\n\n    group_lists = build_parser.add_mutually_exclusive_group()\n\n    build_parser.add_argument(\n        \"-t\", \"--tasks\", type=str, dest=\"tasks\",\n        help=\"Space separated tasks of the pipeline\")\n    build_parser.add_argument(\n        \"-r\", \"--recipe\", dest=\"recipe\",\n        help=\"Use one of the available recipes\")\n    build_parser.add_argument(\n        \"-o\", dest=\"output_nf\", help=\"Name of the pipeline file\")\n    build_parser.add_argument(\n        \"-n\", dest=\"pipeline_name\", default=\"flowcraft\",\n        help=\"Provide a name for your pipeline.\")\n    build_parser.add_argument(\n        \"--merge-params\", dest=\"merge_params\", action=\"store_true\",\n        help=\"Merges identical parameters from multiple components into the \"\n             \"same one. Otherwise, the parameters will be separated and unique\"\n             \" to each component.\")\n    build_parser.add_argument(\n        \"--pipeline-only\", dest=\"pipeline_only\", action=\"store_true\",\n        help=\"Write only the pipeline files and not the templates, bin, and\"\n             \" lib folders.\")\n    build_parser.add_argument(\n        \"-nd\", \"--no-dependecy\", dest=\"no_dep\", action=\"store_false\",\n        help=\"Do not automatically add dependencies to the pipeline.\")\n    build_parser.add_argument(\n        \"-c\", \"--check-pipeline\", dest=\"check_only\", action=\"store_const\",\n        const=True, help=\"Check only the validity of the pipeline \"\n                         \"string and exit.\")\n    group_lists.add_argument(\n        \"-L\", \"--component-list\", action=\"store_const\", dest=\"detailed_list\",\n        const=True, help=\"Print a detailed description for all the \"\n                         \"currently available processes.\")\n    group_lists.add_argument(\n        \"-l\", \"--component-list-short\", action=\"store_const\", dest=\"short_list\",\n        const=True, help=\"Print a short list of the currently \"\n                         \"available processes.\")\n    group_lists.add_argument(\n        \"--recipe-list\", dest=\"recipe_list\", action=\"store_const\", const=True,\n        help=\"Print a short list of the currently available recipes.\"\n    )\n    group_lists.add_argument(\n        \"--recipe-list-short\", dest=\"recipe_list_short\", action=\"store_const\",\n        const=True, help=\"Print a condensed list of the currently available \"\n                         \"recipes\"\n    )\n    build_parser.add_argument(\n        \"-cr\", \"--check-recipe\", dest=\"check_recipe\",\n        action=\"store_const\", const=True,\n        help=\"Check tasks that the recipe contain and \"\n             \"their flow. This option might be useful \"\n             \"if a user wants to change some components \"\n             \"of a given recipe, by using the -t option.\")\n    build_parser.add_argument(\n        \"--export-params\", dest=\"export_params\", action=\"store_const\",\n        const=True, help=\"Only export the parameters for the provided \"\n                         \"components (via -t option) in JSON format to stdout. \"\n                         \"No pipeline will be generated with this option.\"\n    )\n    build_parser.add_argument(\n        \"--export-directives\", dest=\"export_directives\", action=\"store_const\",\n        const=True, help=\"Only export the directives for the provided \"\n                         \"components (via -t option) in JSON format to stdout. \"\n                         \"No pipeline will be generated with this option.\"\n    )\n    build_parser.add_argument(\n        \"-ft\", \"--fetch-tags\", dest=\"fetch_docker_tags\",\n        action=\"store_const\", const=True, help=\"Allows to fetch all docker tags\"\n                                               \" for the components listed with\"\n                                               \" the -t flag.\"\n    )\n\n    # GENERAL OPTIONS\n    parser.add_argument(\n        \"--debug\", dest=\"debug\", action=\"store_const\", const=True,\n        help=\"Set log to debug mode\")\n    parser.add_argument(\n        \"-v\", \"--version\", dest=\"version\", action=\"store_const\", const=True,\n        help=\"Show version and exit.\")\n\n    # INSPECT MODE\n    inspect_parser = subparsers.add_parser(\"inspect\",\n                                           help=\"Inspect the progress of a \"\n                                                \"pipeline execution\")\n    inspect_parser.add_argument(\n        \"-i\", dest=\"trace_file\", default=\"pipeline_stats.txt\",\n        help=\"Specify the nextflow trace file.\"\n    )\n    inspect_parser.add_argument(\n        \"-r\", dest=\"refresh_rate\", default=0.02,\n        help=\"Set the refresh frequency for the continuous inspect functions\"\n    )\n    inspect_parser.add_argument(\n        \"-m\", \"--mode\", dest=\"mode\", default=\"overview\",\n        choices=[\"overview\", \"broadcast\"],\n        help=\"Specify the inspection run mode.\"\n    )\n    inspect_parser.add_argument(\n        \"-u\", \"--url\", dest=\"url\", default=\"http://www.flowcraft.live:80/\",\n        help=\"Specify the URL to where the data should be broadcast\"\n    )\n    inspect_parser.add_argument(\n        \"--pretty\", dest=\"pretty\", action=\"store_const\", const=True,\n        help=\"Pretty inspection mode that removes usual reporting processes.\"\n    )\n\n    # REPORT MODE\n    reports_parser = subparsers.add_parser(\"report\",\n                                           help=\"Broadcast the report of \"\n                                                \"a pipeline\")\n    reports_parser.add_argument(\n        \"-i\", dest=\"report_file\",\n        default=\"pipeline_report/pipeline_report.json\",\n        help=\"Specify the path to the pipeline report JSON file.\"\n    )\n    reports_parser.add_argument(\n        \"-u\", \"--url\", dest=\"url\", default=\"http://www.flowcraft.live:80/\",\n        help=\"Specify the URL to where the data should be broadcast\"\n    )\n    reports_parser.add_argument(\n        \"--trace-file\", dest=\"trace_file\", default=\"pipeline_stats.txt\",\n        help=\"Specify the nextflow trace file. Only applicable in combination \"\n             \"with --watch option.\"\n    )\n    reports_parser.add_argument(\n        \"--log-file\", dest=\"log_file\", default=\".nextflow.log\",\n        help=\"Specify the nextflow log file. Only applicable in combination \"\n             \"with --watch option.\"\n    )\n    reports_parser.add_argument(\n        \"-w\", \"--watch\", dest=\"watch\",  action=\"store_const\", const=True,\n        help=\"Run the report in watch mode. This option will track the \"\n             \"generation of reports during the execution of the pipeline, \"\n             \"allowing for the visualization of the reports in real-time\"\n    )\n\n    if len(sys.argv) == 1:\n        parser.print_help()\n        sys.exit(1)\n\n    return parser.parse_args(args)\n\n\ndef validate_build_arguments(args):\n\n    # Skip all checks when listing the processes\n    if args.detailed_list or args.short_list:\n        return\n\n    # Skip all checks when exporting parameters AND providing at least one\n    # component\n    if args.export_params or args.export_directives or args.fetch_docker_tags:\n        # Check if components provided\n        if not args.tasks:\n            logger.error(colored_print(\n                \"At least one component needs to be provided via the -t option\"\n                \" when exporting parameters in JSON format\"\n            ))\n            sys.exit(1)\n        return\n\n    # When none of the main run options is specified\n    if not args.tasks and not args.recipe and not args.check_only \\\n            and not args.detailed_list and not args.short_list:\n        logger.error(colored_print(\n            \"At least one of these options is required: -t, -r, -c, \"\n            \"-l, -L\", \"red_bold\"))\n        sys.exit(1)\n\n    # When the build mode is active via tasks or recipe, but no output file\n    # option has been provided\n    if (args.tasks or args.recipe) and not args.check_recipe \\\n            and not args.output_nf:\n        logger.error(colored_print(\n            \"Please provide the path and name of the pipeline file using the\"\n            \" -o option.\", \"red_bold\"))\n        sys.exit(1)\n\n    if args.output_nf:\n        if not os.path.basename(args.output_nf):\n            logger.error(colored_print(\n                \"Output pipeline path '{}' missing a name (only the directory \"\n                \"path was provided)\".format(args.output_nf), \"red_bold\"))\n            sys.exit(1)\n\n        parsed_output_nf = (args.output_nf if args.output_nf.endswith(\".nf\")\n                            else \"{}.nf\".format(args.output_nf.strip()))\n        opath = parsed_output_nf\n        if os.path.dirname(opath):\n            parent_dir = os.path.dirname(opath)\n            if not os.path.exists(parent_dir):\n                logger.error(colored_print(\n                    \"The provided directory '{}' does not exist.\".format(\n                        parent_dir), \"red_bold\"))\n                sys.exit(1)\n\n        return parsed_output_nf\n\n\ndef copy_project(path):\n    \"\"\"\n\n    Parameters\n    ----------\n    path\n\n    Returns\n    -------\n\n    \"\"\"\n\n    # Get nextflow repo directory\n    repo_dir = dirname(os.path.abspath(__file__))\n\n    # Get target directory\n    target_dir = dirname(path)\n\n    # Copy templates\n    copy_tree(join(repo_dir, \"templates\"), join(target_dir, \"templates\"))\n\n    # Copy Helper scripts\n    copy_tree(join(repo_dir, \"lib\"), join(target_dir, \"lib\"))\n\n    # Copy resources dir\n    copy_tree(join(repo_dir, \"resources\"), join(target_dir, \"resources\"))\n\n    # Copy bin scripts\n    copy_tree(join(repo_dir, \"bin\"), join(target_dir, \"bin\"))\n\n    # Copy static profiles file\n    shutil.copy(join(repo_dir, \"profiles.config\"),\n                join(target_dir, \"profiles.config\"))\n\n\ndef build(args):\n\n    # Disable standard logging for stdout when the following modes are\n    #  executed:\n    if args.export_params or args.export_directives or args.fetch_docker_tags:\n        logger.setLevel(logging.ERROR)\n\n    if args.recipe_list_short:\n        list_recipes()\n\n    if args.recipe_list:\n        list_recipes(full=True)\n\n    welcome = [\n        \"========= F L O W C R A F T =========\",\n        \"Build mode\\n\"\n        \"version: {}\".format(__version__),\n        \"build: {}\".format(__build__),\n        \"=====================================\"\n    ]\n\n    parsed_output_nf = validate_build_arguments(args)\n\n    logger.info(colored_print(\"\\n\".join(welcome), \"green_bold\"))\n\n    # If a recipe is specified, build pipeline based on the\n    # appropriate recipe\n    if args.recipe:\n        if args.recipe == \"innuendo\":\n            pipeline_string = brew_innuendo(args)\n        else:\n            # pipeline_string = available_recipes[args.recipe]\n            pipeline_string = brew_recipe(args.recipe)\n            if args.tasks:\n                logger.warning(colored_print(\n                    \"-t parameter will be ignored for recipe: {}\\n\".format(\n                        args.recipe), \"yellow_bold\")\n                )\n\n        if args.check_recipe:\n            logger.info(colored_print(\"Pipeline string for recipe: {}\"\n                                      .format(args.recipe), \"purple_bold\"))\n            logger.info(pipeline_string)\n            sys.exit(0)\n    else:\n        pipeline_string = args.tasks\n\n    process_map = collect_process_map()\n\n    # used for lists print\n    proc_collector(process_map, args, pipeline_string)\n\n    try:\n        logger.info(colored_print(\"Checking pipeline for errors...\"))\n        pipeline_list = parse_pipeline(pipeline_string)\n    except SanityError as e:\n        logger.error(colored_print(e.value, \"red_bold\"))\n        sys.exit(1)\n    logger.debug(\"Pipeline successfully parsed: {}\".format(pipeline_list))\n\n    # Exit if only the pipeline parser needs to be checked\n    if args.check_only:\n        sys.exit()\n\n    nfg = NextflowGenerator(process_connections=pipeline_list,\n                            nextflow_file=parsed_output_nf,\n                            process_map=process_map,\n                            pipeline_name=args.pipeline_name,\n                            auto_dependency=args.no_dep,\n                            merge_params=args.merge_params,\n                            export_params=args.export_params)\n\n    logger.info(colored_print(\"Building your awesome pipeline...\"))\n\n    if args.export_params:\n        nfg.export_params()\n        sys.exit(0)\n    elif args.export_directives:\n        nfg.export_directives()\n        sys.exit(0)\n    elif args.fetch_docker_tags:\n        nfg.fetch_docker_tags()\n        sys.exit(0)\n    else:\n        # building the actual pipeline nf file\n        nfg.build()\n\n    # copy template to cwd, to allow for immediate execution\n    if not args.pipeline_only:\n        copy_project(parsed_output_nf)\n\n    logger.info(colored_print(\"DONE!\", \"green_bold\"))\n\n\ndef inspect(args):\n\n    try:\n        nf_inspect = NextflowInspector(args.trace_file, args.refresh_rate,\n                                       args.pretty, args.url)\n        if args.mode == \"overview\":\n            nf_inspect.display_overview()\n\n        if args.mode == \"broadcast\":\n            nf_inspect.broadcast_status()\n\n    except eh.InspectionError as ie:\n        logger.error(colored_print(ie.value, \"red_bold\"))\n        sys.exit(1)\n\n    except eh.LogError as le:\n        logger.error(colored_print(le.value, \"red_bold\"))\n        sys.exit(1)\n\n\n\ndef report(args):\n\n    try:\n        fc_report = FlowcraftReport(\n            report_file=args.report_file,\n            trace_file=args.trace_file,\n            log_file=args.log_file,\n            watch=args.watch,\n            ip_addr=args.url)\n\n        fc_report.broadcast_report()\n\n    except eh.ReportError as re:\n        logger.error(colored_print(re.value, \"red_bold\"))\n        sys.exit(1)\n\n    except eh.LogError as le:\n        logger.error(colored_print(le.value, \"red_bold\"))\n        sys.exit(1)\n\n\ndef main():\n\n    args = get_args()\n\n    if args.version:\n        print(__version__)\n\n    if args.debug:\n        logger.setLevel(logging.DEBUG)\n\n        # create formatter\n        formatter = logging.Formatter(\n            '%(asctime)s - %(name)s - %(levelname)s - %(message)s')\n\n    else:\n        logger.setLevel(logging.INFO)\n\n        # create special formatter for info logs\n        formatter = logging.Formatter('%(message)s')\n\n    # create console handler and set level to debug\n    ch = logging.StreamHandler(sys.stdout)\n    ch.setLevel(logging.DEBUG)\n\n    # add formatter to ch\n    ch.setFormatter(formatter)\n    logger.addHandler(ch)\n\n    if args.main_op == \"build\":\n        build(args)\n\n    if args.main_op == \"inspect\":\n        inspect(args)\n\n    if args.main_op == \"report\":\n        report(args)\n\n\nif __name__ == '__main__':\n\n    main()\n"
  },
  {
    "path": "flowcraft/generator/__init__.py",
    "content": "\"\"\"\nPlaceholder for Process creation docs\n\"\"\""
  },
  {
    "path": "flowcraft/generator/components/__init__.py",
    "content": ""
  },
  {
    "path": "flowcraft/generator/components/alignment.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Mafft(Process):\n    \"\"\"mafft to align sequences\n\n            This process is set with:\n\n                - ``input_type``: fasta\n                - ``output_type``: align\n                - ``ptype``: sequence alignment\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"align\"\n\n        self.params = {\n        }\n\n        self.link_end.append({\"link\": \"_ref_seqTyping\", \"alias\": \"_ref_seqTyping\"})\n\n\n        self.directives = {\n            \"mafft\": {\n                \"container\": \"flowcraft/mafft\",\n                \"version\": \"7.402-1\",\n                \"cpus\": 4,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"mafft\"\n        ]\n\n\nclass ProgressiveMauve(Process):\n    \"\"\"Mauve to align sequences\n\n            This process is set with:\n\n                - ``input_type``: fasta\n                - ``output_type``: align\n                - ``ptype``: sequence alignment\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"align\"\n\n        self.params = {\n        }\n\n        self.directives = {\n            \"progressive_mauve\": {\n                \"container\": \"flowcraft/mauve\",\n                \"version\": \"2015.02.13-1\",\n                \"cpus\": 4,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"progressive_mauve\"\n        ]"
  },
  {
    "path": "flowcraft/generator/components/annotation.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Abricate(Process):\n    \"\"\"Abricate mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: None\n        - ``ptype``: post_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``MAIN_assembly`` (alias: ``MAIN_assembly``): Receives the last\n        assembly.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.ignore_type = True\n\n        self.status_channels = [\"STATUS_abricate\", \"STATUS_process_abricate\"]\n\n        self.params = {\n            \"abricateDatabases\": {\n                \"default\": '[\"resfinder\", \"card\", \"vfdb\", \"plasmidfinder\", '\n                           '\"virulencefinder\", \"bacmet\"]',\n                \"description\": \"Specify the databases for abricate.\"\n            },\n            \"abricateDataDir\": {\n                \"default\": 'null',\n                \"description\": \"Specify the full path location of the database \"\n                               \"folders.\"\n            },\n            \"abricateMinId\": {\n                \"default\": '75',\n                \"description\": \"Minimum DNA %identity.\"\n            },\n            \"abricateMinCov\": {\n                \"default\": '0',\n                \"description\": \"Minimum DNA %coverage.\"\n            }\n        }\n\n        self.link_start = None\n        self.link_end.append({\"link\": \"MAIN_assembly\",\n                              \"alias\": \"MAIN_assembly\"})\n\n        self.directives = {\n            \"abricate\": {\n                \"container\": \"flowcraft/abricate\",\n                \"version\": \"0.8.0-3\"\n            },\n            \"process_abricate\": {\n                \"container\": \"flowcraft/abricate\",\n                \"version\": \"0.8.0-3\"\n            }\n        }\n\n\nclass CardRgi(Process):\n    \"\"\"card's rgi process template interface\n\n        This process is set with:\n\n            - ``input_type``: fasta\n            - ``output_type``: txt\n            - ``ptype``: resistance gene detection (assembly)\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"txt\"\n\n        self.params = {\n            \"alignmentTool\": {\n                \"default\": \"'DIAMOND'\",\n                \"description\": \"Specifies the alignment tool.\"\n                               \"Options: DIAMOND or BLAST\"\n            }\n        }\n\n        self.directives = {\n            \"card_rgi\": {\n                \"container\": \"flowcraft/card_rgi\",\n                \"version\": \"4.0.2-0.1\",\n                \"memory\": \"{10.Gb*task.attempt}\"\n            }\n        }\n\n        self.status_channels = [\n            \"card_rgi\"\n        ]\n\n\nclass Prokka(Process):\n    \"\"\"Prokka mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: None\n        - ``ptype``: post_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``MAIN_assembly`` (alias: ``MAIN_assembly``): Receives the last\n        assembly.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.ignore_type = True\n\n        self.link_start = None\n        self.link_end.append({\"link\": \"MAIN_assembly\",\n                              \"alias\": \"MAIN_assembly\"})\n\n        self.params = {\n            \"centre\": {\n                \"default\": \"'UMMI'\",\n                \"description\": \"sequencing centre ID\"\n            },\n            \"kingdom\": {\n                \"default\": \"'Bacteria'\",\n                \"description\": \"Annotation mode: Archaea|Bacteria|Mitochondria\"\n                               \"|Viruses (default 'Bacteria')\"\n            },\n            \"genus\": {\n                \"default\": \"false\",\n                \"description\": \"Genus name (default 'Genus'). This also adds\"\n                               \"the --usegenus flag to prokka\"\n            },\n        }\n\n        self.directives = {\n            \"prokka\": {\n                \"cpus\": 2,\n                \"container\": \"ummidock/prokka\",\n                \"version\": \"1.12\"\n            }\n        }\n\n\nclass Diamond(Process):\n    \"\"\"diamond process for protein database queries\n\n        This process is set with:\n\n            - ``input_type``: fasta\n            - ``output_type``: None\n            - ``ptype``: post_assembly\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.params = {\n            \"pathToDb\": {\n                \"default\": 'null',\n                \"description\": \"Provide full path for the diamond database. \"\n                               \"If none is provided then will try to fetch from\"\n                               \" the previous process. Default: None\"\n            },\n            \"fastaToDb\": {\n                \"default\": 'null',\n                \"description\": \"Provide the full path for the fasta to \"\n                               \"construct a diamond database. Default: None\"\n            },\n            \"blastType\": {\n                \"default\": \"'blastx'\",\n                \"description\": \"Defines the type of blast that diamond will do.\"\n                               \"Can wither be blastx or blastp. Default: blastx\"\n            }\n        }\n\n        self.directives = {\n            \"diamond\": {\n                \"container\": \"flowcraft/diamond\",\n                \"version\": \"0.9.22-1\",\n                \"memory\": \"{ 4.GB * task.attempt }\",\n                \"cpus\": 2\n            }\n        }\n"
  },
  {
    "path": "flowcraft/generator/components/assembly.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Bcalm(Process):\n    \"\"\"Bcalm process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: assembly\n        - ``ptype``: assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.params = {\n            \"bcalmKmerSize\": {\n                \"default\": 31,\n                \"description\":\n                    \"size of a kmer\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"bcalm\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"quay.io/biocontainers/bcalm\",\n            \"version\": \"2.2.0--hd28b015_2\",\n            \"scratch\": \"true\"\n        }}\n\n\nclass Spades(Process):\n    \"\"\"Spades process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: assembly\n        - ``ptype``: assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``SIDE_max_len`` (alias: ``SIDE_max_len``): Receives max read length\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"SIDE_max_len\", \"alias\": \"SIDE_max_len\"})\n        self.link_start.append(\"gfa1\")\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.params = {\n            \"spadesMinCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"The minimum number of reads to consider an edge in the\"\n                    \" de Bruijn graph during the assembly\"\n            },\n            \"spadesMinKmerCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"Minimum contigs K-mer coverage. After assembly only \"\n                    \"keep contigs with reported k-mer coverage equal or \"\n                    \"above this value\"\n            },\n            \"spadesKmers\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"If 'auto' the SPAdes k-mer lengths will be determined \"\n                    \"from the maximum read length of each assembly. If \"\n                    \"'default', SPAdes will use the default k-mer lengths. \"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            },\n            \"disableRR\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"disables repeat resolution stage of assembling.\"\n            }\n        }\n\n        self.directives = {\"spades\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/spades\",\n            \"version\": \"3.13.0-1\",\n            \"scratch\": \"true\"\n        }}\n\n\nclass Skesa(Process):\n    \"\"\"Skesa process template interface\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.directives = {\"skesa\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/skesa\",\n            \"version\": \"2.3.0-1\",\n            \"scratch\": \"true\"\n        }}\n\n        self.params = {\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n\nclass ViralAssembly(Process):\n    \"\"\"\n    Process to assemble viral genomes, based on SPAdes and megahit\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.status_channels = [\"va_spades\", \"va_megahit\",\n                                \"report_viral_assembly\"]\n\n        self.link_end.append({\"link\": \"SIDE_max_len\", \"alias\": \"SIDE_max_len\"})\n\n        self.directives = {\"va_spades\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/viral_assembly\",\n            \"version\": \"0.1-1\",\n            \"scratch\": \"true\"\n        }, \"va_megahit\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/viral_assembly\",\n            \"version\": \"0.1-1\",\n            \"scratch\": \"true\"\n        }}\n\n        self.params = {\n            \"minimumContigSize\": {\n                \"default\": 10000,\n                \"description\":\n                    \"Expected genome size in bases\"\n            },\n            \"spadesMinCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"The minimum number of reads to consider an edge in the\"\n                    \" de Bruijn graph during the assembly\"\n            },\n            \"spadesMinKmerCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"Minimum contigs K-mer coverage. After assembly only \"\n                    \"keep contigs with reported k-mer coverage equal or \"\n                    \"above this value\"\n            },\n            \"spadesKmers\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"If 'auto' the SPAdes k-mer lengths will be determined \"\n                    \"from the maximum read length of each assembly. If \"\n                    \"'default', SPAdes will use the default k-mer lengths. \"\n            },\n            \"megahitKmers\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"If 'auto' the megahit k-mer lengths will be determined \"\n                    \"from the maximum read length of each assembly. If \"\n                    \"'default', megahit will use the default k-mer lengths. \"\n                    \"(default: $params.megahitKmers)\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n\nclass Abyss(Process):\n    \"\"\"ABySS process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: assembly\n        - ``ptype``: assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n        self.link_start.append(\"gfa1\")\n\n        self.params = {\n            \"abyssKmer\": {\n                \"default\": \"96\",\n                \"description\":\n                    \"kmer size for assembly.\"\n            }\n        }\n\n        self.directives = {\"abyss\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/abyss\",\n            \"version\": \"2.1.1\",\n            \"scratch\": \"true\"\n        }}\n\n\nclass Unicycler(Process):\n    \"\"\"Unicycler process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: assembly\n        - ``ptype``: assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n        self.link_start.append(\"gfa1\")\n\n        self.directives = {\"unicycler\": {\n            \"cpus\": 4,\n            \"container\": \"quay.io/biocontainers/unicycler\",\n            \"version\": \"0.4.7--py36hdbcaa40_0\",\n            \"scratch\": \"true\"\n        }}\n"
  },
  {
    "path": "flowcraft/generator/components/assembly_processing.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass ProcessSkesa(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.params = {\n            \"genomeSize\": {\n                \"default\": 1,\n                \"description\":\n                    \"Genome size estimate for the samples in Mb. It is used \"\n                    \"to assess whether an assembly is much larger or smaller \"\n                    \"than expected\",\n            },\n            \"skesaMinKmerCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"Minimum contigs K-mer coverage. After assembly only keep\"\n                    \" contigs with reported k-mer coverage equal or above \"\n                    \"this value\"\n            },\n            \"skesaMinContigLen\": {\n                \"default\": 200,\n                \"description\":\n                    \"Filter contigs for length greater or equal than this \"\n                    \"value\"\n            },\n            \"skesaMaxContigs\": {\n                \"default\": 100,\n                \"description\":\n                    \"Maximum number of contigs per 1.5 Mb of expected \"\n                    \"genome size\"\n            }\n        }\n\n        self.directives = {\"skesa\": {\n            \"cpus\": 1,\n            \"memory\": \"'2GB'\",\n            \"container\": \"flowcraft/skesa\",\n            \"version\": \"2.1-1\",\n        }}\n\n\nclass ProcessSpades(Process):\n    \"\"\"Process spades process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: assembly\n        - ``ptype``: post_assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.params = {\n            \"genomeSize\": {\n                \"default\": 1,\n                \"description\":\n                    \"Genome size estimate for the samples in Mb. It is used \"\n                    \"to assess whether an assembly is much larger or smaller \"\n                    \"than expected\",\n\n            },\n            \"spadesMinKmerCoverage\": {\n                \"default\": 2,\n                \"description\":\n                    \"Minimum contigs K-mer coverage. After assembly only keep\"\n                    \" contigs with reported k-mer coverage equal or above \"\n                    \"this value\"\n            },\n            \"spadesMinContigLen\": {\n                \"default\": 200,\n                \"description\":\n                    \"Filter contigs for length greater or equal than this \"\n                    \"value\"\n            },\n            \"spadesMaxContigs\": {\n                \"default\": 100,\n                \"description\":\n                    \"Maximum number of contigs per 1.5 Mb of expected \"\n                    \"genome size\"\n            }\n        }\n\n        self.directives = {\"process_spades\": {\n            \"container\": \"flowcraft/spades\",\n            \"version\": \"3.11.1-1\"\n        }}\n\n\nclass AssemblyMapping(Process):\n    \"\"\"Assembly mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: assembly\n        - ``ptype``: post_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``MAIN_fq`` (alias: ``_MAIN_assembly``): Receives the FastQ files\n        from the last process with ``fastq`` output type.\n\n    It contains two **status channels**:\n\n        - ``STATUS_am``: Status for the assembly_mapping process\n        - ``STATUS_amp``: Status for the process_assembly_mapping process\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.status_channels = [\"STATUS_assembly_mapping\",\n                                \"STATUS_process_am\"]\n\n        self.link_start.append(\"SIDE_BpCoverage\")\n        self.link_end.append({\"link\": \"__fastq\", \"alias\": \"_LAST_fastq\"})\n\n        self.params = {\n            \"minAssemblyCoverage\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"In auto, the default minimum coverage for each \"\n                    \"assembled contig is 1/3 of the assembly mean coverage or\"\n                    \" 10x, if the mean coverage is below 10x\"\n            },\n            \"AMaxContigs\": {\n                \"default\": 100,\n                \"description\":\n                    \"A warning is issued if the number of contigs is over\"\n                    \"this threshold.\"\n            },\n            \"genomeSize\": {\n                \"default\": 2.1,\n                \"description\":\n                    \"Genome size estimate for the samples. It is used to \"\n                    \"check the ratio of contig number per genome MB\"\n            }\n        }\n\n        self.directives = {\n            \"assembly_mapping\": {\n                \"cpus\": 4,\n                \"memory\": \"{ 5.GB * task.attempt }\",\n                \"container\": \"flowcraft/bowtie2_samtools\",\n                \"version\": \"1.0.0-1\"\n            },\n            \"process_assembly_mapping\": {\n                \"cpus\": 1,\n                \"memory\": \"{ 5.GB * task.attempt }\",\n                \"container\": \"flowcraft/bowtie2_samtools\",\n                \"version\": \"1.0.0-1\"\n            }\n        }\n\n\nclass Pilon(Process):\n    \"\"\"Pilon mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: assembly\n        - ``ptype``: post_assembly\n\n    It contains one **dependency process**:\n\n        - ``assembly_mapping``: Requires the BAM file generated by the\n        assembly mapping process\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.dependencies = [\"assembly_mapping\"]\n        self.status_channels = [\"STATUS_pilon\", \"STATUS_pilon_report\"]\n\n        self.link_end.append({\"link\": \"SIDE_BpCoverage\",\n                              \"alias\": \"SIDE_BpCoverage\"})\n\n        self.params = {\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"pilon\": {\n                \"cpus\": 4,\n                \"memory\": \"{ 7.GB * task.attempt }\",\n                \"container\": \"flowcraft/pilon\",\n                \"version\": \"1.22.0-1\"\n            },\n            \"pilon_report\": {\n                \"cpus\": 1,\n                \"memory\": \"{ 7.GB * task.attempt }\",\n                \"container\": \"flowcraft/pilon\",\n                \"version\": \"1.22.0-1\"\n            }\n        }\n\nclass Bandage(Process):\n    \"\"\"Visualize the assembly using Bandage\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: none\n        - ``ptype``: post_assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.link_end.append({\"link\": \"gfa1\", \"alias\": \"gfa1\"})\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"null\",\n                \"description\": \"Align the assembly to this reference genome using BLAST\"\n            },\n        }\n\n        self.directives = {\n            \"bandage\": {\n                \"container\": \"flowcraft/bandage\",\n                \"version\": \"0.8.1\"\n            }\n        }\n\nclass Quast(Process):\n    \"\"\"Assess assembly quality using QUAST\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: tsv\n        - ``ptype``: post_assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"tsv\"\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"null\",\n                \"description\": \"Compare the assembly to this reference genome\"\n            },\n            \"genomeSizeBp\": {\n                \"default\": \"null\",\n                \"description\": \"Expected genome size (bp)\"\n            },\n        }\n\n        self.directives = {\n            \"quast\": {\n                \"container\": \"quay.io/biocontainers/quast\",\n                \"version\": \"5.0.0--py27pl526ha92aebf_1\"\n            }\n        }\n"
  },
  {
    "path": "flowcraft/generator/components/distance_estimation.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass MashDist(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"json\"\n\n        self.params = {\n            \"pValue\": {\n                \"default\": 0.05,\n                \"description\": \"P-value cutoff for the distance estimation \"\n                               \"between two sequences to be included in the \"\n                               \"output.\"\n            },\n            \"mash_distance\": {\n                \"default\": 0.1,\n                \"description\": \"Sets the maximum distance between two \"\n                               \"sequences to be included in the output.\"\n            },\n            \"shared_hashes\": {\n                \"default\": 0.8,\n                \"description\": \"Sets a minimum percentage of hashes shared \"\n                               \"between two sequences in order to include its \"\n                               \"result in the output.\"\n            },\n            \"refFile\": {\n                \"default\": \"'/ngstools/data/plasmid_db_reference.msh'\",\n                \"description\": \"Specifies the reference file to be provided \"\n                               \"to mash. It can either be a fasta or a .msh \"\n                               \"reference sketch generated by mash.\"\n            }\n        }\n\n        self.directives = {\n            \"runMashDist\": {\n                \"container\": \"flowcraft/mash-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            },\n            \"mashDistOutputJson\": {\n                \"container\": \"flowcraft/mash-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"'4GB'\"\n            }\n        }\n\n        self.status_channels = [\n            \"runMashDist\",\n            \"mashDistOutputJson\"\n        ]\n\n        self.link_end.append({\n            \"link\": \"SIDE_mashSketchOutChannel\",\n            \"alias\": \"SIDE_mashSketchOutChannel\"\n        })\n\n\nclass MashScreen(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"json\"\n\n        self.params = {\n            \"noWinner\": {\n                \"default\": \"false\",\n                \"description\": \"A variable that enables the use of -w option\"\n                               \" for mash screen.\"\n            },\n            \"pValue\": {\n                \"default\": 0.05,\n                \"description\": \"P-value cutoff for the distance estimation \"\n                               \"between two sequences to be included in the \"\n                               \"output.\"\n            },\n            \"identity\": {\n                \"default\": 0.9,\n                \"description\": \"The percentage of identity between the reads \"\n                               \"input and the reference sequence\"\n            },\n            \"refFile\": {\n                \"default\": \"'/ngstools/data/plasmid_db_reference.msh'\",\n                \"description\": \"Specifies the reference file to be provided \"\n                               \"to mash. It can either be a fasta or a .msh \"\n                               \"reference sketch generated by mash.\"\n            }\n        }\n\n        self.directives = {\n            \"mashScreen\": {\n                \"container\": \"flowcraft/mash-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            },\n            \"mashOutputJson\": {\n                \"container\": \"flowcraft/mash-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"'4GB'\"\n            }\n        }\n\n        self.status_channels = [\n            \"mashScreen\",\n            \"mashOutputJson\"\n        ]\n\n        self.compiler[\"patlas_consensus\"] = [\"mashScreenOutputChannel\"]\n\n        self.link_end.append({\n            \"link\": \"SIDE_mashSketchOutChannel\",\n            \"alias\": \"SIDE_mashSketchOutChannel\"\n        })\n\n\nclass MashSketchFasta(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"msh\"\n\n        self.ignore_type = True\n\n        self.params = {\n            \"kmerSize\": {\n                \"default\": 21,\n                \"description\": \"Set the kmer size for hashing. Default: 21.\"\n            },\n            \"sketchSize\": {\n                \"default\": 1000,\n                \"description\": \"Set the number of hashes per sketch. Default: \"\n                               \"1000\"\n            },\n        }\n\n        self.directives = {\n            \"mashSketchFasta\": {\n                \"container\": \"flowcraft/mash-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            },\n        }\n\n        self.status_channels = [\n            \"mashSketchFasta\",\n        ]\n\n        self.link_start.extend([\"SIDE_mashSketchOutChannel\"])\n\n\nclass MashSketchFastq(MashSketchFasta):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n\n        # add more params to dict\n        self.params.update({\n            \"minKmer\": {\n                \"default\": 1,\n                \"description\": \"Minimum copies of each k-mer required to pass \"\n                               \"noise filter for reads. Implies -r. Default: 1\"\n            },\n            \"genomeSize\": {\n                \"default\": \"false\",\n                \"description\": \"Genome size (raw bases or with K/M/G/T). If \"\n                               \"specified, will be used for p-value calculation\"\n                               \" instead of an estimated size from k-mer \"\n                               \"content. Default: false, meaning that it won't\"\n                               \"be used. If you want to use it pass a number to\"\n                               \" this parameter.\"\n            }\n        })\n\n        self.directives = {\n            \"mashSketchFastq\": self.directives[\"mashSketchFasta\"]\n        }\n\n        self.status_channels = [\n            \"mashSketchFastq\",\n        ]\n\n\nclass FastAni(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n\n        self.params = {\n            \"fragLen\": {\n                \"default\": 3000,\n                \"description\": \"Set size of fragment. Default: 3000.\"\n            }\n        }\n\n        self.directives = {\n            \"fastAniMatrix\": {\n                \"container\": \"flowcraft/fast_ani\",\n                \"version\": \"1.1.0-2\",\n                \"cpus\": 20,\n                \"memory\": \"{ 30.GB * task.attempt }\"\n            },\n        }\n\n        self.status_channels = [\n            \"fastAniMatrix\",\n        ]\n"
  },
  {
    "path": "flowcraft/generator/components/downloads.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass ReadsDownload(Process):\n    \"\"\"Process template interface for reads downloading from SRA and NCBI\n\n    This process is set with:\n\n        - ``input_type``: accessions\n        - ``output_type`` fastq\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"accessions\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"asperaKey\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Downloads fastq accessions from ENA using Aspera Connect \"\n                    \"by providing the private-key file \"\n                    \"'asperaweb_id_dsa.openssh' normally found in \"\n                    \"~/.aspera/connect/etc/asperaweb_id_dsa.openssh \"\n            }\n        }\n\n        self.directives = {\"reads_download\": {\n            \"cpus\": 1,\n            \"memory\": \"'1GB'\",\n            \"container\": \"flowcraft/getseqena\",\n            \"version\": \"0.4.0-1\"\n        }}\n\n\nclass FasterqDump(Process):\n    \"\"\"Process template for fasterq-dump\n\n    This process is set with:\n\n        - ``input_type``: accessions\n        - ``output_type`` fastq\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"accessions\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"option_file\": {\n                \"default\": \"false\",\n                \"description\": \"Read more options and parameters from the file.\"\n                           \"Use to provide parameters to fasterq-dump\"\n            },\n            \"compress_fastq\": {\n                \"default\": \"true\",\n                \"description\": \"This option allow the users to define if they\"\n                               \"want to compress the downloaded fastq files, \"\n                               \"saving disk space. Default behavior is set\"\n                               \"to compress the fastq files. If the user wants\"\n                               \"to change this, set the variable to 'no'\"\n            }\n        }\n\n        self.directives = {\"fasterqDump\": {\n            \"cpus\": 1,\n            \"memory\": \"'1GB'\",\n            \"container\": \"flowcraft/sra-tools\",\n            \"version\": \"2.9.1-1\"\n        }}\n\n        self.status_channels = [\n            \"fasterqDump\"\n        ]\n"
  },
  {
    "path": "flowcraft/generator/components/mapping.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Bowtie(Process):\n    \"\"\"bowtie2 to align short paired-end sequencing reads to long reference sequences\n\n        This process is set with:\n\n            - ``input_type``: fastq\n            - ``output_type``: bam\n            - ``ptype``: mapping\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"bam\"\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the reference genome to be provided \"\n                               \"to bowtie2-build.\"\n            },\n            \"index\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the reference indexes to be provided \"\n                               \"to bowtie2.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"bowtie\": {\n                \"container\": \"flowcraft/bowtie2_samtools\",\n                \"version\": \"1.0.0-1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4\n            },\n            \"bowtie_build\": {\n                \"container\": \"flowcraft/bowtie2_samtools\",\n                \"version\": \"1.0.0-1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 1\n            }\n        }\n\n        self.status_channels = [\n            \"bowtie\",\n            \"report_bowtie\"\n        ]\n\n\nclass RetrieveMapped(Process):\n    \"\"\"Samtools process to  to align short paired-end sequencing reads to\n    long reference sequences\n\n        This process is set with:\n\n            - ``input_type``: bam\n            - ``output_type``: fastq\n            - ``ptype``: mapping\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"bam\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.dependencies = [\"bowtie\"]\n\n        self.directives = {\n            \"retrieve_mapped\": {\n                \"container\": \"flowcraft/bowtie2_samtools\",\n                \"version\": \"1.0.0-1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 2\n            }\n        }\n\n        self.status_channels = [\n            \"retrieve_mapped\"\n        ]\n\n\nclass Bwa(Process):\n    \"\"\"Bwa to align short paired-end sequencing reads to long reference sequences\n\n        This process is set with:\n\n            - ``input_type``: fastq\n            - ``output_type``: bam\n            - ``ptype``: mapping\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"bam\"\n\n        self.params = {\n            \"bwaIndex\": {\n                \"default\": \"'s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta'\",\n                \"description\": \"Specifies the reference indexes to be provided \"\n                               \"to bwa.\"\n            }\n        }\n\n        self.directives = {\n            \"bwa\": {\n                \"container\": \"flowcraft/bwa_samtools\",\n                \"version\": \"0.7.17-1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4\n            }\n        }\n\n        self.status_channels = [\n            \"bwa\",\n        ]\n\n\nclass MarkDuplicates(Process):\n    \"\"\"Identifies duplicate reads.\n\n        This process is set with:\n\n            - ``input_type``: bam\n            - ``output_type``: bam\n            - ``ptype``: mapping\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"bam\"\n        self.output_type = \"bam\"\n\n        self.compiler[\"multiqc\"] = [\"markDupMultiQC\"]\n\n        self.directives = {\n            \"mark_duplicates\": {\n                \"container\": \"broadinstitute/gatk\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4\n            }\n        }\n\n        self.status_channels = [\n            \"mark_duplicates\"\n        ]\n\n\nclass BaseRecalibrator(Process):\n    \"\"\"Detects systematic errors in base quality scores\n\n        This process is set with:\n\n            - ``input_type``: bam\n            - ``output_type``: bam\n            - ``ptype``: mapping\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"bam\"\n        self.output_type = \"bam\"\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the name of the FASTA reference genome and index files to be provided \"\n                               \"to BaseRecalibrator.\"\n            },\n            \"dbsnp\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the dbSNP VCF file to be provided \"\n                               \"to BaseRecalibrator.\"\n            },\n            \"dbsnpIdx\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the dbSNP VCF index file to be provided \"\n                               \"to BaseRecalibrator.\"\n            },\n            \"goldenIndel\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the Gold standard INDELs VCF file to be provided \"\n                               \"to BaseRecalibrator.\"\n            },\n            \"goldenIndelIdx\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the Gold standard INDELs VCF index file to be provided \"\n                               \"to BaseRecalibrator.\"\n            }\n        }\n\n        self.directives = {\n            \"base_recalibrator\": {\n                \"container\": \"broadinstitute/gatk\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4\n            },\n            \"apply_bqsr\": {\n                \"container\": \"broadinstitute/gatk\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4\n            }\n        }\n\n        self.status_channels = [\n            \"base_recalibrator\",\n            \"apply_bqsr\"\n        ]"
  },
  {
    "path": "flowcraft/generator/components/metagenomics.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Concoct(Process):\n    \"\"\"\n    CONCOCT process template interface for the\n    taxonomic independent binning of metagenomic\n    assemblies.\n\n    This process is set with:\n        - ``input_type``: assembly\n        - ``output_type``: assembly\n        - ``ptype``: post_assembly\n\n        It contains one **secondary channel link end**:\n\n            - ``MAIN_fq`` (alias: ``_MAIN_assembly``): Receives the FastQ files\n            from the last process with ``fastq`` output type.\n    \"\"\"\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"__fastq\", \"alias\": \"_LAST_fastq\"})\n\n        self.params = {\n            \"clusters\": {\n                \"default\": 400,\n                \"description\": \"Maximum number of clusters for VGMM. Default: 400\"\n            },\n            \"lengthThreshold\": {\n                \"default\": 1000,\n                \"description\": \"Contigs shorter than this value will not be included. Default: 1000.\"\n            },\n            \"readLength\": {\n                \"default\": 100,\n                \"description\": \"Specify read length for coverage.\"\n                               \"Default: 0.9\"\n            },\n            \"iterations\": {\n                \"default\": 500,\n                \"description\": \"Number of iterations for the VBGMM. Default: 500\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"concoct\": {\n                \"container\": \"flowcraft/concoct\",\n                \"version\": \"1.0.0-1\",\n                \"cpus\": 4,\n                \"memory\": \"{ 5.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"concoct\",\n            \"report_concoct\"\n        ]\n\n\nclass Kraken(Process):\n    \"\"\"kraken process template interface\n\n            This process is set with:\n\n                - ``input_type``: fastq\n                - ``output_type``: txt\n                - ``ptype``: taxonomic classification\n    \"\"\"\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"txt\"\n\n        self.params = {\n            \"krakenDB\": {\n                \"default\": \"'minikraken_20171013_4GB'\",\n                \"description\": \"Specifies kraken database.\"\n            }\n        }\n\n        self.directives = {\n            \"kraken\": {\n                \"container\": \"flowcraft/kraken\",\n                \"version\": \"1.0-0.1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 3\n            }\n        }\n\n        self.status_channels = [\n            \"kraken\"\n        ]\n\n\nclass Kraken2(Process):\n    \"\"\"kraken2 process template interface\n\n            This process is set with:\n\n                - ``input_type``: fastq\n                - ``output_type``: txt\n                - ``ptype``: taxonomic classification\n    \"\"\"\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = None\n\n        self.params = {\n            \"kraken2DB\": {\n                \"default\": \"'minikraken2_v1_8GB'\",\n                \"description\": \"Specifies kraken2 database. Requires full path if database not on \"\n                               \"KRAKEN2_DB_PATH.\"\n            }\n        }\n\n        self.directives = {\n            \"kraken2\": {\n                \"container\": \"flowcraft/kraken2\",\n                \"version\": \"2.0.7-1\",\n                \"memory\": \"{8.Gb*task.attempt}\",\n                \"cpus\": 4\n            }\n        }\n\n        self.status_channels = [\n            \"kraken2\"\n        ]\n\n\nclass Maxbin2(Process):\n    \"\"\"MaxBin2, a metagenomics binning software\n\n            This process is set with:\n\n                - ``input_type``: assembly\n                - ``output_type``: assembly\n                - ``ptype``: post_assembly\n\n            It contains one **secondary channel link end**:\n\n                - ``MAIN_fq`` (alias: ``_MAIN_assembly``): Receives the FastQ files\n                from the last process with ``fastq`` output type.\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"__fastq\", \"alias\": \"_LAST_fastq\"})\n\n        self.params = {\n            \"min_contig_lenght\": {\n                \"default\": 1000,\n                \"description\": \"minimum contig length. Default: 1000\"\n            },\n            \"max_iteration\": {\n                \"default\": 50,\n                \"description\": \"maximum Expectation-Maximization algorithm\"\n                               \"iteration number. Default: 50\"\n            },\n            \"prob_threshold\": {\n                \"default\": 0.9,\n                \"description\": \"probability threshold for EM final classification.\"\n                               \"Default: 0.9\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"maxbin2\": {\n                \"container\": \"flowcraft/maxbin2\",\n                \"version\": \"2.2.4-1\",\n                \"cpus\": 3,\n                \"memory\": \"{ 5.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"maxbin2\",\n            \"report_maxbin2\"\n        ]\n\n\nclass Megahit(Process):\n    \"\"\"megahit process template interface\n\n        This process is set with:\n\n            - ``input_type``: fastq\n            - ``output_type``: assembly\n            - ``ptype``: assembly\n\n        It contains one **secondary channel link end**:\n\n            - ``SIDE_max_len`` (alias: ``SIDE_max_len``): Receives max read length\n        \"\"\"\n\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"SIDE_max_len\", \"alias\": \"SIDE_max_len\"})\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.params = {\n            \"megahitKmers\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"If 'auto' the megahit k-mer lengths will be determined \"\n                    \"from the maximum read length of each assembly. If \"\n                    \"'default', megahit will use the default k-mer lengths. \"\n                    \"(default: $params.megahitKmers)\"\n            },\n            \"fastg\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Converts megahit intermediate contigs to fastg\"\n\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"megahit\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/megahit\",\n            \"version\": \"1.1.3-0.1\",\n            \"scratch\": \"true\"\n        },\n            \"megahit_fastg\": {\n                \"container\": \"flowcraft/megahit\",\n                \"version\": \"1.1.3-0.1\",\n            }\n        }\n\n        self.status_channels = [\n            \"megahit\",\n            \"megahit_fastg\"\n        ]\n\n\nclass Metabat2(Process):\n    \"\"\"\n    MetaBat2 process template interface for the\n    taxonomic independent binning of metagenomic\n    assemblies.\n\n    This process is set with:\n        - ``input_type``: assembly\n        - ``output_type``: assembly\n        - ``ptype``: post_assembly\n\n    It contains one **dependency process**:\n\n        - ``assembly_mapping``: Requires the BAM file generated by the\n        assembly mapping process\n\n    \"\"\"\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.dependencies = [\"assembly_mapping\"]\n\n        self.params = {\n            \"maxPercentage\": {\n                \"default\": 95,\n                \"description\": \"Percentage of 'good' contigs considered for binning decided by connection. Default: 95.\"\n            },\n            \"minContig\": {\n                \"default\": 2500,\n                \"description\": \"Minimum size of a contig for binning (should be >=1500). Default: 2500.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"metabat2\": {\n                \"container\": \"flowcraft/metabat\",\n                \"version\": \"2.13-1\",\n                \"cpus\": 4,\n                \"memory\": \"{ 5.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"metabat2\",\n            \"report_metabat2\"\n        ]\n\n\nclass Metaspades(Process):\n    \"\"\"Metaspades process template interface\n\n        This process is set with:\n\n            - ``input_type``: fastq\n            - ``output_type``: assembly\n            - ``ptype``: assembly\n\n        It contains one **secondary channel link end**:\n\n            - ``SIDE_max_len`` (alias: ``SIDE_max_len``): Receives max read length\n        \"\"\"\n\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"SIDE_max_len\", \"alias\": \"SIDE_max_len\"})\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.params = {\n            \"metaspadesKmers\": {\n                \"default\": \"'auto'\",\n                \"description\":\n                    \"If 'auto' the metaSPAdes k-mer lengths will be determined \"\n                    \"from the maximum read length of each assembly. If \"\n                    \"'default', metaSPAdes will use the default k-mer lengths. \"\n                    \"(default: $params.metaspadesKmers)\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"metaspades\": {\n            \"cpus\": 4,\n            \"memory\": \"{ 5.GB * task.attempt }\",\n            \"container\": \"flowcraft/spades\",\n            \"version\": \"3.11.1-1\",\n            \"scratch\": \"true\"\n        }}\n\n\nclass Midas_species(Process):\n    \"\"\"Midas species process template interface\n\n            This process is set with:\n\n                - ``input_type``: fastq\n                - ``output_type``: txt\n                - ``ptype``: taxonomic classification (species)\n    \"\"\"\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"txt\"\n\n        self.params = {\n            \"midasDB\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies Midas database.\"\n            }\n        }\n\n        self.directives = {\n            \"midas_species\": {\n                \"container\": \"flowcraft/midas\",\n                \"version\": \"1.3.2-0.1\",\n                \"memory\": \"{2.Gb*task.attempt}\",\n                \"cpus\": 3\n            }\n        }\n\n        self.status_channels = [\n            \"midas_species\"\n        ]\n\n\nclass RemoveHost(Process):\n    \"\"\"bowtie2 to remove host reads process template interface\n\n        This process is set with:\n\n            - ``input_type``: fastq\n            - ``output_type``: fastq\n            - ``ptype``: removal os host reads\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"refIndex\": {\n                \"default\": \"'/index_hg19/hg19'\",\n                \"description\": \"Specifies the reference indexes to be provided \"\n                               \"to bowtie2.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"remove_host\": {\n                \"container\": \"flowcraft/remove_host\",\n                \"version\": \"2-0.1\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 3\n            }\n        }\n\n        self.status_channels = [\n            \"remove_host\",\n            \"report_remove_host\"\n        ]\n\n\nclass Metaprob(Process):\n    \"\"\"MetaProb to bin metagenomic reads interface\n\n            This process is set with:\n\n                - ``input_type``: fastq\n                - ``output_type``: csv\n                - ``ptype``: binning of reads\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"csv\"\n\n        self.params = {\n            \"feature\": {\n                \"default\": 1,\n                \"description\": \"Feature used to compute. Default: 1\"\n            },\n            \"metaProbQMer\": {\n                \"default\": 5,\n                \"description\": \"Threshold of shared q-mer to create graph \"\n                               \"adiacences. Default: 5\"\n            }\n        }\n\n        self.directives = {\n            \"metaProb\": {\n                \"container\": \"flowcraft/metaprob\",\n                \"version\": \"2-1\",\n                \"cpus\": 1,\n                \"memory\": \"{ 30.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"metaProb\"\n        ]\n\n\nclass SplitAssembly(Process):\n    \"\"\"Component to filter metagenomic assemblies by contig size\n    If the contig is larger than $param.size, it gets separated\n    from the original assembly to continue the processes downstream\n    of the pipeline.\n\n            This process is set with:\n\n                - ``input_type``: fasta\n                - ``output_type``: fasta\n                - ``ptype``: assembly filter\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.params = {\n            \"size\": {\n                \"default\": \"null\",\n                \"description\": \"Minimum contig size\"\n            }\n        }\n\n        self.directives = {\n            \"split_assembly\": {\n                \"cpus\": 1,\n                \"memory\": \"{ 1.GB * task.attempt }\"\n            }\n        }\n\n        self.status_channels = [\n            \"split_assembly\"\n        ]\n"
  },
  {
    "path": "flowcraft/generator/components/mlst.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Mlst(Process):\n    \"\"\"Mlst mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: None\n        - ``ptype``: post_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``MAIN_assembly`` (alias: ``MAIN_assembly``): Receives the last\n        assembly.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.directives = {\"mlst\": {\n            \"container\": \"ummidock/mlst\",\n        }}\n\n        self.params = {\n            \"mlstSpecies\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Specify the expected species for MLST checking.\"\n            }\n        }\n\n\nclass Chewbbaca(Process):\n    \"\"\"Chewbbaca process template interface\n\n    This process is set with:\n\n        - ``input_type``: assembly\n        - ``output_type``: None\n        - ``ptype``: post_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``MAIN_assembly`` (alias: ``MAIN_assembly``): Receives the last\n        assembly.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.ignore_type = True\n\n        self.link_start = None\n        self.link_end.append({\"link\": \"MAIN_assembly\",\n                              \"alias\": \"MAIN_assembly\"})\n\n        self.directives = {\n            \"chewbbaca\": {\n                \"cpus\": 4,\n                \"container\": \"mickaelsilva/chewbbaca_py3\",\n                \"version\": \"latest\",\n            },\n            \"chewbbaca_batch\": {\n                \"cpus\": 4,\n                \"container\": \"mickaelsilva/chewbbaca_py3\",\n                \"version\": \"latest\",\n            },\n            \"chewbbacaExtractMLST\": {\n                \"container\": \"mickaelsilva/chewbbaca_py3\",\n                \"version\": \"latest\"\n            }\n        }\n\n        self.params = {\n            \"chewbbacaQueue\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Specifiy a queue/partition for chewbbaca. This option\"\n                    \" is only used for grid schedulers.\"\n            },\n            \"chewbbacaTraining\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Specify the full path to the prodigal training file \"\n                    \"of the corresponding species.\"\n            },\n            \"schemaPath\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"The path to the chewbbaca schema directory.\"\n            },\n            \"schemaSelectedLoci\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"The path to the selection of loci in the schema \"\n                    \"directory to be used. If not specified, all loci in the\"\n                    \" schema will be used.\"\n            },\n            \"schemaCore\": {\n                \"default\": \"null\",\n                \"description\": \"\"\n            },\n            \"chewbbacaJson\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"If set to True, chewbbaca's allele call output will be \"\n                    \"set to JSON format.\"\n            },\n            \"chewbbacaToPhyloviz\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"If set to True, the ExtractCgMLST module of chewbbaca\"\n                    \" will be executed after the allele calling.\",\n            },\n            \"chewbbacaProfilePercentage\": {\n                \"default\": 0.95,\n                \"description\":\n                    \"Specifies the proportion of samples that must be \"\n                    \"present in a locus to save the profile.\"\n            },\n            \"chewbbacaBatch\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Specifies whether a chewbbaca run will be performed on the\"\n                    \" complete input batch (all at the same time) or one by \"\n                    \"one.\"\n            }\n        }\n\n\nclass Metamlst(Process):\n    \"\"\"MetaMlst mapping process template interface\n\n    This process is set with:\n\n        - ``input_type``: reads\n        - ``output_type``: None\n        - ``ptype``: pre_assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = None\n\n        self.directives = {\"metamlst\": {\n            \"container\": \"flowcraft/metamlst\",\n            \"version\": \"1.1-1\",\n            \"memory\": \"{4.Gb*task.attempt}\"\n            }\n        }\n\n        self.params = {\n            \"metamlstDB\": {\n                \"default\": \"'/NGStools/metamlst/metamlstDB_2017.db'\",\n                \"description\":\n                    \"Specify the metamlstDB (full path) for MLST checking.\"\n            },\n            \"metamlstDB_index\": {\n                \"default\": \"'/NGStools/index/metamlstDB_2017'\",\n                \"description\":\n                    \"Specify the Bowtie2 metamlstDB index (full path) for MLST checking.\"\n            }\n        }\n\n\n"
  },
  {
    "path": "flowcraft/generator/components/patlas_mapping.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass MappingPatlas(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"json\"\n\n        self.params = {\n            \"trim5\": {\n                \"default\": 0,\n                \"description\": \"Sets trim5 option for bowtie. This will become\"\n                               \" legacy with QC integration, but it enables to\"\n                               \" trim 5' end of reads to be mapped with \"\n                               \"bowtie2.\"\n            },\n            \"cov_cutoff\": {\n                \"default\": 0.6,\n                \"description\": \"This variable sets a cutoff for the percentage\"\n                               \" of the query reference sequence that is \"\n                               \"covered by reads (in absolute lenght).\"\n            },\n            \"refIndex\": {\n                \"default\": \"'/ngstools/data/indexes/patlas_bowtie2_index'\",\n                \"description\": \"Specifies the reference indexes to be provided\"\n                               \" to bowtie2.\"\n            },\n            \"samtoolsIndex\": {\n                \"default\": \"'/ngstools/data/indexes/master_fasta_plasmid_db.fas.fai'\",\n                \"description\": \"Specifies the reference indexes to be provided\"\n                               \" to samtools.\"\n            },\n            \"lengthJson\": {\n                \"default\": \"'/ngstools/data/length_plasmid_db.json'\",\n                \"description\": \"A dictionary of all the lengths of reference \"\n                               \"sequences.\"\n            }\n        }\n\n        self.directives = {\n            \"mappingBowtie\": {\n                \"container\": \"flowcraft/mapping-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"{ 4.GB * task.attempt }\",\n                \"scratch\": \"true\"\n            },\n            \"jsonDumpingMapping\": {\n                \"container\": \"flowcraft/mapping-patlas\",\n                \"version\": \"1.6.0-1\",\n                \"cpus\": 1,\n                \"memory\": \"'4GB'\"\n            }\n        }\n\n        self.status_channels = [\n            \"mappingBowtie\",\n            \"jsonDumpingMapping\"\n        ]\n\n        self.compiler[\"patlas_consensus\"] = [\"mappingOutputChannel\"]\n"
  },
  {
    "path": "flowcraft/generator/components/phylogeny.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Raxml(Process):\n    \"\"\"mafft to align sequences\n\n            This process is set with:\n\n                - ``input_type``: align\n                - ``output_type``: .tree\n                - ``ptype``: tree\n\n            \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"align\"\n        self.output_type = \".tree\"\n\n        self.params = {\n            \"substitutionModel\": {\n                \"default\": \"'GTRGAMMA'\",\n                \"description\": \"Substitution model. Option: GTRCAT, GTRCATI, ASC_GTRCAT, GTRGAMMA, ASC_GTRGAMMA etc \"\n            },\n            \"seedNumber\": {\n                \"default\": \"12345\",\n                \"description\": \"Specify an integer number (random seed) and turn on rapid bootstrapping\"\n            },\n            \"bootstrap\": {\n                \"default\": \"500\",\n                \"description\": \"Specify the number of alternative runs on distinct starting trees\"\n            },\n            \"simpleLabel\": {\n                \"default\": \"true\",\n                \"description\": \"Simplify the labels in the newick tree (for interactive report only)\"\n            }\n        }\n\n        self.directives = {\n            \"raxml\": {\n                \"container\": \"flowcraft/raxml\",\n                \"version\": \"8.2.11-2\",\n                \"cpus\": 4,\n                \"memory\": \"{ 4.GB * task.attempt }\"\n            },\n            \"report_raxml\": {\n                \"container\": \"flowcraft/raxml\",\n                \"version\": \"8.2.11-2\"\n            }\n        }\n\n        self.status_channels = [\n            \"raxml\",\n            \"report_raxml\"\n        ]\n\n\n"
  },
  {
    "path": "flowcraft/generator/components/reads_quality_control.py",
    "content": "\ntry:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass IntegrityCoverage(Process):\n    \"\"\"Process template interface for first integrity_coverage process\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    It contains two **secondary channel link starts**:\n\n        - ``SIDE_phred``: Phred score of the FastQ files\n        - ``SIDE_max_len``: Maximum read length\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"genomeSize\": {\n                \"default\": 1,\n                \"description\":\n                    \"Genome size estimate for the samples in Mb. It is used to \"\n                    \"estimate the coverage and other assembly parameters and\"\n                    \"checks\"\n            },\n            \"minCoverage\": {\n                \"default\": 0,\n                \"description\":\n                    \"Minimum coverage for a sample to proceed. By default it's set\"\n                    \"to 0 to allow any coverage\"\n            }\n        }\n\n        self.link_start.extend([\"SIDE_phred\", \"SIDE_max_len\"])\n\n\nclass CheckCoverage(Process):\n    \"\"\"Process template interface for additional integrity_coverage process\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    It contains one **secondary channel link start**:\n\n        - ``SIDE_max_len``: Maximum read length\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"genomeSize\": {\n                \"default\": 2.1,\n                \"description\":\n                    \"Genome size estimate for the samples. It is used to \"\n                    \"estimate the coverage and other assembly parameters and\"\n                    \"checks\"\n            },\n            \"minCoverage\": {\n                \"default\": 15,\n                \"description\":\n                    \"Minimum coverage for a sample to proceed. Can be set to\"\n                    \"0 to allow any coverage\"\n            }\n        }\n\n        self.link_start.extend([\"SIDE_max_len\"])\n\n\nclass TrueCoverage(Process):\n    \"\"\"TrueCoverage process template interface\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"species\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Species name. Must be the complete species name with\"\n                    \"genus and species, e.g.: 'Yersinia enterocolitica'. \"\n            }\n        }\n\n        self.directives = {\n            \"true_coverage\": {\n                \"cpus\": 4,\n                \"memory\": \"'1GB'\",\n                \"container\": \"flowcraft/true_coverage\",\n                \"version\": \"3.2-1\"\n            }\n        }\n\n\nclass Fastqc(Process):\n    \"\"\"FastQC process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    It contains two **status channels**:\n\n        - ``STATUS_fastqc``: Status for the fastqc process\n        - ``STATUS_report``: Status for the fastqc_report process\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.status_channels = [\"STATUS_fastqc2\", \"STATUS_fastqc2_report\"]\n        \"\"\"\n        list: Setting status channels for FastQC execution and FastQC report\n        \"\"\"\n\n        self.params = {\n            \"adapters\": {\n                \"default\": \"'None'\",\n                \"description\":\n                    \"Path to adapters files, if any.\"\n            }\n        }\n\n        self.directives = {\"fastqc2\": {\n            \"cpus\": 2,\n            \"memory\": \"'4GB'\",\n            \"container\": \"flowcraft/fastqc\",\n            \"version\": \"0.11.7-1\"\n        }}\n\n\nclass Trimmomatic(Process):\n    \"\"\"Trimmomatic process template interface\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``SIDE_phred`` (alias: ``SIDE_phred``): Receives FastQ phred score\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.link_end.append({\"link\": \"SIDE_phred\", \"alias\": \"SIDE_phred\"})\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.params = {\n            \"adapters\": {\n                \"default\": \"'None'\",\n                \"description\":\n                    \"Path to adapters files, if any.\"\n            },\n            \"trimSlidingWindow\": {\n                \"default\": \"'5:20'\",\n                \"description\":\n                    \"Perform sliding window trimming, cutting once the \"\n                    \"average quality within the window falls below a \"\n                    \"threshold\"\n            },\n            \"trimLeading\": {\n                \"default\": \"3\",\n                \"description\":\n                    \"Cut bases off the start of a read, if below a threshold \"\n                    \"quality\"\n            },\n            \"trimTrailing\": {\n                \"default\": \"3\",\n                \"description\":\n                    \"Cut bases of the end of a read, if below a \"\n                    \"threshold quality\"\n            },\n            \"trimMinLength\": {\n                \"default\": \"55\",\n                \"description\":\n                    \"Drop the read if it is below a specified length \"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"trimmomatic\": {\n            \"cpus\": 2,\n            \"memory\": \"{ 4.GB * task.attempt }\",\n            \"container\": \"flowcraft/trimmomatic\",\n            \"version\": \"0.36-1\"\n        }}\n\n\nclass FastqcTrimmomatic(Process):\n    \"\"\"Fastqc + Trimmomatic process template interface\n\n    This process executes FastQC only to inform the trim range for trimmomatic,\n    not for QC checks.\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    It contains one **secondary channel link end**:\n\n        - ``SIDE_phred`` (alias: ``SIDE_phred``): Receives FastQ phred score\n\n    It contains three **status channels**:\n\n        - ``STATUS_fastqc``: Status for the fastqc process\n        - ``STATUS_report``: Status for the fastqc_report process\n        - ``STATUS_trim``: Status for the trimmomatic process\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.link_end.append({\"link\": \"SIDE_phred\", \"alias\": \"SIDE_phred\"})\n\n        self.status_channels = [\"STATUS_fastqc\", \"STATUS_fastqc_report\",\n                                \"STATUS_trimmomatic\"]\n\n        self.dependencies = [\"integrity_coverage\"]\n\n        self.params = {\n            \"adapters\": {\n                \"default\": \"'None'\",\n                \"description\":\n                    \"Path to adapters files, if any.\"\n            },\n            \"trimSlidingWindow\": {\n                \"default\": \"'5:20'\",\n                \"description\":\n                    \"Perform sliding window trimming, cutting once the \"\n                    \"average quality within the window falls below a \"\n                    \"threshold.\"\n            },\n            \"trimLeading\": {\n                \"default\": \"3\",\n                \"description\":\n                    \"Cut bases off the start of a read, if below a threshold \"\n                    \"quality.\"\n            },\n            \"trimTrailing\": {\n                \"default\": \"3\",\n                \"description\":\n                    \"Cut bases of the end of a read, if below a \"\n                    \"threshold quality.\"\n            },\n            \"trimMinLength\": {\n                \"default\": \"55\",\n                \"description\":\n                    \"Drop the read if it is below a specified length.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"fastqc\": {\n                \"cpus\": 2,\n                \"memory\": \"'4GB'\",\n                \"container\": \"flowcraft/fastqc\",\n                \"version\": \"0.11.7-1\"\n            },\n            \"trimmomatic\": {\n                \"cpus\": 2,\n                \"memory\": \"{ 4.GB * task.attempt }\",\n                \"container\": \"flowcraft/trimmomatic\",\n                \"version\": \"0.36-1\"\n            }\n        }\n\n\nclass FilterPoly(Process):\n    \"\"\"PrinSeq process to filter non-informative sequences from reads\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n        - ``ptype``: pre_assembly\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"adapter\": {\n                \"default\": \"'A 50%; T 50%; N 50%'\",\n                \"description\":\n                    \"Pattern to filter the reads. Please separate parameter\"\n                    \"values with a space and separate new parameter sets with\"\n                    \" semicolon (;). Parameters are defined by two values: \"\n                    \"the pattern (any combination of the letters ATCGN), and \"\n                    \"the number of repeats or percentage of occurence.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"filter_poly\": {\n            \"cpus\": 1,\n            \"memory\": \"{ 4.GB * task.attempt }\",\n            \"container\": \"flowcraft/prinseq\",\n            \"version\": \"0.20.4-1\"\n        }}\n\n        self.status_channels = [\n            \"filter_poly\"\n        ]\n\n\nclass DownsampleFastq(Process):\n    \"\"\"Downsamples FastQ file based on depth using seqtk\n\n    This process is set with:\n\n        - ``input_type``: fastq\n        - ``output_type``: fastq\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = \"fastq\"\n\n        self.params = {\n            \"genomeSize\": {\n                \"default\": 1,\n                \"description\":\n                    \"Genome size estimate for the samples in Mb. It is used to\"\n                    \" estimate the coverage\"\n            },\n            \"depth\": {\n                \"default\": 100,\n                \"description\":\n                    \"Maximum estimated depth coverage allowed. FastQ with \"\n                    \"higher estimated depth will be subsampled to this value.\"\n            },\n            \"seed\":{\n                \"default\": 100,\n                \"description\": \"The seed number for seqtk. By default it is 100\"\n                               \"and should be equal for both pairs of \"\n                               \"reads.\"\n            },\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\"downsample_fastq\": {\n            \"cpus\": 1,\n            \"memory\": \"{ 4.GB * task.attempt }\",\n            \"container\": \"flowcraft/seqtk\",\n            \"version\": \"1.3.0-3\"\n        }}\n\n        self.status_channels = [\n            \"downsample_fastq\"\n        ]\n"
  },
  {
    "path": "flowcraft/generator/components/typing.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass SeqTyping(Process):\n    \"\"\"\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = None\n\n        self.link_start = None\n\n        self.directives = {\"seq_typing\": {\n            \"cpus\": 4,\n            \"memory\": \"'4GB'\",\n            \"container\": \"flowcraft/seq_typing\",\n            \"version\": \"2.0-1\"\n        }}\n\n        self.params = {\n            \"referenceFileO\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Fasta file containing reference sequences. If more\"\n                    \"than one file is passed via the 'referenceFileH parameter\"\n                    \", a reference sequence for each file will be determined. \"\n            },\n            \"referenceFileH\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Fasta file containing reference sequences. If more\"\n                    \"than one file is passed via the 'referenceFileO parameter\"\n                    \", a reference sequence for each file will be determined. \"\n            }\n        }\n\n\nclass PathoTyping(Process):\n    \"\"\"\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = None\n\n        self.ignore_type = True\n\n        self.params = {\n            \"species\": {\n                \"default\": \"null\",\n                \"description\":\n                    \"Species name. Must be the complete species name with\"\n                    \"genus and species, e.g.: 'Yersinia enterocolitica'. \"\n            }\n        }\n\n        self.link_start = None\n        self.link_end.append({\"link\": \"MAIN_raw\",\n                              \"alias\": \"SIDE_PathoType_raw\"})\n\n        self.directives = {\"patho_typing\": {\n            \"cpus\": 4,\n            \"memory\": \"'4GB'\",\n            \"container\": \"flowcraft/patho_typing\",\n            \"version\": \"0.3.0-1\"\n        }}\n\n\nclass Sistr(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.directives = {\"sistr\": {\n            \"cpus\": 4,\n            \"memory\": \"'4GB'\",\n            \"container\": \"ummidock/sistr_cmd\",\n            \"version\": \"1.0.2\"\n        }}\n\n\nclass Momps(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = None\n\n        self.link_end.append({\"link\": \"__fastq\", \"alias\": \"_LAST_fastq\"})\n\n        self.params = {\n            \"clearInput\": {\n                \"default\": \"false\",\n                \"description\":\n                    \"Permanently removes temporary input files. This option \"\n                    \"is only useful to remove temporary files in large \"\n                    \"workflows and prevents nextflow's resume functionality. \"\n                    \"Use with caution.\"\n            }\n        }\n\n        self.directives = {\n            \"momps\": {\n                \"cpus\": 3,\n                \"memory\": \"'4GB'\",\n                \"container\": \"flowcraft/momps\",\n                \"version\": \"0.1.1-1\"\n            }\n        }\n\n\nclass DengueTyping(Process):\n    \"\"\"\n\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fasta\"\n        self.output_type = \"fasta\"\n\n        self.link_end.append({\"link\": \"__fastq\", \"alias\": \"_LAST_fastq\"})\n\n        self.link_start.extend([\"_ref_seqTyping\"])\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"ref/DENV_TYPING_DB_V2.fasta\",\n                \"description\":\n                    \"Typing database.\"\n            },\n            \"get_genome\": {\n                \"default\": \"true\",\n                \"description\":\n                    \"Retrieves the sequence of the closest reference.\"\n            }\n        }\n\n        self.directives = {\"dengue_typing_assembly\": {\n            \"cpus\": 4,\n            \"memory\": \"'1GB'\",\n            \"container\": \"flowcraft/seq_typing\",\n            \"version\": \"2.0-1\"\n        },\n            \"dengue_typing_reads\": {\n                \"cpus\": 4,\n                \"memory\": \"{ 5.GB * task.attempt }\",\n                \"container\": \"ummidock/seq_typing\",\n                \"version\": \"2.2-02\"\n            }\n        }\n\n        self.status_channels = [\n            \"dengue_typing_assembly\",\n            \"dengue_typing_reads\"\n        ]\n\n\nclass Seroba(Process):\n    \"\"\"\n    Serotyping of Streptococcus pneumoniae sequencing data\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"fastq\"\n        self.output_type = None\n\n        self.params = {\n            \"coverage\": {\n                \"default\": \"20\",\n                \"description\":\n                    \"Threshold for k-mer coverage of the reference sequence (default = 20)\"\n            }\n        }\n\n        self.directives = {\n            \"seroba\": {\n                \"cpus\": 3,\n                \"memory\": \"'4GB'\",\n                \"container\": \"sangerpathogens/seroba\",\n                \"version\": \"latest\"\n            }\n        }"
  },
  {
    "path": "flowcraft/generator/components/variant_calling.py",
    "content": "try:\n    from generator.process import Process\nexcept ImportError:\n    from flowcraft.generator.process import Process\n\n\nclass Haplotypecaller(Process):\n    \"\"\"Call germline SNPs and indels via local re-assembly of haplotypes\n\n        This process is set with:\n\n            - ``input_type``: bam\n            - ``output_type``: vcf\n            - ``ptype``: varaint calling\n\n        \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = \"bam\"\n\n        self.params = {\n            \"reference\": {\n                \"default\": \"null\",\n                \"description\": \"Specifies the reference genome to be provided \"\n                               \"to GATK HaplotypeCaller.\"\n            },\n            \"intervals\": {\n                \"default\": \"null\",\n                \"description\": \"Interval list file to specify the regions to call variants.\"\n            }\n        }\n\n        self.directives = {\n            \"haplotypecaller\": {\n                \"container\": \"broadinstitute/gatk\",\n                \"memory\": \"{2.Gb*task.attempt}\",\n                \"cpus\": 4,\n            },\n            \"merge_vcfs\": {\n                \"container\": \"broadinstitute/gatk\",\n                \"memory\": \"{5.Gb*task.attempt}\",\n                \"cpus\": 4,\n            }\n        }\n\n        self.status_channels = [\n            \"haplotypecaller\",\n            \"merge_vcfs\"\n        ]"
  },
  {
    "path": "flowcraft/generator/engine.py",
    "content": "import os\nimport sys\nimport json\nimport jinja2\nimport shutil\nimport logging\nimport requests\n\nfrom collections import defaultdict\nfrom os.path import dirname, join, abspath, split, splitext, exists, basename\n\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\ntry:\n    import generator.process as pc\n    import generator.error_handling as eh\n    from __init__ import __version__\n    from generator import header_skeleton as hs\n    from generator import footer_skeleton as fs\n    from generator.process_details import colored_print\n    from generator.pipeline_parser import guess_process\nexcept ImportError:\n    import flowcraft.generator.process as pc\n    import flowcraft.generator.error_handling as eh\n    from flowcraft import __version__\n    from flowcraft.generator import header_skeleton as hs\n    from flowcraft.generator import footer_skeleton as fs\n    from flowcraft.generator.process_details import colored_print\n    from flowcraft.generator.pipeline_parser import guess_process\n\n\nclass NextflowGenerator:\n\n    def __init__(self, process_connections, nextflow_file, process_map,\n                 pipeline_name=\"flowcraft\", ignore_dependencies=False,\n                 auto_dependency=True, merge_params=True, export_params=False):\n\n        self.processes = []\n\n        self.process_map = process_map\n        \"\"\"\n        dict: Maps the nextflow template name to the corresponding Process\n        class of the component.\n        \"\"\"\n\n        # Create the processes attribute with the first special init process.\n        # This process will handle the forks of the raw input channels and\n        # secondary inputs\n        self.processes = [pc.Init(template=\"init\")]\n        \"\"\"\n        list: Stores the process interfaces in the specified order\n        \"\"\"\n\n        self._fork_tree = defaultdict(list)\n        \"\"\"\n        dict: A dictionary with the fork tree of the pipeline, which consists\n        on the the paths of each lane. For instance, a single fork with two\n        sinks is represented as: {1: [2,3]}. Subsequent forks are then added\n        sequentially: {1:[2,3], 2:[3,4,5]}. This allows the path upstream\n        of a process in a given lane to be traversed until the start of the\n        pipeline.\n        \"\"\"\n\n        self.lanes = 0\n        \"\"\"\n        int: Stores the number of lanes in the pipelines\n        \"\"\"\n\n        self.export_parameters = export_params\n        \"\"\"\n        bool: Determines whether the build mode is only for the export of \n        parameters in JSON format. Setting to True will disabled some checks,\n        such as component dependency requirements\n        \"\"\"\n\n        # When the export_params option is used, disable the auto dependency\n        # feature automatically\n        auto_deps = auto_dependency if not self.export_parameters else False\n\n        # Builds the connections in the processes, which parses the\n        # process_connections dictionary into the self.processes attribute\n        # list.\n        self._build_connections(process_connections, ignore_dependencies,\n                                auto_deps)\n\n        self.nf_file = nextflow_file\n        \"\"\"\n        str: Path to file where the pipeline will be generated\n        \"\"\"\n\n        self.pipeline_name = pipeline_name\n        \"\"\"\n        str: Name of the pipeline, for customization and help purposes.\n        \"\"\"\n\n        self.template = \"\"\n        \"\"\"\n        str: String that will harbour the pipeline code\n        \"\"\"\n\n        self.secondary_channels = {}\n        \"\"\"\n        dict: Stores secondary channel links\n        \"\"\"\n\n        self.main_raw_inputs = {}\n        \"\"\"\n        list: Stores the main raw inputs from the user parameters into the\n        first process(es).\n        \"\"\"\n\n        self.merge_params = merge_params\n        \"\"\"\n        bool: Determines whether the params of the pipeline should be merged\n        (i.e., the same param name in multiple components is merged into one)\n        or if they should be unique and specific to each component.\n        \"\"\"\n\n        self.extra_inputs = {}\n        \"\"\"\n        \"\"\"\n\n        self.status_channels = []\n        \"\"\"\n        list: Stores the status channels from each process\n        \"\"\"\n\n        self.skip_class = [pc.Compiler]\n        \"\"\"\n        list: Stores the Process classes that should be skipped when iterating\n        over the :attr:`~NextflowGenerator.processes` list.\n        \"\"\"\n\n        self.resources = \"\"\n        \"\"\"\n        str: Stores the resource directives string for each nextflow process.\n        See :func:`NextflowGenerator._get_resources_string`.\n        \"\"\"\n\n        self.containers = \"\"\n        \"\"\"\n        str: Stores the container directives string for each nextflow process.\n        See :func:`NextflowGenerator._get_container_string`.\n        \"\"\"\n\n        self.params = \"\"\n        \"\"\"\n        str: Stores the params directives string for the nextflow pipeline.\n        See :func:`NextflowGenerator._get_params_string`\n        \"\"\"\n\n        self.config = \"\"\n        \"\"\"\n        str: Stores de configuration for the nextflow pipeline.\n        See :func:`NextflowGenerator._get_config_string`\n        \"\"\"\n\n        self.user_config = \"\"\n        \"\"\"\n        str: Stores the user configuration file placeholder. This is an\n        empty configuration file that is only added the first time to a\n        project directory. If the file already exists, it will not overwrite\n        it.\n        \"\"\"\n\n        self.compilers = {\n            \"patlas_consensus\": {\n                \"cls\": pc.PatlasConsensus,\n                \"template\": \"patlas_consensus\"\n            }\n        }\n        \"\"\"\n        dict: Maps the information about each available compiler process in\n        flowcraft. The key of each entry is the name/signature of the\n        compiler process. The value is a json/dict object that contains two\n        key:pair values:\n            - ``cls``: The reference to the compiler class object.\n            - ``template``: The nextflow template file of the process.\n        \"\"\"\n\n\n    @staticmethod\n    def _parse_process_name(name_str):\n        \"\"\"Parses the process string and returns the process name and its\n        directives\n\n        Process strings my contain directive information with the following\n        syntax::\n\n            proc_name={'directive':'val'}\n\n        This method parses this string and returns the process name as a\n        string and the directives information as a dictionary.\n\n        Parameters\n        ----------\n        name_str : str\n            Raw string with process name and, potentially, directive\n            information\n\n        Returns\n        -------\n        str\n            Process name\n        dict or None\n            Process directives\n        \"\"\"\n\n        directives = None\n\n        fields = name_str.split(\"=\")\n        process_name = fields[0]\n\n        if len(fields) == 2:\n            _directives = fields[1].replace(\"'\", '\"')\n            try:\n                directives = json.loads(_directives)\n            except json.decoder.JSONDecodeError:\n                raise eh.ProcessError(\n                    \"Could not parse directives for process '{}'. The raw\"\n                    \" string is: {}\\n\"\n                    \"Possible causes include:\\n\"\n                    \"\\t1. Spaces inside directives\\n\"\n                    \"\\t2. Missing '=' symbol before directives\\n\"\n                    \"\\t3. Missing quotes (' or \\\") around directives\\n\"\n                    \"A valid example: process_name={{'cpus':'2'}}\".format(\n                        process_name, name_str))\n\n        return process_name, directives\n\n    def _build_connections(self, process_list, ignore_dependencies,\n                           auto_dependency):\n        \"\"\"Parses the process connections dictionaries into a process list\n\n        This method is called upon instantiation of the NextflowGenerator\n        class. Essentially, it sets the main input/output channel names of the\n        processes so that they can be linked correctly.\n\n        If a connection between two consecutive process is not possible due\n        to a mismatch in the input/output types, it exits with an error.\n\n        Returns\n        -------\n\n        \"\"\"\n\n        logger.debug(\"=============================\")\n        logger.debug(\"Building pipeline connections\")\n        logger.debug(\"=============================\")\n\n        logger.debug(\"Processing connections: {}\".format(process_list))\n\n        for p, con in enumerate(process_list):\n\n            logger.debug(\"Processing connection '{}': {}\".format(p, con))\n\n            # Get lanes\n            in_lane = con[\"input\"][\"lane\"]\n            out_lane = con[\"output\"][\"lane\"]\n            logger.debug(\"[{}] Input lane: {}\".format(p, in_lane))\n            logger.debug(\"[{}] Output lane: {}\".format(p, out_lane))\n\n            # Update the total number of lines of the pipeline\n            if out_lane > self.lanes:\n                self.lanes = out_lane\n\n            # Get process names and directives for the output process\n            p_in_name, p_out_name, out_directives = self._get_process_names(\n                con, p)\n\n            # Check if process is available or correctly named\n            if p_out_name not in self.process_map:\n                logger.error(colored_print(\n                    \"\\nThe process '{}' is not available.\"\n                        .format(p_out_name), \"red_bold\"))\n                guess_process(p_out_name, self.process_map)\n                sys.exit(1)\n\n            # Instance output process\n            out_process = self.process_map[p_out_name](template=p_out_name)\n\n            # Update directives, if provided\n            if out_directives:\n                out_process.update_attributes(out_directives)\n\n            # Set suffix strings for main input/output channels. Suffixes are\n            # based on the lane and the arbitrary and unique process id\n            # e.g.: 'process_1_1'\n            input_suf = \"{}_{}\".format(in_lane, p)\n            output_suf = \"{}_{}\".format(out_lane, p)\n            logger.debug(\"[{}] Setting main channels with input suffix '{}'\"\n                         \" and output suffix '{}'\".format(\n                            p, input_suf, output_suf))\n            out_process.set_main_channel_names(input_suf, output_suf, out_lane)\n\n            # Instance input process, if it exists. In case of init, the\n            # output process forks from the raw input user data\n            if p_in_name != \"__init__\":\n                # Create instance of input process\n                in_process = self.process_map[p_in_name](template=p_in_name)\n                # Test if two processes can be connected by input/output types\n                logger.debug(\"[{}] Testing connection between input and \"\n                             \"output processes\".format(p))\n                self._test_connection(in_process, out_process)\n                out_process.parent_lane = in_lane\n            else:\n                # When the input process is __init__, set the parent_lane\n                # to None. This will tell the engine that this process\n                # will receive the main input from the raw user input.\n                out_process.parent_lane = None\n            logger.debug(\"[{}] Parent lane: {}\".format(\n                p, out_process.parent_lane))\n\n            # If the current connection is a fork, add it to the fork tree\n            if in_lane != out_lane:\n                logger.debug(\"[{}] Connection is a fork. Adding lanes to \"\n                             \"fork list\".format(p))\n                self._fork_tree[in_lane].append(out_lane)\n                # Update main output fork of parent process\n                try:\n                    parent_process = [\n                        x for x in self.processes if x.lane == in_lane and\n                        x.template == p_in_name\n                    ][0]\n                    logger.debug(\n                        \"[{}] Updating main forks of parent fork '{}' with\"\n                        \" '{}'\".format(p, parent_process,\n                                       out_process.input_channel))\n                    parent_process.update_main_forks(out_process.input_channel)\n                except IndexError:\n                    pass\n            else:\n                # Get parent process, naive version\n                parent_process = self.processes[-1]\n\n                # Check if the last process' lane matches the lane of the\n                # current output process. If not, get the last process\n                # in the same lane\n                if parent_process.lane and parent_process.lane != out_lane:\n                    parent_process = [x for x in self.processes[::-1]\n                                      if x.lane == out_lane][0]\n\n                if parent_process.output_channel:\n                    logger.debug(\n                        \"[{}] Updating input channel of output process\"\n                        \" with '{}'\".format(\n                            p, parent_process.output_channel))\n                    out_process.input_channel = parent_process.output_channel\n\n            # Check for process dependencies\n            if out_process.dependencies and not ignore_dependencies:\n                logger.debug(\"[{}] Dependencies found for process '{}': \"\n                             \"{}\".format(p, p_out_name,\n                                         out_process.dependencies))\n                parent_lanes = self._get_fork_tree(out_lane)\n                for dep in out_process.dependencies:\n                    if not self._search_tree_backwards(dep, parent_lanes):\n                        if auto_dependency:\n                            self._add_dependency(\n                                out_process, dep, in_lane, out_lane, p)\n                        elif not self.export_parameters:\n                            logger.error(colored_print(\n                                \"\\nThe following dependency of the process\"\n                                \" '{}' is missing: {}\".format(p_out_name, dep),\n                                \"red_bold\"))\n                            sys.exit(1)\n\n            self.processes.append(out_process)\n\n        logger.debug(\"Completed connections: {}\".format(self.processes))\n        logger.debug(\"Fork tree: {}\".format(self._fork_tree))\n\n    def _get_process_names(self, con, pid):\n        \"\"\"Returns the input/output process names and output process directives\n\n        Parameters\n        ----------\n        con : dict\n            Dictionary with the connection information between two processes.\n\n        Returns\n        -------\n        input_name : str\n            Name of the input process\n        output_name : str\n            Name of the output process\n        output_directives : dict\n            Parsed directives from the output process\n        \"\"\"\n\n        try:\n            _p_in_name = con[\"input\"][\"process\"]\n            p_in_name, _ = self._parse_process_name(_p_in_name)\n            logger.debug(\"[{}] Input channel: {}\".format(pid, p_in_name))\n            _p_out_name = con[\"output\"][\"process\"]\n            p_out_name, out_directives = self._parse_process_name(\n                _p_out_name)\n            logger.debug(\"[{}] Output channel: {}\".format(pid, p_out_name))\n        # Exception is triggered when the process name/directives cannot\n        # be processes.\n        except eh.ProcessError as ex:\n            logger.error(colored_print(ex.value, \"red_bold\"))\n            sys.exit(1)\n\n        return p_in_name, p_out_name, out_directives\n\n    def _add_dependency(self, p, template, inlane, outlane, pid):\n        \"\"\"Automatically Adds a dependency of a process.\n\n        This method adds a template to the process list attribute as a\n        dependency. It will adapt the input lane, output lane and process\n        id of the process that depends on it.\n\n        Parameters\n        ----------\n        p : Process\n            Process class that contains the dependency.\n        template : str\n            Template name of the dependency.\n        inlane : int\n            Input lane.\n        outlane : int\n            Output lane.\n        pid : int\n            Process ID.\n        \"\"\"\n\n        dependency_proc = self.process_map[template](template=template)\n\n        if dependency_proc.input_type != p.input_type:\n            logger.error(\"Cannot automatically add dependency with different\"\n                         \" input type. Input type of process '{}' is '{}.\"\n                         \" Input type of dependency '{}' is '{}'\".format(\n                            p.template, p.input_type, template,\n                            dependency_proc.input_type))\n\n        input_suf = \"{}_{}_dep\".format(inlane, pid)\n        output_suf = \"{}_{}_dep\".format(outlane, pid)\n        dependency_proc.set_main_channel_names(input_suf, output_suf, outlane)\n\n        # To insert the dependency process before the current process, we'll\n        # need to move the input channel name of the later to the former, and\n        # set a new connection between the dependency and the process.\n        dependency_proc.input_channel = p.input_channel\n        p.input_channel = dependency_proc.output_channel\n\n        # If the current process was the first in the pipeline, change the\n        # lanes so that the dependency becomes the first process\n        if not p.parent_lane:\n            p.parent_lane = outlane\n            dependency_proc.parent_lane = None\n        else:\n            dependency_proc.parent_lane = inlane\n            p.parent_lane = outlane\n\n        self.processes.append(dependency_proc)\n\n    def _search_tree_backwards(self, template, parent_lanes):\n        \"\"\"Searches the process tree backwards in search of a provided process\n\n        The search takes into consideration the provided parent lanes and\n        searches only those\n\n        Parameters\n        ----------\n        template : str\n            Name of the process template attribute being searched\n        parent_lanes : list\n            List of integers with the parent lanes to be searched\n\n        Returns\n        -------\n        bool\n            Returns True when the template is found. Otherwise returns False.\n        \"\"\"\n\n        for p in self.processes[::-1]:\n\n            # Ignore process in different lanes\n            if p.lane not in parent_lanes:\n                continue\n\n            # template found\n            if p.template == template:\n                return True\n\n        return False\n\n    @staticmethod\n    def _test_connection(parent_process, child_process):\n        \"\"\"Tests if two processes can be connected by input/output type\n\n        Parameters\n        ----------\n        parent_process : flowcraft.Process.Process\n            Process that will be sending output.\n        child_process : flowcraft.Process.Process\n            Process that will receive output.\n\n        \"\"\"\n\n        # If any of the processes has an ignore type attribute set to True,\n        # don't perform the check\n        if parent_process.ignore_type or child_process.ignore_type:\n            return\n\n        if parent_process.output_type != child_process.input_type:\n            logger.error(\n                \"The output of the '{}' process ({}) cannot link with the \"\n                \"input of the '{}' process ({}). Please check the order of \"\n                \"the processes\".format(parent_process.template,\n                                       parent_process.output_type,\n                                       child_process.template,\n                                       child_process.input_type))\n            sys.exit(1)\n\n    def _build_header(self):\n        \"\"\"Adds the header template to the master template string\n        \"\"\"\n\n        logger.debug(\"===============\")\n        logger.debug(\"Building header\")\n        logger.debug(\"===============\")\n        self.template += hs.header\n\n    def _build_footer(self):\n        \"\"\"Adds the footer template to the master template string\"\"\"\n\n        logger.debug(\"===============\")\n        logger.debug(\"Building header\")\n        logger.debug(\"===============\")\n        self.template += fs.footer\n\n    def _update_raw_input(self, p, sink_channel=None, input_type=None):\n        \"\"\"Given a process, this method updates the\n        :attr:`~Process.main_raw_inputs` attribute with the corresponding\n        raw input channel of that process. The input channel and input type\n        can be overridden if the `input_channel` and `input_type` arguments\n        are provided.\n\n        Parameters\n        ----------\n        p : flowcraft.Process.Process\n            Process instance whose raw input will be modified\n        sink_channel: str\n            Sets the channel where the raw input will fork into. It overrides\n            the process's `input_channel` attribute.\n        input_type: str\n            Sets the type of the raw input. It overrides the process's\n            `input_type` attribute.\n        \"\"\"\n\n        process_input = input_type if input_type else p.input_type\n        process_channel = sink_channel if sink_channel else p.input_channel\n\n        logger.debug(\"[{}] Setting raw input channel \"\n                     \"with input type '{}'\".format(p.template, process_input))\n        # Get the dictionary with the raw forking information for the\n        # provided input\n        raw_in = p.get_user_channel(process_channel, process_input)\n        logger.debug(\"[{}] Fetched process raw user: {}\".format(p.template,\n                                                                raw_in))\n\n        if process_input in self.main_raw_inputs:\n            self.main_raw_inputs[process_input][\"raw_forks\"].append(\n                raw_in[\"input_channel\"])\n        else:\n            self.main_raw_inputs[process_input] = {\n                \"channel\": raw_in[\"channel\"],\n                \"channel_str\": \"{}\\n{} = {}\".format(\n                    raw_in[\"checks\"].format(raw_in[\"params\"]),\n                    raw_in[\"channel\"],\n                    raw_in[\"channel_str\"].format(raw_in[\"params\"])),\n                \"raw_forks\": [raw_in[\"input_channel\"]]\n            }\n        logger.debug(\"[{}] Updated main raw inputs: {}\".format(\n            p.template, self.main_raw_inputs))\n\n    def _update_extra_inputs(self, p):\n        \"\"\"Given a process, this method updates the\n        :attr:`~Process.extra_inputs` attribute with the corresponding extra\n        inputs of that process\n\n        Parameters\n        ----------\n        p : flowcraft.Process.Process\n        \"\"\"\n\n        if p.extra_input:\n            logger.debug(\"[{}] Found extra input: {}\".format(\n                p.template, p.extra_input))\n\n            if p.extra_input == \"default\":\n                # Check if the default type is now present in the main raw\n                # inputs. If so, issue an error. The default param can only\n                # be used when not present in the main raw inputs\n                if p.input_type in self.main_raw_inputs:\n                    logger.error(colored_print(\n                        \"\\nThe default input param '{}' of the process '{}'\"\n                        \" is already specified as a main input parameter of\"\n                        \" the pipeline. Please choose a different extra_input\"\n                        \" name.\".format(p.input_type, p.template), \"red_bold\"))\n                    sys.exit(1)\n                param = p.input_type\n            else:\n                param = p.extra_input\n\n            dest_channel = \"EXTRA_{}_{}\".format(p.template, p.pid)\n\n            if param not in self.extra_inputs:\n                self.extra_inputs[param] = {\n                    \"input_type\": p.input_type,\n                    \"channels\": [dest_channel]\n                }\n            else:\n                if self.extra_inputs[param][\"input_type\"] != p.input_type:\n                    logger.error(colored_print(\n                        \"\\nThe extra_input parameter '{}' for process\"\n                        \" '{}' was already defined with a different \"\n                        \"input type '{}'. Please choose a different \"\n                        \"extra_input name.\".format(\n                            p.input_type, p.template,\n                            self.extra_inputs[param][\"input_type\"]),\n                        \"red_bold\"))\n                    sys.exit(1)\n                self.extra_inputs[param][\"channels\"].append(dest_channel)\n\n            logger.debug(\"[{}] Added extra channel '{}' linked to param: '{}' \"\n                         \"\".format(p.template, param,\n                                   self.extra_inputs[param]))\n            p.update_main_input(\n                \"{}.mix({})\".format(p.input_channel, dest_channel)\n            )\n\n    def _get_fork_tree(self, lane):\n        \"\"\"\n\n        Parameters\n        ----------\n        p\n\n        Returns\n        -------\n        \"\"\"\n\n        parent_lanes = [lane]\n\n        while True:\n            original_lane = lane\n            for fork_in, fork_out in self._fork_tree.items():\n                if lane in fork_out:\n                    lane = fork_in\n                    parent_lanes.append(fork_in)\n            if lane == original_lane:\n                break\n\n        return parent_lanes\n\n    def _set_implicit_link(self, p, link):\n        \"\"\"\n\n        Parameters\n        ----------\n        p\n        link\n\n        Returns\n        -------\n\n        \"\"\"\n\n        output_type = link[\"link\"].lstrip(\"_\")\n        parent_forks = self._get_fork_tree(p.lane)\n        fork_sink = \"{}_{}\".format(link[\"alias\"], p.pid)\n\n        for proc in self.processes[::-1]:\n            if proc.lane not in parent_forks:\n                continue\n            if proc.output_type == output_type:\n                proc.update_main_forks(fork_sink)\n                logger.debug(\"[{}] Found special implicit link '{}' with \"\n                             \"output type '{}'. Linked '{}' with process \"\n                             \"{}\".format(\n                                     p.template, link[\"link\"], output_type,\n                                     link[\"alias\"], proc))\n                return\n\n        self._update_raw_input(p, fork_sink, output_type)\n\n    def _update_secondary_channels(self, p):\n        \"\"\"Given a process, this method updates the\n        :attr:`~Process.secondary_channels` attribute with the corresponding\n        secondary inputs of that channel.\n\n        The rationale of the secondary channels is the following:\n\n            - Start storing any secondary emitting channels, by checking the\n              `link_start` list attribute of each process. If there are\n              channel names in the link start, it adds to the secondary\n              channels dictionary.\n            - Check for secondary receiving channels, by checking the\n              `link_end` list attribute. If the link name starts with a\n              `__` signature, it will created an implicit link with the last\n              process with an output type after the signature. Otherwise,\n              it will check is a corresponding link start already exists in\n              the at least one process upstream of the pipeline and if so,\n              it will update the ``secondary_channels`` attribute with the\n              new link.\n\n        Parameters\n        ----------\n        p : flowcraft.Process.Process\n        \"\"\"\n\n        # Check if the current process has a start of a secondary\n        # side channel\n        if p.link_start:\n            logger.debug(\"[{}] Found secondary link start: {}\".format(\n                p.template, p.link_start))\n            for l in p.link_start:\n                # If there are multiple link starts in the same lane, the\n                # last one is the only one saved.\n                if l in self.secondary_channels:\n                    self.secondary_channels[l][p.lane] = {\"p\": p, \"end\": []}\n                else:\n                    self.secondary_channels[l] = {p.lane: {\"p\": p, \"end\": []}}\n\n        # check if the current process receives a secondary side channel.\n        # If so, add to the links list of that side channel\n        if p.link_end:\n            logger.debug(\"[{}] Found secondary link end: {}\".format(\n                p.template, p.link_end))\n            for l in p.link_end:\n\n                # Get list of lanes from the parent forks.\n                parent_forks = self._get_fork_tree(p.lane)\n\n                # Parse special case where the secondary channel links with\n                # the main output of the specified type\n                if l[\"link\"].startswith(\"__\"):\n                    self._set_implicit_link(p, l)\n                    continue\n\n                # Skip if there is no match for the current link in the\n                # secondary channels\n                if l[\"link\"] not in self.secondary_channels:\n                    continue\n\n                for lane in parent_forks:\n                    if lane in self.secondary_channels[l[\"link\"]]:\n                        self.secondary_channels[\n                            l[\"link\"]][lane][\"end\"].append(\"{}\".format(\n                                \"{}_{}\".format(l[\"alias\"], p.pid)))\n\n        logger.debug(\"[{}] Secondary links updated: {}\".format(\n            p.template, self.secondary_channels))\n\n    def _set_channels(self):\n        \"\"\"Sets the main channels for the pipeline\n\n        This method will parse de the :attr:`~Process.processes` attribute\n        and perform the following tasks for each process:\n\n            - Sets the input/output channels and main input forks and adds\n              them to the process's\n              :attr:`flowcraft.process.Process._context`\n              attribute (See\n              :func:`~NextflowGenerator.set_channels`).\n            - Automatically updates the main input channel of the first\n              process of each lane so that they fork from the user provide\n              parameters (See\n              :func:`~NextflowGenerator._update_raw_input`).\n            - Check for the presence of secondary channels and adds them to the\n              :attr:`~NextflowGenerator.secondary_channels` attribute.\n\n        Notes\n        -----\n        **On the secondary channel setup**: With this approach, there can only\n        be one secondary link start for each type of secondary link. For\n        instance, If there are two processes that start a secondary channel\n        for the ``SIDE_max_len`` channel, only the last one will be recorded,\n        and all receiving processes will get the channel from the latest\n        process. Secondary channels can only link if the source process if\n        downstream of the sink process in its \"forking\" path.\n        \"\"\"\n\n        logger.debug(\"=====================\")\n        logger.debug(\"Setting main channels\")\n        logger.debug(\"=====================\")\n\n        for i, p in enumerate(self.processes):\n\n            # Set main channels for the process\n            logger.debug(\"[{}] Setting main channels with pid: {}\".format(\n                p.template, i))\n            p.set_channels(pid=i)\n\n            # If there is no parent lane, set the raw input channel from user\n            logger.debug(\"{} {} {}\".format(p.parent_lane, p.input_type, p.template))\n            if not p.parent_lane and p.input_type:\n                self._update_raw_input(p)\n\n            self._update_extra_inputs(p)\n\n            self._update_secondary_channels(p)\n\n            logger.info(colored_print(\n                \"\\tChannels set for {} \\u2713\".format(p.template)))\n\n    def _set_init_process(self):\n        \"\"\"Sets the main raw inputs and secondary inputs on the init process\n\n        This method will fetch the :class:`flowcraft.process.Init` process\n        instance and sets the raw input (\n        :func:`flowcraft.process.Init.set_raw_inputs`) for\n        that process. This will handle the connection of the user parameters\n        with channels that are then consumed in the pipeline.\n        \"\"\"\n\n        logger.debug(\"========================\")\n        logger.debug(\"Setting secondary inputs\")\n        logger.debug(\"========================\")\n\n        # Get init process\n        init_process = self.processes[0]\n        logger.debug(\"Setting main raw inputs: \"\n                     \"{}\".format(self.main_raw_inputs))\n        init_process.set_raw_inputs(self.main_raw_inputs)\n        logger.debug(\"Setting extra inputs: {}\".format(self.extra_inputs))\n        init_process.set_extra_inputs(self.extra_inputs)\n\n    def _set_secondary_channels(self):\n        \"\"\"Sets the secondary channels for the pipeline\n\n        This will iterate over the\n        :py:attr:`NextflowGenerator.secondary_channels` dictionary that is\n        populated when executing\n        :func:`~NextflowGenerator._update_secondary_channels` method.\n        \"\"\"\n\n        logger.debug(\"==========================\")\n        logger.debug(\"Setting secondary channels\")\n        logger.debug(\"==========================\")\n\n        logger.debug(\"Setting secondary channels: {}\".format(\n            self.secondary_channels))\n\n        for source, lanes in self.secondary_channels.items():\n\n            for vals in lanes.values():\n\n                if not vals[\"end\"]:\n                    logger.debug(\"[{}] No secondary links to setup\".format(\n                        vals[\"p\"].template))\n                    continue\n\n                logger.debug(\"[{}] Setting secondary links for \"\n                             \"source {}: {}\".format(vals[\"p\"].template,\n                                                    source,\n                                                    vals[\"end\"]))\n\n                vals[\"p\"].set_secondary_channel(source, vals[\"end\"])\n\n    def _set_compiler_channels(self):\n        \"\"\"Wrapper method that calls functions related to compiler channels\n        \"\"\"\n\n        self._set_status_channels()\n        self._set_general_compilers()\n\n    def _set_general_compilers(self):\n        \"\"\"Adds compiler channels to the :attr:`processes` attribute.\n\n        This method will iterate over the pipeline's processes and check\n        if any process is feeding channels to a compiler process. If so, that\n        compiler process is added to the pipeline and those channels are\n        linked to the compiler via some operator.\n        \"\"\"\n\n        for c, c_info in self.compilers.items():\n\n            # Instantiate compiler class object and set empty channel list\n            compiler_cls = c_info[\"cls\"](template=c_info[\"template\"])\n            c_info[\"channels\"] = []\n\n            for p in self.processes:\n                if not any([isinstance(p, x) for x in self.skip_class]):\n                    # Check if process has channels to feed to a compiler\n                    if c in p.compiler:\n                        # Correct channel names according to the pid of the\n                        # process\n                        channels = [\"{}_{}\".format(i, p.pid) for i in\n                                    p.compiler[c]]\n                        c_info[\"channels\"].extend(channels)\n\n            # If one ore more channels were detected, establish connections\n            # and append compiler to the process list.\n            if c_info[\"channels\"]:\n                compiler_cls.set_compiler_channels(c_info[\"channels\"],\n                                                   operator=\"join\")\n                self.processes.append(compiler_cls)\n\n    def _set_status_channels(self):\n        \"\"\"Compiles all status channels for the status compiler process\n        \"\"\"\n\n        status_inst = pc.StatusCompiler(template=\"status_compiler\")\n        report_inst = pc.ReportCompiler(template=\"report_compiler\")\n\n        # Compile status channels from pipeline process\n        status_channels = []\n        for p in [p for p in self.processes]:\n            if not any([isinstance(p, x) for x in self.skip_class]):\n\n                status_channels.extend(p.status_strs)\n\n        if not status_channels:\n            logger.debug(\"No status channels found. Skipping status compiler\"\n                         \"process\")\n            return\n\n        logger.debug(\"Setting status channels: {}\".format(status_channels))\n\n        # Check for duplicate channels. Raise exception if found.\n        if len(status_channels) != len(set(status_channels)):\n            raise eh.ProcessError(\n                \"Duplicate status channels detected. Please ensure that \"\n                \"the 'status_channels' attributes of each process are \"\n                \"unique. Here are the status channels:\\n\\n{}\".format(\n                    \", \".join(status_channels)\n                ))\n\n        status_inst.set_compiler_channels(status_channels)\n\n        report_channels = [\"REPORT_{}\".format(x.lstrip(\"STATUS_\")) for x in\n                           status_channels]\n\n        report_inst.set_compiler_channels(report_channels)\n\n        self.processes.extend([status_inst, report_inst])\n\n    @staticmethod\n    def _get_resources_string(res_dict, pid):\n        \"\"\" Returns the nextflow resources string from a dictionary object\n\n        If the dictionary has at least on of the resource directives, these\n        will be compiled for each process in the dictionary and returned\n        as a string read for injection in the nextflow config file template.\n\n        This dictionary should be::\n\n            dict = {\"processA\": {\"cpus\": 1, \"memory\": \"4GB\"},\n                    \"processB\": {\"cpus\": 2}}\n\n        Parameters\n        ----------\n        res_dict : dict\n            Dictionary with the resources for processes.\n        pid : int\n            Unique identified of the process\n\n        Returns\n        -------\n        str\n            nextflow config string\n        \"\"\"\n\n        config_str = \"\"\n        ignore_directives = [\"container\", \"version\"]\n\n        for p, directives in res_dict.items():\n\n            for d, val in directives.items():\n\n                if d in ignore_directives:\n                    continue\n\n                config_str += '\\n\\t${}_{}.{} = {}'.format(p, pid, d, val)\n\n        return config_str\n\n    @staticmethod\n    def _get_container_string(cont_dict, pid):\n        \"\"\" Returns the nextflow containers string from a dictionary object\n\n        If the dictionary has at least on of the container directives, these\n        will be compiled for each process in the dictionary and returned\n        as a string read for injection in the nextflow config file template.\n\n        This dictionary should be::\n\n            dict = {\"processA\": {\"container\": \"asd\", \"version\": \"1.0.0\"},\n                    \"processB\": {\"container\": \"dsd\"}}\n\n        Parameters\n        ----------\n        cont_dict : dict\n            Dictionary with the containers for processes.\n        pid : int\n            Unique identified of the process\n\n        Returns\n        -------\n        str\n            nextflow config string\n        \"\"\"\n\n        config_str = \"\"\n\n        for p, directives in cont_dict.items():\n\n            container = \"\"\n\n            if \"container\" in directives:\n                container += directives[\"container\"]\n\n                if \"version\" in directives:\n                    container += \":{}\".format(directives[\"version\"])\n                else:\n                    container += \":latest\"\n\n            if container:\n                config_str += '\\n\\t${}_{}.container = \"{}\"'.format(p, pid, container)\n\n        return config_str\n\n    def _get_params_string(self):\n        \"\"\"Returns the nextflow params string from a dictionary object.\n\n        The params dict should be a set of key:value pairs with the\n        parameter name, and the default parameter value::\n\n            self.params = {\n                \"genomeSize\": 2.1,\n                \"minCoverage\": 15\n            }\n\n        The values are then added to the string as they are. For instance,\n        a ``2.1`` float will appear as ``param = 2.1`` and a\n        ``\"'teste'\" string will appear as ``param = 'teste'`` (Note the\n        string).\n\n        Returns\n        -------\n        str\n            Nextflow params configuration string\n        \"\"\"\n\n        params_str = \"\"\n\n        for p in self.processes:\n\n            logger.debug(\"[{}] Adding parameters: {}\\n\".format(\n                p.template, p.params)\n            )\n\n            # Add an header with the template name to structure the params\n            # configuration\n            if p.params and p.template != \"init\":\n\n                p.set_param_id(\"_{}\".format(p.pid))\n                params_str += \"\\n\\t/*\"\n                params_str += \"\\n\\tComponent '{}_{}'\\n\".format(p.template,\n                                                               p.pid)\n                params_str += \"\\t{}\\n\".format(\"-\" * (len(p.template) + len(p.pid) + 12))\n                params_str += \"\\t*/\\n\"\n\n            for param, val in p.params.items():\n\n                if p.template == \"init\":\n                    param_id = param\n                else:\n                    param_id = \"{}_{}\".format(param, p.pid)\n\n                params_str += \"\\t{} = {}\\n\".format(param_id, val[\"default\"])\n\n        return params_str\n\n    def _get_merged_params_string(self):\n        \"\"\"Returns the merged nextflow params string from a dictionary object.\n\n        The params dict should be a set of key:value pairs with the\n        parameter name, and the default parameter value::\n\n            self.params = {\n                \"genomeSize\": 2.1,\n                \"minCoverage\": 15\n            }\n\n        The values are then added to the string as they are. For instance,\n        a ``2.1`` float will appear as ``param = 2.1`` and a\n        ``\"'teste'\" string will appear as ``param = 'teste'`` (Note the\n        string).\n\n        Identical parameters in multiple processes will be merged into the same\n        param.\n\n        Returns\n        -------\n        str\n            Nextflow params configuration string\n        \"\"\"\n\n        params_str = \"\"\n\n        for p in self.processes:\n\n            logger.debug(\"[{}] Adding parameters: {}\\n\".format(\n                p.template, p.params)\n            )\n\n            # Add an header with the template name to structure the params\n            # configuration\n            if p.params and p.template != \"init\":\n\n                p.set_param_id(\"_{}\".format(p.pid))\n                params_str += \"\\n\\t/*\"\n                params_str += \"\\n\\tComponent '{}_{}'\\n\".format(p.template,\n                                                               p.pid)\n                params_str += \"\\t{}\\n\".format(\"-\" * (len(p.template) + len(p.pid) + 12))\n                params_str += \"\\t*/\\n\"\n\n            for param, val in p.params.items():\n\n                if p.template == \"init\":\n                    param_id = param\n                else:\n                    param_id = \"{}_{}\".format(param, p.pid)\n\n                params_str += \"\\t{} = {}\\n\".format(param_id, val[\"default\"])\n\n        return params_str\n\n    def _get_merged_params_string(self):\n        \"\"\"Returns the merged nextflow params string from a dictionary object.\n\n        The params dict should be a set of key:value pairs with the\n        parameter name, and the default parameter value::\n\n            self.params = {\n                \"genomeSize\": 2.1,\n                \"minCoverage\": 15\n            }\n\n        The values are then added to the string as they are. For instance,\n        a ``2.1`` float will appear as ``param = 2.1`` and a\n        ``\"'teste'\" string will appear as ``param = 'teste'`` (Note the\n        string).\n\n        Identical parameters in multiple processes will be merged into the same\n        param.\n\n        Returns\n        -------\n        str\n            Nextflow params configuration string\n        \"\"\"\n\n        params_temp = {}\n\n        for p in self.processes:\n\n            logger.debug(\"[{}] Adding parameters: {}\".format(p.template,\n                                                             p.params))\n            for param, val in p.params.items():\n\n                params_temp[param] = val[\"default\"]\n\n        config_str = \"\\n\\t\" + \"\\n\\t\".join([\n            \"{} = {}\".format(param, val) for param, val in params_temp.items()\n        ])\n\n        return config_str\n\n    def _get_params_help(self):\n\n        help_list = []\n\n        for p in self.processes:\n\n            # Skip init process\n            if p.template == \"init\":\n                for param, val in p.params.items():\n                    help_list.append(\"--{:25} {} (default: {})\".format(\n                        param, val[\"description\"],\n                        str(val[\"default\"]).replace('\"', \"'\")))\n                continue\n\n            # Add component header and a line break\n            if p.params:\n                help_list.extend(\n                    [\"\",\n                     \"Component '{}_{}'\".format(p.template.upper(), p.pid),\n                     \"-\" * (len(p.template) + len(p.pid) + 13)])\n\n            for param, val in p.params.items():\n                help_list.append(\"--{:<25} {} (default: {})\".format(\n                    param + \"_\" + p.pid, val[\"description\"],\n                    str(val[\"default\"]).replace('\"', \"'\")))\n\n        return help_list\n\n    def _get_merged_params_help(self):\n        \"\"\"\n\n        Returns\n        -------\n\n        \"\"\"\n\n        help_dict = {}\n        help_list = []\n\n        for p in self.processes:\n\n            for param, val in p.params.items():\n\n                if param in help_dict:\n                    help_dict[param][\"process\"].append(p.template)\n                else:\n                    tpl = [p.template] if p.template != \"init\" else []\n                    help_dict[param] = {\"process\": tpl,\n                                        \"description\": val[\"description\"]}\n\n        # Transform process list into final template string\n        for p, val in help_dict.items():\n            if not val[\"process\"]:\n                val[\"process\"] = \"\"\n            else:\n                val[\"process\"] = \"({})\".format(\";\".join(val[\"process\"]))\n            help_list.append(\"--{:<25} {} {}\".format(\n                p, val[\"description\"], val[\"process\"]))\n\n        return help_list\n\n    @staticmethod\n    def _render_config(template, context):\n\n        tpl_dir = join(dirname(abspath(__file__)), \"templates\")\n        tpl_path = join(tpl_dir, template)\n\n        path, filename = split(tpl_path)\n\n        return jinja2.Environment(\n            loader=jinja2.FileSystemLoader(path or \"./\")\n        ).get_template(filename).render(context)\n\n    def _set_configurations(self):\n        \"\"\"This method will iterate over all process in the pipeline and\n        populate the nextflow configuration files with the directives\n        of each process in the pipeline.\n        \"\"\"\n\n        logger.debug(\"======================\")\n        logger.debug(\"Setting configurations\")\n        logger.debug(\"======================\")\n\n        resources = \"\"\n        containers = \"\"\n        params = \"\"\n        config = \"\"\n\n        if self.merge_params:\n            params += self._get_merged_params_string()\n            help_list = self._get_merged_params_help()\n        else:\n            params += self._get_params_string()\n            help_list = self._get_params_help()\n\n        for p in self.processes:\n\n            # Skip processes with the directives attribute populated\n            if not p.directives:\n                continue\n\n            logger.debug(\"[{}] Adding directives: {}\".format(\n                p.template, p.directives))\n            resources += self._get_resources_string(p.directives, p.pid)\n            containers += self._get_container_string(p.directives, p.pid)\n\n        self.resources = self._render_config(\"resources.config\", {\n            \"process_info\": resources\n        })\n        self.containers = self._render_config(\"containers.config\", {\n            \"container_info\": containers\n        })\n        self.params = self._render_config(\"params.config\", {\n            \"params_info\": params\n        })\n        self.config = self._render_config(\"nextflow.config\", {\n            \"pipeline_name\": self.pipeline_name,\n            \"nf_file\": self.nf_file\n        })\n        self.help = self._render_config(\"Helper.groovy\", {\n            \"nf_file\": basename(self.nf_file),\n            \"help_list\": help_list,\n            \"version\": __version__,\n            \"pipeline_name\": \" \".join([x.upper() for x in self.pipeline_name])\n        })\n        self.user_config = self._render_config(\"user.config\", {})\n\n    def dag_to_file(self, dict_viz, output_file=\".treeDag.json\"):\n        \"\"\"Writes dag to output file\n\n        Parameters\n        ----------\n        dict_viz: dict\n            Tree like dictionary that is used to export tree data of processes\n            to html file and here for the dotfile .treeDag.json\n\n        \"\"\"\n\n        outfile_dag = open(os.path.join(dirname(self.nf_file), output_file)\n                           , \"w\")\n        outfile_dag.write(json.dumps(dict_viz))\n        outfile_dag.close()\n\n    def render_pipeline(self):\n        \"\"\"Write pipeline attributes to json\n\n        This function writes the pipeline and their attributes to a json file,\n        that is intended to be read by resources/pipeline_graph.html to render\n        a graphical output showing the DAG.\n\n        \"\"\"\n\n        dict_viz = {\n            \"name\": \"root\",\n            \"children\": []\n        }\n        last_of_us = {}\n\n        f_tree = self._fork_tree if self._fork_tree else {1: [1]}\n\n        for x, (k, v) in enumerate(f_tree.items()):\n            for p in self.processes[1:]:\n\n                if x == 0 and p.lane not in [k] + v:\n                    continue\n\n                if x > 0 and p.lane not in v:\n                    continue\n\n                if not p.parent_lane:\n                    lst = dict_viz[\"children\"]\n                else:\n                    lst = last_of_us[p.parent_lane]\n\n                tooltip = {\n                    \"name\": \"{}_{}\".format(p.template, p.pid),\n                    \"process\": {\n                        \"pid\": p.pid,\n                        \"input\": p.input_type,\n                        \"output\": p.output_type if p.output_type else \"None\",\n                        \"lane\": p.lane,\n                    },\n                    \"children\": []\n                }\n\n                dir_var = \"\"\n                for k2, v2 in p.directives.items():\n                    dir_var += k2\n                    for d in v2:\n                        try:\n                            # Remove quotes from string directives\n                            directive = v2[d].replace(\"'\", \"\").replace('\"', '') \\\n                                if isinstance(v2[d], str) else v2[d]\n                            dir_var += \"{}: {}\".format(d, directive)\n                        except KeyError:\n                            pass\n\n                if dir_var:\n                    tooltip[\"process\"][\"directives\"] = dir_var\n                else:\n                    tooltip[\"process\"][\"directives\"] = \"N/A\"\n\n                lst.append(tooltip)\n\n                last_of_us[p.lane] = lst[-1][\"children\"]\n\n        # write to file dict_viz\n        self.dag_to_file(dict_viz)\n\n        # Write tree forking information for dotfile\n        with open(os.path.join(dirname(self.nf_file),\n                               \".forkTree.json\"), \"w\") as fh:\n            fh.write(json.dumps(self._fork_tree))\n\n        # send with jinja to html resource\n        return self._render_config(\"pipeline_graph.html\", {\"data\": dict_viz})\n\n    def write_configs(self, project_root):\n        \"\"\"Wrapper method that writes all configuration files to the pipeline\n        directory\n        \"\"\"\n\n        # Write resources config\n        with open(join(project_root, \"resources.config\"), \"w\") as fh:\n            fh.write(self.resources)\n\n        # Write containers config\n        with open(join(project_root, \"containers.config\"), \"w\") as fh:\n            fh.write(self.containers)\n\n        # Write containers config\n        with open(join(project_root, \"params.config\"), \"w\") as fh:\n            fh.write(self.params)\n\n        # Write nextflow config\n        with open(join(project_root, \"nextflow.config\"), \"w\") as fh:\n            fh.write(self.config)\n\n        # Write user config if not present in the project directory\n        if not exists(join(project_root, \"user.config\")):\n            with open(join(project_root, \"user.config\"), \"w\") as fh:\n                fh.write(self.user_config)\n\n        lib_dir = join(project_root, \"lib\")\n        if not exists(lib_dir):\n            os.makedirs(lib_dir)\n        with open(join(lib_dir, \"Helper.groovy\"), \"w\") as fh:\n            fh.write(self.help)\n\n        # Generate the pipeline DAG\n        pipeline_to_json = self.render_pipeline()\n        with open(splitext(self.nf_file)[0] + \".html\", \"w\") as fh:\n            fh.write(pipeline_to_json)\n\n    def export_params(self):\n        \"\"\"Export pipeline params as a JSON to stdout\n\n        This run mode iterates over the pipeline processes and exports the\n        params dictionary of each component as a JSON to stdout.\n        \"\"\"\n\n        params_json = {}\n\n        # Skip first init process\n        for p in self.processes[1:]:\n            params_json[p.template] = p.params\n\n        # Flush params json to stdout\n        sys.stdout.write(json.dumps(params_json))\n\n    def export_directives(self):\n        \"\"\"Export pipeline directives as a JSON to stdout\n        \"\"\"\n\n        directives_json = {}\n\n        # Skip first init process\n        for p in self.processes[1:]:\n            directives_json[p.template] = p.directives\n\n        # Flush params json to stdout\n        sys.stdout.write(json.dumps(directives_json))\n\n    def fetch_docker_tags(self):\n        \"\"\"\n        Export all dockerhub tags associated with each component given by\n        the -t flag.\n        \"\"\"\n\n        # dict to store the already parsed components (useful when forks are\n        # given to the pipeline string via -t flag\n        dict_of_parsed = {}\n\n        # fetches terminal width and subtracts 3 because we always add a\n        # new line character and we want a space at the beggining and at the end\n        # of each line\n        terminal_width = shutil.get_terminal_size().columns - 3\n\n        # first header\n        center_string = \" Selected container tags \"\n\n        # starts a list with the headers\n        tags_list = [\n            [\n                \"=\" * int(terminal_width / 4),\n                \"{0}{1}{0}\".format(\n                    \"=\" * int(((terminal_width/2 - len(center_string)) / 2)),\n                    center_string)\n                ,\n                \"{}\\n\".format(\"=\" * int(terminal_width / 4))\n            ],\n            [\"component\", \"container\", \"tags\"],\n            [\n                \"=\" * int(terminal_width / 4),\n                \"=\" * int(terminal_width / 2),\n                \"=\" * int(terminal_width / 4)\n            ]\n        ]\n\n        # Skip first init process and iterate through the others\n        for p in self.processes[1:]:\n            template = p.template\n            # if component has already been printed then skip and don't print\n            # again\n            if template in dict_of_parsed:\n                continue\n\n            # starts a list of  containers for the current process in\n            # dict_of_parsed, in which each containers will be added to this\n            # list once it gets parsed\n            dict_of_parsed[template] = {\n                \"container\": []\n            }\n\n            # fetch repo name from directives of each component.\n            for directives in p.directives.values():\n                try:\n                    repo = directives[\"container\"]\n                    default_version = directives[\"version\"]\n                except KeyError:\n                    # adds the default container if container key isn't present\n                    # this happens for instance in integrity_coverage\n                    repo = \"flowcraft/flowcraft_base\"\n                    default_version = \"1.0.0-1\"\n                # checks if repo_version already exists in list of the\n                # containers for the current component being queried\n                repo_version = repo + default_version\n                if repo_version not in dict_of_parsed[template][\"container\"]:\n                    # make the request to docker hub\n                    r = requests.get(\n                        \"https://hub.docker.com/v2/repositories/{}/tags/\"\n                        .format(repo)\n                    )\n                    # checks the status code of the request, if it is 200 then\n                    # parses docker hub entry, otherwise retrieve no tags but\n                    # alerts the user\n                    if r.status_code != 404:\n                        # parse response content to dict and fetch results key\n                        r_content = json.loads(r.content)[\"results\"]\n                        for version in r_content:\n                            printed_version = (version[\"name\"] + \"*\") \\\n                                if version[\"name\"] == default_version \\\n                                else version[\"name\"]\n                            tags_list.append([template, repo, printed_version])\n                    else:\n                        tags_list.append([template, repo, \"No DockerHub tags\"])\n\n                dict_of_parsed[template][\"container\"].append(repo_version)\n\n        # iterate through each entry in tags_list and print the list of tags\n        # for each component. Each entry (excluding the headers) contains\n        # 3 elements (component name, container and tag version)\n        for x, entry in enumerate(tags_list):\n            # adds different color to the header in the first list and\n            # if row is pair add one color and if is even add another (different\n            # background)\n            color = \"blue_bold\" if x < 3 else \\\n                (\"white\" if x % 2 != 0 else \"0;37;40m\")\n            # generates a small list with the terminal width for each column,\n            # this will be given to string formatting as the 3, 4 and 5 element\n            final_width = [\n                int(terminal_width/4),\n                int(terminal_width/2),\n                int(terminal_width/4)\n            ]\n            # writes the string to the stdout\n            sys.stdout.write(\n                colored_print(\"\\n {0: <{3}} {1: ^{4}} {2: >{5}}\".format(\n                    *entry, *final_width), color)\n            )\n        # assures that the entire line gets the same color\n        sys.stdout.write(\"\\n{0: >{1}}\\n\".format(\"(* = default)\",\n                                                terminal_width + 3))\n\n    def build(self):\n        \"\"\"Main pipeline builder\n\n        This method is responsible for building the\n        :py:attr:`NextflowGenerator.template` attribute that will contain\n        the nextflow code of the pipeline.\n\n        First it builds the header, then sets the main channels, the\n        secondary inputs, secondary channels and finally the\n        status channels. When the pipeline is built, is writes the code\n        to a nextflow file.\n        \"\"\"\n\n        logger.info(colored_print(\n            \"\\tSuccessfully connected {} process(es) with {} \"\n            \"fork(s) across {} lane(s) \\u2713\".format(\n                len(self.processes[1:]), len(self._fork_tree), self.lanes)))\n\n        # Generate regular nextflow header that sets up the shebang, imports\n        # and all possible initial channels\n        self._build_header()\n\n        self._set_channels()\n\n        self._set_init_process()\n\n        self._set_secondary_channels()\n\n        logger.info(colored_print(\n            \"\\tSuccessfully set {} secondary channel(s) \\u2713\".format(\n                len(self.secondary_channels))))\n\n        self._set_compiler_channels()\n\n        self._set_configurations()\n\n        logger.info(colored_print(\n            \"\\tFinished configurations \\u2713\"))\n\n        for p in self.processes:\n            self.template += \"\\n{}\".format(p.template_str)\n\n        self._build_footer()\n\n        project_root = dirname(self.nf_file)\n\n        # Write configs\n        self.write_configs(project_root)\n\n        # Write pipeline file\n        with open(self.nf_file, \"w\") as fh:\n            fh.write(self.template)\n\n        logger.info(colored_print(\n            \"\\tPipeline written into {} \\u2713\".format(self.nf_file)))\n"
  },
  {
    "path": "flowcraft/generator/error_handling.py",
    "content": "class ProcessError(Exception):\n    def __init__(self, value):\n        self.value = value\n\n    def __str__(self):\n        return repr(self.value)\n\n\nclass SanityError(Exception):\n    \"\"\"\n    Class to raise a custom error for sanity checks\n    \"\"\"\n    def __init__(self, value):\n        self.value = \"inSANITY ERROR: {}\".format(value)\n\n    # def __str__(self):\n    #     return repr(self.value)\n\n\nclass InspectionError(Exception):\n    def __init__(self, value):\n        self.value = \"Inspection ERROR: {}\".format(value)\n\n\nclass ReportError(Exception):\n    def __init__(self, value):\n        self.value = \"Reports ERROR: {}\".format(value)\n\n\nclass RecipeError(Exception):\n    def __init__(self, value):\n        self.value = \"Recipe ERROR: {}\".format(value)\n\n    # def __str__(self):\n    #     return repr(self.value)\n\nclass LogError(Exception):\n    def __init__(self, value):\n        self.value = \"Log ERROR: {}\".format(value)\n"
  },
  {
    "path": "flowcraft/generator/footer_skeleton.py",
    "content": "footer = \"\"\"\nworkflow.onComplete {\n  // Display complete message\n  log.info \"Completed at: \" + workflow.complete\n  log.info \"Duration    : \" + workflow.duration\n  log.info \"Success     : \" + workflow.success\n  log.info \"Exit status : \" + workflow.exitStatus\n}\n\nworkflow.onError {\n  // Display error message\n  log.info \"Workflow execution stopped with the following message:\"\n  log.info \"  \" + workflow.errorMessage\n}\n\"\"\""
  },
  {
    "path": "flowcraft/generator/header_skeleton.py",
    "content": "header = \"\"\"#!/usr/bin/env nextflow\n\nimport Helper\nimport CollectInitialMetadata\n\n// Pipeline version\nif (workflow.commitId){\n    version = \"0.1 $workflow.revision\"\n} else {\n    version = \"0.1 (local version)\"\n}\n\nparams.help = false\nif (params.help){\n    Help.print_help(params)\n    exit 0\n}\n\ndef infoMap = [:]\nif (params.containsKey(\"fastq\")){\n    infoMap.put(\"fastq\", file(params.fastq).size())\n}\nif (params.containsKey(\"fasta\")){\n    if (file(params.fasta) instanceof LinkedList){\n        infoMap.put(\"fasta\", file(params.fasta).size())\n    } else {\n        infoMap.put(\"fasta\", 1) \n    }\n}\nif (params.containsKey(\"accessions\")){\n    // checks if params.accessions is different from null\n    if (params.accessions) {\n        BufferedReader reader = new BufferedReader(new FileReader(params.accessions));\n        int lines = 0;\n        while (reader.readLine() != null) lines++;\n        reader.close();\n        infoMap.put(\"accessions\", lines)\n    }\n}\n\nHelp.start_info(infoMap, \"$workflow.start\", \"$workflow.profile\")\nCollectInitialMetadata.print_metadata(workflow)\n    \"\"\""
  },
  {
    "path": "flowcraft/generator/inspect.py",
    "content": "import re\nimport os\nimport sys\nimport uuid\nimport time\nimport curses\nimport signal\nimport locale\nimport socket\nimport logging\nimport hashlib\nimport requests\nimport json\n\nfrom pympler import asizeof\nfrom os.path import join, abspath\nfrom time import gmtime, strftime, sleep\nfrom collections import defaultdict, OrderedDict\n\ntry:\n    import generator.error_handling as eh\n    from generator.process_details import colored_print\n    from generator.utils import get_nextflow_filepath\nexcept ImportError:\n    import flowcraft.generator.error_handling as eh\n    from flowcraft.generator.process_details import colored_print\n    from flowcraft.generator.utils import get_nextflow_filepath\n\nlocale.setlocale(locale.LC_ALL, \"\")\ncode = locale.getpreferredencoding()\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n\ndef signal_handler(screen):\n    \"\"\"This function is bound to the SIGINT signal (like ctrl+c) to graciously\n    exit the program and reset the curses options.\n    \"\"\"\n\n    if screen:\n        screen.clear()\n        screen.refresh()\n\n        curses.nocbreak()\n        screen.keypad(0)\n        curses.echo()\n        curses.endwin()\n\n    print(\"Exiting flowcraft inspection... Bye\")\n    sys.exit(0)\n\n\nclass NextflowInspector:\n\n    MAX_RETRIES = 1000\n    \"\"\"\n    int: Number of retries for parsing trace and log files. Only exit with non-0\n    error code after these retries.\n    \"\"\"\n\n    def __init__(self, trace_file, refresh_rate, pretty=False, ip_addr=None):\n\n        self.trace_file = trace_file\n        \"\"\"\n        str: Path to nextflow trace file.\n        \"\"\"\n\n        self.trace_sizestamp = None\n        \"\"\"\n        str: Stores the sizestamp of the last modification of the trace file.\n        This is used to parse the file only when it has changed.\n        \"\"\"\n\n        self.refresh_rate = refresh_rate\n        \"\"\"\n        float: Frequency (in seconds) that the curses screen will be refreshed.\n        \"\"\"\n\n        self.stored_ids = []\n        \"\"\"\n        list: Stores the task_ids that have already been parsed. It is used\n        to skip them when parsing the trace files multiple times.\n        \"\"\"\n\n        self.stored_log_ids = []\n        \"\"\"\n        list: Stores the time stamps of the log file lines that were already\n        parsed. It is used to skip parsing the log files multilpe times\n        \"\"\"\n\n        self.trace_info = defaultdict(list)\n        \"\"\"\n        dict: Main object that stores the status information for each process\n        name in the trace file.\n        \"\"\"\n\n        self.process_stats = {}\n        \"\"\"\n        dict: Contains some statistics for each process.\n        \"\"\"\n\n        self.processes = OrderedDict()\n        \"\"\"\n        dict: Dictionary of processes from the pipeline with the status of the\n        channel as the value. This information is retrieved from the\n        .nextflow.log file in the :func:`_parser_pipeline_processes` method\n        and updated in the :func:`_update_barrier_status` and\n        :func:`_update_process_stats` and :func:`_update_submission_status`.\n        \"\"\"\n\n        self.process_tags = {}\n        \"\"\"\n        dict: Dictionary of processes with summary information for each tag\n        it processes\n        \"\"\"\n\n        self.samples = []\n        \"\"\"\n        list: List of samples inferred from the pipeline.\n        \"\"\"\n\n        self.skip_processes = [\"status\", \"compile_status\", \"report\",\n                               \"compile_reports\", \"fullConsensus\",\n                               \"compile_status_buffer\"]\n        \"\"\"\n        list: List of special processes that should be skipped for inspection\n        purposes.\n        \"\"\"\n\n        self.log_file = \".nextflow.log\"\n        \"\"\"\n        str: Name of the nextflow log file.\n        \"\"\"\n\n        self.log_sizestamp = None\n        \"\"\"\n        str: Stores the sizestamp of the last modification of the nextflow\n        log file. This is used to parse the file only when it has changed.\n        \"\"\"\n\n        self.pipeline_tag = \"\"\n        \"\"\"\n        str: Tag of the pipeline, parsed from .nextflow.log\n        \"\"\"\n\n        self.log_retry = 0\n        \"\"\"\n        int: Each time the log file is not found, this counter is\n        increased. Only when it matches the :attr:`MAX_RETRIES` attribute\n        does it raises a FileNotFoundError.\n        \"\"\"\n\n        self.trace_retry = 0\n        \"\"\"\n        int: Each time the log file is not found, this counter is \n        increased. Only when it matches the :attr:`MAX_RETRIES` attribute\n        does it raises a FileNotFoundError.\n        \"\"\"\n\n        self.pipeline_name = \"\"\n        \"\"\"\n        str: Name of the nextflow pipeline file.\n        \"\"\"\n\n        self.time_start = None\n        \"\"\"\n        datetime.time object with the starting time of the pipeline.\n        \"\"\"\n\n        self.time_stop = None\n        \"\"\"\n        datetime.time object with the finish time of the pipeline. This\n        attribute is only set when the pipeline is not running.\n        \"\"\"\n\n        self.workdir = os.getcwd()\n        \"\"\"\n        str: Path to the pipeline work directory\n        \"\"\"\n\n        self.execution_command = None\n        \"\"\"\n        str: The command used to execute the pipeline\n        \"\"\"\n\n        self.nextflow_version = None\n        \"\"\"\n        str: Nextflow's version string, as retrieved from the log file.\n        \"\"\"\n\n        self.run_status = \"\"\n        \"\"\"\n        str: Status of the pipeline. Can be either 'running', 'aborted',\n        'error', 'complete'.\n        \"\"\"\n\n        self.abort_cause = None\n        \"\"\"\n        str or None: When :attr:`run_status` is \"aborted\", this attribute\n        will contain the reason provided in the nextflow log. When this\n        attribute is not None, it will also trigger the sending of the\n        final lines of the nextflow log to broadcast.\n        \"\"\"\n\n        if not ip_addr:\n            self.app_address = \"http://www.flowcraft.live:80/\"\n        else:\n            self.app_address = ip_addr\n            \"\"\"\n            str: Address of flowcraft web app\n            \"\"\"\n\n        self.broadcast_address = \"{}inspect/api/status\".format(\n            self.app_address)\n        \"\"\"\n        str: Address of the REST api where the information will be sent\n        \"\"\"\n\n        self._c = 0\n        \"\"\"\n        Counter of payloads sent, for debug purposes\n        \"\"\"\n\n        self.send = True\n        \"\"\"\n        boolean: This attribute will be set to False after sending a request\n        and set to True when there is a change in the inspection attributes.\n        \"\"\"\n\n        # Skip these process names (they are check with the startswith()\n        # method) when using the --pretty option\n        if pretty:\n            self._blacklist = [\n                \"report_coverage_\", \"fastqc2_report\", \"compile_fastqc_status2\",\n                \"fastqc_report\", \"trim_report\", \"compile_fastqc_status\",\n                \"report_corrupt_\", \"jsonDumpingMapping\", \"compile_mlst_\",\n                \"mashOutputJson_\", \"mashDistOutputJson_\", \"pilon_report_\",\n                \"compile_pilon_report\"\n            ]\n        else:\n            self._blacklist = []\n\n        # CURSES ATTRIBUTES\n        # Init curses screen\n        self.screen = None\n        self.top_line = 0\n        self.padding = 0\n        self.screen_lines = None\n        self.max_width = 0\n        self.content_lines = 0\n\n        # Checks if nextflow log and trace files are available\n        self._check_required_files()\n        # Gathers the complete list of processes from the nextflow log\n        self._get_pipeline_processes()\n        # Fetches the pipeline status from the nextflow log\n        self._update_pipeline_status()\n\n        # Bind SIGINT to singal_handler function. This makes a clean exit\n        # from the curses interface when exiting through ctrl+c.\n        signal.signal(signal.SIGINT, lambda *x: signal_handler(self.screen))\n\n    #################\n    # UTILITY METHODS\n    #################\n\n    def _check_required_files(self):\n        \"\"\"Checks whetner the trace and log files are available\n        \"\"\"\n\n        if not os.path.exists(self.trace_file):\n            raise eh.InspectionError(\"The provided trace file could not be \"\n                                     \"opened: {}\".format(self.trace_file))\n\n        if not os.path.exists(self.log_file):\n            raise eh.InspectionError(\"The .nextflow.log files could not be \"\n                                     \"opened. Are you sure you are in a \"\n                                     \"nextflow project directory?\")\n\n    @staticmethod\n    def _header_mapping(header):\n        \"\"\"Parses the trace file header and retrieves the positions of each\n        column key.\n\n        Parameters\n        ----------\n        header : str\n            The header line of nextflow's trace file\n\n        Returns\n        -------\n        dict\n            Mapping the column ID to its position (e.g.: {\"tag\":2})\n        \"\"\"\n\n        return dict(\n            (x.strip(), pos) for pos, x in enumerate(header.split(\"\\t\"))\n        )\n\n    @staticmethod\n    def _expand_path(hash_str):\n        \"\"\"Expands the hash string of a process (ae/1dasjdm) into a full\n        working directory\n\n        Parameters\n        ----------\n        hash_str : str\n            Nextflow process hash with the beggining of the work directory\n\n        Returns\n        -------\n        str\n            Path to working directory of the hash string\n        \"\"\"\n\n        try:\n            first_hash, second_hash = hash_str.split(\"/\")\n            first_hash_path = join(abspath(\"work\"), first_hash)\n\n            for l in os.listdir(first_hash_path):\n                if l.startswith(second_hash):\n                    return join(first_hash_path, l)\n        except FileNotFoundError:\n            return None\n\n    @staticmethod\n    def _hms(s):\n        \"\"\"Converts a hms string into seconds.\n\n        Parameters\n        ----------\n        s : str\n            The hms string can be something like '20s', '1m30s' or '300ms'.\n\n        Returns\n        -------\n        float\n            Time in seconds.\n\n        \"\"\"\n\n        if s == \"-\":\n            return 0\n\n        if s.endswith(\"ms\"):\n            return float(s.rstrip(\"ms\")) / 1000\n\n        fields = list(map(float, re.split(\"[dhms]\", s)[:-1]))\n        if len(fields) == 4:\n            return fields[0] * 24 * 3600 + fields[1] * 3600 + fields[2] * 60 +\\\n                fields[3]\n        if len(fields) == 3:\n            return fields[0] * 3600 + fields[1] * 60 + fields[2]\n        elif len(fields) == 2:\n            return fields[0] * 60 + fields[1]\n        else:\n            return fields[0]\n\n    @staticmethod\n    def _size_coverter(s):\n        \"\"\"Converts size string into megabytes\n\n        Parameters\n        ----------\n        s : str\n            The size string can be '30KB', '20MB' or '1GB'\n\n        Returns\n        -------\n        float\n            With the size in bytes\n\n        \"\"\"\n\n        if s.upper().endswith(\"KB\"):\n            return float(s.rstrip(\"KB\")) / 1024\n\n        elif s.upper().endswith(\" B\"):\n            return float(s.rstrip(\"B\")) / 1024 / 1024\n\n        elif s.upper().endswith(\"MB\"):\n            return float(s.rstrip(\"MB\"))\n\n        elif s.upper().endswith(\"GB\"):\n            return float(s.rstrip(\"GB\")) * 1024\n\n        elif s.upper().endswith(\"TB\"):\n            return float(s.rstrip(\"TB\")) * 1024 * 1024\n\n        else:\n            return float(s)\n\n    @staticmethod\n    def _size_compress(s):\n        \"\"\"Shortens a megabytes string.\n        \"\"\"\n\n        if s / 1024 > 1:\n            return \"{}GB\".format(round(s / 1024, 1))\n        else:\n            return \"{}MB\".format(s)\n\n    #########################\n    # AUXILIARY PARSE METHODS\n    #########################\n\n    def _get_pipeline_processes(self):\n        \"\"\"Parses the .nextflow.log file and retrieves the complete list\n        of processes\n\n        This method searches for specific signatures at the beginning of the\n        .nextflow.log file::\n\n             Apr-19 19:07:32.660 [main] DEBUG nextflow.processor\n             TaskProcessor - Creating operator > report_corrupt_1_1 --\n             maxForks: 4\n\n        When a line with the .*Creating operator.* signature is found, the\n        process name is retrieved and populates the :attr:`processes` attribute\n        \"\"\"\n\n        with open(self.log_file) as fh:\n\n            for line in fh:\n                if re.match(\".*Creating operator.*\", line):\n                    # Retrieves the process name from the string\n                    match = re.match(\".*Creating operator > (.*) --\", line)\n                    process = match.group(1)\n\n                    if any([process.startswith(x) for x in self._blacklist]):\n                        continue\n\n                    if process not in self.skip_processes:\n                        self.processes[match.group(1)] = {\n                            \"barrier\": \"W\",\n                            \"submitted\": set(),\n                            \"finished\": set(),\n                            \"failed\": set(),\n                            \"retry\": set(),\n                            \"cpus\": None,\n                            \"memory\": None\n                        }\n                        self.process_tags[process] = {}\n\n                # Retrieves the pipeline name from the string\n                if re.match(\".*Launching `.*` \\[.*\\] \", line):\n                    tag_match = re.match(\".*Launching `.*` \\[(.*)\\] \", line)\n                    self.pipeline_tag = tag_match.group(1) if tag_match else \\\n                        \"?\"\n                    name_match = re.match(\".*Launching `(.*)` \\[.*\\] \", line)\n                    self.pipeline_name = name_match.group(1) if name_match \\\n                        else \"?\"\n\n        self.content_lines = len(self.processes)\n\n    def _clear_inspect(self):\n        \"\"\"Clears inspect attributes when re-executing a pipeline\"\"\"\n\n        self.trace_info = defaultdict(list)\n        self.process_tags = {}\n        self.process_stats = {}\n        self.samples = []\n        self.stored_ids = []\n        self.stored_log_ids = []\n        self.time_start = None\n        self.time_stop = None\n        self.execution_command = None\n        self.nextflow_version = None\n        self.abort_cause = None\n        self._c = 0\n        # Clean up of tag running status\n        for p in self.processes.values():\n            p[\"barrier\"] = \"W\"\n            for i in [\"submitted\", \"finished\", \"failed\", \"retry\"]:\n                p[i] = set()\n\n    def _update_pipeline_status(self):\n        \"\"\"Parses the .nextflow.log file for signatures of pipeline status.\n        It sets the :attr:`status_info` attribute.\n        \"\"\"\n\n        with open(self.log_file) as fh:\n\n            try:\n                first_line = next(fh)\n            except:\n                raise eh.InspectionError(\"Could not read .nextflow.log file. Is file empty?\")\n            time_str = \" \".join(first_line.split()[:2])\n            self.time_start = time_str\n\n            if not self.execution_command:\n                try:\n                    self.execution_command = re.match(\n                        \".*nextflow run (.*)\", first_line).group(1)\n                except AttributeError:\n                    self.execution_command = \"Unknown\"\n\n            for line in fh:\n\n                if \"DEBUG nextflow.cli.CmdRun\" in line:\n                    if not self.nextflow_version:\n                        try:\n                            vline = next(fh)\n                            self.nextflow_version = re.match(\n                                \".*Version: (.*)\", vline).group(1)\n                        except AttributeError:\n                            self.nextflow_version = \"Unknown\"\n\n                if \"Session aborted\" in line:\n                    self.run_status = \"aborted\"\n                    # Get abort cause\n                    try:\n                        self.abort_cause = re.match(\n                            \".*Cause: (.*)\", line).group(1)\n                    except AttributeError:\n                        self.abort_cause = \"Unknown\"\n                    # Get time of pipeline stop\n                    time_str = \" \".join(line.split()[:2])\n                    self.time_stop = time_str\n                    self.send = True\n                    return\n                if \"Execution complete -- Goodbye\" in line:\n                    self.run_status = \"complete\"\n                    # Get time of pipeline stop\n                    time_str = \" \".join(line.split()[:2])\n                    self.time_stop = time_str\n                    self.send = True\n                    return\n\n        if self.run_status not in [\"running\", \"\"]:\n            self._clear_inspect()\n            # Take a break to allow nextflow to restart before refreshing\n            # pipeine processes\n            sleep(5)\n            self._get_pipeline_processes()\n\n        self.run_status = \"running\"\n\n    def _update_tag_status(self, process, vals):\n        \"\"\" Updates the 'submitted', 'finished', 'failed' and 'retry' status\n        of each process/tag combination.\n\n        Process/tag combinations provided to this method already appear on\n        the trace file, so their submission status is updated based on their\n        execution status from nextflow.\n\n        For instance, if a tag is successfully\n        complete, it is moved from the 'submitted' to the 'finished' list.\n        If not, it is moved to the 'failed' list.\n\n        Parameters\n        ----------\n        process : str\n            Name of the current process. Must be present in attr:`processes`\n        vals : list\n            List of tags for this process that have been gathered in the\n            trace file.\n        \"\"\"\n\n        good_status = [\"COMPLETED\", \"CACHED\"]\n\n        # Update status of each process\n        for v in list(vals)[::-1]:\n            p = self.processes[process]\n            tag = v[\"tag\"]\n\n            # If the process/tag is in the submitted list, move it to the\n            # complete or failed list\n            if tag in p[\"submitted\"]:\n                p[\"submitted\"].remove(tag)\n                if v[\"status\"] in good_status:\n                    p[\"finished\"].add(tag)\n                elif v[\"status\"] == \"FAILED\":\n                    if not v[\"work_dir\"]:\n                        v[\"work_dir\"] = \"\"\n                    self.process_tags[process][tag][\"log\"] = \\\n                        self._retrieve_log(join(v[\"work_dir\"], \".command.log\"))\n                    p[\"failed\"].add(tag)\n\n            # It the process/tag is in the retry list and it completed\n            # successfully, remove it from the retry and fail lists. Otherwise\n            # maintain it in the retry/failed lists\n            elif tag in p[\"retry\"]:\n                if v[\"status\"] in good_status:\n                    p[\"retry\"].remove(tag)\n                    p[\"failed\"].remove(tag)\n                    del self.process_tags[process][tag][\"log\"]\n                elif self.run_status == \"aborted\":\n                    p[\"retry\"].remove(tag)\n\n            elif v[\"status\"] in good_status:\n                p[\"finished\"].add(tag)\n\n            # Filter tags without a successfull status.\n            if v[\"status\"] not in good_status:\n                if v[\"tag\"] in list(p[\"submitted\"]) + list(p[\"finished\"]):\n                    vals.remove(v)\n\n        return vals\n\n    def _update_barrier_status(self):\n        \"\"\"Checks whether the channels to each process have been closed.\n        \"\"\"\n\n        with open(self.log_file) as fh:\n\n            for line in fh:\n\n                # Exit barrier update after session abort signal\n                if \"Session aborted\" in line:\n                    return\n\n                if \"<<< barrier arrive\" in line:\n                    # Retrieve process name from string\n                    process_m = re.match(\".*process: (.*)\\)\", line)\n                    if process_m:\n                        process = process_m.group(1)\n                        # Updates process channel to complete\n                        if process in self.processes:\n                            self.processes[process][\"barrier\"] = \"C\"\n\n    @staticmethod\n    def _retrieve_log(path):\n        \"\"\"Method used to retrieve the contents of a log file into a list.\n\n        Parameters\n        ----------\n        path\n\n        Returns\n        -------\n        list or None\n            Contents of the provided file, each line as a list entry\n        \"\"\"\n\n        if not os.path.exists(path):\n            return None\n\n        with open(path) as fh:\n            return fh.readlines()\n\n    def _update_trace_info(self, fields, hm):\n        \"\"\"Parses a trace line and updates the :attr:`status_info` attribute.\n\n        Parameters\n        ----------\n        fields : list\n            List of the tab-seperated elements of the trace line\n        hm : dict\n            Maps the column IDs to their position in the fields argument.\n            This dictionary object is retrieve from :func:`_header_mapping`.\n        \"\"\"\n\n        process = fields[hm[\"process\"]]\n\n        if process not in self.processes:\n            return\n\n        # Get information from a single line of trace file\n        info = dict((column, fields[pos]) for column, pos in hm.items())\n\n        # The headers that will be used to populate the process\n        process_tag_headers = [\"realtime\", \"rss\", \"rchar\", \"wchar\"]\n        for h in process_tag_headers:\n\n            # In the rare occasion the tag is parsed first in the trace\n            # file than the log file, add the new tag.\n            if info[\"tag\"] not in self.process_tags[process]:\n                # If the 'start' tag is present in the trace, use that\n                # information. If not, it will be parsed in the log file.\n                try:\n                    timestart = info[\"start\"].split()[1]\n                except KeyError:\n                    timestart = None\n                self.process_tags[process][info[\"tag\"]] = {\n                    \"workdir\": self._expand_path(info[\"hash\"]),\n                    \"start\": timestart\n                }\n\n            if h in info and info[\"tag\"] != \"-\":\n                if h != \"realtime\" and info[h] != \"-\":\n                    self.process_tags[process][info[\"tag\"]][h] = \\\n                        round(self._size_coverter(info[h]), 2)\n                else:\n                    self.process_tags[process][info[\"tag\"]][h] = info[h]\n\n        # Set allocated cpu and memory information to process\n        if \"cpus\" in info and not self.processes[process][\"cpus\"]:\n            self.processes[process][\"cpus\"] = info[\"cpus\"]\n        if \"memory\" in info and not self.processes[process][\"memory\"]:\n            try:\n                self.processes[process][\"memory\"] = self._size_coverter(\n                    info[\"memory\"])\n            except ValueError:\n                self.processes[process][\"memory\"] = None\n\n        if info[\"hash\"] in self.stored_ids:\n            return\n\n        # If the task hash code is provided, expand it to the work directory\n        # and add a new entry\n        if \"hash\" in info:\n            hs = info[\"hash\"]\n            info[\"work_dir\"] = self._expand_path(hs)\n\n        if \"tag\" in info:\n            tag = info[\"tag\"]\n            if tag != \"-\" and tag not in self.samples and \\\n                    tag.split()[0] not in self.samples:\n                self.samples.append(tag)\n\n        self.trace_info[process].append(info)\n        self.stored_ids.append(info[\"hash\"])\n\n    def _update_process_resources(self, process, vals):\n        \"\"\"Updates the resources info in :attr:`processes` dictionary.\n        \"\"\"\n\n        resources = [\"cpus\"]\n\n        for r in resources:\n            if not self.processes[process][r]:\n                try:\n                    self.processes[process][r] = vals[0][\"cpus\"]\n                # When the trace column is not present\n                except KeyError:\n                    pass\n\n    def _cpu_load_parser(self, cpus, cpu_per, t):\n        \"\"\"Parses the cpu load from the number of cpus and its usage\n        percentage and returnsde cpu/hour measure\n\n        Parameters\n        ----------\n        cpus : str\n            Number of cpus allocated.\n        cpu_per : str\n            Percentage of cpu load measured (e.g.: 200,5%).\n        t : str\n            The time string can be something like '20s', '1m30s' or '300ms'.\n        \"\"\"\n\n        try:\n            _cpus = float(cpus)\n            _cpu_per = float(cpu_per.replace(\",\", \".\").replace(\"%\", \"\"))\n            hours = self._hms(t) / 60 / 24\n\n            return ((_cpu_per / (100 * _cpus)) * _cpus) * hours\n\n        except ValueError:\n            return 0\n\n    def _assess_resource_warnings(self, process, vals):\n        \"\"\"Assess whether the cpu load or memory usage is above the allocation\n\n        Parameters\n        ----------\n        process : str\n            Process name\n        vals : vals\n            List of trace information for each tag of that process\n\n        Returns\n        -------\n        cpu_warnings : dict\n            Keys are tags and values are the excessive cpu load\n        mem_warnings : dict\n            Keys are tags and values are the excessive rss\n        \"\"\"\n\n        cpu_warnings = {}\n        mem_warnings = {}\n\n        for i in vals:\n            try:\n                expected_load = float(i[\"cpus\"]) * 100\n                cpu_load = float(i[\"%cpu\"].replace(\",\", \".\").replace(\"%\", \"\"))\n\n                if expected_load * 0.9 > cpu_load > expected_load * 1.10:\n                    cpu_warnings[i[\"tag\"]] = {\n                        \"expected\":  expected_load,\n                        \"value\": cpu_load\n                    }\n            except (ValueError, KeyError):\n                pass\n\n            try:\n                rss = self._size_coverter(i[\"rss\"])\n                mem_allocated = self._size_coverter(i[\"memory\"])\n\n                if rss > mem_allocated * 1.10:\n                    mem_warnings[i[\"tag\"]] = {\n                        \"expected\": mem_allocated,\n                        \"value\": rss\n                    }\n            except (ValueError, KeyError):\n                pass\n\n        return cpu_warnings, mem_warnings\n\n    def _update_process_stats(self):\n        \"\"\"Updates the process stats with the information from the processes\n\n        This method is called at the end of each static parsing of the nextflow\n        trace file. It re-populates the :attr:`process_stats` dictionary\n        with the new stat metrics.\n        \"\"\"\n\n        good_status = [\"COMPLETED\", \"CACHED\"]\n\n        for process, vals in self.trace_info.items():\n\n            # Update submission status of tags for each process\n            vals = self._update_tag_status(process, vals)\n\n            # Update process resources\n            self._update_process_resources(process, vals)\n\n            self.process_stats[process] = {}\n\n            inst = self.process_stats[process]\n\n            # Get number of completed samples\n            inst[\"completed\"] = \"{}\".format(\n                len([x for x in vals if x[\"status\"] in good_status]))\n\n            # Get average time\n            try:\n                time_array = [self._hms(x[\"realtime\"]) for x in vals]\n                mean_time = round(sum(time_array) / len(time_array), 1)\n                mean_time_str = strftime('%H:%M:%S', gmtime(mean_time))\n                inst[\"realtime\"] = mean_time_str\n            # When the realtime column is not present\n            except KeyError:\n                inst[\"realtime\"] = \"-\"\n\n            # Get cumulative cpu/hours\n            try:\n                cpu_hours = [self._cpu_load_parser(\n                    x[\"cpus\"], x[\"%cpu\"], x[\"realtime\"]) for x in vals]\n                inst[\"cpuhour\"] = round(sum(cpu_hours), 2)\n            # When the realtime, cpus or %cpus column are not present\n            except KeyError:\n                inst[\"cpuhour\"] = \"-\"\n\n            # Assess resource warnings\n            inst[\"cpu_warnings\"], inst[\"mem_warnings\"] = \\\n                self._assess_resource_warnings(process, vals)\n\n            # Get maximum memory\n            try:\n                rss_values = [self._size_coverter(x[\"rss\"]) for x in vals\n                              if x[\"rss\"] != \"-\"]\n                if rss_values:\n                    max_rss = round(max(rss_values))\n                    rss_str = self._size_compress(max_rss)\n                else:\n                    rss_str = \"-\"\n                inst[\"maxmem\"] = rss_str\n            except KeyError:\n                inst[\"maxmem\"] = \"-\"\n\n            # Get read size\n            try:\n                rchar_values = [self._size_coverter(x[\"rchar\"]) for x in vals\n                                if x[\"rchar\"] != \"-\"]\n                if rchar_values:\n                    avg_rchar = round(sum(rchar_values) / len(rchar_values))\n                    rchar_str = self._size_compress(avg_rchar)\n                else:\n                    rchar_str = \"-\"\n            except KeyError:\n                rchar_str = \"-\"\n            inst[\"avgread\"] = rchar_str\n\n            # Get write size\n            try:\n                wchar_values = [self._size_coverter(x[\"wchar\"]) for x in vals\n                                if x[\"wchar\"] != \"-\"]\n                if wchar_values:\n                    avg_wchar = round(sum(wchar_values) / len(wchar_values))\n                    wchar_str = self._size_compress(avg_wchar)\n                else:\n                    wchar_str = \"-\"\n            except KeyError:\n                wchar_str = \"-\"\n            inst[\"avgwrite\"] = wchar_str\n\n    #################\n    # PARSING METHODS\n    #################\n\n    def trace_parser(self):\n        \"\"\"Method that parses the trace file once and updates the\n        :attr:`status_info` attribute with the new entries.\n        \"\"\"\n\n        # Check the timestamp of the tracefile. Only proceed with the parsing\n        # if it changed from the previous time.\n        size_stamp = os.path.getsize(self.trace_file)\n        self.trace_retry = 0\n        if size_stamp and size_stamp == self.trace_sizestamp:\n            return\n        else:\n            logger.debug(\"Updating trace size stamp to: {}\".format(size_stamp))\n            self.trace_sizestamp = size_stamp\n\n        with open(self.trace_file) as fh:\n\n            # Skip potential empty lines at the start of file\n            header = next(fh).strip()\n            while not header:\n                header = next(fh).strip()\n\n            # Get header mappings before parsing the file\n            hm = self._header_mapping(header)\n\n            for line in fh:\n\n                # Skip empty lines\n                if line.strip() == \"\":\n                    continue\n\n                fields = line.strip().split(\"\\t\")\n\n                # Skip if task ID was already processes\n                if fields[hm[\"task_id\"]] in self.stored_ids:\n                    continue\n\n                # Parse trace entry and update status_info attribute\n                self._update_trace_info(fields, hm)\n                self.send = True\n\n        self._update_process_stats()\n        self._update_barrier_status()\n\n    def log_parser(self):\n        \"\"\"Method that parses the nextflow log file once and updates the\n        submitted number of samples for each process\n        \"\"\"\n\n        # Check the timestamp of the log file. Only proceed with the parsing\n        # if it changed from the previous time.\n        size_stamp = os.path.getsize(self.log_file)\n        self.log_retry = 0\n        if size_stamp and size_stamp == self.log_sizestamp:\n            return\n        else:\n            logger.debug(\"Updating log size stamp to: {}\".format(size_stamp))\n            self.log_sizestamp = size_stamp\n\n        # Regular expression to catch four groups:\n        # 1. Start timestamp\n        # 2. Work directory hash\n        # 3. Process name\n        # 4. Tag name\n        r = \".* (.*) \\[.*\\].*\\[(.*)\\].*process > (.*) \\((.*)\\).*\"\n\n        with open(self.log_file) as fh:\n\n            for line in fh:\n                if \"Submitted process >\" in line or \\\n                        \"Re-submitted process >\" in line or \\\n                        \"Cached process >\" in line:\n                    m = re.match(r, line)\n                    if not m:\n                        continue\n\n                    time_start = m.group(1)\n                    workdir = m.group(2)\n                    process = m.group(3)\n                    tag = m.group(4)\n\n                    # Skip if this line has already been parsed\n                    if time_start + tag not in self.stored_log_ids:\n                        self.stored_log_ids.append(time_start + tag)\n                    else:\n                        continue\n\n                    # For first time processes\n                    if process not in self.processes:\n                        continue\n                    p = self.processes[process]\n\n                    # Skip is process/tag combination has finished or is retrying\n                    if tag in list(p[\"finished\"]) + list(p[\"retry\"]):\n                        continue\n\n                    # Update failed process/tags when they have been re-submitted\n                    if tag in list(p[\"failed\"]) and \\\n                            \"Re-submitted process >\" in line:\n                        p[\"retry\"].add(tag)\n                        self.send = True\n                        continue\n\n                    # Set process barrier to running. Check for barrier status\n                    # are performed at the end of the trace parsing in the\n                    # _update_barrier_status method.\n                    p[\"barrier\"] = \"R\"\n                    if tag not in p[\"submitted\"]:\n                        p[\"submitted\"].add(tag)\n                        # Update the process_tags attribute with the new tag.\n                        # Update only when the tag does not exist. This may rarely\n                        # occur when the tag is parsed first in the trace file\n                        if tag not in self.process_tags[process]:\n                            self.process_tags[process][tag] = {\n                                \"workdir\": self._expand_path(workdir),\n                                \"start\": time_start\n                            }\n                            self.send = True\n                        # When the tag is filled in the trace file parsing,\n                        # the timestamp may not be present in the trace. In\n                        # those cases, fill that information here.\n                        elif not self.process_tags[process][tag][\"start\"]:\n                            self.process_tags[process][tag][\"start\"] = time_start\n                            self.send = True\n\n        self._update_pipeline_status()\n\n    def update_inspection(self):\n        \"\"\"Wrapper method that calls the appropriate main updating methods of\n        the inspection.\n\n        It is meant to be used inside a loop (like while), so that it can\n        continuously update the class attributes from the trace and log files.\n        It already implements checks to parse these files only when they\n        change, and they ignore entries that have been previously processes.\n        \"\"\"\n\n        try:\n            self.log_parser()\n        except (FileNotFoundError, StopIteration) as e:\n            logger.debug(\"ERROR: \" + str(sys.exc_info()[0]))\n            self.log_retry += 1\n            if self.log_retry == self.MAX_RETRIES:\n                raise e\n        try:\n            self.trace_parser()\n        except (FileNotFoundError, StopIteration) as e:\n            logger.debug(\"ERROR: \" + str(sys.exc_info()[0]))\n            self.trace_retry += 1\n            if self.trace_retry == self.MAX_RETRIES:\n                raise e\n\n    #################\n    # CURSES METHODS\n    #################\n\n    def display_overview(self):\n        \"\"\"Displays the default pipeline inspection overview\n        \"\"\"\n\n        stay_alive = True\n\n        self.screen = curses.initscr()\n\n        self.screen.keypad(True)\n        self.screen.nodelay(-1)\n        curses.cbreak()\n        curses.noecho()\n        curses.start_color()\n\n        self.screen_lines = self.screen.getmaxyx()[0]\n        # self.screen_width = self.screen.getmaxyx()[1]\n\n        try:\n            while stay_alive:\n\n                # Provide functionality to certain keybindings\n                self._curses_keybindings()\n                # Updates main inspector attributes\n                self.update_inspection()\n                # Display curses interface\n                self.flush_overview()\n\n                sleep(self.refresh_rate)\n        except FileNotFoundError:\n            sys.stderr.write(colored_print(\n                \"ERROR: nextflow log and/or trace files are no longer \"\n                \"reachable!\", \"red_bold\"))\n        except Exception as e:\n            sys.stderr.write(str(e))\n        finally:\n            curses.nocbreak()\n            self.screen.keypad(0)\n            curses.echo()\n            curses.endwin()\n\n    def _curses_keybindings(self):\n\n        c = self.screen.getch()\n        # Provide scroll up/down with keys or mouse wheel\n        if c == curses.KEY_UP:\n            self._updown(\"up\")\n        elif c == curses.KEY_DOWN:\n            self._updown(\"down\")\n        elif c == curses.KEY_LEFT:\n            self._rightleft(\"left\")\n        elif c == curses.KEY_RIGHT:\n            self._rightleft(\"right\")\n        # Trigger screen size update on resize\n        elif c == curses.KEY_RESIZE:\n            self.screen_lines = self.screen.getmaxyx()[0]\n        # Exit interface when pressing q\n        elif c == ord('q'):\n            raise Exception\n\n    def _updown(self, direction):\n        \"\"\"Provides curses scroll functionality.\n        \"\"\"\n\n        if direction == \"up\" and self.top_line != 0:\n            self.top_line -= 1\n        elif direction == \"down\" and \\\n                self.screen.getmaxyx()[0] + self.top_line\\\n                <= self.content_lines + 3:\n            self.top_line += 1\n\n    def _rightleft(self, direction):\n        \"\"\"Provides curses horizontal padding\"\"\"\n\n        if direction == \"left\" and self.padding != 0:\n            self.padding -= 1\n\n        if direction == \"right\" and \\\n                self.screen.getmaxyx()[1] + self.padding < self.max_width:\n            self.padding += 1\n\n    def flush_overview(self):\n        \"\"\"Displays the default overview of the pipeline execution from the\n        :attr:`status_info`, :attr:`processes` and :attr:`run_status`\n        attributes into stdout.\n        \"\"\"\n\n        colors = {\n            \"W\": 1,\n            \"R\": 2,\n            \"C\": 3\n        }\n\n        pc = {\n            \"running\": 3,\n            \"complete\": 3,\n            \"aborted\": 4,\n            \"error\": 4\n        }\n\n        curses.init_pair(1, curses.COLOR_WHITE, curses.COLOR_BLACK)\n        curses.init_pair(2, curses.COLOR_BLUE, curses.COLOR_BLACK)\n        curses.init_pair(3, curses.COLOR_GREEN, curses.COLOR_BLACK)\n        curses.init_pair(4, curses.COLOR_MAGENTA, curses.COLOR_BLACK)\n\n        # self.screen.erase()\n\n        height, width = self.screen.getmaxyx()\n        win = curses.newpad(height, 2000)\n\n        # Add static header\n        header = \"Pipeline [{}] inspection at {}. Status: \".format(\n            self.pipeline_tag, strftime(\"%Y-%m-%d %H:%M:%S\", gmtime()))\n\n        win.addstr(0, 0, header)\n        win.addstr(0, len(header), self.run_status,\n                   curses.color_pair(pc[self.run_status]))\n        submission_str = \"{0:23.23}  {1:23.23}  {2:23.23}  {3:23.23}\".format(\n            \"Running: {}\".format(\n                sum([len(x[\"submitted\"]) for x in self.processes.values()])\n            ),\n            \"Failed: {}\".format(\n                sum([len(x[\"failed\"]) for x in self.processes.values()])\n            ),\n            \"Retrying: {}\".format(\n                sum([len(x[\"retry\"]) for x in self.processes.values()])\n            ),\n            \"Completed: {}\".format(\n                sum([len(x[\"finished\"]) for x in self.processes.values()])\n            )\n        )\n\n        win.addstr(\n            1, 0, submission_str, curses.color_pair(1)\n        )\n\n        headers = [\"\", \"Process\", \"Running\", \"Complete\", \"Error\",\n                   \"Avg Time\", \"Max Mem\", \"Avg Read\", \"Avg Write\"]\n        header_str = \"{0: ^1} \" \\\n                     \"{1: ^25}  \" \\\n                     \"{2: ^7} \" \\\n                     \"{3: ^7} \" \\\n                     \"{4: ^7} \" \\\n                     \"{5: ^10} \" \\\n                     \"{6: ^10} \" \\\n                     \"{7: ^10} \" \\\n                     \"{8: ^10} \".format(*headers)\n        self.max_width = len(header_str)\n        win.addstr(3, 0, header_str, curses.A_UNDERLINE | curses.A_REVERSE)\n\n        # Get display size\n        top = self.top_line\n        bottom = self.screen_lines - 4 + self.top_line\n\n        # Fetch process information\n        for p, process in enumerate(\n                list(self.processes.keys())[top:bottom]):\n\n            if process not in self.process_stats:\n                vals = [\"-\"] * 8\n                txt_fmt = curses.A_NORMAL\n            else:\n                ref = self.process_stats[process]\n                vals = [ref[\"completed\"],\n                        len(self.processes[process][\"failed\"]),\n                        ref[\"realtime\"],\n                        ref[\"maxmem\"], ref[\"avgread\"],\n                        ref[\"avgwrite\"]]\n                txt_fmt = curses.A_BOLD\n\n            proc = self.processes[process]\n            if proc[\"retry\"]:\n                completed = \"{}({})\".format(len(proc[\"submitted\"]),\n                                            len(proc[\"retry\"]))\n            else:\n                completed = \"{}\".format(len(proc[\"submitted\"]))\n\n            win.addstr(\n                4 + p, 0, \"{0: ^1} \"\n                          \"{1:25.25}  \"\n                          \"{2: ^7} \"\n                          \"{3: ^7} \"\n                          \"{4: ^7} \"\n                          \"{5: ^10} \"\n                          \"{6: ^10} \"\n                          \"{7: ^10} \"\n                          \"{8: ^10} \".format(\n                                proc[\"barrier\"],\n                                process,\n                                completed,\n                                *vals),\n                curses.color_pair(colors[proc[\"barrier\"]]) | txt_fmt)\n\n        win.clrtoeol()\n        win.refresh(0, self.padding, 0, 0, height-1, width-1)\n\n    ###################\n    # BROADCAST METHODS\n    ###################\n\n    def _convert_process_dict(self):\n\n        d = {}\n\n        for k, v in self.processes.items():\n            d[k] = {\n                \"barrier\": v[\"barrier\"],\n                \"cpus\": v[\"cpus\"],\n                \"memory\": v[\"memory\"]\n            }\n            for i in [\"submitted\", \"finished\", \"failed\", \"retry\"]:\n                d[k][i] = list(v[i])\n\n        return d\n\n    def _prepare_table_data(self):\n\n        # Set data mappings\n        mappings = {\n            \"Barrier\": \"barrier\",\n            \"Process\": \"process\",\n            \"Running\": \"running\",\n            \"Complete\": \"complete\",\n            \"Error\": \"error\",\n            \"Avg Time\": \"avgTime\",\n            \"CPU/hour\": \"cpuhour\",\n            \"Max Mem\": \"maxMem\",\n            \"Avg Read\": \"avgRead\",\n            \"Avg Write\": \"avgWrite\"\n        }\n\n        # Set table data\n        data = []\n        table_headers = [\"avgTime\", \"cpuhour\", \"maxMem\", \"avgRead\", \"avgWrite\"]\n        for process in list(self.processes):\n\n            proc = self.processes[process]\n            # Add general data that is always available for all processes\n            current_data = {\n                \"process\": process,\n                \"barrier\": proc[\"barrier\"],\n                \"complete\": list(proc[\"finished\"]),\n                \"error\": list(proc[\"failed\"]),\n                \"running\": list(proc[\"submitted\"])\n            }\n\n            # Add stats data that is only available for processes that have\n            # finished once.\n            if process not in self.process_stats:\n                current_data = {\n                    **current_data,\n                    **dict((x, \"-\") for x in table_headers),\n                    **{\"cpuWarn\": {}, \"memWarn\": {}}\n                }\n\n            else:\n                ref = self.process_stats[process]\n                current_data = {\n                    **current_data,\n                    **{\"avgTime\": ref[\"realtime\"],\n                       \"cpuhour\": ref[\"cpuhour\"],\n                       \"maxMem\": ref[\"maxmem\"],\n                       \"avgRead\": ref[\"avgread\"],\n                       \"avgWrite\": ref[\"avgwrite\"],\n                       \"cpuWarn\": ref[\"cpu_warnings\"],\n                       \"memWarn\": ref[\"mem_warnings\"]}\n                }\n\n            data.append(current_data)\n\n        return mappings, data\n\n    def _prepare_overview_data(self):\n\n        return [\n            {\n                \"header\": \"Pipeline name\",\n                \"value\": self.pipeline_name\n            },\n            {\n                \"header\": \"Pipeline tag\",\n                \"value\": self.pipeline_tag\n            },\n            {\n                \"header\": \"Number of processes\",\n                \"value\": len(self.processes)\n            }]\n\n    def _prepare_general_details(self):\n        return [\n            {\n                \"header\": \"Pipeline directory\",\n                \"value\": self.workdir\n            },\n            {\n                \"header\": \"Work directory\",\n                \"value\": join(self.workdir, \"work\")\n            },\n            {\n                \"header\": \"Nextflow command\",\n                \"value\": self.execution_command\n            },\n            {\n                \"header\": \"Nextflow version\",\n                \"value\": self.nextflow_version\n            }\n        ]\n\n    def _get_log_lines(self, n=300):\n        \"\"\"Returns a list with the last ``n`` lines of the nextflow log file\n\n        Parameters\n        ----------\n        n : int\n            Number of last lines from the log file\n\n        Returns\n        -------\n        list\n            List of strings with the nextflow log\n        \"\"\"\n\n        with open(self.log_file) as fh:\n            last_lines = fh.readlines()[-n:]\n\n        return last_lines\n\n    def _prepare_run_status_data(self):\n\n        if self.run_status == \"aborted\":\n            log_lines = self._get_log_lines()\n        else:\n            log_lines = None\n\n        return {\n            \"value\": self.run_status,\n            \"abortCause\": self.abort_cause,\n            \"logLines\": log_lines\n        }\n\n    def _send_status_info(self, run_id):\n\n        mappings, data = self._prepare_table_data()\n        overview_data = self._prepare_overview_data()\n        general_details = self._prepare_general_details()\n        status_data = self._prepare_run_status_data()\n\n        # Add current year to start and stop dates\n        time_start = \"{} {}\".format(time.strftime(\"%Y\"), self.time_start)\n        time_stop = \"{} {}\".format(time.strftime(\"%Y\"), self.time_stop) \\\n            if self.time_stop else \"-\"\n        # Get enconding for proper parsing of time\n        time_locale = locale.getlocale()[0]\n\n        status_json = {\n            \"generalOverview\": overview_data,\n            \"generalDetails\": general_details,\n            \"tableData\": data,\n            \"tableMappings\": mappings,\n            \"processInfo\": self._convert_process_dict(),\n            \"processTags\": self.process_tags,\n            \"runStatus\": status_data,\n            \"timeStart\": time_start,\n            \"timeStop\": time_stop,\n            \"timeLocale\": time_locale,\n            \"processes\": list(self.processes)\n        }\n\n        self._c += 1\n        logger.debug(\"Payload [{}] sent with size: {}\".format(\n            self._c,\n            asizeof.asizeof(json.dumps(status_json))\n        ))\n\n        try:\n            requests.put(self.broadcast_address,\n                         json={\"run_id\": run_id, \"status_json\": status_json})\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _prepare_static_info(self):\n        \"\"\"Prepares the first batch of information, containing static\n        information such as the pipeline file, and configuration files\n\n        Returns\n        -------\n        dict\n            Dict with the static information for the first POST request\n        \"\"\"\n\n        pipeline_files = {}\n\n        with open(join(self.workdir, self.pipeline_name)) as fh:\n            pipeline_files[\"pipelineFile\"] = fh.readlines()\n\n        nf_config = join(self.workdir, \"nextflow.config\")\n        if os.path.exists(nf_config):\n            with open(nf_config) as fh:\n                pipeline_files[\"configFile\"] = fh.readlines()\n\n        # Check for specific flowcraft configurations files\n        configs = {\n            \"params.config\": \"paramsFile\",\n            \"resources.config\": \"resourcesFile\",\n            \"containers.config\": \"containersFile\",\n            \"user.config\": \"userFile\",\n        }\n        for config, key in configs.items():\n            cfile = join(self.workdir, config)\n            if os.path.exists(cfile):\n                with open(cfile) as fh:\n                    pipeline_files[key] = fh.readlines()\n\n        return pipeline_files\n\n    def _dag_file_to_dict(self):\n        \"\"\"Function that opens the dotfile named .treeDag.json in the current\n        working directory\n\n        Returns\n        -------\n        Returns a dictionary with the dag object to be used in the post\n        instance available through the method _establish_connection\n\n        \"\"\"\n        try:\n            dag_file = open(os.path.join(self.workdir, \".treeDag.json\"))\n            dag_json = json.load(dag_file)\n        except (FileNotFoundError, json.decoder.JSONDecodeError):\n            logger.warning(colored_print(\n                \"WARNING: dotfile named .treeDag.json not found or corrupted\",\n                \"red_bold\"))\n            dag_json = {}\n\n        return dag_json\n\n    def _establish_connection(self, run_id, dict_dag):\n\n        try:\n\n            static_info = self._prepare_static_info()\n\n            logger.debug(\"Sending initial data with run id: {}\".format(run_id))\n\n            payload = {\"run_id\": run_id, \"dag_json\": dict_dag,\n                       \"pipeline_files\": static_info}\n            logger.debug(\"Connection payload size: {}\".format(\n                asizeof.asizeof(payload)))\n\n            r = requests.post(self.broadcast_address,\n                              json=payload)\n\n            logger.debug(\"Response received: {}\".format(r.status_code))\n            if r.status_code != 201:\n                logger.error(colored_print(\n                    \"ERROR: There was a problem sending data to the server\"\n                    \"with reason: {}\".format(r.reason)))\n                sys.exit(1)\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _close_connection(self, run_id):\n\n        try:\n            r = requests.delete(self.broadcast_address,\n                                json={\"run_id\": run_id})\n            if r.status_code != 202:\n                logger.error(colored_print(\n                    \"ERROR: There was a problem sending data to the server\"\n                    \"with reason: {}\".format(r.reason)))\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _get_run_hash(self):\n        \"\"\"Gets the hash of the nextflow file\"\"\"\n\n        # Get name and path of the pipeline from the log file\n        pipeline_path = get_nextflow_filepath(self.log_file)\n\n        # Get hash from the entire pipeline file\n        pipeline_hash = hashlib.md5()\n        with open(pipeline_path, \"rb\") as fh:\n            for chunk in iter(lambda: fh.read(4096), b\"\"):\n                pipeline_hash.update(chunk)\n        # Get hash from the current working dir and hostname\n        workdir = self.workdir.encode(\"utf8\")\n        hostname = socket.gethostname().encode(\"utf8\")\n        hardware_addr = str(uuid.getnode()).encode(\"utf8\")\n        dir_hash = hashlib.md5(workdir + hostname + hardware_addr)\n\n        return pipeline_hash.hexdigest() + dir_hash.hexdigest()\n\n    def _print_msg(self, run_id):\n\n        inspect_address = \"{}inspect/{}\".format(self.app_address, run_id)\n        logger.info(colored_print(\n            \"Starting broadcast. You can see the pipeline progress on the \"\n            \"link below:\", \"green_bold\"))\n        logger.info(\"{}\".format(inspect_address))\n\n    def broadcast_status(self):\n\n        logger.info(colored_print(\"Preparing broadcast data...\", \"green_bold\"))\n\n        run_hash = self._get_run_hash()\n        dict_dag = self._dag_file_to_dict()\n        _broadcast_sent = False\n        logger.debug(\"Establishing connection...\")\n        self._establish_connection(run_hash, dict_dag)\n\n        stay_alive = True\n        try:\n            logger.debug(\"Starting inspection loop\")\n            while stay_alive:\n\n                if not _broadcast_sent:\n                    self._print_msg(run_hash)\n                    _broadcast_sent = True\n\n                self.update_inspection()\n                if self.send:\n                    logger.debug(\"Updating inspection\")\n                    self._send_status_info(run_hash)\n                    self.send = False\n\n                sleep(self.refresh_rate)\n\n        except FileNotFoundError:\n            logger.error(colored_print(\n                \"ERROR: nextflow log and/or trace files are no longer \"\n                \"reachable!\", \"red_bold\"))\n        except Exception:\n            logger.exception(\"ERROR: \" + str(sys.exc_info()[0]))\n        finally:\n            logger.info(\"Closing connection\")\n            self._close_connection(run_hash)\n"
  },
  {
    "path": "flowcraft/generator/pipeline_parser.py",
    "content": "import os\nimport logging\nimport re\nfrom difflib import SequenceMatcher\n\ntry:\n    from generator.error_handling import SanityError\n    from generator.process_details import colored_print\nexcept ImportError:\n    from flowcraft.generator.error_handling import SanityError\n    from flowcraft.generator.process_details import colored_print\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n# Set the tokens used for the main syntax\n# Token signaling the start of a fork\nFORK_TOKEN = \"(\"\n# Token separating different lanes from a fork\nLANE_TOKEN = \"|\"\n# Token that closes a fork\nCLOSE_TOKEN = \")\"\n\n\ndef guess_process(query_str, process_map):\n    \"\"\"\n    Function to guess processes based on strings that are not available in\n    process_map. If the string has typos and is somewhat similar (50%) to any\n    process available in flowcraft it will print info to the terminal,\n    suggesting the most similar processes available in flowcraft.\n\n    Parameters\n    ----------\n    query_str: str\n        The string of the process with potential typos\n    process_map:\n        The dictionary that contains all the available processes\n\n    \"\"\"\n\n    save_list = []\n    # loops between the processes available in process_map\n    for process in process_map:\n        similarity = SequenceMatcher(None, process, query_str)\n        # checks if similarity between the process and the query string is\n        # higher than 50%\n        if similarity.ratio() > 0.5:\n            save_list.append(process)\n\n    # checks if any process is stored in save_list\n    if save_list:\n        logger.info(colored_print(\n            \"Maybe you meant:\\n\\t{}\".format(\"\\n\\t\".join(save_list)), \"white\"))\n\n    logger.info(colored_print(\"Hint: check the available processes by using \"\n                              \"the '-l' or '-L' flag.\", \"white\"))\n\n\ndef remove_inner_forks(text):\n    \"\"\"Recursively removes nested brackets\n\n    This function is used to remove nested brackets from fork strings using\n    regular expressions\n\n    Parameters\n    ----------\n    text: str\n        The string that contains brackets with inner forks to be removed\n\n    Returns\n    -------\n    text: str\n        the string with only the processes that are not in inner forks, thus\n        the processes that belong to a given fork.\n\n    \"\"\"\n\n    n = 1  # run at least once for one level of fork\n    # Then this loop assures that all brackets will get removed in a nested\n    # structure\n    while n:\n        # this removes non-nested brackets\n        text, n = re.subn(r'\\([^()]*\\)', '', text)\n\n    return text\n\n\ndef empty_tasks(p_string):\n    \"\"\"\n    Function to check if pipeline string is empty or has an empty string\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n    if p_string.strip() == \"\":\n        raise SanityError(\"'-t' parameter received an empty string or \"\n                          \"an empty file.\")\n\n\ndef brackets_but_no_lanes(p_string):\n    \"\"\"\n    Function to check if a LANE_TOKEN is provided but no fork is initiated.\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    if \"|\" in p_string and \"(\" not in p_string:\n        raise SanityError(\"No fork initiation character '(' was \"\n                          \"provided but there is a fork lane separator \"\n                          \"character '|'\")\n\n\ndef brackets_insanity_check(p_string):\n    \"\"\"\n    This function performs a check for different number of '(' and ')'\n    characters, which indicates that some forks are poorly constructed.\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    if p_string.count(FORK_TOKEN) != p_string.count(CLOSE_TOKEN):\n        # get the number of each type of bracket and state the one that has a\n        # higher value\n        dict_values = {\n            FORK_TOKEN: p_string.count(FORK_TOKEN),\n            CLOSE_TOKEN: p_string.count(CLOSE_TOKEN)\n        }\n        max_bracket = max(dict_values, key=dict_values.get)\n\n        raise SanityError(\n            \"A different number of '(' and ')' was specified. There are \"\n            \"{} extra '{}'. The number of '(' and ')'should be equal.\".format(\n                str(abs(\n                    p_string.count(FORK_TOKEN) - p_string.count(CLOSE_TOKEN))),\n                max_bracket))\n\n\ndef lane_char_insanity_check(p_string):\n    \"\"\"\n    This function performs a sanity check for multiple '|' character\n    between two processes.\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    if LANE_TOKEN + LANE_TOKEN in p_string:\n        raise SanityError(\"Duplicated fork separator character '|'.\")\n\n\ndef final_char_insanity_check(p_string):\n    \"\"\"\n    This function checks if lane token is the last element of the pipeline\n    string.\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    # Check if last character of string is a LANE_TOKEN\n    if p_string.endswith(LANE_TOKEN):\n        raise SanityError(\"Fork separator character '|' cannot be the \"\n                          \"last element of pipeline string\")\n\n\ndef fork_procs_insanity_check(p_string):\n    \"\"\"\n    This function checks if the pipeline string contains a process between\n    the fork start token or end token and the separator (lane) token. Checks for\n    the absence of processes in one of the branches of the fork ['|)' and '(|']\n    and for the existence of a process before starting a fork (in an inner fork)\n    ['|('].\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    # Check for the absence of processes in one of the branches of the fork\n    # ['|)' and '(|'] and for the existence of a process before starting a fork\n    # (in an inner fork) ['|('].\n    if FORK_TOKEN + LANE_TOKEN in p_string or \\\n            LANE_TOKEN + CLOSE_TOKEN in p_string or \\\n            LANE_TOKEN + FORK_TOKEN in p_string:\n        raise SanityError(\"There must be a process between the fork \"\n                          \"start character '(' or end ')' and the separator of \"\n                          \"processes character '|'\")\n\n\ndef start_proc_insanity_check(p_string):\n    \"\"\"\n    This function checks if there is a starting process after the beginning of\n    each fork. It checks for duplicated start tokens ['(('].\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    if FORK_TOKEN + FORK_TOKEN in p_string:\n        raise SanityError(\"There must be a starting process after the \"\n                          \"fork before adding a new fork. E.g: proc1 ( proc2.1 \"\n                          \"(proc3.1 | proc3.2) | proc 2.2 )\")\n\n\ndef late_proc_insanity_check(p_string):\n    \"\"\"\n    This function checks if there are processes after the close token. It\n    searches for everything that isn't \"|\" or \")\" after a \")\" token.\n\n    Parameters\n    ----------\n    p_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    if re.search('\\{}[^|)]'.format(CLOSE_TOKEN), p_string):\n        raise SanityError(\"After a fork it is not allowed to have any \"\n                          \"alphanumeric value.\")\n\n\ndef inner_fork_insanity_checks(pipeline_string):\n    \"\"\"\n    This function performs two sanity checks in the pipeline string. The first\n    check, assures that each fork contains a lane token '|', while the second\n    check looks for duplicated processes within the same fork.\n\n    Parameters\n    ----------\n    pipeline_string: str\n         String with the definition of the pipeline, e.g.::\n             'processA processB processC(ProcessD | ProcessE)'\n\n    \"\"\"\n\n    # first lets get all forks to a list.\n    list_of_forks = []  # stores forks\n    left_indexes = []  # stores indexes of left brackets\n\n    # iterate through the string looking for '(' and ')'.\n    for pos, char in enumerate(pipeline_string):\n        if char == FORK_TOKEN:\n            # saves pos to left_indexes list\n            left_indexes.append(pos)\n        elif char == CLOSE_TOKEN and len(left_indexes) > 0:\n            # saves fork to list_of_forks\n            list_of_forks.append(pipeline_string[left_indexes[-1] + 1: pos])\n            # removes last bracket from left_indexes list\n            left_indexes = left_indexes[:-1]\n\n    # sort list in descending order of number of forks\n    list_of_forks.sort(key=lambda x: x.count(FORK_TOKEN), reverse=True)\n\n    # Now, we can iterate through list_of_forks and check for errors in each\n    # fork\n    for fork in list_of_forks:\n        # remove inner forks for these checks since each fork has its own entry\n        # in list_of_forks. Note that each fork is now sorted in descending\n        # order which enables to remove sequentially the string for the fork\n        # potentially with more inner forks\n        for subfork in list_of_forks:\n            # checks if subfork is contained in fork and if they are different,\n            # avoiding to remove itself\n            if subfork in list_of_forks and subfork != fork:\n                # removes inner forks. Note that string has no spaces\n                fork_simplified = fork.replace(\"({})\".format(subfork), \"\")\n            else:\n                fork_simplified = fork\n\n        # Checks if there is no fork separator character '|' within each fork\n        if not len(fork_simplified.split(LANE_TOKEN)) > 1:\n            raise SanityError(\"One of the forks doesn't have '|' \"\n                              \"separator between the processes to fork. This is\"\n                              \" the prime suspect: '({})'\".format(fork))\n\n\ndef insanity_checks(pipeline_str):\n    \"\"\"Wrapper that performs all sanity checks on the pipeline string\n\n    Parameters\n    ----------\n    pipeline_str : str\n        String with the pipeline definition\n    \"\"\"\n\n    # Gets rid of all spaces in string\n    p_string = pipeline_str.replace(\" \", \"\").strip()\n\n    # some of the check functions use the pipeline_str as the user provided but\n    # the majority uses the parsed p_string.\n    checks = [\n        [p_string, [\n            empty_tasks,\n            brackets_but_no_lanes,\n            brackets_insanity_check,\n            lane_char_insanity_check,\n            final_char_insanity_check,\n            fork_procs_insanity_check,\n            start_proc_insanity_check,\n            late_proc_insanity_check\n        ]],\n        [pipeline_str, [\n            inner_fork_insanity_checks\n        ]]\n    ]\n\n    # executes sanity checks in pipeline string before parsing it.\n    for param, func_list in checks:\n        for func in func_list:\n            func(param)\n\n\ndef parse_pipeline(pipeline_str):\n    \"\"\"Parses a pipeline string into a list of dictionaries with the connections\n     between processes\n\n    Parameters\n    ----------\n    pipeline_str : str\n        String with the definition of the pipeline, e.g.::\n            'processA processB processC(ProcessD | ProcessE)'\n\n    Returns\n    -------\n    pipeline_links : list\n\n    \"\"\"\n\n    if os.path.exists(pipeline_str):\n        logger.debug(\"Found pipeline file: {}\".format(pipeline_str))\n        with open(pipeline_str) as fh:\n            pipeline_str = \"\".join([x.strip() for x in fh.readlines()])\n\n    logger.info(colored_print(\"Resulting pipeline string:\\n\"))\n    logger.info(colored_print(pipeline_str + \"\\n\"))\n\n    # Perform pipeline insanity checks\n    insanity_checks(pipeline_str)\n\n    logger.debug(\"Parsing pipeline string: {}\".format(pipeline_str))\n\n    pipeline_links = []\n    lane = 1\n\n    # Add unique identifiers to each process to allow a correct connection\n    # between forks with same processes\n    pipeline_str_modified, identifiers_to_tags = add_unique_identifiers(\n        pipeline_str)\n\n    # Get number of forks in the pipeline\n    nforks = pipeline_str_modified.count(FORK_TOKEN)\n    logger.debug(\"Found {} fork(s)\".format(nforks))\n\n    # If there are no forks, connect the pipeline as purely linear\n    if not nforks:\n        logger.debug(\"Detected linear pipeline string : {}\".format(\n            pipeline_str))\n        linear_pipeline = [\"__init__\"] + pipeline_str_modified.split()\n        pipeline_links.extend(linear_connection(linear_pipeline, lane))\n        # Removes unique identifiers used for correctly assign fork parents with\n        #  a possible same process name\n        pipeline_links = remove_unique_identifiers(identifiers_to_tags,\n                                                   pipeline_links)\n        return pipeline_links\n\n    for i in range(nforks):\n\n        logger.debug(\"Processing fork {} in lane {}\".format(i, lane))\n        # Split the pipeline at each fork start position. fields[-1] will\n        # hold the process after the fork. fields[-2] will hold the processes\n        # before the fork.\n        fields = pipeline_str_modified.split(FORK_TOKEN, i + 1)\n\n        # Get the processes before the fork. This may be empty when the\n        # fork is at the beginning of the pipeline.\n        previous_process = fields[-2].split(LANE_TOKEN)[-1].split()\n        logger.debug(\"Previous processes string: {}\".format(fields[-2]))\n        logger.debug(\"Previous processes list: {}\".format(previous_process))\n        # Get lanes after the fork\n        next_lanes = get_lanes(fields[-1])\n        logger.debug(\"Next lanes object: {}\".format(next_lanes))\n        # Get the immediate targets of the fork\n        fork_sink = [x[0] for x in next_lanes]\n        logger.debug(\"The fork sinks into the processes: {}\".format(fork_sink))\n\n        # The first fork is a special case, where the processes before AND\n        # after the fork (until the start of another fork) are added to\n        # the ``pipeline_links`` variable. Otherwise, only the processes\n        # after the fork will be added\n        if i == 0:\n            # If there are no previous process, the fork is at the beginning\n            # of the pipeline string. In this case, inject the special\n            # \"init\" process.\n            if not previous_process:\n                previous_process = [\"__init__\"]\n                lane = 0\n            else:\n                previous_process = [\"__init__\"] + previous_process\n\n            # Add the linear modules before the fork\n            pipeline_links.extend(\n                linear_connection(previous_process, lane))\n\n        fork_source = previous_process[-1]\n        logger.debug(\"Fork source is set to: {}\".format(fork_source))\n        fork_lane = get_source_lane(previous_process, pipeline_links)\n        logger.debug(\"Fork lane is set to: {}\".format(fork_lane))\n        # Add the forking modules\n        pipeline_links.extend(\n            fork_connection(fork_source, fork_sink, fork_lane, lane))\n        # Add the linear connections in the subsequent lanes\n        pipeline_links.extend(\n            linear_lane_connection(next_lanes, lane))\n\n        lane += len(fork_sink)\n\n    pipeline_links = remove_unique_identifiers(identifiers_to_tags,\n                                               pipeline_links)\n    return pipeline_links\n\n\ndef get_source_lane(fork_process, pipeline_list):\n    \"\"\"Returns the lane of the last process that matches fork_process\n\n    Parameters\n    ----------\n    fork_process : list\n        List of processes before the fork.\n    pipeline_list : list\n        List with the pipeline connection dictionaries.\n\n    Returns\n    -------\n    int\n        Lane of the last process that matches fork_process\n    \"\"\"\n\n    fork_source = fork_process[-1]\n    fork_sig = [x for x in fork_process if x != \"__init__\"]\n\n    for position, p in enumerate(pipeline_list[::-1]):\n\n        if p[\"output\"][\"process\"] == fork_source:\n\n            lane = p[\"output\"][\"lane\"]\n            logger.debug(\"Possible source match found in position {} in lane\"\n                         \" {}\".format(position, lane))\n            lane_sequence = [x[\"output\"][\"process\"] for x in pipeline_list\n                             if x[\"output\"][\"lane\"] == lane]\n            logger.debug(\"Testing lane sequence '{}' against fork signature\"\n                         \" '{}'\".format(lane_sequence, fork_sig))\n            if lane_sequence == fork_sig:\n                return p[\"output\"][\"lane\"]\n\n    return 0\n\n\ndef get_lanes(lanes_str):\n    \"\"\"From a raw pipeline string, get a list of lanes from the start\n    of the current fork.\n\n    When the pipeline is being parsed, it will be split at every fork\n    position. The string at the right of the fork position will be provided\n    to this function. It's job is to retrieve the lanes that result\n    from that fork, ignoring any nested forks.\n\n    Parameters\n    ----------\n    lanes_str : str\n        Pipeline string after a fork split\n\n    Returns\n    -------\n    lanes : list\n        List of lists, with the list of processes for each lane\n\n    \"\"\"\n\n    logger.debug(\"Parsing lanes from raw string: {}\".format(lanes_str))\n\n    # Temporarily stores the lanes string after removal of nested forks\n    parsed_lanes = \"\"\n    # Flag used to determined whether the cursor is inside or outside the\n    # right fork\n    infork = 0\n    for i in lanes_str:\n\n        # Nested fork started\n        if i == FORK_TOKEN:\n            infork += 1\n        # Nested fork stopped\n        if i == CLOSE_TOKEN:\n            infork -= 1\n\n        if infork < 0:\n            break\n\n        # Save only when in the right fork\n        if infork == 0:\n            # Ignore forking syntax tokens\n            if i not in [FORK_TOKEN, CLOSE_TOKEN]:\n                parsed_lanes += i\n\n    return [x.split() for x in parsed_lanes.split(LANE_TOKEN)]\n\n\ndef linear_connection(plist, lane):\n    \"\"\"Connects a linear list of processes into a list of dictionaries\n\n    Parameters\n    ----------\n    plist : list\n        List with process names. This list should contain at least two entries.\n    lane : int\n        Corresponding lane of the processes\n\n    Returns\n    -------\n    res : list\n        List of dictionaries with the links between processes\n    \"\"\"\n\n    logger.debug(\n        \"Establishing linear connection with processes: {}\".format(plist))\n\n    res = []\n    previous = None\n\n    for p in plist:\n        # Skip first process\n        if not previous:\n            previous = p\n            continue\n\n        res.append({\n            \"input\": {\n                \"process\": previous,\n                \"lane\": lane\n            },\n            \"output\": {\n                \"process\": p,\n                \"lane\": lane\n            }\n        })\n        previous = p\n\n    return res\n\n\ndef fork_connection(source, sink, source_lane, lane):\n    \"\"\"Makes the connection between a process and the first processes in the\n    lanes to which it forks.\n\n    The ``lane`` argument should correspond to the lane of the source process.\n    For each lane in ``sink``, the lane counter will increase.\n\n    Parameters\n    ----------\n    source : str\n        Name of the process that is forking\n    sink : list\n        List of the processes where the source will fork to. Each element\n        corresponds to the start of a lane.\n    source_lane : int\n        Lane of the forking process\n    lane : int\n        Lane of the source process\n\n    Returns\n    -------\n    res : list\n        List of dictionaries with the links between processes\n    \"\"\"\n\n    logger.debug(\"Establishing forking of source '{}' into processes\"\n                 \" '{}'. Source lane set to '{}' and lane set to '{}'\".format(\n                    source, sink, source_lane, lane))\n\n    res = []\n    # Increase the lane counter for the first lane\n    lane_counter = lane + 1\n\n    for p in sink:\n        res.append({\n            \"input\": {\n                \"process\": source,\n                \"lane\": source_lane\n            },\n            \"output\": {\n                \"process\": p,\n                \"lane\": lane_counter\n            }\n        })\n        lane_counter += 1\n\n    return res\n\n\ndef linear_lane_connection(lane_list, lane):\n    \"\"\"\n\n    Parameters\n    ----------\n    lane_list : list\n        Each element should correspond to a list of processes for a given lane\n    lane : int\n        Lane counter before the fork start\n\n    Returns\n    -------\n    res : list\n        List of dictionaries with the links between processes\n    \"\"\"\n\n    logger.debug(\n        \"Establishing linear connections for lanes: {}\".format(lane_list))\n\n    res = []\n    # Increase the lane counter for the first lane\n    lane += 1\n\n    for l in lane_list:\n        res.extend(linear_connection(l, lane))\n        lane += 1\n\n    return res\n\n\ndef add_unique_identifiers(pipeline_str):\n    \"\"\"Returns the pipeline string with unique identifiers and a dictionary with\n     references between the unique keys and the original values\n\n    Parameters\n    ----------\n    pipeline_str : str\n        Pipeline string\n\n    Returns\n    -------\n    str\n        Pipeline string with unique identifiers\n    dict\n        Match between process unique values and original names\n    \"\"\"\n\n    # Add space at beginning and end of pipeline to allow regex mapping of final\n    # process in linear pipelines\n    pipeline_str_modified = \" {} \".format(pipeline_str)\n\n    # Regex to get all process names. Catch all words without spaces and that\n    # are not fork tokens or pipes\n    reg_find_proc = r\"[^\\s{}{}{}]+\".format(LANE_TOKEN, FORK_TOKEN, CLOSE_TOKEN)\n    process_names = re.findall(reg_find_proc, pipeline_str_modified)\n\n    identifiers_to_tags = {}\n    \"\"\"\n    dict: Matches new process names (identifiers) with original process \n    names\n    \"\"\"\n\n    new_process_names = []\n    \"\"\"\n    list: New process names used to replace in the pipeline string\n    \"\"\"\n\n    # Assigns the new process names by appending a numeric id at the end of\n    # the process name\n    for index, val in enumerate(process_names):\n        if \"=\" in val:\n            parts = val.split(\"=\")\n            new_id = \"{}_{}={}\".format(parts[0], index, parts[1])\n        else:\n            new_id = \"{}_{}\".format(val, index)\n\n        # add new process with id\n        new_process_names.append(new_id)\n        # makes a match between new process name and original process name\n        identifiers_to_tags[new_id] = val\n\n    # Add space between forks, pipes and the process names for the replace\n    # regex to work\n    match_result = lambda match: \" {} \".format(match.group())\n\n    # force to add a space between each token so that regex modification can\n    # be applied\n    find = r'[{}{}{}]+'.format(FORK_TOKEN, LANE_TOKEN, CLOSE_TOKEN)\n    pipeline_str_modified = re.sub(find, match_result, pipeline_str_modified)\n\n    # Replace original process names by the unique identifiers\n    for index, val in enumerate(process_names):\n        # regex to replace process names with non assigned process ids\n        # escape characters are required to match to the dict keys\n        # (identifiers_to_tags), since python keys with escape characters\n        # must be escaped\n        find = r'{}[^_]'.format(val).replace(\"\\\\\", \"\\\\\\\\\")\n        pipeline_str_modified = re.sub(find, new_process_names[index] + \" \",\n                                       pipeline_str_modified, 1)\n\n    return pipeline_str_modified, identifiers_to_tags\n\n\ndef remove_unique_identifiers(identifiers_to_tags, pipeline_links):\n    \"\"\"Removes unique identifiers and add the original process names to the\n    already parsed pipelines\n\n    Parameters\n    ----------\n    identifiers_to_tags : dict\n        Match between unique process identifiers and process names\n    pipeline_links: list\n        Parsed pipeline list with unique identifiers\n\n    Returns\n    -------\n    list\n        Pipeline list with original identifiers\n    \"\"\"\n\n    # Replaces the unique identifiers by the original process names\n    for index, val in enumerate(pipeline_links):\n        if val[\"input\"][\"process\"] != \"__init__\":\n            val[\"input\"][\"process\"] = identifiers_to_tags[\n                val[\"input\"][\"process\"]]\n        if val[\"output\"][\"process\"] != \"__init__\":\n            val[\"output\"][\"process\"] = identifiers_to_tags[\n                val[\"output\"][\"process\"]]\n\n    return pipeline_links\n"
  },
  {
    "path": "flowcraft/generator/process.py",
    "content": "import os\nimport jinja2\nimport logging\n\nfrom os.path import dirname, join, abspath\n\ntry:\n    import generator.error_handling as eh\nexcept ImportError:\n    import flowcraft.generator.error_handling as eh\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n\nclass Process:\n    \"\"\"Main interface for basic process functionality\n\n    The ``Process`` class is intended to be inherited by specific process\n    classes (e.g., :py:class:`IntegrityCoverage`) and provides the basic\n    functionality to build the channels and links between processes.\n\n    Child classes are expected to inherit the ``__init__`` execution, which\n    basically means that at least, the child must be defined as::\n\n        class ChildProcess(Process):\n            def__init__(self, **kwargs):\n                super().__init__(**kwargs)\n\n    This ensures that when the ``ChildProcess`` class is instantiated, it\n    automatically sets the attributes of the parent class.\n\n    This also means that child processes must be instantiated providing\n    information on the process type and jinja2 template with the nextflow code.\n\n    Parameters\n    ----------\n    template : str\n        Name of the jinja2 template with the nextflow code for that process.\n        Templates are stored in ``generator/templates``.\n    \"\"\"\n\n    RAW_MAPPING = {\n        \"fastq\": {\n            \"params\": \"fastq\",\n            \"description\": \"Path expression to paired-end fastq files.\"\n                           \" (default: $params.fastq)\",\n            \"default_value\": \"'fastq/*_{1,2}.*'\",\n            \"channel\": \"IN_fastq_raw\",\n            \"channel_str\":\n                \"Channel.fromFilePairs(params.{0})\"\n                \".ifEmpty {{ exit 1, \\\"No fastq files provided with pattern:\"\n                \"'${{params.{0}}}'\\\" }}\",\n            \"checks\":\n                \"if (params.{0} instanceof Boolean){{\"\n                \"exit 1, \\\"'{0}' must be a path pattern. Provide value:\"\n                \"'$params.{0}'\\\"}}\\n\"\n                \"if (!params.{0}){{ exit 1, \\\"'{0}' parameter \"\n                \"missing\\\"}}\"\n        },\n        \"fasta\": {\n            \"params\": \"fasta\",\n            \"description\": \"Path fasta files. (default: $params.fastq)\",\n            \"default_value\": \"'fasta/*.fasta'\",\n            \"channel\": \"IN_fasta_raw\",\n            \"channel_str\":\n                \"Channel.fromPath(params.{0}).\"\n                \"map{{ it -> file(it).exists() ? [it.toString()\"\n                \".tokenize('/').last()\"\n                \".tokenize('.')[0..-2].join('.'), it] : null }}\"\n                \".ifEmpty {{ exit 1, \\\"No fasta files provided with pattern:\"\n                \"'${{params.{0}}}'\\\" }}\",\n            \"checks\":\n                \"if (params.{0} instanceof Boolean){{\"\n                \"exit 1, \\\"'{0}' must be a path pattern. Provide value:\"\n                \"'$params.{0}'\\\"}}\\n\"\n                \"if (!params.{0}){{ exit 1, \\\"'{0}' parameter \"\n                \"missing\\\"}}\"\n        },\n        \"accessions\": {\n            \"params\": \"accessions\",\n            \"description\": \"Path file with accessions, one perline. (\"\n                           \"default: $params.fastq)\",\n            \"default_value\": \"null\",\n            \"channel\": \"IN_accessions_raw\",\n            \"channel_str\":\n                \"Channel.fromPath(params.{0})\"\n                \".ifEmpty {{ exit 1, \\\"No accessions file provided with path:\"\n                \"'${{params.{0}}}'\\\" }}\",\n            \"checks\":\n                \"if (!params.{0}){{ exit 1, \\\"'{0}' parameter \"\n                \"missing\\\" }}\\n\"\n        }\n    }\n    \"\"\"\n    dict: Contains the mapping between the :attr:`Process.input_type` attribute\n    and the corresponding nextflow parameter and main channel definition,\n    e.g.::\n\n        \"fastq\" : {\n            \"params\": \"fastq\",\n            \"channel: \"<channel>\n        }\n    \"\"\"\n\n    def __init__(self, template):\n\n        self.pid = None\n        \"\"\"\n        int: Process ID number that represents the order and position in the\n        generated pipeline\n        \"\"\"\n\n        self.template = template\n        \"\"\"\n        str: Template name for the current process. This string will be used\n        to fetch the file containing the corresponding jinja2 template\n        in the :py:func:`_set_template` method\n        \"\"\"\n\n        self._template_path = None\n        \"\"\"\n        str: Path to the file containing the jinja2 template file. It's\n        set in :py:func:`_set_template`.\n        \"\"\"\n        self._set_template(template)\n\n        self.input_type = None\n        \"\"\"\n        str: Type of expected input data. Used to verify the connection between\n        two processes is viable.\n        \"\"\"\n\n        self.output_type = None\n        \"\"\"\n        str: Type of output data. Used to verify the connection between\n        two processes is viable.\n        \"\"\"\n\n        self.ignore_type = False\n        \"\"\"\n        boolean: If True, this process will ignore the input/output type\n        requirements. This attribute is set to True for terminal singleton\n        forks in the pipeline.\n        \"\"\"\n\n        self.ignore_pid = False\n        \"\"\"\n        boolean: If True, this process will not make the pid advance. This\n        is used for terminal forks before the end of the pipeline.\n        \"\"\"\n\n        self.dependencies = []\n        \"\"\"\n        list: Contains the dependencies of the current process in the form\n        of the :py:attr:`Process.template` attribute (e.g., [``fastqc``])\n        \"\"\"\n\n        self.lane = None\n        self.parent_lane = None\n\n        self.input_channel = None\n        \"\"\"\n        str: Place holder of the main input channel for the current process.\n        This attribute can change dynamically depending on the forks and\n        secondary channels in the final pipeline.\n        \"\"\"\n\n        self.output_channel = None\n        \"\"\"\n        str: Place holder of the main output channel for the current process.\n        This attribute can change dynamically depending on the forks and\n        secondary channels in the final pipeline.\n        \"\"\"\n\n        self.input_user_channel = None\n        \"\"\"\n        dict: Stores a dictionary of two key:value pairs containing\n        the raw input channel for the process. This is automatically\n         determined by the :attr:`~Process.input_type` attribute, and will\n        fetch the information that is mapped in the :attr:`RAW_MAPPING`\n         variable. It will only be used by the first process(es) defined in\n         a pipeline.\n        \"\"\"\n\n        self.link_start = []\n        \"\"\"\n        list: List of strings with the starting points for secondary channels.\n        When building the pipeline, these strings will be matched with equal\n        strings in the :py:attr:`link_end` attribute of other Processes.\n        \"\"\"\n\n        self.link_end = []\n        \"\"\"\n        list: List of dictionaries containing the a string of the ending point\n        for a secondary channel. Each dictionary should contain at least\n        two key/vals:\n        ``{\"link\": <link string>, \"alias\":<string for template>}``\n        \"\"\"\n\n        self.status_channels = [\"STATUS_{}\".format(template)]\n        \"\"\"\n        list: Name of the status channels produced by the process. By default,\n        it sets a single status channel. If more than one status channels\n        are required for the process, list each one in this attribute\n        (e.g., :py:attr:`FastQC.status_channels`)\n        \"\"\"\n        self.status_strs = []\n        \"\"\"\n        str: Name of the status channel for the current process. These strings\n        will be provided to the StatusCompiler process to collect and\n        compile status reports\n        \"\"\"\n\n        self.forks = []\n        \"\"\"\n        list: List of strings with the literal definition of the forks for\n        the current process, ready to be added to the template string.\n        \"\"\"\n\n        self.main_forks = []\n        \"\"\"\n        list: List of the channels onto which the main output should be\n        forked into. They will be automatically added to the\n        :attr:`~Process.main_forks` attribute when setting the secondary\n        channels\n        \"\"\"\n\n        self.secondary_inputs = []\n        \"\"\"\n        list: List of dictionaries with secondary input channels from nextflow\n        parameters. This dictionary should contain two key:value pairs\n        with the ``params`` key, containing the parameter name, and the\n        ``channel`` key, containing the nextflow channel definition::\n\n            {\n                \"params\": \"pathoSpecies\",\n                \"channel\": \"IN_pathoSpecies = Channel\n                                                .value(params.pathoSpecies)\"\n            }\n        \"\"\"\n        self.secondary_input_str = \"\"\n\n        self.extra_input = \"\"\n        \"\"\"\n        str:  with the name of the params that will be used to provide\n        extra input into the process. This extra input will be mixed with\n        the main input channel using nextflow's ``mix`` operator. Its\n        channel will be defined at the start of the pipeline, based on the\n        ``channel_str`` key of the :attr:`~Process.RAW_MAPPING` for the\n        corresponding input type.\n        \"\"\"\n\n        self.params = {}\n        \"\"\"\n        dict: Maps the parameter names to the corresponding default values.\n        \"\"\"\n\n        self.param_id = \"\"\n        \"\"\"\n        str: The parameter id suffix that will be added to each parameter. In\n        case it is empty, the multiple identical parameters in different\n        components will be merged.\n        \"\"\"\n\n        self._context = {}\n        \"\"\"\n        dict: Dictionary with the keyword placeholders for the string template\n        of the current process.\n        \"\"\"\n\n        self.directives = {\n            self.template: {}\n        }\n        \"\"\"\n        dict: Specifies the directives (cpus, memory, container) for each\n        nextflow process in the template. If specified, this directives\n        will be added to the nextflow configuration file. Otherwise,\n        the default values for cpus and memory will be used. In the case\n        of containers, they will not run inside any container.\n\n        The current supported directives are:\n            - cpus\n            - memory\n            - container\n            - container tag/version\n\n        An example of directives for two process is as follows::\n        \n            self.directives = {\n                \"processA\": {\"cpus\": 1, \"memory\": \"1GB\"},\n                \"processB\": {\"memory\": \"5GB\", \"container\": \"my/image\",\n                             \"version\": \"0.5.0\"}\n            }\n        \"\"\"\n\n        self.compiler = {}\n        \"\"\"\n        dict: Specifies channels from the current process that are received\n        by a compiler process. Each key in this dictionary should match\n        a compiler process key in\n        :attr:`~flowcraft.generator.engine.NextflowGenerator.compilers`.\n        The value should be a list of the channels that will be fed to the\n        compiler process::\n        \n            self.compiler[\"patlas_consensus\"] = [\"mashScreenOutputChannel\"]\n        \"\"\"\n\n    def _set_template(self, template):\n        \"\"\"Sets the path to the appropriate jinja template file\n\n        When a Process instance is initialized, this method will fetch\n        the location of the appropriate template file, based on the\n        ``template`` argument. It will raise an exception is the template\n        file is not found. Otherwise, it will set the\n        :py:attr:`Process.template_path` attribute.\n        \"\"\"\n\n        # Set template directory\n        tpl_dir = join(dirname(abspath(__file__)), \"templates\")\n\n        # Set template file path\n        tpl_path = join(tpl_dir, template + \".nf\")\n\n        if not os.path.exists(tpl_path):\n            raise eh.ProcessError(\n                \"Template {} does not exist\".format(tpl_path))\n\n        self._template_path = join(tpl_dir, template + \".nf\")\n\n    def set_main_channel_names(self, input_suffix, output_suffix, lane):\n        \"\"\"Sets the main channel names based on the provide input and\n        output channel suffixes. This is performed when connecting processes.\n\n        Parameters\n        ----------\n        input_suffix : str\n            Suffix added to the input channel. Should be based on the lane\n            and an arbitrary unique id\n        output_suffix : str\n            Suffix added to the output channel. Should be based on the lane\n            and an arbitrary unique id\n        lane : int\n            Sets the lane of the process.\n        \"\"\"\n\n        self.input_channel = \"{}_in_{}\".format(self.template, input_suffix)\n        self.output_channel = \"{}_out_{}\".format(self.template, output_suffix)\n        self.lane = lane\n\n    def set_param_id(self, param_id):\n        \"\"\"Sets the param_id for the process, which will be used to render\n        the template.\n\n        Parameters\n        ----------\n        param_id : str\n            The :attr:`param_id` attribute of the process.\n        \"\"\"\n\n        self._context = {**self._context, \"param_id\": param_id}\n\n    def get_user_channel(self, input_channel, input_type=None):\n        \"\"\"Returns the main raw channel for the process\n\n        Provided with at least a channel name, this method returns the raw\n        channel name and specification (the nextflow string definition)\n        for the process. By default, it will fork from the raw input of\n        the process' :attr:`~Process.input_type` attribute. However, this\n        behaviour can be overridden by providing the ``input_type`` argument.\n\n        If the specified or inferred input type exists in the\n        :attr:`~Process.RAW_MAPPING` dictionary, the channel info dictionary\n        will be retrieved along with the specified input channel. Otherwise,\n        it will return None.\n\n        An example of the returned dictionary is::\n\n             {\"input_channel\": \"myChannel\",\n             \"params\": \"fastq\",\n             \"channel\": \"IN_fastq_raw\",\n             \"channel_str\":\"IN_fastq_raw = Channel.fromFilePairs(params.fastq)\"\n            }\n\n        Returns\n        -------\n        dict or None\n            Dictionary with the complete raw channel info. None if no\n            channel is found.\n        \"\"\"\n\n        res = {\"input_channel\": input_channel}\n\n        itype = input_type if input_type else self.input_type\n\n        if itype in self.RAW_MAPPING:\n\n            channel_info = self.RAW_MAPPING[itype]\n\n            return {**res, **channel_info}\n\n    @staticmethod\n    def render(template, context):\n        \"\"\"Wrapper to the jinja2 render method from a template file\n\n        Parameters\n        ----------\n        template : str\n            Path to template file.\n        context : dict\n            Dictionary with kwargs context to populate the template\n        \"\"\"\n\n        path, filename = os.path.split(template)\n\n        return jinja2.Environment(\n            loader=jinja2.FileSystemLoader(path or './')\n        ).get_template(filename).render(context)\n\n    @property\n    def template_str(self):\n        \"\"\"Class property that returns a populated template string\n\n        This property allows the template of a particular process to be\n        dynamically generated and returned when doing ``Process.template_str``.\n\n        Returns\n        -------\n        x : str\n            String with the complete and populated process template\n\n        \"\"\"\n\n        if not self._context:\n            raise eh.ProcessError(\"Channels must be setup first using the \"\n                                  \"set_channels method\")\n\n        logger.debug(\"Setting context for template {}: {}\".format(\n            self.template, self._context\n        ))\n\n        x = self.render(self._template_path, self._context)\n        return x\n\n    def set_channels(self, **kwargs):\n        \"\"\" General purpose method that sets the main channels\n\n        This method will take a variable number of keyword arguments to\n        set the :py:attr:`Process._context` attribute with the information\n        on the main channels for the process. This is done by appending\n        the process ID (:py:attr:`Process.pid`) attribute to the input,\n        output and status channel prefix strings. In the output channel,\n        the process ID is incremented by 1 to allow the connection with the\n        channel in the next process.\n\n        The ``**kwargs`` system for setting the :py:attr:`Process._context`\n        attribute also provides additional flexibility. In this way,\n        individual processes can provide additional information not covered\n        in this method, without changing it.\n\n        Parameters\n        ----------\n        kwargs : dict\n            Dictionary with the keyword arguments for setting up the template\n            context\n        \"\"\"\n\n        if not self.pid:\n            self.pid = \"{}_{}\".format(self.lane, kwargs.get(\"pid\"))\n\n        for i in self.status_channels:\n            if i.startswith(\"STATUS_\"):\n                self.status_strs.append(\"{}_{}\".format(i, self.pid))\n            else:\n                self.status_strs.append(\"STATUS_{}_{}\".format(i, self.pid))\n\n        if self.main_forks:\n            logger.debug(\"Setting main fork channels: {}\".format(\n                self.main_forks))\n            operator = \"set\" if len(self.main_forks) == 1 else \"into\"\n            self.forks = [\"\\n{}.{}{{ {} }}\\n\".format(\n                self.output_channel, operator, \";\".join(self.main_forks))]\n\n        self._context = {**kwargs, **{\"input_channel\": self.input_channel,\n                                      \"output_channel\": self.output_channel,\n                                      \"template\": self.template,\n                                      \"forks\": \"\\n\".join(self.forks),\n                                      \"pid\": self.pid}}\n\n    def update_main_input(self, input_str):\n\n        self.input_channel = input_str\n        self._context[\"input_channel\"] = self.input_channel\n\n    def update_main_forks(self, sink):\n        \"\"\"Updates the forks attribute with the sink channel destination\n\n        Parameters\n        ----------\n        sink : str\n            Channel onto which the main input will be forked to\n\n        \"\"\"\n\n        if not self.main_forks:\n            self.main_forks = [self.output_channel]\n            self.output_channel = \"_{}\".format(self.output_channel)\n        self.main_forks.append(sink)\n\n        # fork_lst = self.forks + self.main_forks\n        operator = \"set\" if len(self.main_forks) == 1 else \"into\"\n        self.forks = [\"\\n{}.{}{{ {} }}\\n\".format(\n            self.output_channel, operator, \";\".join(self.main_forks))]\n\n        self._context = {**self._context,\n                         **{\"forks\": \"\".join(self.forks),\n                            \"output_channel\": self.output_channel}}\n\n    def set_secondary_channel(self, source, channel_list):\n        \"\"\" General purpose method for setting a secondary channel\n\n        This method allows a given source channel to be forked into one or\n        more channels and sets those forks in the :py:attr:`Process.forks`\n        attribute. Both the source and the channels in the ``channel_list``\n        argument must be the final channel strings,  which means that this\n        method should be called only after setting the main channels.\n\n        If the source is not a main channel, this will simply create a fork\n        or set for every channel in the ``channel_list`` argument list::\n\n            SOURCE_CHANNEL_1.into{SINK_1;SINK_2}\n\n        If the source is a main channel, this will apply some changes to\n        the output channel of the process, to avoid overlapping main output\n        channels.  For instance, forking the main output channel for process\n        2 would create a ``MAIN_2.into{...}``. The issue here is that the\n        ``MAIN_2`` channel is expected as the input of the next process, but\n        now is being used to create the fork. To solve this issue, the output\n        channel is modified into ``_MAIN_2``, and the fork is set to\n        the channels provided channels plus the ``MAIN_2`` channel::\n\n            _MAIN_2.into{MAIN_2;MAIN_5;...}\n\n        Parameters\n        ----------\n        source : str\n            String with the name of the source channel\n        channel_list : list\n            List of channels that will receive a fork of the secondary\n            channel\n        \"\"\"\n\n        logger.debug(\"Setting secondary channel for source '{}': {}\".format(\n            source, channel_list))\n\n        source = \"{}_{}\".format(source, self.pid)\n\n        # Removes possible duplicate channels, when the fork is terminal\n        channel_list = sorted(list(set(channel_list)))\n\n        # When there is only one channel to fork into, use the 'set' operator\n        # instead of 'into'\n        op = \"set\" if len(channel_list) == 1 else \"into\"\n        self.forks.append(\"\\n{}.{}{{ {} }}\\n\".format(\n            source, op, \";\".join(channel_list)))\n\n        logger.debug(\"Setting forks attribute to: {}\".format(self.forks))\n        self._context = {**self._context, **{\"forks\": \"\\n\".join(self.forks)}}\n\n    def update_attributes(self, attr_dict):\n        \"\"\"Updates the directives attribute from a dictionary object.\n\n        This will only update the directives for processes that have been\n        defined in the subclass.\n\n        Parameters\n        ----------\n        attr_dict : dict\n            Dictionary containing the attributes that will be used to update\n            the process attributes and/or directives.\n\n        \"\"\"\n\n        # Update directives\n        # Allowed attributes to write\n        valid_directives = [\"pid\", \"ignore_type\", \"ignore_pid\", \"extra_input\",\n                            \"group\", \"input_type\"]\n\n        for attribute, val in attr_dict.items():\n\n            # If the attribute has a valid directive key, update that\n            # directive\n            if attribute in valid_directives and hasattr(self, attribute):\n                setattr(self, attribute, val)\n\n            # The params attribute is special, in the sense that it provides\n            # information for the self.params attribute.\n            elif attribute == \"params\":\n                for name, value in val.items():\n                    if name in self.params:\n                        self.params[name][\"default\"] = value\n                    else:\n                        raise eh.ProcessError(\n                            \"The parameter name '{}' does not exist for \"\n                            \"component '{}'\".format(name, self.template))\n\n            else:\n                for p in self.directives:\n                    self.directives[p][attribute] = val\n\n\nclass Compiler(Process):\n    \"\"\"Extends the Process methods to status-type processes\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.ignore_type = True\n        self.link_start = None\n\n    def set_compiler_channels(self, channel_list, operator=\"mix\"):\n        \"\"\"General method for setting the input channels for the status process\n\n        Given a list of status channels that are gathered during the pipeline\n        construction, this method will automatically set the input channel\n        for the status process. This makes use of the ``mix`` channel operator\n        of nextflow for multiple channels::\n\n            STATUS_1.mix(STATUS_2,STATUS_3,...)\n\n        This will set the ``status_channels`` key for the ``_context``\n        attribute of the process.\n\n        Parameters\n        ----------\n        channel_list : list\n            List of strings with the final name of the status channels\n        operator : str\n            Specifies the operator used to join the compiler channels.\n            Available options are 'mix'and 'join'.\n        \"\"\"\n\n        if not channel_list:\n            raise eh.ProcessError(\"At least one status channel must be \"\n                                  \"provided to include this process in the \"\n                                  \"pipeline\")\n\n        if len(channel_list) == 1:\n            logger.debug(\"Setting only one status channel: {}\".format(\n                channel_list[0]))\n            self._context = {\"compile_channels\": channel_list[0]}\n\n        else:\n\n            first_status = channel_list[0]\n\n            if operator == \"mix\":\n                lst = \",\".join(channel_list[1:])\n\n                s = \"{}.mix({})\".format(first_status, lst)\n\n            elif operator == \"join\":\n\n                s = first_status\n                for ch in channel_list[1:]:\n                    s += \".join({})\".format(ch)\n\n                s += \".map{ ot -> [ ot[0], ot[1..-1] ] }\"\n\n            logger.debug(\"Status channel string: {}\".format(s))\n\n            self._context = {\"compile_channels\": s}\n\n\nclass Init(Process):\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n        self.input_type = None\n        self.output_type = \"raw\"\n\n        self.status_channels = []\n\n    def set_raw_inputs(self, raw_input):\n        \"\"\"Sets the main input channels of the pipeline and their forks.\n\n        The ``raw_input`` dictionary input should contain one entry for each\n        input type (fastq, fasta, etc). The corresponding value should be a\n        dictionary/json with the following key:values:\n\n        - ``channel``: Name of the raw input channel (e.g.: channel1)\n        - ``channel_str``: The nextflow definition of the channel and\n           eventual checks (e.g.: channel1 = Channel.fromPath(param))\n        - ``raw_forks``: A list of channels to which the channel name will\n          for to.\n\n        Each new type of input parameter is automatically added to the\n        :attr:`params` attribute, so that they are automatically collected\n        for the pipeline description and help.\n\n        Parameters\n        ----------\n        raw_input : dict\n            Contains an entry for each input type with the channel name,\n            channel string and forks.\n        \"\"\"\n\n        logger.debug(\"Setting raw inputs using raw input dict: {}\".format(\n            raw_input))\n\n        primary_inputs = []\n\n        for input_type, el in raw_input.items():\n\n            primary_inputs.append(el[\"channel_str\"])\n\n            # Update the process' parameters with the raw input\n            raw_channel = self.RAW_MAPPING[input_type]\n            self.params[input_type] = {\n                \"default\": raw_channel[\"default_value\"],\n                \"description\": raw_channel[\"description\"]\n            }\n\n            op = \"set\" if len(el[\"raw_forks\"]) == 1 else \"into\"\n\n            self.forks.append(\"\\n{}.{}{{ {} }}\\n\".format(\n                el[\"channel\"], op, \";\".join(el[\"raw_forks\"])\n            ))\n\n        logger.debug(\"Setting raw inputs: {}\".format(primary_inputs))\n        logger.debug(\"Setting forks attribute to: {}\".format(self.forks))\n        self._context = {**self._context,\n                         **{\"forks\": \"\\n\".join(self.forks),\n                            \"main_inputs\": \"\\n\".join(primary_inputs)}}\n\n    def set_secondary_inputs(self, channel_dict):\n        \"\"\" Adds secondary inputs to the start of the pipeline.\n\n        This channels are inserted into the pipeline file as they are\n        provided in the values of the argument.\n\n        Parameters\n        ----------\n        channel_dict : dict\n            Each entry should be <parameter>: <channel string>.\n        \"\"\"\n\n        logger.debug(\"Setting secondary inputs: {}\".format(channel_dict))\n\n        secondary_input_str = \"\\n\".join(list(channel_dict.values()))\n        self._context = {**self._context,\n                         **{\"secondary_inputs\": secondary_input_str}}\n\n    def set_extra_inputs(self, channel_dict):\n        \"\"\"Sets the initial definition of the extra input channels.\n\n        The ``channel_dict`` argument should contain the input type and\n        destination channel of each parameter (which is the key)::\n\n            channel_dict = {\n                \"param1\": {\n                    \"input_type\": \"fasta\"\n                    \"channels\": [\"abricate_2_3\", \"chewbbaca_3_4\"]\n                }\n            }\n\n        Parameters\n        ----------\n        channel_dict : dict\n            Dictionary with the extra_input parameter as key, and a dictionary\n            as a value with the input_type and destination channels\n        \"\"\"\n\n        extra_inputs = []\n\n        for param, info in channel_dict.items():\n\n            # Update the process' parameters with the raw input\n            raw_channel = self.RAW_MAPPING[info[\"input_type\"]]\n            self.params[param] = {\n                \"default\": raw_channel[\"default_value\"],\n                \"description\": raw_channel[\"description\"]\n            }\n\n            channel_name = \"IN_{}_extraInput\".format(param)\n            channel_str = self.RAW_MAPPING[info[\"input_type\"]][\"channel_str\"]\n            extra_inputs.append(\"{} = {}\".format(channel_name,\n                                                 channel_str.format(param)))\n\n            op = \"set\" if len(info[\"channels\"]) == 1 else \"into\"\n            extra_inputs.append(\"{}.{}{{ {} }}\".format(\n                channel_name, op, \";\".join(info[\"channels\"])))\n\n        self._context = {\n            **self._context,\n            **{\"extra_inputs\": \"\\n\".join(extra_inputs)}\n        }\n\n\nclass StatusCompiler(Compiler):\n    \"\"\"Status compiler process template interface\n\n    This special process receives the status channels from all processes\n    in the generated pipeline.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n\nclass ReportCompiler(Compiler):\n    \"\"\"Reports compiler process template interface\n\n    This special process receives the report channels from all processes\n    in the generated pipeline.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n\n\nclass PatlasConsensus(Compiler):\n    \"\"\"Patlas consensus compiler process template interface\n\n    This special process receives the channels associated with the\n    ``patlas_consensus`` key.\n    \"\"\"\n\n    def __init__(self, **kwargs):\n\n        super().__init__(**kwargs)\n"
  },
  {
    "path": "flowcraft/generator/process_collector.py",
    "content": "import re\nimport pkgutil\n\ntry:\n    from generator import components\nexcept ImportError:\n    from flowcraft.generator import components\n\n\ndef convert_camel_case(name):\n    \"\"\"Convers a CamelCase string into a snake_case one\n\n    Parameters\n    ----------\n    name : str\n        An arbitrary string that may be CamelCase\n\n    Returns\n    -------\n    str\n        The input string converted into snake_case\n\n    \"\"\"\n    s1 = re.sub('(.)([A-Z][a-z]+)', r'\\1_\\2', name)\n    return re.sub('([a-z0-9])([A-Z])', r'\\1_\\2', s1).lower()\n\n\ndef collect_process_map():\n    \"\"\"Collects Process classes and return dict mapping templates to classes\n\n    This function crawls through the components module and retrieves all\n    classes that inherit from the Process class. Then, it converts the name\n    of the classes (which should be CamelCase) to snake_case, which is used\n    as the template name.\n\n    Returns\n    -------\n    dict\n        Dictionary mapping the template name (snake_case) to the corresponding\n        process class.\n    \"\"\"\n\n    process_map = {}\n\n    prefix = \"{}.\".format(components.__name__)\n    for importer, modname, _ in pkgutil.iter_modules(components.__path__,\n                                                     prefix):\n\n        _module = importer.find_module(modname).load_module(modname)\n\n        _component_classes = [\n            cls for cls in _module.__dict__.values() if\n            isinstance(cls, type) and cls.__name__ != \"Process\"\n        ]\n\n        for cls in _component_classes:\n            process_map[convert_camel_case(cls.__name__)] = cls\n\n    return process_map\n"
  },
  {
    "path": "flowcraft/generator/process_details.py",
    "content": "import logging\nimport sys\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\nCOLORS = {\n    \"green_bold\": \"1;32m\",\n    \"red_bold\": \"1;31m\",\n    \"white\": \"0;38m\",\n    \"white_bold\": \"1;38m\",\n    \"white_underline\": \"4;38m\",\n    \"blue_bold\": \"1;36m\",\n    \"purple_bold\": \"1;34m\",\n    \"yellow_bold\": \"1;93m\"\n}\n\n\ndef colored_print(msg, color_label=\"white_bold\"):\n    \"\"\"\n    This function enables users to add a color to the print. It also enables\n    to pass end_char to print allowing to print several strings in the same line\n    in different prints.\n\n    Parameters\n    ----------\n    color_string: str\n        The color code to pass to the function, which enables color change as\n        well as background color change.\n    msg: str\n        The actual text to be printed\n    end_char: str\n        The character in which each print should finish. By default it will be\n        \"\\n\".\n\n    \"\"\"\n\n    if sys.stdout.encoding != \"UTF-8\":\n        msg = \"\".join([i if ord(i) < 128 else \"\" for i in msg])\n\n    # try except first looks for the color in COLORS dictionary, otherwise use\n    # color_label as the color.\n    try:\n        col = COLORS[color_label]\n    except KeyError:\n        col = color_label\n\n    return \"\\x1b[{}{}\\x1b[0m\".format(col, msg)\n\n\ndef procs_dict_parser(procs_dict):\n    \"\"\"\n    This function handles the dictionary of attributes of each Process class\n    to print to stdout lists of all the components or the components which the\n    user specifies in the -t flag.\n\n    Parameters\n    ----------\n    procs_dict: dict\n        A dictionary with the class attributes for all the components (or\n        components that are used by the -t flag), that allow to create\n        both the short_list and detailed_list. Dictionary example:\n        {\"abyss\": {'input_type': 'fastq', 'output_type': 'fasta',\n        'dependencies': [], 'directives': {'abyss': {'cpus': 4,\n        'memory': '{ 5.GB * task.attempt }', 'container': 'flowcraft/abyss',\n        'version': '2.1.1', 'scratch': 'true'}}}\n    \"\"\"\n\n    logger.info(colored_print(\n        \"\\n===== L I S T   O F   P R O C E S S E S =====\\n\", \"green_bold\"))\n\n    #Sort to print alphabetically ordered list of processes to ease reading\n    procs_dict_ordered = {k: procs_dict[k] for k in sorted(procs_dict)}\n\n    for template, dict_proc_info in procs_dict_ordered.items():\n        template_str = \"=> {}\".format(template)\n        logger.info(colored_print(template_str, \"blue_bold\"))\n\n        for info in dict_proc_info:\n            info_str = \"{}:\".format(info)\n\n            if isinstance(dict_proc_info[info], list):\n                if not dict_proc_info[info]:\n                    arg_msg = \"None\"\n                else:\n                    arg_msg = \", \".join(dict_proc_info[info])\n            elif info == \"directives\":\n                # this is used for the \"directives\", which is a dict\n                if not dict_proc_info[info]:\n                    # if dict is empty then add None to the message\n                    arg_msg = \"None\"\n                else:\n                    # otherwise fetch all template names within a component\n                    # and all the directives for each template to a list\n                    list_msg = [\"\\n      {}: {}\".format(\n                        templt,\n                        \" , \".join([\"{}: {}\".format(dr, val)\n                                    for dr, val in drs.items()]))\n                                for templt, drs in dict_proc_info[info].items()\n                    ]\n                    # write list to a str\n                    arg_msg = \"\".join(list_msg)\n            else:\n                arg_msg = dict_proc_info[info]\n\n            logger.info(\"   {} {}\".format(\n                colored_print(info_str, \"white_underline\"), arg_msg\n            ))\n\n\ndef proc_collector(process_map, args, pipeline_string):\n    \"\"\"\n    Function that collects all processes available and stores a dictionary of\n    the required arguments of each process class to be passed to\n    procs_dict_parser\n\n    Parameters\n    ----------\n    process_map: dict\n        The dictionary with the Processes currently available in flowcraft\n        and their corresponding classes as values\n    args: argparse.Namespace\n        The arguments passed through argparser that will be access to check the\n        type of list to be printed\n    pipeline_string: str\n        the pipeline string\n\n    \"\"\"\n\n    arguments_list = []\n\n    # prints a detailed list of the process class arguments\n    if args.detailed_list:\n        # list of attributes to be passed to proc_collector\n        arguments_list += [\n            \"input_type\",\n            \"output_type\",\n            \"description\",\n            \"dependencies\",\n            \"conflicts\",\n            \"directives\"\n        ]\n\n    # prints a short list with each process and the corresponding description\n    if args.short_list:\n        arguments_list += [\n            \"description\"\n        ]\n\n    if arguments_list:\n        # dict to store only the required entries\n        procs_dict = {}\n        # loops between all process_map Processes\n        for name, cls in process_map.items():\n\n            # instantiates each Process class\n            cls_inst = cls(template=name)\n\n            # checks if recipe is provided\n            if pipeline_string:\n                if name not in pipeline_string:\n                    continue\n\n            d = {arg_key: vars(cls_inst)[arg_key] for arg_key in\n                 vars(cls_inst) if arg_key in arguments_list}\n            procs_dict[name] = d\n\n        procs_dict_parser(procs_dict)\n\n        sys.exit(0)\n"
  },
  {
    "path": "flowcraft/generator/recipe.py",
    "content": "try:\n    from generator.process_details import colored_print\n    import generator.error_handling as eh\n    from generator import recipes\nexcept ImportError:\n    from flowcraft.generator.process_details import colored_print\n    import flowcraft.generator.error_handling as eh\n    from flowcraft.generator import recipes\n\nfrom collections import OrderedDict\nimport sys\nimport json\nimport logging\nimport pkgutil\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n\nclass InnuendoRecipe:\n\n    def __init__(self):\n        \"\"\"Class to build automatic pipelines based on the processes provided.\n\n        This class provides the methods to build the most eficient pipeline\n        based on the processes provided. It automatic creates the\n        flowcraft pipeline string based on the relationships between the\n        possible processes.\n\n        \"\"\"\n\n        self.count_forks = 0\n        \"\"\"\n        int : counts the total possible number of forks\n        \"\"\"\n\n        self.forks = []\n        \"\"\"\n        list : a list with all the possible forks\n        \"\"\"\n\n        self.pipeline_string = \"\"\n        \"\"\"\n        str : the generated pipeline string\n        \"\"\"\n\n        self.process_to_id = {}\n        \"\"\"\n        dict: key value between the process name and its identifier\n        \"\"\"\n\n        self.process_descriptions = {}\n\n    @staticmethod\n    def validate_pipeline(pipeline_string):\n        \"\"\"Validate pipeline string\n\n        Validates the pipeline string by searching for forbidden characters\n\n        Parameters\n        ----------\n        pipeline_string : str\n            STring with the processes provided\n\n        Returns\n        -------\n\n        \"\"\"\n        if \"(\" in pipeline_string or \")\" in pipeline_string or \"|\" in \\\n                pipeline_string:\n            logger.error(\n                colored_print(\"Please provide a valid task list!\", \"red_bold\")\n            )\n            return False\n\n        return True\n\n    def build_upstream(self, process_descriptions, task, all_tasks,\n                       task_pipeline,\n                       count_forks, total_tasks, forks):\n        \"\"\"Builds the upstream pipeline of the current process\n\n        Checks for the upstream processes to the current process and\n        adds them to the current pipeline fragment if they were provided in\n        the process list.\n\n        Parameters\n        ----------\n        process_descriptions : dict\n            Information of processes input, output and if is forkable\n        task : str\n            Current process\n        all_tasks : list\n            A list of all provided processes\n        task_pipeline : list\n            Current pipeline fragment\n        count_forks : int\n            Current number of forks\n        total_tasks : str\n            All space separated processes\n        forks : list\n            Current forks\n        Returns\n        -------\n        list : resulting pipeline fragment\n        \"\"\"\n        if task in process_descriptions:\n            if process_descriptions[task][1] is not None:\n                if len(process_descriptions[task][1].split(\"|\")) > 1:\n                    local_forks = process_descriptions[task][1].split(\"|\")\n\n                    # Produces a new pipeline fragment for each forkable\n                    #  process\n                    for local_fork in local_forks:\n                        if local_fork in total_tasks:\n                            count_forks += 1\n                            task_pipeline.insert(\n                                0,\n                                process_descriptions[task][1]\n                            )\n                            self.define_pipeline_string(\n                                process_descriptions,\n                                local_fork,\n                                False,\n                                True,\n                                count_forks,\n                                total_tasks,\n                                forks\n                            )\n\n                    return task_pipeline\n                else:\n                    # Adds the process to the pipeline fragment in case it is\n                    # provided in the task list\n                    if process_descriptions[task][1] in total_tasks:\n                        task_pipeline.insert(\n                            0,\n                            process_descriptions[task][1].split(\"|\")[0]\n                        )\n\n                        # Proceeds building upstream until the input for a\n                        # process is None\n                        self.build_upstream(\n                            process_descriptions,\n                            process_descriptions[task][1].split(\"|\")[0],\n                            all_tasks,\n                            task_pipeline,\n                            count_forks,\n                            total_tasks,\n                            forks\n                        )\n                    else:\n                        logger.error(\n                            colored_print(\"{} not in provided protocols as \"\n                                          \"input for {}\".format(\n                                process_descriptions[task][1], task), \"red_bold\"\n                            )\n                        )\n\n                        sys.exit()\n\n                    return task_pipeline\n            else:\n                return task_pipeline\n\n    def build_downstream(self, process_descriptions, task, all_tasks,\n                         task_pipeline,\n                         count_forks, total_tasks, forks):\n        \"\"\"Builds the downstream pipeline of the current process\n\n        Checks for the downstream processes to the current process and\n        adds them to the current pipeline fragment.\n\n        Parameters\n        ----------\n        process_descriptions : dict\n            Information of processes input, output and if is forkable\n        task : str\n            Current process\n        all_tasks : list\n            A list of all provided processes\n        task_pipeline : list\n            Current pipeline fragment\n        count_forks : int\n            Current number of forks\n        total_tasks : str\n            All space separated processes\n        forks : list\n            Current forks\n        Returns\n        -------\n        list : resulting pipeline fragment\n        \"\"\"\n\n        if task in process_descriptions:\n            if process_descriptions[task][2] is not None:\n                if len(process_descriptions[task][2].split(\"|\")) > 1:\n                    local_forks = process_descriptions[task][2].split(\"|\")\n\n                    # Adds the process to the pipeline fragment downstream\n                    # and defines a new pipeline fragment for each fork.\n                    # Those will only look for downstream processes\n                    for local_fork in local_forks:\n                        if local_fork in total_tasks:\n                            count_forks += 1\n                            task_pipeline.append(process_descriptions[task][2])\n                            self.define_pipeline_string(\n                                process_descriptions,\n                                local_fork,\n                                False,\n                                True,\n                                count_forks,\n                                total_tasks,\n                                forks\n                            )\n\n                    return task_pipeline\n                else:\n                    if process_descriptions[task][2] in total_tasks:\n                        task_pipeline.append(process_descriptions[task][2].split(\"|\")[0])\n\n                        # Proceeds building downstream until the output for a\n                        # process is None\n                        self.build_downstream(\n                            process_descriptions,\n                            process_descriptions[task][2].split(\"|\")[0],\n                            all_tasks,\n                            task_pipeline,\n                            count_forks,\n                            total_tasks,\n                            forks\n                        )\n\n                    return task_pipeline\n            else:\n                return task_pipeline\n\n    def define_pipeline_string(self, process_descriptions, tasks,\n                               check_upstream,\n                               check_downstream, count_forks, total_tasks,\n                               forks):\n        \"\"\"Builds the possible forks and connections between the provided\n        processes\n\n        This method loops through all the provided tasks and builds the\n        upstream and downstream pipeline if required. It then returns all\n        possible forks than need to be merged à posteriori`\n\n        Parameters\n        ----------\n        process_descriptions : dict\n            Information of processes input, output and if is forkable\n        tasks : str\n            Space separated processes\n        check_upstream : bool\n            If is to build the upstream pipeline of the current task\n        check_downstream : bool\n            If is to build the downstream pipeline of the current task\n        count_forks : int\n            Number of current forks\n        total_tasks : str\n            All space separated processes\n        forks : list\n            Current forks\n\n        Returns\n        -------\n        list : List with all the possible pipeline forks\n        \"\"\"\n\n        tasks_array = tasks.split()\n\n        for task_unsplit in tasks_array:\n            task = task_unsplit.split(\"=\")[0]\n\n            if task not in process_descriptions.keys():\n                logger.error(\n                    colored_print(\n                        \"{} not in the possible processes\".format(task),\n                        \"red_bold\"\n                    )\n                )\n\n                sys.exit()\n            else:\n                process_split = task_unsplit.split(\"=\")\n\n                if len(process_split) > 1:\n                    self.process_to_id[process_split[0]] = process_split[1]\n\n            # Only uses the process if it is not already in the possible forks\n            if not bool([x for x in forks if task in x]) and not bool([y for y in forks if process_descriptions[task][2] in y]):\n                task_pipeline = []\n\n                if task in process_descriptions:\n\n                    if check_upstream:\n                        task_pipeline = self.build_upstream(\n                            process_descriptions,\n                            task,\n                            tasks_array,\n                            task_pipeline,\n                            count_forks,\n                            total_tasks,\n                            forks\n                        )\n\n                    task_pipeline.append(task)\n\n                    if check_downstream:\n                        task_pipeline = self.build_downstream(\n                            process_descriptions,\n                            task,\n                            tasks_array,\n                            task_pipeline,\n                            count_forks,\n                            total_tasks,\n                            forks\n                        )\n\n                # Adds the pipeline fragment to the list of possible forks\n                forks.append(list(OrderedDict.fromkeys(task_pipeline)))\n\n            # Checks for task in fork. Case order of input processes is reversed\n            elif bool([y for y in forks if process_descriptions[task][2] in y]):\n                for fork in forks:\n                    if task not in fork:\n                        try:\n                            dependent_index = fork.index(process_descriptions[task][2])\n                            fork.insert(dependent_index, task)\n                        except ValueError:\n                            continue\n\n        for i in range(0, len(forks)):\n            for j in range(0, len(forks[i])):\n                try:\n                    if len(forks[i][j].split(\"|\")) > 1:\n                        forks[i][j] = forks[i][j].split(\"|\")\n                        tmp_fork = []\n                        for s in forks[i][j]:\n                            if s in total_tasks:\n                                tmp_fork.append(s)\n\n                        forks[i][j] = tmp_fork\n\n                except AttributeError as e:\n                    continue\n\n        return forks\n\n    def build_pipeline_string(self, forks):\n        \"\"\"Parses, filters and merge all possible pipeline forks into the\n        final pipeline string\n\n        This method checks for shared start and end sections between forks\n        and merges them according to the shared processes::\n\n            [[spades, ...], [skesa, ...], [...,[spades, skesa]]]\n                -> [..., [[spades, ...], [skesa, ...]]]\n\n        Then it defines the pipeline string by replacing the arrays levels\n        to the flowcraft fork format::\n\n            [..., [[spades, ...], [skesa, ...]]]\n                -> ( ... ( spades ... | skesa ... ) )\n\n        Parameters\n        ----------\n        forks : list\n            List with all the possible pipeline forks.\n\n        Returns\n        -------\n        str : String with the pipeline definition used as input for\n        parse_pipeline\n        \"\"\"\n\n        final_forks = []\n\n        for i in range(0, len(forks)):\n            needs_merge = [False, 0, 0, 0, 0, \"\"]\n            is_merged = False\n            for i2 in range(0, len(forks[i])):\n                for j in range(i, len(forks)):\n                    needs_merge[0] = False\n                    for j2 in range(0, len(forks[j])):\n                        try:\n                            j2_fork = forks[j][j2].split(\"|\")\n                        except AttributeError:\n                            j2_fork = forks[j][j2]\n\n                        # Gets the indexes of the forks matrix that need to\n                        # be merged\n                        if forks[i][i2] in j2_fork and (i2 == 0 or j2 == 0) and i != j:\n                            needs_merge[0] = True\n                            needs_merge[1] = i\n                            needs_merge[2] = i2\n                            needs_merge[3] = j\n                            needs_merge[4] = j2\n                            needs_merge[5] = forks[i][i2]\n\n                    if needs_merge[0]:\n                        index_merge_point = forks[needs_merge[3]][-1].index(needs_merge[5])\n\n                        # Merges the forks. If only one fork is possible,\n                        # that fork is neglected and it merges into a single\n                        # channel.\n                        if needs_merge[2] == 0:\n                            if len(forks[needs_merge[3]][-1]) < 2:\n                                forks[needs_merge[3]] = forks[needs_merge[3]][:-1] + forks[needs_merge[1]][::]\n                            else:\n                                forks[needs_merge[3]][-1][index_merge_point] = forks[needs_merge[1]]\n\n                        elif needs_merge[4] == 0:\n                            if len(forks[needs_merge[3]][-1]) < 2:\n                                forks[needs_merge[3]] = forks[needs_merge[3]][:-1] + forks[needs_merge[1]][::]\n                            else:\n                                forks[needs_merge[3]][-1][index_merge_point] = forks[needs_merge[1]]\n\n                        is_merged = True\n\n            # Adds forks that dont need merge to the final forks\n            if needs_merge[0] is not None and not is_merged:\n                if bool([nf for nf in forks[i] if \"|\" in nf]):\n                    continue\n                final_forks.append(forks[i])\n\n        if len(final_forks) == 1:\n            final_forks = str(final_forks[0])\n\n        # parses the string array to the flowcraft nomenclature\n        pipeline_string = \" \" + str(final_forks)\\\n            .replace(\"[[\", \"( \")\\\n            .replace(\"]]\", \" )\")\\\n            .replace(\"]\", \" |\")\\\n            .replace(\", [\", \" \")\\\n            .replace(\"'\", \"\")\\\n            .replace(\",\", \"\")\\\n            .replace(\"[\", \"\")\n\n        if pipeline_string[-1] == \"|\":\n            pipeline_string = pipeline_string[:-1]\n\n        to_search = \" {} \"\n        to_replace = \" {}={} \"\n\n        # Replace only names by names + process ids\n        for key, val in self.process_to_id.items():\n            # Case only one process in the pipeline\n            pipeline_string = pipeline_string\\\n                .replace(to_search.format(key),\n                         to_replace.format(key, val))\n\n        return pipeline_string\n\n    def run_auto_pipeline(self, tasks):\n        \"\"\"Main method to run the automatic pipeline creation\n\n        This method aggregates the functions required to build the pipeline\n        string that can be used as input for the workflow generator.\n\n        Parameters\n        ----------\n        tasks : str\n            A string with the space separated tasks to be included in the\n            pipeline\n\n        Returns\n        -------\n        str : String with the pipeline definition used as input for\n        parse_pipeline\n        \"\"\"\n\n        self.forks = self.define_pipeline_string(\n            self.process_descriptions,\n            tasks,\n            True,\n            True,\n            self.count_forks,\n            tasks,\n            self.forks\n        )\n\n        self.pipeline_string = self.build_pipeline_string(self.forks)\n\n        return self.pipeline_string\n\n    # def get_process_info(self):\n    #     return list(self.process_descriptions.keys())\n\n\nclass Innuendo(InnuendoRecipe):\n    \"\"\"\n    Recipe class for the INNUENDO Project. It has all the available in the\n    platform for quick use of the processes in the scope of the project.\n    \"\"\"\n\n    def __init__(self, *args, **kwargs):\n\n        super().__init__(*args, **kwargs)\n\n        # The description of the processes\n        # [forkable, input_process, output_process]\n        self.process_descriptions = {\n            \"reads_download\": [False, None,\"integrity_coverage|seq_typing|patho_typing\"],\n            \"patho_typing\": [True, None, None],\n            \"seq_typing\": [True, None, None],\n            \"integrity_coverage\": [True, None, \"fastqc_trimmomatic\"],\n            \"fastqc_trimmomatic\": [False, \"integrity_coverage\",\n                                   \"true_coverage\"],\n            \"true_coverage\": [False, \"fastqc_trimmomatic\",\n                              \"fastqc\"],\n            \"fastqc\": [False, \"true_coverage\", \"check_coverage\"],\n            \"check_coverage\": [False, \"fastqc\", \"spades|skesa\"],\n            \"spades\": [False, \"fastqc_trimmomatic\", \"process_spades\"],\n            \"skesa\": [False, \"fastqc_trimmomatic\", \"process_skesa\"],\n            \"process_spades\": [False, \"spades\", \"assembly_mapping\"],\n            \"process_skesa\": [False, \"skesa\", \"assembly_mapping\"],\n            \"assembly_mapping\": [False, \"process_spades\", \"pilon\"],\n            \"pilon\": [False, \"assembly_mapping\", \"mlst\"],\n            \"mlst\": [False, \"pilon\", \"abricate|prokka|chewbbaca|sistr\"],\n            \"sistr\": [True, \"mlst\", None],\n            \"abricate\": [True, \"mlst\", None],\n            #\"prokka\": [True, \"mlst\", None],\n            \"chewbbaca\": [True, \"mlst\", None]\n        }\n\n\ndef brew_innuendo(args):\n    \"\"\"Brews a given list of processes according to the recipe\n\n    Parameters\n    ----------\n    args : argparse.Namespace\n        The arguments passed through argparser that will be used to check the\n        the recipe, tasks and brew the process\n\n    Returns\n    -------\n    str\n        The final pipeline string, ready for the engine.\n    list\n        List of process strings.\n    \"\"\"\n\n    # Create recipe class instance\n    automatic_pipeline = Innuendo()\n\n    if not args.tasks:\n        input_processes = \" \".join(\n            automatic_pipeline.process_descriptions.keys())\n    else:\n        input_processes = args.tasks\n\n    # Validate the provided pipeline processes\n    validated = automatic_pipeline.validate_pipeline(input_processes)\n    if not validated:\n        sys.exit(1)\n    # Get the final pipeline string\n    pipeline_string = automatic_pipeline.run_auto_pipeline(input_processes)\n\n    return pipeline_string\n\n\nclass Recipe:\n\n    def __init__(self):\n\n        self.pipeline_str = None\n        \"\"\"\n        str: The raw pipeline string, with no attribute or directives, except\n        for number indicators for when there are duplicate components.\n        \n        e.g.: \"fastqc trimmomatic spades\"\n        e.g.: \"fastqc trimmomatic (spades#1 | spades#2)\n        \"\"\"\n\n        self.directives = {}\n        \"\"\"\n        dict: Dictionary with the parameters and directives for each component\n        in the pipeline_str attribute. Missing components will be left with\n        the default parameters and directives. \n        \"\"\"\n\n    def brew(self):\n\n        if not hasattr(self, \"name\"):\n            raise eh.RecipeError(\"Recipe class '{}' does not have a 'name' \"\n                                 \"attribute set\".format(self.__class__))\n\n        if not self.pipeline_str:\n            raise eh.RecipeError(\"Recipe with name '{}' does not have a \"\n                                 \"pipeline_str attribute set\".format(self.name))\n\n        for component, vals in self.directives.items():\n\n            params = vals.get(\"params\", None)\n            directives = vals.get(\"directives\", None)\n\n            # Check for component number symbol\n            if \"#\" in component:\n                _component = component.split(\"#\")[0]\n            else:\n                _component = component\n\n            component_str = self._get_component_str(_component, params,\n                                                    directives)\n\n            self.pipeline_str = self.pipeline_str.replace(component,\n                                                          component_str)\n\n        return self.pipeline_str\n\n    @staticmethod\n    def _get_component_str(component, params=None, directives=None):\n        \"\"\" Generates a component string based on the provided parameters and\n        directives\n\n        Parameters\n        ----------\n        component : str\n            Component name\n        params : dict\n            Dictionary with parameter information\n        directives : dict\n            Dictionary with directives information\n\n        Returns\n        -------\n        str\n            Component string with the parameters and directives, ready for\n            parsing by flowcraft engine\n        \"\"\"\n\n        final_directives = {}\n\n        if directives:\n            final_directives = directives\n\n        if params:\n            final_directives[\"params\"] = params\n\n        if final_directives:\n            return \"{}={}\".format(\n                component, json.dumps(final_directives, separators=(\",\", \":\")))\n        else:\n            return component\n\n\ndef brew_recipe(recipe_name):\n    \"\"\"Returns a pipeline string from a recipe name.\n\n    Parameters\n    ----------\n    recipe_name : str\n        Name of the recipe. Must match the name attribute in one of the classes\n        defined in :mod:`flowcraft.generator.recipes`\n\n    Returns\n    -------\n    str\n        Pipeline string ready for parsing and processing by flowcraft engine\n    \"\"\"\n\n    # This will iterate over all modules included in the recipes subpackage\n    # It will return the import class and the module name, algon with the\n    # correct prefix\n    prefix = \"{}.\".format(recipes.__name__)\n    for importer, modname, _ in pkgutil.iter_modules(recipes.__path__, prefix):\n\n        # Import the current module\n        _module = importer.find_module(modname).load_module(modname)\n\n        # Fetch all available classes in module\n        _recipe_classes = [cls for cls in _module.__dict__.values() if\n                           isinstance(cls, type)]\n\n        # Iterate over each Recipe class, and check for a match with the\n        # provided recipe name.\n        for cls in _recipe_classes:\n            # Create instance of class to allow fetching the name attribute\n            recipe_cls = cls()\n            if getattr(recipe_cls, \"name\", None) == recipe_name:\n                return recipe_cls.brew()\n\n    logger.error(\n        colored_print(\"Recipe name '{}' does not exist.\".format(recipe_name))\n    )\n    sys.exit(1)\n\n\ndef list_recipes(full=False):\n    \"\"\"Method that iterates over all available recipes and prints their\n    information to the standard output\n\n    Parameters\n    ----------\n    full : bool\n        If true, it will provide the pipeline string along with the recipe name\n    \"\"\"\n\n    logger.info(colored_print(\n        \"\\n===== L I S T   O F   R E C I P E S =====\\n\",\n        \"green_bold\"))\n\n    # This will iterate over all modules included in the recipes subpackage\n    # It will return the import class and the module name, algon with the\n    # correct prefix\n    prefix = \"{}.\".format(recipes.__name__)\n    for importer, modname, _ in pkgutil.iter_modules(recipes.__path__, prefix):\n\n        # Import the current module\n        _module = importer.find_module(modname).load_module(modname)\n\n        # Fetch all available classes in module\n        _recipe_classes = [cls for cls in _module.__dict__.values() if\n                           isinstance(cls, type)]\n\n        # Iterate over each Recipe class, and check for a match with the\n        # provided recipe name.\n        for cls in _recipe_classes:\n\n            recipe_cls = cls()\n\n            if hasattr(recipe_cls, \"name\"):\n                logger.info(colored_print(\"=> {}\".format(recipe_cls.name), \"blue_bold\"))\n                if full:\n                    logger.info(colored_print(\"\\t {}\".format(recipe_cls.__doc__), \"purple_bold\"))\n                    logger.info(colored_print(\"Pipeline string: {}\\n\".format(recipe_cls.pipeline_str), \"yellow_bold\"))\n\n    sys.exit(0)\n"
  },
  {
    "path": "flowcraft/generator/recipes/__init__.py",
    "content": ""
  },
  {
    "path": "flowcraft/generator/recipes/denim.py",
    "content": "try:\n    from generator.recipe import Recipe\nexcept ImportError:\n    from flowcraft.generator.recipe import Recipe\n\n\nclass Denim(Recipe):\n    \"\"\"\n    DEN-IM: Dengue Virus Identification from Metagenomic and Targeted Sequencing\n    Standalone version available at https://github.com/assemblerflow/DEN-IM\n    \"\"\"\n\n    def __init__(self):\n\n        self.name = \"denim\"\n\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"filter_poly \" \\\n                            \"bowtie \" \\\n                            \"retrieve_mapped \" \\\n                            \"check_coverage \" \\\n                            \"viral_assembly \" \\\n                            \"assembly_mapping \" \\\n                            \"pilon \" \\\n                            \"split_assembly \" \\\n                            \"dengue_typing \" \\\n                            \"mafft \" \\\n                            \"raxml\"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"params\": {\"genomeSize\": \"0.012\", \"minCoverage\": \"15\"}\n            },\n            \"check_coverage\": {\n                \"params\": {\"genomeSize\": \"0.012\", \"minCoverage\": \"15\"}\n            },\n            \"bowtie\": {\n                \"directives\": {\"container\": \"flowcraft/bowtie_dengue\",\n                               \"version\": \"2-1\"},\n                \"params\": {\n                    \"reference\": \"\\\"ref/1_GenotypesDENV_14-05-18.fasta\\\"\"}\n            },\n            \"assembly_mapping\": {\n                \"params\": {\"AMaxContigs\": \"1000\", \"genomeSize\": \"0.01\"}\n            },\n            \"split_assembly\": {\n                \"params\": {\"size\": \"10000\"}\n            }\n        }"
  },
  {
    "path": "flowcraft/generator/recipes/innuca.py",
    "content": "try:\n    from generator.recipe import Recipe\nexcept ImportError:\n    from flowcraft.generator.recipe import Recipe\n\n\nclass Innuca(Recipe):\n    \"\"\"\n    Bacterial genome assembly pipeline based on the SPAdes assembler and using\n    pre-assembly quality control and read trimming and post-assembly polishing\n    with Pilon\n    \"\"\"\n\n    def __init__(self):\n        super().__init__()\n\n        # Recipe name\n        self.name = \"innuca\"\n\n        # Recipe pipeline\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"fastqc \" \\\n                            \"check_coverage \" \\\n                            \"true_coverage \" \\\n                            \"spades \" \\\n                            \"process_spades \" \\\n                            \"pilon \" \\\n                            \"mlst \"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"directives\": {\"cpus\": \"1\", \"memory\": \"\\\"2GB\\\"\"},\n                \"params\": {\"genomeSize\": \"1\", \"minCoverage\": \"15\"}\n            }\n        }\n"
  },
  {
    "path": "flowcraft/generator/recipes/plasmids.py",
    "content": "try:\n    from generator.recipe import Recipe\nexcept ImportError:\n    from flowcraft.generator.recipe import Recipe\n\n\nclass Plasmids(Recipe):\n    \"\"\"\n    Plasmid detection pipeline using mapping, mash_screen and assembly with\n    SPAdes, with gene annotations with abricate. Outputs json files that\n    can be imported into pATLAS.\n    \"\"\"\n\n    def __init__(self):\n        super().__init__()\n\n        self.name = \"plasmids\"\n\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"( spades pilon (mash_dist | abricate) |\" \\\n                            \"mash_screen | \" \\\n                            \"mapping_patlas)\"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"params\": {\"genomeSize\": \"0\"}\n            }\n        }\n\n\nclass PlasmidsMapping(Recipe):\n    \"\"\"\n    Plasmid detection pipeline using mapping with bowtie2. Outputs json\n    files that can be imported into pATLAS.\n    \"\"\"\n\n    def __init__(self):\n        super().__init__()\n\n        self.name = \"plasmids_mapping\"\n\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"mapping_patlas\"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"params\": {\"genomeSize\": \"0\"}\n            }\n        }\n\n\nclass PlasmidsAssembly(Recipe):\n    \"\"\"\n    Plasmid detection pipeline using assembly with SPAdes and mash dist.\n    Outputs json files that can be imported into pATLAS.\n    \"\"\"\n\n    def __init__(self):\n        super().__init__()\n\n        self.name = \"plasmids_assembly\"\n\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"spades \" \\\n                            \"pilon \" \\\n                            \"mash_dist\"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"params\": {\"genomeSize\": \"0\"}\n            }\n        }\n\n\nclass PlasmidsMash(Recipe):\n    \"\"\"\n    Plasmid detection pipeline using mash screen. Outputs json files that can\n    be imported into pATLAS.\n    \"\"\"\n\n    def __init__(self):\n        super().__init__()\n\n        self.name = \"plasmids_mash\"\n\n        self.pipeline_str = \"integrity_coverage \" \\\n                            \"fastqc_trimmomatic \" \\\n                            \"mash_screen\"\n\n        # Recipe parameters and directives\n        self.directives = {\n            \"integrity_coverage\": {\n                \"params\": {\"genomeSize\": \"0\"}\n            }\n        }\n"
  },
  {
    "path": "flowcraft/generator/report.py",
    "content": "import os\nimport re\nimport sys\nimport json\nimport uuid\nimport signal\nimport socket\nimport hashlib\nimport logging\nimport requests\n\nfrom os.path import join, abspath\nfrom time import sleep\nfrom pympler.asizeof import asizeof\n\ntry:\n    import generator.error_handling as eh\n    from generator.process_details import colored_print\n    from generator.utils import get_nextflow_filepath\nexcept ImportError:\n    import flowcraft.generator.error_handling as eh\n    from flowcraft.generator.process_details import colored_print\n    from flowcraft.generator.utils import get_nextflow_filepath\n\nlogger = logging.getLogger(\"main.{}\".format(__name__))\n\n\ndef signal_handler():\n    \"\"\"This function is bound to the SIGINT signal (like ctrl+c) to graciously\n    exit the program and reset the curses options.\n    \"\"\"\n\n    print(\"Exiting flowcraft report brodcast... Bye\")\n    sys.exit(0)\n\n\nclass FlowcraftReport:\n\n    def __init__(self, report_file, trace_file=None, log_file=None,\n                 watch=False, ip_addr=None):\n\n        self.report_file = report_file\n        \"\"\"\n        str: Path to Report JSON file.\n        \"\"\"\n\n        if not ip_addr:\n            self.app_address = \"http://www.flowcraft.live:80/\"\n        else:\n            self.app_address = ip_addr\n            \"\"\"\n            str: Address of flowcraft web app\n            \"\"\"\n\n        self.broadcast_address = \"{}reports/broadcast/api/reports\".format(\n            self.app_address)\n\n        self.refresh_rate = 1\n\n        self.send = True\n        \"\"\"\n        boolean: This attribute is used when the report mode is used with the\n        --watch option. It will be set to False after sending a request, and \n        set to True when there is a change in the pipeline reports.\n        \"\"\"\n\n        self.watch = watch\n        \"\"\"\n        boolean: When False, the reports mode will try to open the provided\n        report JSON file and send it to the flowcraft service. When True, \n        it will try to open the nextflow trace file instead and continuously \n        compile the report JSON files from the `report` processes as they \n        are created. \n        \"\"\"\n\n        self.log_file = log_file\n        \"\"\"\n        str: Path to .nextflow.log file.\n        \"\"\"\n\n        self.log_sizestamp = None\n        \"\"\"\n        str: Stores the sizestamp of the last modification of the trace file.\n        This is used to parse the file only when it has changed.\n        \"\"\"\n\n        self.status_info = None\n        \"\"\"\n        str: Status of the pipeline execution. Used in the watch report mode\n        and varies between 'running', 'aborted', 'complete'.\n        \"\"\"\n\n        self.trace_file = trace_file\n        \"\"\"\n        str: Path to nextflow trace file.\n        \"\"\"\n\n        self.trace_sizestamp = None\n        \"\"\"\n        str: Stores the sizestamp of the last modification of the trace file.\n        This is used to parse the file only when it has changed.\n        \"\"\"\n\n        self.trace_retry = 0\n        \"\"\"\n        int: Each time the log file is not found, this counter is \n        increased. Only when it matches the :attr:`MAX_RETRIES` attribute\n        does it raises a FileNotFoundError.\n        \"\"\"\n\n        self.stored_ids = []\n        \"\"\"\n        list: Stores the task_ids that have already been parsed. It is used\n        to skip them when parsing the trace files multiple times.\n        \"\"\"\n\n        self.report_queue = []\n        \"\"\"\n        list: Stores the paths of the report JSON files that are on queue to\n        be sent to the flowcraft service. This list will be emptied when these\n        JSONs are sent.\n        \"\"\"\n\n        # Checks if report file is available\n        self._check_required_files()\n\n        signal.signal(signal.SIGINT, lambda *x: signal_handler())\n\n    def _check_required_files(self):\n\n        if not os.path.exists(self.report_file) and not self.watch:\n            raise eh.ReportError(\"The provided report JSON file could not be\"\n                                 \" opened: {}\".format(self.report_file))\n\n    @staticmethod\n    def _header_mapping(header):\n        \"\"\"Parses the trace file header and retrieves the positions of each\n        column key.\n\n        Parameters\n        ----------\n        header : str\n            The header line of nextflow's trace file\n\n        Returns\n        -------\n        dict\n            Mapping the column ID to its position (e.g.: {\"tag\":2})\n        \"\"\"\n\n        return dict(\n            (x.strip(), pos) for pos, x in enumerate(header.split(\"\\t\"))\n        )\n\n    @staticmethod\n    def _expand_path(hash_str):\n        \"\"\"Expands the hash string of a process (ae/1dasjdm) into a full\n        working directory\n\n        Parameters\n        ----------\n        hash_str : str\n            Nextflow process hash with the beggining of the work directory\n\n        Returns\n        -------\n        str\n            Path to working directory of the hash string\n        \"\"\"\n\n        try:\n            first_hash, second_hash = hash_str.split(\"/\")\n            first_hash_path = join(abspath(\"work\"), first_hash)\n\n            for l in os.listdir(first_hash_path):\n                if l.startswith(second_hash):\n                    return join(first_hash_path, l)\n        except FileNotFoundError:\n            return None\n\n    def _get_report_id(self):\n        \"\"\"Returns a hash of the reports JSON file\n        \"\"\"\n\n        if self.watch:\n\n            # Searches for the first occurence of the nextflow pipeline\n            # file name in the .nextflow.log file\n            pipeline_path = get_nextflow_filepath(self.log_file)\n\n            # Get hash from the entire pipeline file\n            pipeline_hash = hashlib.md5()\n            with open(pipeline_path, \"rb\") as fh:\n                for chunk in iter(lambda: fh.read(4096), b\"\"):\n                    pipeline_hash.update(chunk)\n            # Get hash from the current working dir and hostname\n            workdir = os.getcwd().encode(\"utf8\")\n            hostname = socket.gethostname().encode(\"utf8\")\n            hardware_addr = str(uuid.getnode()).encode(\"utf8\")\n            dir_hash = hashlib.md5(workdir + hostname + hardware_addr)\n\n            return pipeline_hash.hexdigest() + dir_hash.hexdigest()\n\n        else:\n            with open(self.report_file) as fh:\n                report_json = json.loads(fh.read())\n\n            metadata = report_json[\"data\"][\"results\"][0][\"nfMetadata\"]\n\n            try:\n                report_id = metadata[\"scriptId\"] + metadata[\"sessionId\"]\n            except KeyError:\n                raise eh.ReportError(\"Incomplete or corrupt report JSON file \"\n                                     \"missing the 'scriptId' and/or 'sessionId' \"\n                                     \"metadata information\")\n\n            return report_id\n\n    def _update_pipeline_status(self):\n        \"\"\"\n        Parses the .nextflow.log file for signatures of pipeline status and sets\n        the :attr:`status_info` attribute.\n        \"\"\"\n\n        prev_status = self.status_info\n\n        with open(self.log_file) as fh:\n\n            for line in fh:\n\n                if \"Session aborted\" in line:\n                    self.status_info = \"aborted\"\n                    self.send = True if prev_status != self.status_info \\\n                        else self.send\n                    return\n\n                if \"Execution complete -- Goodbye\" in line:\n                    self.status_info = \"complete\"\n                    self.send = True if prev_status != self.status_info \\\n                        else self.send\n                    return\n\n            self.status_info = \"running\"\n            self.send = True if prev_status != self.status_info \\\n                else self.send\n\n    def update_trace_watch(self):\n        \"\"\"Parses the nextflow trace file and retrieves the path of report JSON\n        files that have not been sent to the service yet.\n        \"\"\"\n\n        # Check the size stamp of the tracefile. Only proceed with the parsing\n        # if it changed from the previous size.\n        size_stamp = os.path.getsize(self.trace_file)\n        self.trace_retry = 0\n        if size_stamp and size_stamp == self.trace_sizestamp:\n            return\n        else:\n            logger.debug(\"Updating trace size stamp to: {}\".format(size_stamp))\n            self.trace_sizestamp = size_stamp\n\n        with open(self.trace_file) as fh:\n\n            # Skip potential empty lines at the start of file\n            header = next(fh).strip()\n            while not header:\n                header = next(fh).strip()\n\n            # Get header mappings before parsing the file\n            hm = self._header_mapping(header)\n\n            for line in fh:\n                # Skip empty lines\n                if line.strip() == \"\":\n                    continue\n\n                fields = line.strip().split(\"\\t\")\n\n                # Skip if task ID was already processes\n                if fields[hm[\"task_id\"]] in self.stored_ids:\n                    continue\n\n                if fields[hm[\"process\"]] == \"report\":\n                    self.report_queue.append(\n                        self._expand_path(fields[hm[\"hash\"]])\n                    )\n                    self.send = True\n\n                # Add the processed trace line to the stored ids. It will be\n                # skipped in future parsers\n                self.stored_ids.append(fields[hm[\"task_id\"]])\n\n    def update_log_watch(self):\n        \"\"\"Parses nextflow log file and updates the run status\n        \"\"\"\n\n        # Check the size stamp of the tracefile. Only proceed with the parsing\n        # if it changed from the previous size.\n        size_stamp = os.path.getsize(self.log_file)\n        self.trace_retry = 0\n        if size_stamp and size_stamp == self.log_sizestamp:\n            return\n        else:\n            logger.debug(\"Updating log size stamp to: {}\".format(size_stamp))\n            self.log_sizestamp = size_stamp\n\n        self._update_pipeline_status()\n\n    def _send_live_report(self, report_id):\n        \"\"\"Sends a PUT request with the report JSON files currently in the\n        report_queue attribute.\n\n        Parameters\n        ----------\n        report_id : str\n            Hash of the report JSON as retrieved from :func:`~_get_report_hash`\n        \"\"\"\n\n        # Determines the maximum number of reports sent at the same time in\n        # the same payload\n        buffer_size = 100\n        logger.debug(\"Report buffer size set to: {}\".format(buffer_size))\n\n        for i in range(0, len(self.report_queue), buffer_size):\n\n            # Reset the report compilation batch\n            reports_compilation = []\n\n            # Iterate over report JSON batches determined by buffer_size\n            for report in self.report_queue[i: i + buffer_size]:\n                try:\n                    report_file = [x for x in os.listdir(report)\n                                   if x.endswith(\".json\")][0]\n                except IndexError:\n                    continue\n                with open(join(report, report_file)) as fh:\n                    reports_compilation.append(json.loads(fh.read()))\n\n            logger.debug(\"Payload sent with size: {}\".format(\n                asizeof(json.dumps(reports_compilation))\n            ))\n            logger.debug(\"status: {}\".format(self.status_info))\n\n            try:\n                requests.put(\n                    self.broadcast_address,\n                    json={\"run_id\": report_id,\n                          \"report_json\": reports_compilation,\n                          \"status\": self.status_info}\n                )\n            except requests.exceptions.ConnectionError:\n                logger.error(colored_print(\n                    \"ERROR: Could not establish connection with server. The server\"\n                    \" may be down or there is a problem with your internet \"\n                    \"connection.\", \"red_bold\"))\n                sys.exit(1)\n\n        # When there is no change in the report queue, but there is a change\n        # in the run status of the pipeline\n        if not self.report_queue:\n\n            logger.debug(\"status: {}\".format(self.status_info))\n\n            try:\n                requests.put(\n                    self.broadcast_address,\n                    json={\"run_id\": report_id,\n                          \"report_json\": [],\n                          \"status\": self.status_info}\n                )\n            except requests.exceptions.ConnectionError:\n                logger.error(colored_print(\n                    \"ERROR: Could not establish connection with server. The\"\n                    \" server may be down or there is a problem with your \"\n                    \"internet connection.\", \"red_bold\"))\n                sys.exit(1)\n\n        # Reset the report queue after sending the request\n        self.report_queue = []\n\n    def _init_live_reports(self, report_id):\n        \"\"\"Sends a POST request to initialize the live reports\n\n        Parameters\n        ----------\n        report_id : str\n            Hash of the report JSON as retrieved from :func:`~_get_report_hash`\n        \"\"\"\n\n        logger.debug(\"Sending initial POST request to {} to start report live\"\n                     \" update\".format(self.broadcast_address))\n\n        try:\n            with open(\".metadata.json\") as fh:\n                metadata = [json.load(fh)]\n        except:\n            metadata = []\n\n        start_json = {\n            \"data\": {\"results\": metadata}\n        }\n\n        try:\n            requests.post(\n                self.broadcast_address,\n                json={\"run_id\": report_id, \"report_json\": start_json,\n                      \"status\": self.status_info}\n            )\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _close_connection(self, report_id):\n        \"\"\"Sends a delete request for the report JSON hash\n\n        Parameters\n        ----------\n        report_id : str\n            Hash of the report JSON as retrieved from :func:`~_get_report_hash`\n        \"\"\"\n\n        logger.debug(\n            \"Closing connection and sending DELETE request to {}\".format(\n                self.broadcast_address))\n\n        try:\n            r = requests.delete(self.broadcast_address,\n                                json={\"run_id\": report_id})\n            if r.status_code != 202:\n                logger.error(colored_print(\n                    \"ERROR: There was a problem sending data to the server\"\n                    \"with reason: {}\".format(r.reason)))\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _send_report(self, report_id):\n\n        with open(self.report_file) as fh:\n            report_json = json.loads(fh.read())\n\n        logger.debug(\"Unique payload sent with size: {}\".format(\n            asizeof(json.dumps(report_json))\n        ))\n\n        try:\n            requests.post(\n                self.broadcast_address,\n                json={\"run_id\": report_id, \"report_json\": report_json}\n            )\n        except requests.exceptions.ConnectionError:\n            logger.error(colored_print(\n                \"ERROR: Could not establish connection with server. The server\"\n                \" may be down or there is a problem with your internet \"\n                \"connection.\", \"red_bold\"))\n            sys.exit(1)\n\n    def _print_msg(self, run_id):\n\n        report_address = \"{}reports/broadcast/{}\".format(self.app_address,\n                                                         run_id)\n        logger.info(colored_print(\n            \"The pipeline reports are available in the following link:\",\n            \"green_bold\"))\n        logger.info(\"{}\".format(report_address))\n\n    def broadcast_report(self):\n\n        logger.info(colored_print(\"Preparing to broacast reports...\",\n                                  \"green_bold\"))\n\n        report_hash = self._get_report_id()\n\n        # When in watch mode,\n        if self.watch:\n            logger.info(colored_print(\"\\tFetching pipeline run status\",\n                                      \"green_bold\"))\n            self._update_pipeline_status()\n            logger.info(colored_print(\n                \"\\tSending initial request to test service\", \"green_bold\"))\n            self._init_live_reports(report_hash)\n            logger.info(colored_print(\"\\tInitial parsing of trace file\",\n                                      \"green_bold\"))\n            self.update_trace_watch()\n\n            self._print_msg(report_hash)\n\n        logger.debug(\"Establishing connection...\")\n\n        stay_alive = True\n        _broadcast_sent = False\n        try:\n            while stay_alive:\n\n                # When not in watch mode, send the report JSON once\n                if not _broadcast_sent and not self.watch:\n                    self._send_report(report_hash)\n                    self._print_msg(report_hash)\n                    _broadcast_sent = True\n\n                # When in watch mode, continuously monitor the trace file for\n                # updates\n                if self.watch:\n                    self.update_trace_watch()\n                    self.update_log_watch()\n                    # When new report JSON files are available, send then\n                    # via a PUT request\n                    if self.send:\n                        self._send_live_report(report_hash)\n                        self.send = False\n\n                sleep(self.refresh_rate)\n\n        except FileNotFoundError as e:\n            print(e)\n            logger.error(colored_print(\n                \"ERROR: Report JSON file is not reachable!\", \"red_bold\"))\n        except Exception as e:\n            logger.exception(\"ERROR: \" + e)\n        finally:\n            logger.info(\"Closing connection\")\n            self._close_connection(report_hash)\n"
  },
  {
    "path": "flowcraft/generator/templates/Helper.groovy",
    "content": "class Help {\n\n    static def start_info(Map info, String time, String profile) {\n\n        println \"\"\n        println \"============================================================\"\n        println \"                {{ pipeline_name }}\"\n        println \"============================================================\"\n        println \"Built using flowcraft v{{ version }}\"\n        println \"\"\n        if (info.containsKey(\"fastq\")){\n        int nsamples = info.fastq / 2\n        println \" Input FastQ                 : $info.fastq\"\n        println \" Input samples               : $nsamples\"\n        }\n        if (info.containsKey(\"fasta\")){\n        println \" Input Fasta                 : $info.fasta\"\n        }\n        if (info.containsKey(\"accessions\")){\n        println \" Input accessions            : $info.accessions\"\n        }\n        println \" Reports are found in        : ./reports\"\n        println \" Results are found in        : ./results\"\n        println \" Profile                     : $profile\"\n        println \"\"\n        println \"Starting pipeline at $time\"\n        println \"\"\n\n    }\n\n    static void complete_info(nextflow.script.WorkflowMetadata wf) {\n\n        println \"\"\n        println \"Pipeline execution summary\"\n        println \"==========================\"\n        println \"Completed at                 : $wf.complete\"\n        println \"Duration                     : $wf.duration\"\n        println \"Success                      : $wf.success\"\n        println \"Work directory               : $wf.workDir\"\n        println \"Exit status                  : $wf.exitStatus\"\n        println \"\"\n\n    }\n\n    static def print_help(Map params) {\n\n        println \"\"\n        println \"============================================================\"\n        println \"                {{ pipeline_name }}\"\n        println \"============================================================\"\n        println \"Built using flowcraft v{{ version }}\"\n        println \"\"\n        println \"\"\n        println \"Usage: \"\n        println \"    nextflow run {{ nf_file }}\"\n        println \"\"\n        {% for line in help_list -%}\n        println \"       {{ line }}\"\n        {% endfor %}\n    }\n\n}\n\nclass CollectInitialMetadata {\n\n    public static void print_metadata(nextflow.script.WorkflowMetadata workflow){\n\n        def treeDag = new File(\"${workflow.projectDir}/.treeDag.json\").text\n        def forkTree = new File(\"${workflow.projectDir}/.forkTree.json\").text\n\n        def metadataJson = \"{'nfMetadata':{'scriptId':'${workflow.scriptId}',\\\n'scriptName':'${workflow.scriptName}',\\\n'profile':'${workflow.profile}',\\\n'container':'${workflow.container}',\\\n'containerEngine':'${workflow.containerEngine}',\\\n'commandLine':'${workflow.commandLine}',\\\n'runName':'${workflow.runName}',\\\n'sessionId':'${workflow.sessionId}',\\\n'projectDir':'${workflow.projectDir}',\\\n'launchDir':'${workflow.launchDir}',\\\n'startTime':'${workflow.start}',\\\n'dag':${treeDag},\\\n'forks':${forkTree}}}\"\n\n        def json = metadataJson.replaceAll(\"'\", '\"')\n\n        def jsonFile = new File(\".metadata.json\")\n        jsonFile.write json\n    }\n}"
  },
  {
    "path": "flowcraft/generator/templates/abricate.nf",
    "content": "if ( params.abricateDataDir{{ param_id }} ){\n    if ( !file(params.abricateDataDir{{ param_id }}).exists() ){\n        exit 1, \"'abricateDataDir{{ param_id }}' data directory was not found: '${params.abricateDatabases{{ param_id }}}'\"\n    }\n    dataDirOpt = \"--datadir ${params.abricateDataDir{{ param_id }}}\"\n} else {\n    dataDirOpt = \"\"\n}\n\nif ( !params.abricateMinId{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'abricateMinId{{ param_id }}' parameter must be a number. Provide value: '${params.abricateMinId{{ param_id }}}'\"\n}\n\nif ( !params.abricateMinCov{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'abricateMinCov{{ param_id }}' parameter must be a number. Provide value: '${params.abricateMinCov{{ param_id }}}'\"\n}\n\n\nprocess abricate_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { \"${sample_id} ${db}\" }\n    publishDir \"results/annotation/abricate_{{ pid }}/${sample_id}\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    each db from params.abricateDatabases{{ param_id }}\n    val min_id from Channel.value(params.abricateMinId{{ param_id }})\n    val min_cov from Channel.value(params.abricateMinCov{{ param_id }})\n\n    output:\n    file '*.tsv' into abricate_out_{{ pid }}\n    {% with task_name=\"abricate\", suffix=\"_$db\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # Run abricate\n        abricate $dataDirOpt --minid $min_id --mincov $min_cov --db $db $assembly > ${sample_id}_abr_${db}.tsv\n        echo pass > .status\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n\n}\n\n\nprocess process_abricate_{{ pid }} {\n\n    tag \"process_abricate_{{ pid }}\"\n\n    // Send POST request to platform\n    {% with overwrite=\"false\" %}\n    {% include \"report_post.txt\" ignore missing %}\n    {% endwith %}\n\n    input:\n    file abricate_file from abricate_out_{{ pid }}.collect()\n\n    output:\n    {% with task_name=\"process_abricate\", sample_id=\"val('process_abricate')\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_abricate.py\"\n\n\n}\n\n\n\n"
  },
  {
    "path": "flowcraft/generator/templates/abyss.nf",
    "content": "process abyss_{{ pid }} {\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/abyss_{{ pid }}/', pattern: '*-scaffolds.fa'\n    publishDir 'results/assembly/abyss_{{ pid }}/', pattern: '*-scaffolds.gfa'\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val k from Channel.value(params.abyssKmer{{ param_id }})\n\n    output:\n    set sample_id, file('*-scaffolds.fa') into {{ output_channel }}\n    file \"*-scaffolds.gfa\" into gfa1_{{ pid }}\n    {% with task_name=\"abyss\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"abyss-pe name=${sample_id} graph=gfa k=${k} v=-v in=\\\"${fastq_pair[0]} ${fastq_pair[1]}\\\"\"\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/assembly_mapping.nf",
    "content": "if ( !params.minAssemblyCoverage{{ param_id }}.toString().isNumber() ){\n    if (params.minAssemblyCoverage{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'minAssemblyCoverage{{ param_id }}' parameter must be a number or 'auto'. Provided value: ${params.minAssemblyCoverage{{ param_id }}}\"\n    }\n}\nif ( !params.AMaxContigs{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'AMaxContigs{{ param_id }}' parameter must be a number. Provide value: '${params.AMaxContigs{{ param_id }}}'\"\n}\n\nIN_assembly_mapping_opts_{{ pid }} = Channel.value([params.minAssemblyCoverage{{ param_id }},params.AMaxContigs{{ param_id }}])\nIN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n\n\nprocess assembly_mapping_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(assembly), file(fastq) from {{ input_channel }}.join(_LAST_fastq_{{ pid }})\n\n    output:\n    set sample_id, file(assembly), 'coverages.tsv', 'coverage_per_bp.tsv', 'sorted.bam', 'sorted.bam.bai' into MAIN_am_out_{{ pid }}\n    set sample_id, file(\"coverage_per_bp.tsv\") optional true into SIDE_BpCoverage_{{ pid }}\n    {% with task_name=\"assembly_mapping\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        echo [DEBUG] BUILDING BOWTIE INDEX FOR ASSEMBLY: $assembly >> .command.log 2>&1\n        bowtie2-build --threads ${task.cpus} $assembly genome_index >> .command.log 2>&1\n        echo [DEBUG] MAPPING READS FROM $fastq >> .command.log 2>&1\n        bowtie2 -q --very-sensitive-local --threads ${task.cpus} -x genome_index -1 ${fastq[0]} -2 ${fastq[1]} -S mapping.sam >> .command.log 2>&1\n        echo [DEBUG] CONVERTING AND SORTING SAM TO BAM >> .command.log 2>&1\n        samtools sort -o sorted.bam -O bam -@ ${task.cpus} mapping.sam && rm *.sam  >> .command.log 2>&1\n        echo [DEBUG] CREATING BAM INDEX >> .command.log 2>&1\n        samtools index sorted.bam >> .command.log 2>&1\n        echo [DEBUG] ESTIMATING READ DEPTH >> .command.log 2>&1\n        parallel -j ${task.cpus} samtools depth -ar {} sorted.bam \\\\> {}.tab  ::: \\$(grep \">\" $assembly | cut -c 2- | tr \" \" \"_\")\n        # Insert 0 coverage count in empty files. See Issue #2\n        echo [DEBUG] REMOVING EMPTY FILES  >> .command.log 2>&1\n        find . -size 0 -print0 | xargs -0 -I{} sh -c 'echo -e 0\"\\t\"0\"\\t\"0 > \"{}\"'\n        echo [DEBUG] COMPILING COVERAGE REPORT  >> .command.log 2>&1\n        parallel -j ${task.cpus} echo -n {.} '\"\\t\"' '&&' cut -f3 {} '|' paste -sd+ '|' bc >> coverages.tsv  ::: *.tab\n        cat *.tab > coverage_per_bp.tsv\n        rm *.tab\n        if [ -f \"coverages.tsv\" ]\n        then\n            echo pass > .status\n        else\n            echo fail > .status\n        fi\n        echo -n \"\" > .report.json\n        echo -n \"\" > .versions\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\n\n/** PROCESS_ASSEMBLY_MAPPING -  MAIN\nProcesses the results from the assembly_mapping process and filters the\nassembly contigs based on coverage and length thresholds.\n*/\nprocess process_assembly_mapping_{{ pid }} {\n\n    // Send POST request to platform\n    {% with overwrite=\"false\" %}\n    {% include \"post.txt\" ignore missing %}\n    {% endwith %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n\n    input:\n    set sample_id, file(assembly), file(coverage), file(coverage_bp), file(bam_file), file(bam_index) from MAIN_am_out_{{ pid }}\n    val opts from IN_assembly_mapping_opts_{{ pid }}\n    val gsize from IN_genome_size_{{ pid }}\n\n    output:\n    set sample_id, '*_filt.fasta', 'filtered.bam', 'filtered.bam.bai' into {{ output_channel }}\n    {% with task_name=\"process_am\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_assembly_mapping.py\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/bandage.nf",
    "content": "// True when a GFA secondary channel is connected to this component.\nhas_gfa1_{{pid}} = binding.hasVariable('gfa1_{{pid}}')\n\nprocess bandage_{{pid}} {\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"reports/assembly/bandage_{{pid}}/$sample_id\"\n\n    input:\n    set sample_id, file(fasta) from {{input_channel}}\n    file gfa1 from has_gfa1_{{pid}} ? gfa1_{{pid}} : Channel.value(\"NA\")\n    file reference from params.reference{{param_id}} ?\n        Channel.fromPath(params.reference{{param_id}}) :\n        Channel.value(\"NA\")\n\n    output:\n    file \"*.png\"\n    file \"*.svg\"\n    {% with task_name=\"bandage\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    // Use the GFA assembly when available and FASTA otherwise.\n    assembly = has_gfa1_{{pid}} ? gfa1 : fasta\n    command =\n        \"\"\"\n        time Bandage image $assembly ${assembly}.png >>.command.log 2>&1\n        time Bandage image $assembly ${assembly}.svg >>.command.log 2>&1\n        \"\"\"\n    if (params.reference{{param_id}})\n        command +=\n            \"\"\"\n            time Bandage image $assembly ${assembly}.ref.png --query $reference >>.command.log 2>&1\n            time Bandage image $assembly ${assembly}.ref.svg --query $reference >>.command.log 2>&1\n            \"\"\"\n    command\n}\n"
  },
  {
    "path": "flowcraft/generator/templates/base_recalibrator.nf",
    "content": "baseRecalibratorFasta_{{ pid }} = Channel.value(params.reference{{ param_id }}.split(\"/\").last())\nbaseRecalibratorRef_{{ pid }} = Channel.fromPath(\"${params.reference{{ param_id }}}.*\").collect().toList()\nbaseRecalibratorDbsnp_{{ pid }} = Channel.fromPath(\"${params.dbsnp{{ param_id }}}\")\nbaseRecalibratorDbsnpIdx_{{ pid }} = Channel.fromPath(\"${params.dbsnpIdx{{ param_id }}}\")\nbaseRecalibratorGoldenIndel_{{ pid }} = Channel.fromPath(\"${params.goldenIndel{{ param_id }}}\")\nbaseRecalibratorGoldenIndelIdx_{{ pid }} = Channel.fromPath(\"${params.goldenIndelIdx{{ param_id }}}\")\n\nprocess base_recalibrator_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set val(sample_id), file(bam), file(bai) from {{ input_channel }}\n    each file(reference) from baseRecalibratorRef_{{pid}}\n    val(fasta) from baseRecalibratorFasta_{{pid}}\n    each file(dbsnp) from baseRecalibratorDbsnp_{{pid}}\n    each file(dbsnp_idx) from baseRecalibratorDbsnpIdx_{{pid}}\n    each file(golden_indel) from baseRecalibratorGoldenIndel_{{pid}}\n    each file(golden_indel_idx) from baseRecalibratorGoldenIndelIdx_{{pid}}\n    \n    output:\n    set sample_id, file(\"${sample_id}_recal_data.table\"), file(bam), file(bai) into baserecalibrator_table\n    {% with task_name=\"base_recalibrator\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    # gunzip dbsnp & golden_indel if gzipped\n    [[ \"\\$(file --mime-type $dbsnp | cut -d' ' -f2)\" == \"application/x-gzip\" ]] && gzip -d --force $dbsnp\n    dbsnp=\\$(basename $dbsnp .gz)\n    [[ \"\\$(file --mime-type $dbsnp_idx | cut -d' ' -f2)\" == \"application/x-gzip\" ]] && gzip -d --force $dbsnp_idx\n    [[ \"\\$(file --mime-type $golden_indel | cut -d' ' -f2)\" == \"application/x-gzip\" ]] && gzip -d --force $golden_indel\n    golden_indel=\\$(basename $golden_indel .gz)\n    [[ \"\\$(file --mime-type $golden_indel_idx | cut -d' ' -f2)\" == \"application/x-gzip\" ]] && gzip -d --force $golden_indel_idx\n\n    gatk BaseRecalibrator \\\n      -I $bam \\\n      --known-sites \\$dbsnp \\\n      --known-sites \\$golden_indel \\\n      -O ${sample_id}_recal_data.table \\\n      -R ${fasta}.fasta\n    \"\"\"\n}\n\n\nprocess apply_bqsr_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    publishDir \"results/mapping/apply_bqsr_{{ pid }}\"\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(baserecalibrator_table), file(bam), file(bai) from baserecalibrator_table\n    \n    output:\n    set sample_id, file(\"${sample_id}_recalibrated.bam\"), file(\"${sample_id}_recalibrated.bai\") into {{ output_channel }}\n    {% with task_name=\"apply_bqsr\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    gatk ApplyBQSR \\\n      -I $bam \\\n      -bqsr $baserecalibrator_table \\\n      -O ${sample_id}_recalibrated.bam \\\n      --create-output-bam-index\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/bcalm.nf",
    "content": "// Check parameter\nif ( !params.bcalmKmerSize{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'bcalmKmerSize{{ param_id }}' parameter must be a number. Provided value: '${params.bcalmKmes%rSize{{ param_id }}}'\"\n}\n\n// Clear\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess bcalm_{{ pid }} {\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"reports/assembly/quast_{{pid}}/$sample_id\"\n\n    input:\n    set sample_id, file(fastq) from {{input_channel}}\n    val KmerSize from Channel.value(params.bcalmKmerSize{{param_id}})\n    \noutput:\n    file \"*.unitig.fa\"\n    {% with task_name=\"bcalm\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n\tbcalm -in $fastq -out unitig -kmer-size $KmerSize\"\n\n  \tif [ \"$clear\" = \"true\" ];\n\tthen\n    \t    find . -type f  -print | egrep \"work/.*(h5)|(glue)\" | xargs -L 1 rm\n\tfi\n    }\n    \"\"\"\n}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/bowtie.nf",
    "content": "// Check for the presence of absence of both index and fasta reference\nif (params.index{{ param_id }} == null && params.reference{{ param_id }} == null){\n    exit 1, \"An index or a reference fasta file must be provided.\"\n} else if (params.index{{ param_id }} != null && params.reference{{ param_id }} != null){\n    exit 1, \"Provide only an index OR a reference fasta file.\"\n}\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nif (params.reference{{ param_id }}){\n\n    reference_in_{{ pid }} = Channel.fromPath(params.reference{{ param_id }})\n        .map{it -> file(it).exists() ? [it.toString().tokenize('/').last().tokenize('.')[0..-2].join('.') ,it] : null}\n\n    process bowtie_build_{{ pid }} {\n\n        // Send POST request to platform\n        {% include \"post.txt\" ignore missing %}\n\n        tag { build_id }\n        storeDir 'bowtie_index/'\n        maxForks 1\n\n        input:\n        set build_id, file(fasta) from reference_in_{{ pid }}\n\n        output:\n        val build_id into bowtieIndexId_{{ pid }}\n        file \"${build_id}*.bt2\" into bowtieIndex_{{ pid }}\n\n        script:\n        \"\"\"\n        # checking if reference file is empty. Moved here due to allow reference file to be inside the container.\n        if [ ! -f \"$fasta\" ]\n        then\n            echo \"Error: ${fasta} file not found.\"\n            exit 1\n        fi\n\n        bowtie2-build ${fasta} $build_id > ${build_id}_bowtie2_build.log\n        \"\"\"\n    }\n} else {\n    bowtieIndexId_{{ pid }} = Channel.value(params.index{{ param_id }}.split(\"/\").last())\n    bowtieIndex_{{ pid }} = Channel.fromPath(\"${params.index{{ param_id }}}*.bt2\").collect().toList()\n}\n\n\nprocess bowtie_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/mapping/bowtie_{{ pid }}/'\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    each index from bowtieIndexId_{{pid}}\n    each file(index_files) from bowtieIndex_{{ pid }}\n\n    output:\n    set sample_id , file(\"*.bam\") into {{ output_channel }}\n    set sample_id, file(\"*_bowtie2.log\") into into_json_{{ pid }}\n    {% with task_name=\"bowtie\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        bowtie2 -x $index -1 ${fastq_pair[0]} -2 ${fastq_pair[1]} -p $task.cpus 1> ${sample_id}.bam 2> ${sample_id}_bowtie2.log\n\n        if [ \"$clear\" = \"true\" ];\n        then\n            work_regex=\".*/work/.{2}/.{30}/.*\"\n            file_source1=\\$(readlink -f \\$(pwd)/${fastq_pair[0]})\n            file_source2=\\$(readlink -f \\$(pwd)/${fastq_pair[1]})\n            if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n                rm \\$file_source1 \\$file_source2\n            fi\n        fi\n\n        echo pass > .status\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\n\nprocess report_bowtie_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(bowtie_log) from into_json_{{ pid }}\n\n    output:\n    {% with task_name=\"report_bowtie\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_mapping.py\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/bwa.nf",
    "content": "bwaIndexId_{{ pid }} = Channel.value(params.bwaIndex{{ param_id }}.split(\"/\").last())\nbwaIndex_{{ pid }} = Channel.fromPath(\"${params.bwaIndex{{ param_id }}}.*\").collect().toList()\n\nprocess bwa_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    publishDir \"results/mapping/bwa_{{ pid }}\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    each index from bwaIndexId_{{pid}}\n    each file(index_file) from bwaIndex_{{pid}}\n   \n    output:\n    set sample_id, file(\"${sample_id}.bam\"), file(\"${sample_id}.bam.bai\") into {{ output_channel }}\n    {% with task_name=\"bwa\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    bwa mem -M -R '@RG\\\\tID:${sample_id}\\\\tSM:${sample_id}\\\\tPL:Illumina' -t $task.cpus $index $fastq_pair > ${sample_id}.sam\n    samtools sort -o ${sample_id}.bam -O BAM ${sample_id}.sam\n    samtools index ${sample_id}.bam\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/card_rgi.nf",
    "content": "IN_alignment_tool_{{ pid }} = Channel.value(params.alignmentTool{{ param_id }})\n\n\nprocess card_rgi_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/annotation/card_rgi/\", pattern: \"*.txt\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    val alignmetTool from IN_alignment_tool_{{ pid }}\n\n    output:\n    file(\"${sample_id}_card_rgi.txt\")\n    {% with task_name=\"card_rgi\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    # Place card_rgi source in a read/write location for shifter container\n    mkdir card_temp && cp -r /usr/local/lib/python3.5/dist-packages/app/ card_temp\n    export PYTHONPATH=\"\\$(pwd)/card_temp:\\$PATH\"\n\n    rgi main --input_sequence ${assembly} --output_file ${sample_id}_card_rgi --input_type contig --alignment_tool ${alignmetTool} --low_quality --include_loose -d wgs --clean\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/check_coverage.nf",
    "content": "IN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit (1, \"The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize{{ param_id }}}'\")}\nIN_min_coverage_{{ pid }} = Channel.value(params.minCoverage{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit (1, \"The minCoverage parameter must be a number or a float. Provided value: '${params.minCoverage{{ param_id }}}'\")}\n\nprocess integrity_coverage2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    cpus 1\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val gsize from IN_genome_size_{{ pid }}\n    val cov from IN_min_coverage_{{ pid }}\n    // Use -e option for skipping encoding guess\n    val opts from Channel.value('-e')\n\n    output:\n    set sample_id,\n        file(fastq_pair),\n        file('*_coverage'),\n        file('*_max_len') optional true into MAIN_integrity_{{ pid }}\n    file('*_report') into LOG_report_coverage_{{ pid }}\n    {% with task_name=\"check_coverage\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"integrity_coverage.py\"\n}\n\n{{ output_channel }} = Channel.create()\nSIDE_max_len_{{ pid }} = Channel.create()\n\nMAIN_integrity_{{ pid }}\n    .filter{ it[2].text != \"fail\" }\n    .separate({{ output_channel }}, SIDE_max_len_{{ pid }}){\n        a -> [ [a[0], a[1]], [a[0], a[3].text]]\n    }\n\n\nprocess report_coverage2_{{ pid }} {\n\n    // This process can only use a single CPU\n    cpus 1\n    publishDir 'reports/coverage_{{ pid }}/'\n\n    input:\n    file(report) from LOG_report_coverage_{{ pid }}.filter{ it.text != \"corrupt\" }.collect()\n\n    output:\n    file 'estimated_coverage_second.csv'\n\n    \"\"\"\n    echo Sample,Estimated coverage,Status >> estimated_coverage_second.csv\n    cat $report >> estimated_coverage_second.csv\n    \"\"\"\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/chewbbaca.nf",
    "content": "if ( !params.schemaPath{{ param_id }} ){\n    exit 1, \"'schemaPath{{ param_id }}' parameter missing\"\n}\nif ( params.chewbbacaTraining{{ param_id }}){\n    if (!file(params.chewbbacaTraining{{ param_id }}).exists()) {\n        exit 1, \"'chewbbacaTraining{{ param_id }}' file was not found: '${params.chewbbacaTraining{{ param_id }}}'\"\n    }\n}\nif ( params.schemaSelectedLoci{{ param_id }}){\n    if (!file(params.schemaSelectedLoci{{ param_id }}).exists()) {\n        exit 1, \"'schemaSelectedLoci{{ param_id }}' file was not found: '${params.schemaSelectedLoci{{ param_id }}}'\"\n    }\n}\nif ( params.schemaCore{{ param_id }}){\n    if (!file(params.schemaCore{{ param_id }}).exists()) {\n        exit 1, \"'schemaCore{{ param_id }}' file was not found: '${params.schemaCore{{ param_id }}}'\"\n    }\n}\n\nIN_schema_{{ pid }} = Channel.fromPath(params.schemaPath{{ param_id }})\n\n\nif (params.chewbbacaJson{{ param_id }} == true){\n    jsonOpt = \"--json\"\n} else {\n    jsonOpt = \"\"\n}\n\nif (params.chewbbacaTraining{{ param_id }}){\n    training = \"--ptf ${params.chewbbacaTraining{{ param_id }}}\"\n} else {\n    training = \"\"\n}\n\n// If chewbbaca is executed in batch mode, wait for all assembly files\n// to be collected on the input channel, and only then execute chewbbaca\n// providing all samples simultaneously\nif (params.chewbbacaBatch{{ param_id }}) {\n    process chewbbaca_batch_{{ pid }} {\n\n        {% include \"post.txt\" ignore missing %}\n        maxForks 1\n        scratch false\n        if (params.chewbbacaQueue{{ param_id }} != null) {\n            queue \"${params.chewbbacaQueue{{ param_id}}}\"\n        }\n        publishDir \"results/chewbbaca_alleleCall_{{ pid }}/\", mode: \"copy\"\n\n        input:\n        file assembly from {{ input_channel }}.map{ it[1] }.collect()\n        each file(schema) from IN_schema_{{ pid }}\n\n        output:\n        file 'chew_results*'\n        file 'cgMLST.tsv' optional true into chewbbacaProfile_{{ pid }}\n        {% with task_name=\"chewbbaca\", sample_id=\"val('single')\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n\n        script:\n        \"\"\"\n        {\n            set -x\n            if [ -d \"$schema/temp\" ];\n            then\n                rm -r $schema/temp\n            fi\n\n            if [ \"$params.schemaSelectedLoci{{ param_id }}\" = \"null\" ];\n            then\n                inputGenomes=$schema\n            else\n                inputGenomes=${params.schemaSelectedLoci{{ param_id }}}\n            fi\n\n            echo $assembly | tr \" \" \"\\n\" >> input_file.txt\n            chewBBACA.py AlleleCall -i input_file.txt -g \\$inputGenomes -o chew_results $jsonOpt --cpu $task.cpus $training\n            if [ \"$jsonOpt\" = \"--json\" ]; then\n                merge_json.py ${params.schemaCore{{ param_id }}} chew_results/*/results*\n            else\n                cp chew_results*/*/results_alleles.tsv cgMLST.tsv\n            fi\n        } || {\n            echo fail > .status\n        }\n        \"\"\"\n    }\n\n} else {\n    process chewbbaca_{{ pid }} {\n\n        // Send POST request to platform\n        {% include \"post.txt\" ignore missing %}\n\n        maxForks 1\n        tag { sample_id }\n        scratch true\n        if (params.chewbbacaQueue{{ param_id }} != null) {\n            queue \"${params.chewbbacaQueue{{ param_id }}}\"\n        }\n        publishDir \"results/chewbbaca_alleleCall_{{ pid }}/\", mode: \"copy\"\n\n        input:\n        set sample_id, file(assembly) from {{ input_channel }}\n        each file(schema) from IN_schema_{{ pid }}\n\n        output:\n        file 'chew_results*'\n        file '*_cgMLST.tsv' optional true into chewbbacaProfile_{{ pid }}\n        {% with task_name=\"chewbbaca\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n\n        script:\n        \"\"\"\n        {\n            set -x\n            if [ -d \"$schema/temp\" ];\n            then\n                rm -r $schema/temp\n            fi\n\n            if [ \"$params.schemaSelectedLoci{{ param_id }}\" = \"null\" ];\n            then\n                inputGenomes=$schema\n            else\n                inputGenomes=${params.schemaSelectedLoci{{ param_id }}}\n            fi\n\n            echo $assembly >> input_file.txt\n            chewBBACA.py AlleleCall -i input_file.txt -g \\$inputGenomes -o chew_results_${sample_id} $jsonOpt --cpu $task.cpus $training --fc\n            if [ \"$jsonOpt\" = \"--json\" ]; then\n                merge_json.py ${params.schemaCore{{ param_id }}} chew_results_*/*/results* ${sample_id}\n            else\n                mv chew_results_*/*/results_alleles.tsv ${sample_id}_cgMLST.tsv\n            fi\n        } || {\n            echo fail > .status\n        }\n        \"\"\"\n    }\n}\n\n\nprocess chewbbacaExtractMLST_{{ pid }} {\n\n    publishDir \"results/chewbbaca_{{ pid }}/\", mode: \"copy\", overwrite: true\n\n    input:\n    file profiles from chewbbacaProfile_{{ pid }}.collect()\n\n    output:\n    file \"results/cgMLST.tsv\"\n\n    \"\"\"\n    head -n1 ${profiles[0]} > chewbbaca_profiles.tsv\n    awk 'FNR == 2' $profiles >> chewbbaca_profiles.tsv\n    chewBBACA.py ExtractCgMLST -i chewbbaca_profiles.tsv -o results -p $params.chewbbacaProfilePercentage{{ param_id }}\n    \"\"\"\n\n}\n"
  },
  {
    "path": "flowcraft/generator/templates/compiler_channels.txt",
    "content": "set {{ sample_id|default(\"sample_id\") }}, val(\"{{ pid }}_{{ task_name }}{{ suffix }}\"), file(\".status\"), file(\".warning\"), file(\".fail\"), file(\".command.log\") into STATUS_{{task_name}}_{{ pid }}\nset {{ sample_id|default(\"sample_id\") }}, val(\"{{ task_name }}_{{ pid }}{{ suffix }}\"), val(\"{{ pid }}\"), file(\".report.json\"), file(\".versions\"), file(\".command.trace\") into REPORT_{{task_name}}_{{ pid }}\nfile \".versions\""
  },
  {
    "path": "flowcraft/generator/templates/concoct.nf",
    "content": "IN_max_clusters_{{ pid }} = Channel.value(params.clusters{{ param_id }})\nIN_length_threshold_{{ pid }} = Channel.value(params.lengthThreshold{{ param_id }})\nIN_read_length_{{ pid }} = Channel.value(params.readLength{{ param_id }})\nIN_iterations_{{ pid }} = Channel.value(params.iterations{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess concoct_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/assembly/binning/concoct_{{ pid }}/${sample_id}/\"\n\n    input:\n    set sample_id, file(assembly), file(fastq) from {{ input_channel }}.join(_LAST_fastq_{{ pid }})\n    val maxClusters from IN_max_clusters_{{ pid }}\n    val read_length from IN_read_length_{{ pid }}\n    val length_threshold from IN_length_threshold_{{ pid }}\n    val iterations from IN_iterations_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file(assembly), file('concoct_output/*.fa') into binCh_{{ pid }}\n    set sample_id, file(\"concoct_output/clustering_merged.csv\"), file(assembly) into intoReport_{{ pid }}\n    file(\"concoct_output/*.csv\")\n    file(\"concoct_output/*.txt\")\n    {% with task_name=\"concoct\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # cut up the contigs into chunks of 10Kb to mitigate assembly errors and give more weight to larger contigs\n        cut_up_fasta.py -c 10000 -o 0 -b ${sample_id}_bedfile -m ${assembly} > ${sample_id}_split_contigs.fasta\n\n        # map reads to cut up assembly\n        echo [DEBUG] BUILDING BOWTIE INDEX FOR ASSEMBLY: $assembly >> .command.log 2>&1\n        bowtie2-build ${sample_id}_split_contigs.fasta ${sample_id}_split_contigs_index >> .command.log 2>&1\n        echo [DEBUG] MAPPING READS FROM $fastq >> .command.log 2>&1\n        bowtie2 --threads ${task.cpus} -x ${sample_id}_split_contigs_index -1 ${fastq[0]} -2 ${fastq[1]} -S mapping.sam >> .command.log 2>&1\n        echo [DEBUG] CONVERTING AND SORTING SAM TO BAM >> .command.log 2>&1\n        samtools sort -o sorted.bam -O bam -@ ${task.cpus} mapping.sam && rm *.sam  >> .command.log 2>&1\n        echo [DEBUG] CREATING BAM INDEX >> .command.log 2>&1\n        samtools index sorted.bam >> .command.log 2>&1\n\n        # create coverage table for concoct\n        concoct_coverage_table.py ${sample_id}_bedfile sorted.bam > ${sample_id}_coverage_file.tab\n\n        # run CONCOCT\n        concoct --coverage_file ${sample_id}_coverage_file.tab --composition_file ${sample_id}_split_contigs.fasta \\\n        -b concoct_output/ -c ${maxClusters} -l ${length_threshold} -r ${read_length } -i ${iterations} -t ${task.cpus}\n\n        # Merge subcontig clustering into original contig clustering\n        merge_cutup_clustering.py concoct_output/clustering_*.csv > concoct_output/clustering_merged.csv\n\n        # Extract bins as individual FASTA\n        extract_fasta_bins.py --output_path concoct_output/ ${assembly} concoct_output/clustering_merged.csv\n\n        echo pass > .status\n\n        if [ \"$clear\" = \"true\" ];\n        then\n            work_regex=\".*/work/.{2}/.{30}/.*\"\n            file_source1=\\$(readlink -f \\$(pwd)/${fastq[0]})\n            file_source2=\\$(readlink -f \\$(pwd)/${fastq[1]})\n            assembly_file=\\$(readlink -f \\$(pwd)/${assembly})\n            if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n                rm \\$file_source1 \\$file_source2 \\$assembly_file\n            fi\n        fi\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\nprocess report_concoct_{{ pid }}{\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(cluster), file(contigs) from intoReport_{{ pid }}\n\n    output:\n    {% with task_name=\"report_concoct\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_concoct.py\"\n\n}\n\n// emits one bin per channel\n{{ output_channel }} = Channel.create()\nbinCh_{{ pid }}.map{ it -> [it[2].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'), it[2]]}\n    .transpose()\n    .map{it -> [it[1].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'),it[1]]}\n    .into({{ output_channel }})\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/containers.config",
    "content": "process {\n{{ container_info }}\n\n}"
  },
  {
    "path": "flowcraft/generator/templates/dengue_typing.nf",
    "content": "// Check for the presence of absence of fasta reference\nif (params.reference{{ param_id }} == null) {\n    exit 1, \"Dengue_typing: A reference fasta file must be provided.\"\n}\n\ngetRef_{{ pid }} = params.get_genome{{ param_id }} ? \"true\" : \"false\"\ncheckpointReferenceGenome_{{ pid }} = Channel.value(getRef_{{ pid }})\ncheckpointReferenceGenome_{{ pid }}.into{ reference_reads_{{ pid }} ; reference_assembly_{{ pid }} }\n\nreference_{{ pid }} = Channel.fromPath(params.reference{{ param_id }})\n\nclass VerifyCompletnessTyping {\n\n    public static boolean contigs(String filename, int threshold){\n        BufferedReader reader = new BufferedReader(new FileReader(filename));\n        boolean result = processContigs(reader, threshold);\n        reader.close()\n\n        return result;\n    }\n\n    private static boolean processContigs(BufferedReader reader, int threshold){\n        String line;\n        int lineThreshold = 0;\n        List splittedLine\n\n        while ((line = reader.readLine()) != null) {\n            if (line.startsWith('>')) {\n                lineThreshold = 0\n            } else {\n                lineThreshold += line.length()\n                if(lineThreshold >= threshold) {\n                    return true;\n                }\n             }\n        }\n\n        return false;\n    }\n}\n\n\ntype_reads_{{ pid }} = Channel.create()\ntype_assembly_{{ pid }} = Channel.create()\n{{ input_channel }}.choice(type_assembly_{{ pid }}, type_reads_{{ pid }}){a -> a[1].toString() == \"null\" ? false : VerifyCompletnessTyping.contigs(a[1].toString(), 10000) == true ? 0 : 1}\n\nprocess dengue_typing_assembly_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/dengue_typing/${sample_id}/\"\n\n\n    input:\n    set sample_id, file(assembly), file(reference) from type_assembly_{{ pid }}\n    val get_reference from reference_assembly_{{ pid }}\n    each file(reference) from Channel.fromPath(\"${params.reference{{ param_id }}}\")\n\n    output:\n    file \"seq_typing*\"\n    set sample_id, file(assembly) into out_typing_assembly_{{ pid }}\n    file(\"*.fa\") optional true into _ref_seqTyping_assembly_{{ pid }}\n    {% with task_name=\"dengue_typing_assembly\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"dengue_typing_assembly.py\"\n\n}\n\n\nprocess dengue_typing_reads_{{ pid }} {\n\n// Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/dengue_typing/${sample_id}/\"\n\n    errorStrategy { task.exitStatus == 120 ? 'ignore' : 'retry' }\n\n    input:\n    set sample_id, file(assembly), file(fastq_pair) from type_reads_{{ pid }}.join(_LAST_fastq_{{ pid }})\n    val get_reference from reference_reads_{{ pid }}\n    each file(reference) from Channel.fromPath(\"${params.reference{{ param_id }}}\")\n\n    output:\n    file \"seq_typing*\"\n    set sample_id, file(\"*consensus.fasta\") into out_typing_reads_{{ pid }}\n    file(\"*.fa\") optional true into _ref_seqTyping_reads_{{ pid }}\n    {% with task_name=\"dengue_typing_reads\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"dengue_typing_reads.py\"\n\n}\n\nout_typing_assembly_{{ pid }}.mix(out_typing_reads_{{ pid }}).set{ {{ output_channel }} }\n\n_ref_seqTyping_assembly_{{ pid }}.mix(_ref_seqTyping_reads_{{ pid }}).set{ _ref_seqTyping_{{ pid }} }\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/diamond.nf",
    "content": "// check if any of the parameters it defined before executing the process.\nif (!params.pathToDb{{ param_id }} && !params.fastaToDb{{ param_id }})\n    exit 1, \"'You must specify either a pathToDb or fastaToDb parameter.'\"\n// checks if both are defined and if so raises an error.\nelse if (params.pathToDb{{ param_id }} && params.fastaToDb{{ param_id }})\n    exit 1, \"'Both pathToDb and fastaToDb were given, choose just one.'\"\n\n// list of blasts allowed for diamond\nallowedBlasts = [\"blastp\", \"blastx\"]\n// checks if blast type os defined\nif (!allowedBlasts.contains(params.blastType{{ param_id }}))\n    exit 1, \"Provide a valid blast type: blastx or blastp\"\n\nprocess diamond_{{ pid }}  {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"results/annotation/diamond_{{ pid }}/${sample_id}\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    file pathToDb from params.pathToDb{{ param_id }} ?\n        Channel.fromPath(params.pathToDb{{ param_id }}) : Channel.value(\"NA\")\n    file fastaToDb from params.fastaToDb{{ param_id }} ?\n        Channel.fromPath(params.fastaToDb{{ param_id }}) : Channel.value(\"NA\")\n    val blast from params.blastType{{ param_id }}\n\n    output:\n    file \"*.txt\" into diamondOutputs\n    output:\n    {% with task_name=\"diamond\"%}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    // Use database when available or otherwise use Fasta file\n    if (params.pathToDb{{ param_id }})\n        \"\"\"\n        diamond ${blast} -d ${pathToDb} -q ${assembly} \\\n        -o ${pathToDb}.txt -e 1E-20 -p ${task.cpus} \\\n        -f 6 qseqid sseqid pident length mismatch gapopen qstart qend slen sstart send evalue bitscore\n        \"\"\"\n    else if (params.fastaToDb{{ param_id }})\n        \"\"\"\n        diamond makedb --in ${fastaToDb} -d ${fastaToDb}\n        diamond ${blast} -d ${fastaToDb}.dmnd -q ${assembly} \\\n        -o ${fastaToDb}.txt -e 1E-20 -p ${task.cpus} \\\n        -f 6 qseqid sseqid pident length mismatch gapopen qstart qend slen sstart send evalue bitscore\n        \"\"\"\n\n}"
  },
  {
    "path": "flowcraft/generator/templates/downsample_fastq.nf",
    "content": "\nIN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit(1, \"The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize{{ param_id }}}'\")}\n\nIN_depth_{{ pid }} = Channel.value(params.depth{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit(1, \"The depth parameter must be a number or a float. Provided value: '${params.depth{{ param_id }}}'\")}\n\nIN_seed_{{ pid }} = Channel.value(params.seed{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit(1, \"The seed parameter must be a number or a float. Provided value: '${params.seed{{ param_id }}}'\")}\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess downsample_fastq_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { \"${sample_id}\" }\n    publishDir \"results/downsample_fastq_{{ pid }}/\", pattern: \"_ss.*\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val gsize from IN_genome_size_{{ pid }}\n    val depth from IN_depth_{{ pid }}\n    val seed from IN_seed_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file('*_ss.*') into {{ output_channel }}\n    {% with task_name=\"downsample_fastq\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"downsample_fastq.py\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/fast_ani.nf",
    "content": "IN_fragLen_{{ pid }} = Channel.value(params.fragLen{{ param_id }})\n\n// runs fast ani for multiple comparisons (many to many mode)\nprocess fastAniMatrix_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n     publishDir 'results/fast_ani/fast_ani_{{ pid }}/',\n\n    input:\n    set sample_id, file(fasta) from {{ input_channel }}\n    val fragLenValue from IN_fragLen_{{ pid }}\n\n    output:\n    set sample_id, fasta, file(\"*.out\")\n    {% with task_name=\"fastAniMatrix\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    mkdir fasta_store\n    fasta_spliter.py ${fasta}\n    fastANI --ql files_fastani.txt --rl files_fastani.txt \\\n    -t ${task.cpus} --fragLen ${fragLenValue} \\\n    -o ${sample_id.take(sample_id.lastIndexOf(\".\"))}_fastani.out\n    \"\"\"\n\n}\n"
  },
  {
    "path": "flowcraft/generator/templates/fasterq_dump.nf",
    "content": "// check if option file is provided or not\noptionFile = (params.option_file{{ param_id }} == false) ? \"\" :\n    \"--option-file ${params.option_file{{ param_id }}}\"\n\n// process to run fasterq-dump from sra-tools\nprocess fasterqDump_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { accession_id }\n    publishDir \"reads/${accession_id}/\", pattern: \"*.fastq*\"\n    maxRetries 1\n\n    input:\n    val accession_id from {{ input_channel }}.splitText(){ it.trim() }.filter{ it.trim() != \"\" }\n\n    output:\n    set accession_id, file(\"*.fastq*\") optional true into {{ output_channel }}\n    {% with task_name=\"fasterqDump\", sample_id=\"accession_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        echo \"Downloading the following accession: ${accession_id}\"\n        fasterq-dump ${accession_id} -e ${task.cpus} -p ${optionFile}\n        if [ ${params.compress_fastq{{ param_id }}} = true ]\n        then\n            echo \"Compressing FastQ files...\"\n            if [ -f ${accession_id}_1.fastq ]\n            then\n                pigz -p ${task.cpus} ${accession_id}_1.fastq ${accession_id}_2.fastq\n            elif [ -f ${accession_id}_3.fastq ]\n            then\n                echo \"No paired end reads were found to compress.\"\n                pigz -p ${task.cpus} ${accession_id}_3.fastq\n            else\n                echo \"FastQ files weren't compressed. Check if FastQ files were downloaded.\"\n            fi\n        else\n            echo \"FastQ files won't be compressed because compress_fastq options was set to: '${params.compress_fastq{{ param_id }}}.'\"\n        fi\n    } || {\n        # If exit code other than 0\n        if [ \\$? -eq 0 ]\n        then\n            echo \"pass\" > .status\n        else\n            echo \"fail\" > .status\n            echo \"Could not download accession $accession_id\" > .fail\n        fi\n    }\n    \"\"\"\n}\n"
  },
  {
    "path": "flowcraft/generator/templates/fastqc.nf",
    "content": "IN_adapters_{{ pid }} = Channel.value(params.adapters{{ param_id }})\n\nprocess fastqc2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"reports/fastqc_{{ pid }}/\", pattern: \"*.html\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val ad from IN_adapters_{{ pid }}\n\n    output:\n    set sample_id, file(fastq_pair), file('pair_1*'), file('pair_2*') into MAIN_fastqc_out_{{ pid }}\n    file \"*html\"\n    {% with task_name=\"fastqc2\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"fastqc.py\"\n}\n\n\nprocess fastqc2_report_{{ pid }} {\n\n    // Send POST request to platform\n    {% with overwrite=\"false\" %}\n    {% include \"post.txt\" ignore missing %}\n    {% endwith %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n    publishDir 'reports/fastqc_{{ pid }}/run_2/', pattern: '*summary.txt', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), file(result_p1), file(result_p2) from MAIN_fastqc_out_{{ pid }}\n    val opts from Channel.value(\"\")\n\n    output:\n    set sample_id, file(fastq_pair), '.status' into MAIN_fastqc_report_{{ pid }}\n    file \"*_status_report\" into LOG_fastqc_report_{{ pid }}\n    file \"${sample_id}_*_summary.txt\" optional true\n    {% with task_name=\"fastqc2_report\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"fastqc_report.py\"\n\n}\n\n\nprocess compile_fastqc_status2_{{ pid }} {\n\n    publishDir 'reports/fastqc_{{ pid }}/', mode: 'copy'\n\n    input:\n    file rep from LOG_fastqc_report_{{ pid }}.collect()\n\n    output:\n    file 'FastQC_2run_report.csv'\n\n    \"\"\"\n    echo Sample, Failed? >> FastQC_2run_report.csv\n    cat $rep >> FastQC_2run_report.csv\n    \"\"\"\n\n}\n\n{{ output_channel }} = Channel.create()\n\nMAIN_fastqc_report_{{ pid }}\n        .filter{ it[2].text == \"pass\" }\n        .map{ [it[0], it[1]] }\n        .into({{ output_channel }})\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/fastqc_trimmomatic.nf",
    "content": "// Check sliding window parameter\nif ( params.trimSlidingWindow{{ param_id }}.toString().split(\":\").size() != 2 ){\n    exit 1, \"'trimSlidingWindow{{ param_id }}' parameter must contain two values separated by a ':'. Provided value: '${params.trimSlidingWindow{{ param_id }}}'\"\n}\nif ( !params.trimLeading{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'trimLeading{{ param_id }}' parameter must be a number. Provide value: '${params.trimLeading{{ param_id }}}'\"\n}\nif ( !params.trimTrailing{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'trimTrailing{{ param_id }}' parameter must be a number. Provide value: '${params.trimTrailing{{ param_id }}}'\"\n}\nif ( !params.trimMinLength{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'trimMinLength{{ param_id }}' parameter must be a number. Provide value: '${params.trimMinLength{{ param_id }}}'\"\n}\n\nIN_trimmomatic_opts_{{ pid }} = Channel.value([params.trimSlidingWindow{{ param_id }},params.trimLeading{{ param_id }},params.trimTrailing{{ param_id }},params.trimMinLength{{ param_id }}])\nIN_adapters_{{ pid }} = Channel.value(params.adapters{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess fastqc_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"reports/fastqc_{{ pid }}/\", pattern: \"*.html\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val ad from Channel.value('None')\n\n    output:\n    set sample_id, file(fastq_pair), file('pair_1*'), file('pair_2*') into MAIN_fastqc_out_{{ pid }}\n    file \"*html\"\n    {% with task_name=\"fastqc\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"fastqc.py\"\n}\n\n/** FASTQC_REPORT - MAIN\nThis process will parse the result files from a FastQC analyses and output\nthe optimal_trim information for Trimmomatic\n*/\nprocess fastqc_report_{{ pid }} {\n\n    // Send POST request to platform\n    {% with overwrite=\"false\" %}\n    {% include \"post.txt\" ignore missing %}\n    {% endwith %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n    publishDir 'reports/fastqc_{{ pid }}/run_1/', pattern: '*summary.txt', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), file(result_p1), file(result_p2) from MAIN_fastqc_out_{{ pid }}\n    val opts from Channel.value(\"--ignore-tests\")\n\n    output:\n    set sample_id, file(fastq_pair), 'optimal_trim', \".status\" into _MAIN_fastqc_trim_{{ pid }}\n    file '*_trim_report' into LOG_trim_{{ pid }}\n    file \"*_status_report\" into LOG_fastqc_report_{{ pid }}\n    file \"${sample_id}_*_summary.txt\" optional true\n    {% with task_name=\"fastqc_report\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"fastqc_report.py\"\n\n}\n\nMAIN_fastqc_trim_{{ pid }} = Channel.create()\n_MAIN_fastqc_trim_{{ pid }}\n        .filter{ it[3].text == \"pass\" }\n        .map{ [it[0], it[1], file(it[2]).text] }\n        .into(MAIN_fastqc_trim_{{ pid }})\n\n\n/** TRIM_REPORT - PLUG-IN\nThis will collect the optimal trim points assessed by the fastqc_report\nprocess and write the results of all samples in a single csv file\n*/\nprocess trim_report_{{ pid }} {\n\n    publishDir 'reports/fastqc_{{ pid }}/', mode: 'copy'\n\n    input:\n    file trim from LOG_trim_{{ pid }}.collect()\n\n    output:\n    file \"FastQC_trim_report.csv\"\n\n    \"\"\"\n    echo Sample,Trim begin, Trim end >> FastQC_trim_report.csv\n    cat $trim >> FastQC_trim_report.csv\n    \"\"\"\n}\n\n\nprocess compile_fastqc_status_{{ pid }} {\n\n    publishDir 'reports/fastqc_{{ pid }}/', mode: 'copy'\n\n    input:\n    file rep from LOG_fastqc_report_{{ pid }}.collect()\n\n    output:\n    file 'FastQC_1run_report.csv'\n\n    \"\"\"\n    echo Sample, Failed? >> FastQC_1run_report.csv\n    cat $rep >> FastQC_1run_report.csv\n    \"\"\"\n\n}\n\n\n/** TRIMMOMATIC - MAIN\nThis process will execute trimmomatic. Currently, the main channel requires\ninformation on the trim_range and phred score.\n*/\nprocess trimmomatic_{{ pid }} {\n\n    // Send POST request to platform\n    {% with overwrite=\"false\" %}\n    {% include \"post.txt\" ignore missing %}\n    {% endwith %}\n\n    tag { sample_id }\n    publishDir \"results/trimmomatic_{{ pid }}\", pattern: \"*.gz\"\n\n    input:\n    set sample_id, file(fastq_pair), trim_range, phred from MAIN_fastqc_trim_{{ pid }}.join(SIDE_phred_{{ pid }})\n    val opts from IN_trimmomatic_opts_{{ pid }}\n    val ad from IN_adapters_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, \"${sample_id}_*trim.fastq.gz\" into {{ output_channel }}\n    file 'trimmomatic_report.csv'\n    {% with task_name=\"trimmomatic\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"trimmomatic.py\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/filter_poly.nf",
    "content": "IN_adapter_{{ pid }} = Channel.value(params.adapter{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess filter_poly_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    echo true\n\n    errorStrategy { task.exitStatus == 120 ? 'ignore' : 'retry' }\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val adapter from IN_adapter_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id , file(\"${sample_id}_filtered_{1,2}.fastq.gz\") into {{ output_channel }}\n    {% with task_name=\"filter_poly\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    gunzip -c ${fastq_pair[0]} >  ${sample_id}_1.fq\n    gunzip -c ${fastq_pair[1]} >  ${sample_id}_2.fq\n\n    for seqfile in *.fq;\n    do if [ ! -s \\$seqfile  ]\n    then\n        echo \\$seqfile is empty && exit 120\n    fi\n    done\n\n    prinseq-lite.pl --fastq ${sample_id}_1.fq  --fastq2 ${sample_id}_2.fq  --custom_params \"${adapter}\" -out_format 3 -out_good ${sample_id}_filtered\n\n    gzip ${sample_id}_filtered_*.fastq\n\n    #rm *.fq *.fastq\n\n    if [ \"$clear\" = \"true\" ];\n    then\n        work_regex=\".*/work/.{2}/.{30}/.*\"\n        file_source1=\\$(readlink -f \\$(pwd)/${fastq_pair[0]})\n        file_source2=\\$(readlink -f \\$(pwd)/${fastq_pair[1]})\n        if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n            rm \\$file_source1 \\$file_source2\n        fi\n    fi\n\n    \"\"\"\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/haplotypecaller.nf",
    "content": "haplotypecallerIndexId_{{ pid }} = Channel.value(params.reference{{ param_id }}.split(\"/\").last())\nhaplotypecallerRef_{{ pid }} = Channel.fromPath(\"${params.reference{{ param_id }}}.*\").collect().toList()\ninterval_{{ pid }} = Channel.fromPath(params.intervals{{ param_id }})\n           .ifEmpty { exit 1, \"Interval list file for HaplotypeCaller not found: ${params.intervals}\" }\n           .splitText()\n           .map { it -> it.trim() }\n\nprocess haplotypecaller_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag \"$interval\"\n\n    input:\n    set sample_id, file(bam), file(bai) from {{ input_channel }}\n    each interval from interval_{{pid}}\n    each file(ref_files) from haplotypecallerRef_{{pid}}\n    each index from haplotypecallerIndexId_{{pid}}\n   \n    output:\n    file(\"*.vcf\") into haplotypecallerGvcf\n    file(\"*.vcf.idx\") into gvcfIndex\n    val(sample_id) into sampleId\n\n    {% with task_name=\"haplotypecaller\", suffix=\"_${interval}\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    gatk HaplotypeCaller \\\n      --java-options -Xmx${task.memory.toMega()}M \\\n      -R ${index}.fasta \\\n      -O ${sample_id}.vcf \\\n      -I $bam \\\n      -L $interval\n    \"\"\"\n}\n\nprocess merge_vcfs_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    publishDir \"results/variant_calling/merge_vcfs_{{ pid }}\"\n\n    tag { sample_id }\n\n    input:\n    file('*.vcf') from haplotypecallerGvcf.collect()\n    file('*.vcf.idx') from gvcfIndex.collect()\n    val(sample_id) from sampleId.first()\n\n    output:\n    set file(\"${sample_id}.vcf.gz\"), file(\"${sample_id}.vcf.gz.tbi\") into {{ output_channel }}\n    {% with task_name=\"merge_vcfs\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    ## make list of input variant files\n    for vcf in \\$(ls *vcf); do\n      echo \\$vcf >> input_variant_files.list\n    done\n\n    gatk MergeVcfs \\\n      --INPUT= input_variant_files.list \\\n      --OUTPUT= ${sample_id}.vcf.gz\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/init.nf",
    "content": "\n// Placeholder for main input channels\n{{ main_inputs }}\n\n// Placeholder for secondary input channels\n{{ secondary_inputs }}\n\n// Placeholder for extra input channels\n{{ extra_inputs }}\n\n// Placeholder to fork the raw input channel\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/integrity_coverage.nf",
    "content": "IN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit(1, \"The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize_{{ param_id }}}'\")}\n\nIN_min_coverage_{{ pid }} = Channel.value(params.minCoverage{{ param_id }})\n    .map{it -> it.toString().isNumber() ? it : exit(1, \"The minCoverage parameter must be a number or a float. Provided value: '${params.minCoverage_{{ param_id }}}'\")}\n\nprocess integrity_coverage_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val gsize from IN_genome_size_{{ pid }}\n    val cov from IN_min_coverage_{{ pid }}\n    // This channel is for the custom options of the integrity_coverage.py\n    // script. See the script's documentation for more information.\n    val opts from Channel.value('')\n\n    output:\n    set sample_id,\n        file(fastq_pair),\n        file('*_encoding'),\n        file('*_phred'),\n        file('*_coverage'),\n        file('*_max_len') into MAIN_integrity_{{ pid }}\n    file('*_report') optional true into LOG_report_coverage1_{{ pid }}\n    {% with task_name=\"integrity_coverage\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"integrity_coverage.py\"\n\n}\n\n// TRIAGE OF CORRUPTED SAMPLES\nLOG_corrupted_{{ pid }} = Channel.create()\nMAIN_PreCoverageCheck_{{ pid }} = Channel.create()\n// Corrupted samples have the 2nd value with 'corrupt'\nMAIN_integrity_{{ pid }}.choice(LOG_corrupted_{{ pid }}, MAIN_PreCoverageCheck_{{ pid }}) {\n    a -> a[2].text == \"corrupt\" ? 0 : 1\n}\n\n// TRIAGE OF LOW COVERAGE SAMPLES\n{{ output_channel }} = Channel.create()\nSIDE_phred_{{ pid }} = Channel.create()\nSIDE_max_len_{{ pid }} = Channel.create()\n\nMAIN_PreCoverageCheck_{{ pid }}\n// Low coverage samples have the 4th value of the Channel with 'fail'\n    .filter{ it[4].text != \"fail\" }\n// For the channel to proceed with FastQ in 'sample_good' and the\n// Phred scores for each sample in 'SIDE_phred'\n    .separate({{ output_channel }}, SIDE_phred_{{ pid }}, SIDE_max_len_{{ pid }}){\n        a -> [ [a[0], a[1]], [a[0], a[3].text], [a[0], a[5].text]  ]\n    }\n\n/** REPORT_COVERAGE - PLUG-IN\nThis process will report the expected coverage for each non-corrupted sample\nand write the results to 'reports/coverage/estimated_coverage_initial.csv'\n*/\nprocess report_coverage_{{ pid }} {\n\n    // This process can only use a single CPU\n    cpus 1\n    publishDir 'reports/coverage_{{ pid }}/'\n\n    input:\n    file(report) from LOG_report_coverage1_{{ pid }}.filter{ it.text != \"corrupt\" }.collect()\n\n    output:\n    file 'estimated_coverage_initial.csv'\n\n    \"\"\"\n    echo Sample,Estimated coverage,Status >> estimated_coverage_initial.csv\n    cat $report >> estimated_coverage_initial.csv\n    \"\"\"\n}\n\n/** REPORT_CORRUPT - PLUG-IN\nThis process will report the corrupted samples and write the results to\n'reports/corrupted/corrupted_samples.txt'\n*/\nprocess report_corrupt_{{ pid }} {\n\n    // This process can only use a single CPU\n    cpus 1\n    publishDir 'reports/corrupted_{{ pid }}/'\n\n    input:\n    val sample_id from LOG_corrupted_{{ pid }}.collect{it[0]}\n\n    output:\n    file 'corrupted_samples.txt'\n\n    \"\"\"\n    echo ${sample_id.join(\",\")} | tr \",\" \"\\n\" >> corrupted_samples.txt\n    \"\"\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/kraken.nf",
    "content": "IN_kraken_DB_{{ pid }} = Channel.value(params.krakenDB{{ param_id }})\n\n\n//Process to run Kraken\nprocess kraken_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/taxonomy/kraken/\", pattern: \"*.txt\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val krakenDB from IN_kraken_DB_{{ pid }}\n\n    output:\n    file(\"${sample_id}_kraken_report.txt\")\n    {% with task_name=\"kraken\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    kraken --preload --fastq-input --db ${krakenDB} --threads $task.cpus --output ${sample_id}_kraken.txt --paired --gzip-compressed ${fastq_pair[0]} ${fastq_pair[1]}\n\n    kraken-report --db ${krakenDB} ${sample_id}_kraken.txt > ${sample_id}_kraken_report.txt\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/kraken2.nf",
    "content": "IN_kraken2_DB_{{ pid }} = Channel.value(params.kraken2DB{{ param_id }})\n\n\n//Process to run Kraken2\nprocess kraken2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/taxonomy/kraken2/\", pattern: \"*.txt\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val krakenDB from IN_kraken2_DB_{{ pid }}\n\n    output:\n    file(\"${sample_id}_kraken_report.txt\")\n    {% with task_name=\"kraken2\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    kraken2 --memory-mapping --threads $task.cpus --report ${sample_id}_kraken_report.txt --db ${krakenDB} --paired \\\n    --gzip-compressed ${fastq_pair[0]} ${fastq_pair[1]}\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mafft.nf",
    "content": "// True when a dengue_typing secondary channel is connected\nhas_ref_{{pid}} = binding.hasVariable('_ref_seqTyping_{{ pid }}')\n\nif ( has_ref_{{pid}} ){\n    {{ input_channel }}.map{ it[1] }.collect().mix(_ref_seqTyping_{{pid}}.unique{it.name}).set{mafft_input}\n} else {\n    {{ input_channel }}.map{ it[1] }.collect().set{mafft_input}\n}\n\n//{{ input_channel }}.map{ it[1] }.mix(_ref_seqTyping_{{ pid }}.unique()).set{mafft_input}\n\nprocess mafft_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { 'mafft' }\n\n    publishDir \"results/alignment/mafft_{{ pid }}/\"\n\n    input:\n    file(assembly) from mafft_input.collect()\n\n    output:\n    file (\"*.align\") into {{ output_channel }}\n    {% with task_name=\"mafft\", sample_id=\"val('single')\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    cat ${assembly} > all_assemblies.fasta\n\n    mafft --adjustdirection --thread $task.cpus --auto all_assemblies.fasta > ${workflow.scriptName}.align\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mapping_patlas.nf",
    "content": "// checks if cutoff value is higher than 0\nif (Float.parseFloat(params.cov_cutoff{{ param_id }}.toString()) == 0) {\n    exit 1, \"Cutoff value of 0 will output every plasmid in the database with coverage 0. Provide a value higher than 0.\"\n}\n\nIN_index_files_{{ pid }} = Channel.value(params.refIndex{{ param_id }})\nIN_samtools_indexes_{{ pid }} = Channel.value(params.samtoolsIndex{{ param_id }})\nIN_length_json_{{ pid }} = Channel.value(params.lengthJson{{ param_id }})\nIN_cov_cutoff_{{ pid }} = Channel.value(params.cov_cutoff{{ param_id }})\n\n\n// process that runs bowtie2 and samtools\nprocess mappingBowtie_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(reads) from {{ input_channel }}\n    val bowtie2Index from IN_index_files_{{ pid }}\n    val samtoolsIdx from IN_samtools_indexes_{{ pid }}\n\n    output:\n    set sample_id, file(\"samtoolsDepthOutput*.txt\") into samtoolsResults\n    {% with task_name=\"mappingBowtie\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n\n    //if (params.singleEnd == true) {\n    //    readsString = \"-U ${reads}\"\n    //}\n    //else {\n    readsString = \"-1 ${reads[0]} -2 ${reads[1]}\"\n    //}\n\n    \"\"\"\n    bowtie2 -x ${bowtie2Index} ${readsString} -p ${task.cpus} -a -5 ${params.trim5{{ param_id }}} | \\\n    samtools view -b -t ${samtoolsIdx} -@ ${task.cpus} - | \\\n    samtools sort -@ ${task.cpus} -o samtoolsSorted_${sample_id}.bam\n    samtools index samtoolsSorted_${sample_id}.bam\n    samtools depth samtoolsSorted_${sample_id}.bam > \\\n    samtoolsDepthOutput_${sample_id}.txt\n    rm samtoolsSorted_${sample_id}.bam*\n    \"\"\"\n}\n\n/**\n* These dumping process parses the depth file for each sample and filters it\n* depending on the cutoff set by the user.\n*/\nprocess jsonDumpingMapping_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir 'results/mapping/mapping_json_{{ pid }}/'\n\n    input:\n    set sample_id, file(depthFile) from samtoolsResults\n    val lengthJson from IN_length_json_{{ pid }}\n    val cov_cutoff from IN_cov_cutoff_{{ pid }}\n\n    output:\n    set sample_id, file(\"samtoolsDepthOutput*.txt_mapping.json\") optional true into mappingOutputChannel_{{ pid }}\n    {% with task_name=\"jsonDumpingMapping\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"mapping2json.py\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mark_duplicates.nf",
    "content": "process mark_duplicates_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    input:\n    set sample_id, file(bam), file(bai) from {{ input_channel }}\n   \n    output:\n    set val(sample_id), file(\"${sample_id}_mark_dup.bam\"), file(\"${sample_id}_mark_dup.bai\") into {{ output_channel }}\n    set file(\"metrics.txt\") into markDupMultiQC_{{pid}}\n    {% with task_name=\"mark_duplicates\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    gatk MarkDuplicates \\\n      -I $bam \\\n      -M metrics.txt \\\n      -O ${sample_id}_mark_dup.bam \\\n      --CREATE_INDEX\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mash_dist.nf",
    "content": "IN_shared_hashes_{{ pid }} = Channel.value(params.shared_hashes{{ param_id }})\n\nIN_mash_dist_input = Channel.create()\n// If the side channel with the sketch exists, join the corresponding .msh file\n// with the appropriate sample_id\nif (binding.hasVariable(\"SIDE_mashSketchOutChannel_{{ pid }}\")){\n    {{ input_channel }}\n        .join(SIDE_mashSketchOutChannel_{{ pid }})\n        .into(IN_mash_dist_input)\n// Otherwise, always use the .msh file provided in the docker image\n} else {\n    {{ input_channel }}\n        .map{ it -> [it[0], it[1], params.refFile{{ param_id }}] }\n        .into(IN_mash_dist_input)\n}\n\n// runs mash dist\nprocess runMashDist_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir 'results/mashdist/mashdist_txt_{{ pid }}/'\n\n    input:\n    set sample_id, file(fasta), refFile from IN_mash_dist_input\n\n    output:\n    set sample_id, file(fasta), file(\"*_mashdist.txt\") into mashDistOutChannel_{{ pid }}\n    {% with task_name=\"runMashDist\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    mash dist -i -p ${task.cpus} -v ${params.pValue{{ param_id }}} \\\n    -d ${params.mash_distance{{ param_id }}} ${refFile} ${fasta} > ${fasta}_mashdist.txt\n    \"\"\"\n\n}\n\n// parses mash dist output to a json file that can be imported into pATLAS\nprocess mashDistOutputJson_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir 'results/mashdist/mashdist_json_{{ pid }}/'\n\n    input:\n    set sample_id, fasta, file(mashtxt) from mashDistOutChannel_{{ pid }}\n    val shared_hashes from IN_shared_hashes_{{ pid }}\n\n    output:\n    set sample_id, file(\"*.json\") optional true into {{ output_channel }}\n    {% with task_name=\"mashDistOutputJson\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"mashdist2json.py\"\n\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/mash_screen.nf",
    "content": "if (binding.hasVariable(\"SIDE_mashSketchOutChannel_{{ pid }}\")){\n    IN_reference_file_{{ pid }} = SIDE_mashSketchOutChannel_{{ pid }}\n} else {\n    IN_reference_file_{{ pid }} = Channel.value(params.refFile{{ param_id }})\n}\n\n// check if noWinner is provided or not\nwinnerVar = (params.noWinner{{ param_id }} == false) ? \"-w\" : \"\"\n\n// process to run mashScreen and sort the output into\n// sortedMashScreenResults_{sampleId}.txt\nprocess mashScreen_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(reads) from {{ input_channel }}\n    val refFile from IN_reference_file_{{ pid }}\n\n    output:\n    set sample_id, file(\"sortedMashScreenResults*.txt\") into mashScreenResults_{{ pid }}\n    {% with task_name=\"mashScreen\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    mash screen -i ${params.identity{{ param_id }}} -v ${params.pValue{{ param_id }}} -p \\\n    ${task.cpus} ${winnerVar} ${refFile} ${reads} > mashScreenResults_${sample_id}.txt\n    sort -gr mashScreenResults_${sample_id}.txt > sortedMashScreenResults_${sample_id}.txt\n    \"\"\"\n}\n\n// process to parse the output to json format\nprocess mashOutputJson_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir 'results/mashscreen/mashscreen_json_{{ pid }}', mode: 'copy'\n\n    input:\n    set sample_id, file(mashtxt) from mashScreenResults_{{ pid }}\n\n    output:\n    set sample_id, file(\"sortedMashScreenResults*.json\") optional true into mashScreenOutputChannel_{{ pid }}\n    {% with task_name=\"mashOutputJson\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"mashscreen2json.py\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mash_sketch_fasta.nf",
    "content": "IN_kmerSize_{{ pid }} = Channel.value(params.kmerSize{{ param_id }})\nIN_sketchSize_{{ pid }} = Channel.value(params.sketchSize{{ param_id }})\n\n// runs mash sketch\nprocess mashSketchFasta_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(fasta) from {{ input_channel }}\n    val kmer from IN_kmerSize_{{ pid }}\n    val sketch from IN_sketchSize_{{ pid }}\n\n    output:\n    set sample_id, file(fasta) into  {{ output_channel }}\n    set sample_id, file(\"*.msh\") into SIDE_mashSketchOutChannel_{{ pid }}\n    {% with task_name=\"mashSketchFasta\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    mash sketch -i -k ${kmer} -s ${sketch} ${fasta}\n    \"\"\"\n\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/mash_sketch_fastq.nf",
    "content": "IN_kmerSize_{{ pid }} = Channel.value(params.kmerSize{{ param_id }})\nIN_sketchSize_{{ pid }} = Channel.value(params.sketchSize{{ param_id }})\n//IN_genomeSize_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\nIN_minKmer_{{ pid }} = Channel.value(params.minKmer{{ param_id }})\n\n\n// checks if genomeSize was provided\ngenomeSize = (params.genomeSize{{ param_id }} == false) ? \"\" : \"-g ${params.genomeSize{{ param_id }}}\"\n\n// runs mash sketch\nprocess mashSketchFastq_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(fastq) from {{ input_channel }}\n    val kmer from IN_kmerSize_{{ pid }}\n    val sketch from IN_sketchSize_{{ pid }}\n    val minKmer from IN_minKmer_{{ pid }}\n\n    output:\n    set sample_id, file(fastq) into  {{ output_channel }}\n    file(\"*.msh\") into SIDE_mashSketchOutChannel_{{ pid }}\n    {% with task_name=\"mashSketchFastq\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    mash sketch -r -k ${kmer} -s ${sketch} -m ${minKmer} ${genomeSize} ${fastq}\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/maxbin2.nf",
    "content": "IN_min_contig_lenght_{{ pid }} = Channel.value(params.min_contig_lenght{{ param_id }})\nIN_max_iteration_{{ pid }} = Channel.value(params.max_iteration{{ param_id }})\nIN_prob_threshold_{{ pid }} = Channel.value(params.prob_threshold{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess maxbin2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/assembly/binning/maxbin2_{{ pid }}/${sample_id}/\"\n\n    input:\n    set sample_id, file(assembly), file(fastq) from {{ input_channel }}.join(_LAST_fastq_{{ pid }})\n    val minContigLenght from IN_min_contig_lenght_{{ pid }}\n    val maxIterations from IN_max_iteration_{{ pid }}\n    val probThreshold from IN_prob_threshold_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file(assembly), file ('*_maxbin.*.fasta'), file ('bin_status.txt') into binCh_{{ pid }}\n    file '*_maxbin.{abundance,log,summary}'\n    set sample_id, file(\"*_maxbin.summary\") into intoReport_{{ pid }}\n\n    {% with task_name=\"maxbin2\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        run_MaxBin.pl -contig ${assembly} -out ${sample_id}_maxbin -reads ${fastq[0]} -reads2 ${fastq[1]} \\\n        -thread $task.cpus -min_contig_length ${minContigLenght} -max_iteration ${maxIterations} \\\n        -prob_threshold ${probThreshold}\n\n        echo pass > .status\n\n        #in case maxbin fails to bin sequences for a sample:\n        if ls *_maxbin.*.fasta 1> /dev/null 2>&1; then echo \"true\" > bin_status.txt; else echo \"false\" \\\n        > false_maxbin.0.fasta; echo \"false\" > bin_status.txt; fi\n\n\n        if [ \"$clear\" = \"true\" ];\n        then\n            work_regex=\".*/work/.{2}/.{30}/.*\"\n            file_source1=\\$(readlink -f \\$(pwd)/${fastq[0]})\n            file_source2=\\$(readlink -f \\$(pwd)/${fastq[1]})\n            assembly_file=\\$(readlink -f \\$(pwd)/${assembly})\n            if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n                rm \\$file_source1 \\$file_source2 \\$assembly_file\n            fi\n        fi\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\nprocess report_maxbin2_{{ pid }}{\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(tsv) from  intoReport_{{ pid }}\n\n    output:\n    {% with task_name=\"report_maxbin2\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_tsv.py\"\n\n}\n\n// If maxbin fails to obtain bins for a sample, the workflow continues with the original assembly\n{{ output_channel }} = Channel.create()\n\nOUT_binned = Channel.create()\nOUT_unbinned = Channel.create()\n\nfailedBinning = Channel.create()\nsuccessfulBinning = Channel.create()\n\nbinCh_{{ pid }}.choice(failedBinning, successfulBinning){ it -> it[3].text == \"false\\n\" ? 0 : 1 }\n\nfailedBinning.map{ it -> [it[0], it[1]] }.into(OUT_unbinned)\n\nsuccessfulBinning.map{ it -> [it[2].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'), it[2]]}\n    .transpose()\n    .map{it -> [it[1].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'),it[1]]}\n    .into(OUT_binned)\n\nOUT_binned.mix(OUT_unbinned).set{ {{ output_channel }} }\n\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/megahit.nf",
    "content": "if ( params.megahitKmers{{ param_id }}.toString().split(\" \").size() <= 1 ){\n    if (params.megahitKmers{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'megahitKmers{{ param_id }}' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.megahitKmers{{ param_id }}}\"\n    }\n}\nIN_megahit_kmers_{{ pid }} = Channel.value(params.megahitKmers{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess megahit_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/megahit_{{ pid }}/', pattern: '*_megahit*.fasta', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), max_len from {{ input_channel }}.join(SIDE_max_len_{{ pid }})\n    val kmers from IN_megahit_kmers_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file('*megahit*.fasta') into {{ output_channel }}\n    set sample_id, file('megahit/intermediate_contigs/k*.contigs.fa') into IN_fastg{{ pid }}\n    {% with task_name=\"megahit\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"megahit.py\"\n\n}\n\nfastg = params.fastg{{ param_id }} ? \"true\" : \"false\"\nprocess megahit_fastg_{{ pid }}{\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"results/assembly/megahit_{{ pid }}/$sample_id\", pattern: \"*.fastg\"\n\n    input:\n    set sample_id, file(kmer_files) from IN_fastg{{ pid }}\n    val run_fastg from fastg\n\n    output:\n    file \"*.fastg\" optional true\n    {% with task_name=\"megahit_fastg\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    if [ ${run_fastg} == \"true\" ]\n    then\n        for kmer_file in ${kmer_files};\n        do\n            echo \\$kmer_file\n            k=\\$(echo \\$kmer_file | cut -d '.' -f 1);\n            echo \\$k\n            megahit_toolkit contig2fastg \\$k \\$kmer_file > \\$kmer_file'.fastg';\n        done\n    fi\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/metabat2.nf",
    "content": "IN_contig_percentage_{{ pid }} = Channel.value(params.maxPercentage{{ param_id }})\nIN_length_threshold_{{ pid }} = Channel.value(params.minContig{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess metabat2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    //publishDir \"results/assembly/binning/metabat2_{{ pid }}/${sample_id}/\"\n\n    input:\n    set sample_id, file(assembly), file(bam_file), file(bam_index) from {{ input_channel }}\n    val contig_percentage from IN_contig_percentage_{{ pid }}\n    val length_threshold from IN_length_threshold_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file(assembly), file('*metabat-bins*/*.fa'), file ('bin_status.txt') into binCh_{{ pid }}\n    set sample_id, file('*metabat-bins*/*.fa') into intoReport_{{ pid }}\n    {% with task_name=\"metabat2\"%}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # prevent indexing errors\n        samtools sort ${bam_file} sorted\n        samtools index sorted.bam\n\n        # run METABAT2\n        runMetaBat.sh -m ${length_threshold} --unbinned --maxP ${contig_percentage} ${assembly} sorted.bam\n\n        # In case no sequences are binned\n        if [ -z \"\\$(ls -A *metabat-bins*/)\" ]; then\n            echo \"false\" > false_bin.fa\n            mv false_bin.fa *metabat-bins*/\n            echo \"false\" > bin_status.txt;\n        else\n            echo \"true\" > bin_status.txt\n        fi\n\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\nprocess report_metabat2_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(bins) from intoReport_{{ pid }}\n\n    output:\n    {% with task_name=\"report_metabat2\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_metabat.py\"\n}\n\n// If maxbin fails to obtain bins for a sample, the workflow continues with the original assembly\n{{ output_channel }} = Channel.create()\n\nOUT_binned = Channel.create()\nOUT_unbinned = Channel.create()\n\nfailedBinning = Channel.create()\nsuccessfulBinning = Channel.create()\n\nbinCh_{{ pid }}.choice(failedBinning, successfulBinning){ it -> it[3].text == \"false\\n\" ? 0 : 1 }\n\nfailedBinning.map{ it -> [it[0], it[1]] }.into(OUT_unbinned)\n\nsuccessfulBinning.map{ it -> [it[2].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'), it[2]]}\n    .transpose()\n    .map{it -> [it[1].toString().tokenize('/').last().tokenize('.')[0..-2].join('.'),it[1]]}\n    .into(OUT_binned)\n\nOUT_binned.mix(OUT_unbinned).set{ {{ output_channel }} }\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/metamlst.nf",
    "content": "IN_metamlstDB_{{ pid }} = Channel.value(params.metamlstDB{{ param_id }})\nIN_metamlstDB_index_{{ pid }} = Channel.value(params.metamlstDB_index{{ param_id }})\n\n\nprocess metamlst_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"results/annotation/metamlst_{{ pid }}/${sample_id}\", saveAs: { it.split(\"/\").last() }\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val metamlstDB from IN_metamlstDB_{{ pid }}\n    val metamlstDB_index from IN_metamlstDB_index_{{ pid }}\n\n    output:\n    file 'out/merged/*.txt' optional true\n    {% with task_name=\"metamlst\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    bowtie2 --very-sensitive-local -a --no-unal -x ${metamlstDB_index} -1 ${fastq_pair[0]} -2 ${fastq_pair[1]} | samtools view -bS - > ${sample_id}.bam\n\n    metamlst.py -d ${metamlstDB} ${sample_id}.bam\n\n    metamlst-merge.py -d ${metamlstDB} out/\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/metaprob.nf",
    "content": "IN_feature_{{ pid }} = Channel.value(params.feature{{ param_id }})\nIN_metaProbQMer_{{ pid }} = Channel.value(params.metaProbQMer{{ param_id }})\n\n// runs metaProb\nprocess metaProb_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir 'results/metaprob/'\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val feature from IN_feature_{{ pid }}\n    val metaProbQMer from IN_metaProbQMer_{{ pid }}\n\n    output:\n    set sample_id, file(\"*clusters.csv\") into metaProbOutChannel_{{ pid }}\n    {% with task_name=\"metaProb\", sample_id=\"sample_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    gunzip -c ${fastq_pair[0]} > ${sample_id}_read1.fastq\n    gunzip -c ${fastq_pair[1]} > ${sample_id}_read2.fastq\n\n    MetaProb -pi ${sample_id}_read1.fastq ${sample_id}_read2.fastq -feature ${feature} -m ${pmetaProbQMer}\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/metaspades.nf",
    "content": "if ( params.metaspadesKmers{{ param_id }}.toString().split(\" \").size() <= 1 ){\n    if (params.metaspadesKmers{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'metaspadesKmers{{ param_id }}' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.metaspadesKmers{{ param_id }}}\"\n    }\n}\nIN_metaspades_kmers_{{pid}} = Channel.value(params.metaspadesKmers{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess metaspades_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/metaspades_{{ pid }}/', pattern: '*_metaspades*.fasta', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), max_len from {{ input_channel }}.join(SIDE_max_len_{{ pid }})\n    val kmers from IN_metaspades_kmers_{{pid}}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file('*_metaspades*.fasta') into {{ output_channel }}\n    {% with task_name=\"metaspades\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"metaspades.py\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/midas_species.nf",
    "content": "if (params.midasDB{{ param_id }} == null){\n    exit 1, \"The path to the midas database must be provided with the 'midasDB{{ param_id }}' option.\"\n}\n\nIN_midas_DB_{{ pid }} = Channel.value(params.midasDB{{ param_id }})\n\nprocess midas_species_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/taxonomy/midas/\", pattern: \"*.txt\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val midasDB from IN_midas_DB_{{ pid }}\n\n    output:\n    file(\"${sample_id}_midas_species_profile.txt\")\n    {% with task_name=\"midas_species\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    run_midas.py species midas/ -d ${midasDB} -t $task.cpus -1 ${fastq_pair[0]} -2 ${fastq_pair[1]} --remove_temp\n\n    mv midas/species/species_profile.txt ./${sample_id}_midas_species_profile.txt\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/mlst.nf",
    "content": "\nprocess mlst_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n\n    output:\n    file '*.mlst.txt' into LOG_mlst_{{ pid }}\n    set sample_id, file(assembly), file(\".status\") into MAIN_mlst_out_{{ pid }}\n    {% with task_name=\"mlst\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        expectedSpecies=${params.mlstSpecies{{ param_id }}}\n        mlst $assembly >> ${sample_id}.mlst.txt\n        mlstSpecies=\\$(cat *.mlst.txt | cut -f2)\n        json_str=\"{'expectedSpecies':\\'\\$expectedSpecies\\',\\\n            'species':'\\$mlstSpecies',\\\n            'st':'\\$(cat *.mlst.txt | cut -f3)',\\\n            'tableRow':[{'sample':'${sample_id}','data':[\\\n                {'header':'MLST species','value':'\\$mlstSpecies','table':'typing'},\\\n                {'header':'MLST ST','value':'\\$(cat *.mlst.txt | cut -f3)','table':'typing'}]}]}\"\n        echo \\$json_str > .report.json\n\n        if [ ! \\$mlstSpecies = \\$expectedSpecies ];\n        then\n            printf fail > .status\n        else\n            printf pass > .status\n        fi\n\n    } || {\n        printf fail > .status\n    }\n    \"\"\"\n}\n\nprocess compile_mlst_{{ pid }} {\n\n    publishDir \"results/annotation/mlst_{{ pid }}/\"\n\n    input:\n    file res from LOG_mlst_{{ pid }}.collect()\n\n    output:\n    file \"mlst_report.tsv\"\n\n    script:\n    \"\"\"\n    cat $res >> mlst_report.tsv\n    \"\"\"\n}\n\n{{ output_channel }} = Channel.create()\nMAIN_mlst_out_{{ pid }}\n    .filter{ it[2].text != \"fail\" }\n    .map{ [it[0], it[1]] }\n    .set{ {{output_channel}} }\n\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/momps.nf",
    "content": "\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess momps_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(assembly), file(fastq) from {{ input_channel }}.join(_LAST_fastq_{{ pid }})\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    file(\"*_st.tsv\") into momps_st_{{ pid }}\n    file(\"*_profile.tsv\") into momps_profile_{{ pid }}\n    {% with task_name=\"momps\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # Stage in momps source files. This cannot be a symlink because the files\n        # need to be writable.\n        cp -r /NGStools/mompS/* .\n        momps.pl -r ${fastq[0]} -f ${fastq[1]} -a $assembly -o res -p $sample_id -t ${task.cpus}\n        # Get the ST for the sample\n        if [ -f \"res/${sample_id}.MLST_res.txt\" ]\n        then\n            st=\\$(grep -oP \"ST = \\\\K\\\\w+\" res/*.MLST_res.txt)\n            # If the ST cannot be determined, set string to ND\n            if [ -z \\$st ]\n            then\n                st=\"ND\"\n            fi\n            echo $sample_id\\t\\${st}> ${sample_id}_st.tsv\n            # Add ST information to report JSON\n            json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'mompS','value':'\\$st','table':'typing'}]}]}\"\n            echo \\$json_str > .report.json\n            # Get the profile for the sample\n            echo $sample_id\\t\\$(awk \"NR == 7\" res/*.MLST_res.txt) > ${sample_id}_profile.tsv\n            rm -r res\n\n            # Remove temporary input files when the clearInput option is used\n            if [ \"$clear\" = \"true\" ];\n            then\n                work_regex=\".*/work/.{2}/.{30}/.*\"\n                file_source1=\\$(readlink -f \\$(pwd)/${fastq[0]})\n                file_source2=\\$(readlink -f \\$(pwd)/${fastq[1]})\n                if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n                    rm \\$file_source1 \\$file_source2\n                fi\n            fi\n        else\n            echo fail > .status\n            rm -r res\n        fi\n    } || {\n        echo fail > .status\n        # Remove results directory\n        rm -r res\n    }\n    \"\"\"\n\n}\n\n\nprocess momps_report_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n    publishDir \"results/typing/momps_{{ pid }}/\", pattern: \"*.tsv\"\n\n    input:\n    file(st_file) from momps_st_{{ pid }}.collect()\n    file(profile_file) from momps_profile_{{ pid }}.collect()\n\n    output:\n    file \"*.tsv\"\n\n\n    script:\n    \"\"\"\n    cat $st_file >> momps_st.tsv\n    cat $profile_file >> momps_profile.tsv\n    \"\"\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/nextflow.config",
    "content": "manifest {\n    name = \"{{ pipeline_name }}\"\n    mainScript = \"{{ nf_file }}\"\n}\n\nparams {\n    platformHTTP = null\n    reportHTTP = null\n\n    // Settings this option to true, will trigger the removal of temporary\n    // data (usually fastq reads) at particular checkpoint processes that\n    // modify that data. These checkpoint processes include 'trimmomatic',\n    // 'spades' and 'skesa'.\n    // WARNING: This will remove temporary fastq files that are not necessary\n    // for the completion of the pipeline but, consequently, will disable\n    // the resume functionality of the pipeline. However, it is often necessary\n    // for very large pipelines and whenever disk space is critical.\n    // More precisely, these checkpoint components will check whether the\n    // putative temporary files are inside the nextflow work directory by\n    // matching the regex: \".*/work/.{2}/.{30}/.*\"\n    // If it is a match, then the file is assumed to be a temporary one and\n    // will be removed.\n    clearAtCheckpoint = false\n}\n\nenv {\n    PYTHONPATH = \"$baseDir/templates:\\$PYTHONPATH\"\n    PATH = \"$baseDir/templates:\\$PATH\"\n}\n\nprocess {\n    cpus = 1\n    memory = \"1GB\"\n\n    errorStrategy = { task.attempt <= 7 ? \"retry\" : \"ignore\" }\n    maxRetries = 7\n    container = \"flowcraft/flowcraft_base:1.0.0-1\"\n}\n\ndocker {\n    // Added default docker option to avoid docker permission errors. See issue\n    // #142\n    runOptions = \"-u \\$(id -u):\\$(id -g)\"\n}\n\n\nexecutor {\n  $local {\n      cpus = 4\n  }\n}\n\nsingularity {\n    cacheDir = \"$HOME/.singularity_cache\"\n    autoMounts = true\n}\n\ntrace {\n    enabled = true\n    file = \"pipeline_stats.txt\"\n    fields = \"task_id,\\\n              hash,\\\n              process,\\\n              tag,\\\n              status,\\\n              exit,\\\n              start,\\\n              container,\\\n              cpus,\\\n              time,\\\n              disk,\\\n              memory,\\\n              duration,\\\n              realtime,\\\n              queue,\\\n              %cpu,\\\n              %mem,\\\n              rss,\\\n              vmem,\\\n              rchar,\\\n              wchar\"\n}\n\n//                             PROFILE OPTIONS                               //\n///////////////////////////////////////////////////////////////////////////////\n\nprofiles {\n\n    oneida {\n\n        process.executor = \"slurm\"\n        docker.enabled = true\n\n        process{\n\n            // MEMORY USAGE PER PROCESS //\n            // general memory usage\n            memory = \"4GB\"\n\n        }\n\n    }\n\n    // INCD PROFILE\n    incd {\n\n        process.executor = \"slurm\"\n        singularity.enabled = true\n\n        singularity {\n            cacheDir = \"/mnt/singularity_cache\"\n            autoMounts = true\n        }\n\n        // Error and retry strategies\n        process.errorStrategy = \"retry\"\n        maxRetries = 3\n\n        process.$chewbbaca.queue = \"chewBBACA\"\n\n        process {\n\n            // MEMORY USAGE PER PROCESS //\n            // general memory usage\n            memory = \"4GB\"\n\n        }\n\n    }\n\n    // SLURM PROFILE\n    slurm {\n\n        // Change executor for SLURM\n        process.executor = \"slurm\"\n        // Change container engine for Shifter\n        shifter.enabled = true\n\n        process {\n\n            clusterOptions = \"--qos=oneida\"\n\n            errorStrategy = \"retry\"\n            maxRetries = 5\n\n            // MEMORY USAGE PER PROCESS //\n            // general memory usage\n            memory = \"4GB\"\n\n        }\n\n    }\n\n    // SLURM PROFILE\n    slurmOneida {\n\n        // Change executor for SLURM\n        process.executor = \"slurm\"\n        // Change container engine for Shifter\n        shifter.enabled = true\n\n        process {\n\n            clusterOptions = \"--qos=oneida\"\n\n            // MEMORY USAGE PER PROCESS //\n            // general memory usage\n            memory = \"4GB\"\n\n            // Set QOS for chewbbaca in order to run a single job\n            $chewbbaca.clusterOptions = \"--qos=chewbbaca\"\n        }\n    }\n}\n\nincludeConfig \"profiles.config\"\nincludeConfig \"resources.config\"\nincludeConfig \"containers.config\"\nincludeConfig \"params.config\"\nincludeConfig \"user.config\"\n"
  },
  {
    "path": "flowcraft/generator/templates/params.config",
    "content": "params {\n\n{{ params_info }}\n\n}"
  },
  {
    "path": "flowcraft/generator/templates/patho_typing.nf",
    "content": "if ( !params.species{{ param_id }}){ exit 1, \"'species' parameter missing\" }\nif ( params.species{{ param_id }}.toString().split(\" \").size() != 2 ){\n    exit 1, \"'species' parameter must contain two values (e.g.: 'escherichia coli'). Provided value: ${params.species{{ param_id }}}\"\n}\n\nIN_pathoSpecies_{{ pid }} = Channel.value(params.species{{ param_id }})\n\nprocess patho_typing_{{ pid }} {\n\n    validExitStatus 0, 2\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    errorStrategy \"ignore\"\n    publishDir \"results/pathotyping/${sample_id}/\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val species from IN_pathoSpecies_{{ pid }}\n\n    output:\n    file \"patho_typing*\" optional true\n    {% with task_name=\"patho_typing\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # Prevents read-only issues\n        mkdir rematch_temp\n        cp -r /NGStools/ReMatCh rematch_temp\n        export PATH=\"\\$(pwd)/rematch_temp/ReMatCh:\\$PATH\"\n\n        patho_typing.py -f \\$(pwd)/${fastq_pair[0]} \\$(pwd)/${fastq_pair[1]} -o \\$(pwd) -j $task.cpus --trueCoverage --species $species\n\n        # Add information to dotfiles\n        version_str=\"[{'program':'patho_typing.py','version':'0.4'}]\"\n        echo \\$version_str > .versions\n\n        rm -r rematch_temp\n        echo pass > .status\n\n        if [ -s patho_typing.report.txt ];\n        then\n            json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'pathotyping','value':'\\$(cat patho_typing.report.txt)','table':'typing'}]}]}\"\n            echo \\$json_str > .report.json\n            echo pass > .status\n        else\n            json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'pathotyping','value':'NA','table':'typing'}]}]}\"\n            echo \\$json_str > .report.json\n            echo fail > .status\n        fi\n    } || {\n        echo fail > .status\n        json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'pathotyping','value':'NA','table':'typing'}]}]}\"\n        echo \\$json_str > .report.json\n    }\n    \"\"\"\n\n}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/patlas_consensus.nf",
    "content": "\n/**\n* A process that creates a consensus from all the outputted json files\n*/\nprocess fullConsensus {\n\n    tag { sample_id }\n\n    publishDir 'results/consensus_{{ pid }}/'\n\n    input:\n    set sample_id, file(infile_list) from {{ compile_channels }}\n\n    output:\n    file \"consensus_*.json\"\n\n    script:\n    template \"pATLAS_consensus_json.py\"\n\n}"
  },
  {
    "path": "flowcraft/generator/templates/pilon.nf",
    "content": "\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess pilon_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    echo false\n    publishDir 'results/assembly/pilon_{{ pid }}/', mode: 'copy', pattern: \"*.fasta\"\n\n    input:\n    set sample_id, file(assembly), file(bam_file), file(bam_index) from {{ input_channel }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, '*_polished.fasta' into {{ output_channel }}, pilon_report_{{ pid }}\n    {% with task_name=\"pilon\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        pilon_mem=${String.valueOf(task.memory).substring(0, String.valueOf(task.memory).length() - 1).replaceAll(\"\\\\s\", \"\")}\n        java -jar -Xms256m -Xmx\\${pilon_mem} /NGStools/pilon-1.22.jar --genome $assembly --frags $bam_file --output ${assembly.name.replaceFirst(~/\\.[^\\.]+$/, '')}_polished --changes --threads $task.cpus >> .command.log 2>&1\n        echo pass > .status\n\n        if [ \"$clear\" = \"true\" ];\n        then\n            work_regex=\".*/work/.{2}/.{30}/.*\"\n            assembly_file=\\$(readlink -f \\$(pwd)/${assembly})\n            bam_file=\\$(readlink -f \\$(pwd)/${bam_file})\n            if [[ \"\\$assembly_file\" =~ \\$work_regex ]]; then\n                rm \\$assembly_file \\$bam_file\n            fi\n        fi\n\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n\n}\n\nprocess pilon_report_{{ pid }} {\n\n    {% with overwrite=\"false\" %}\n    {% include \"report_post.txt\" ignore missing %}\n    {% endwith %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(assembly), file(coverage_bp) from pilon_report_{{ pid }}.join(SIDE_BpCoverage_{{ pid }})\n\n    output:\n    file \"*_assembly_report.csv\" into pilon_report_out_{{ pid }}\n    {% with task_name=\"pilon_report\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"assembly_report.py\"\n\n}\n\n\nprocess compile_pilon_report_{{ pid }} {\n\n    publishDir \"reports/assembly/pilon_{{ pid }}/\", mode: 'copy'\n\n    input:\n    file(report) from pilon_report_out_{{ pid }}.collect()\n\n    output:\n    file \"pilon_assembly_report.csv\"\n\n    \"\"\"\n    echo Sample,Number of contigs,Average contig size,N50,Total assembly length,GC content,Missing data > pilon_assembly_report.csv\n    cat $report >> pilon_assembly_report.csv\n    \"\"\"\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/pipeline_graph.html",
    "content": "<!DOCTYPE html>\n<title>FlowCraft DAG tool</title>\n<meta charset=\"utf-8\"/>\n<style>\n    .node circle {\n        stroke: steelblue;\n        stroke-width: 3px;\n    }\n\n    .node text {\n        font: 14px sans-serif;\n        font-weight: bold;\n    }\n\n    .link {\n        fill: none;\n        stroke: #acacac;\n        stroke-width: 2px;\n    }\n\n    div.tooltip {\n        position: absolute;\n        text-align: center;\n        padding: 10px 15px 10px 15px;\n        font: 14px sans-serif;\n        background: lightsteelblue;\n        box-shadow: 1px 2px 8px #626262;\n        border-radius: 8px;\n        pointer-events: none;\n    }\n</style>\n<body>\n</body>\n<script src=\"https://d3js.org/d3.v4.min.js\"></script>\n<script>\n    // fetchs data using jinja\n    const inputData = {{ data }}\n\n    /**\n     * This function creates a tooltip with the node/process information\n     * on mouse over in the respective node\n     *\n     * @param {Object} d - stores information of the node data (containing\n     * name, input, output, etc) and parent info for this node\n     */\n    const mouseover = (d) => {\n        div.transition()\n            .duration(200)\n            .style(\"opacity\", .9)\n        div.html(`<b>pid:</b> ${d.data.process.pid},<br>\n            <b>lane:</b> ${d.data.process.lane},<br>\n            <b>input:</b> ${d.data.process.input},<br>\n            <b>output:</b> ${d.data.process.output},<br>\n            <b>directives:</b><br>\n            ${d.data.process.directives}\n            `)\n            .style(\"left\", (d3.event.pageX) + \"px\")\n            .style(\"left\", (d3.event.pageX) + \"px\")\n            .style(\"top\", (d3.event.pageY - 28) + \"px\")\n            .style(\"text-align\", \"left\")\n    }\n\n    /**\n     * Function that hides the tooltip\n     * @param {Object} d - stores information of the node data (containing\n     * name, input, output, etc) and parent info for this node\n     */\n    const mouseout = (d) => {\n        div.transition()\n            .duration(500)\n            .style(\"opacity\", 0)\n    }\n\n    /**\n     * Function that collapses nodes and all their childrens\n     * @param {Object} d - stores information of the node data (containing\n     * name, input, output, etc) and parent info for this node\n     */\n    // const collapse = (d) => {\n    //     if(d.children) {\n    //         d._children = d.children\n    //         d._children.forEach(collapse)\n    //         d.children = null\n    //     }\n        // }\n\n    // Set the dimensions and margins of the diagram\n    const margin = {top: 20, right: 20, bottom: 20, left: 20},\n        width = 1870,\n        height = 860\n\n    const div = d3.select(\"body\").append(\"div\")\n        .attr(\"class\", \"tooltip\")\n        .style(\"opacity\", 0)\n\n    let i = 0,\n        duration = 750\n\n    let root\n    // Assigns parent, children, height, depth\n    root = d3.hierarchy(inputData, (d) => { return d.children })\n    root.x0 = height / 2\n    root.y0 = 0\n\n    // declares a tree layout and assigns the size\n    const treemap = d3.tree().size([height, width])\n\n    // Assigns the x and y position for the nodes\n    const treeData = treemap(root)\n\n    // append the svg object to the body of the page\n    // appends a 'group' element to 'svg'\n    // moves the 'group' element to the top left margin\n    const svg = d3.select(\"body\")\n        .append(\"svg\")\n        .attr(\"width\", width + margin.right + margin.left)\n        .attr(\"height\", height + margin.top + margin.bottom)\n        .call(d3.zoom().on(\"zoom\", function () {\n            svg.attr(\"transform\", d3.event.transform)\n        }))\n        .on(\"dblclick.zoom\", null)\n        .append(\"g\")\n        .attr(\"transform\", \"translate(\"\n            + margin.left + \",\" + margin.top + \")\"\n        )\n\n    /**\n     * Function that updates the graph on load and on node clicks\n     *\n     * @param {Object} source - Stores the full tree information, including\n     * the root node, which will be deleted by filter on nodes and links.\n     */\n    const update = (source) => {\n\n        // Creates a curved (diagonal) path from parent to the child nodes\n        /**\n         * Creates a curved (diagonal) path from parent to the child nodes\n         *\n         * @param {Object} s\n         * @param {Object} d\n         * @returns {string}\n         */\n        const diagonal = (s, d) => {\n            path = `M ${s.y} ${s.x}\n            C ${(s.y + d.y) / 2} ${s.x},\n              ${(s.y + d.y) / 2} ${d.x},\n              ${d.y} ${d.x}`\n            return path\n        }\n\n        /**\n         * Function that toggles childrens on click\n         *\n         * @param {Object} d - stores information of the node data (containing\n         * name, input, output, etc) and parent info for this node\n         */\n        const click = (d) => {\n            if (d.children) {\n                d._children = d.children\n                d.children = null\n            } else {\n                d.children = d._children\n                d._children = null\n            }\n            update(d)\n        }\n\n        // Compute the new tree layout.\n        let nodes = treeData.descendants(),\n            links = treeData.descendants().slice(1)\n\n        // hide root node\n        nodes = nodes.filter( (d) => {\n            return d.depth\n        })\n\n        // hide links to root\n        links = links.filter( (d) => {\n            return d.depth !== 1\n        })\n\n        // ****************** Nodes section ***************************\n\n        // Update the nodes...\n        const node = svg.selectAll('g.node')\n            .data(nodes, (d) => { return d.id || (d.id = ++i) })\n\n        // Enter any new modes at the parent's previous position.\n        const nodeEnter = node.enter().append('g')\n            .attr('class', 'node')\n            .attr(\"transform\", (d) => {\n                return \"translate(\" + source.y0 + \",\" + source.x0 + \")\"\n            })\n            .on('click', click)\n            .on(\"mouseover\", mouseover)\n            .on(\"mouseout\", mouseout)\n\n        // Add Circle for the nodes\n        nodeEnter.append('circle')\n            .attr('class', 'node')\n            .attr('r', 1e-6)\n        // .style(\"fill\", (d) => {\n        //     return d._children ? \"lightsteelblue\" : \"#fff\"\n        // })\n\n        // Add labels for the nodes\n        nodeEnter.append('text')\n            .attr(\"y\", \"-20\")\n            .attr(\"text-anchor\", \"middle\")\n            .text( (d) => { return d.data.name } )\n\n                // gets labels variable\n        const labels = d3.selectAll(\"text\")\n        // returns the label with max width value\n        const maxTextWidth = d3.max(labels.nodes(),\n            n => n.getComputedTextLength())\n\n        // Normalize for fixed-depth, according to max_width\n        nodes.forEach( (d) => { d.y = d.depth * maxTextWidth} )\n\n        // UPDATE\n        const nodeUpdate = nodeEnter.merge(node)\n\n        // Transition to the proper position for the node\n        nodeUpdate.transition()\n            .duration(duration)\n            .attr(\"transform\", (d) => {\n                return \"translate(\" + d.y + \",\" + d.x + \")\"\n            })\n\n        // Update the node attributes and style\n        nodeUpdate.select('circle.node')\n            .attr('r', 10)\n            .style(\"fill\", (d) => {\n                return d._children ? \"#ffad6b\" : \"lightsteelblue\"\n            })\n            .attr('cursor', 'pointer')\n\n\n        // Remove any exiting nodes\n        const nodeExit = node.exit().transition()\n            .duration(duration)\n            .attr(\"transform\", (d) => {\n                return \"translate(\" + source.y + \",\" + source.x + \")\"\n            })\n            .remove()\n\n        // On exit reduce the node circles size to 0\n        nodeExit.select('circle')\n            .attr('r', 1e-6)\n\n        // On exit reduce the opacity of text labels\n        nodeExit.select('text')\n            .style('fill-opacity', 1e-6)\n\n        // ****************** links section ***************************\n\n        // Update the links...\n        const link = svg.selectAll('path.link')\n            .data(links, (d) => { return d.id })\n\n        // Enter any new links at the parent's previous position.\n        const linkEnter = link.enter().insert('path', \"g\")\n            .attr(\"class\", \"link\")\n            .attr('d', (d) => {\n                const o = {x: source.x0, y: source.y0}\n                return diagonal(o, o)\n            })\n\n        // merge links\n        const linkUpdate = linkEnter.merge(link)\n\n        // Transition back to the parent element position\n        linkUpdate.transition()\n            .duration(duration)\n            .attr('d', function(d){ return diagonal(d, d.parent) })\n\n        // Remove any existing links\n        const linkExit = link.exit().transition()\n            .duration(duration)\n            .attr('d', (d) => {\n                const o = {x: source.x, y: source.y}\n                return diagonal(o, o)\n            })\n            .remove()\n\n        // Store the old positions for transition.\n        nodes.forEach( (d) => {\n            d.x0 = d.x\n            d.y0 = d.y\n        })\n\n    }\n    // Collapse after the second level\n    // root.children.forEach(collapse);\n\n    update(root)\n\n</script>\n"
  },
  {
    "path": "flowcraft/generator/templates/post.txt",
    "content": "    if ( params.platformHTTP != null ) {\n        beforeScript \"PATH=${workflow.projectDir}/bin:\\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId {{ pid }} $params.platformHTTP\"\n        afterScript \"final_POST.sh $params.projectId $params.pipelineId {{ pid }} $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId {{ pid }} $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId {{ template }}_{{ pid }} \\\"$params.platformSpecies\\\" {{ overwrite|default(\"true\") }}\"\n    } else {\n        beforeScript \"PATH=${workflow.projectDir}/bin:\\$PATH; set_dotfiles.sh\"\n        }"
  },
  {
    "path": "flowcraft/generator/templates/process_skesa.nf",
    "content": "if ( !params.skesaMinKmerCoverage{{ param_id }}.toString().isNumber() ){ \n    exit 1, \"'skesaMinKmerCoverage{{ param_id }}' parameter must be a number. Provided value: ${params.skesaMinKmerCoverage{{ param_id }}}\"\n}\nif ( !params.skesaMinContigLen{{ param_id }}.toString().isNumber() ){ \n    exit 1, \"'skesaMinContigLen{{ param_id }}' parameter must be a number. Provided value: ${params.skesaMinContigLen{{ param_id }}}\"\n}\nif ( !params.skesaMaxContigs{{ param_id }}.toString().isNumber() ){ \n    exit 1, \"'skesaMaxContigs{{ param_id }}' parameter must be a number. Provided value: ${params.skesaMaxContigs{{ param_id }}}\"\n}\n\nIN_process_skesa_opts_{{ pid }} = Channel.value([params.skesaMinContigLen{{ param_id }},params.skesaMinKmerCoverage{{ param_id }},params.skesaMaxContigs{{ param_id }}])\nIN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n\nprocess process_skesa_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n    publishDir \"reports/assembly/skesa_filter_{{ pid }}\", pattern: '*.report.csv', mode: 'copy'\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    val opts from IN_process_skesa_opts_{{ pid }}\n    val gsize from IN_genome_size_{{ pid }}\n    val assembler from Channel.value(\"skesa\")\n\n    output:\n    set sample_id, file('*.fasta') into {{ output_channel }}\n    file '*.report.csv' optional true\n    {% with task_name=\"process_skesa\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_assembly.py\"\n\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/process_spades.nf",
    "content": "if ( !params.spadesMinKmerCoverage{{ param_id }}.toString().isNumber()){\n    exit 1, \"'spadesMinKmerCoverage' parameter must be a number. Provided value: ${params.spadesMinKmerCoverage{{ param_id }}}\"\n}\nif ( !params.spadesMinContigLen{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'spadesMinContigLen' parameter must be a number. Provided value: ${params.spadesMinContigLen{{ param_id }}}\"\n}\nif ( !params.spadesMaxContigs{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'spadesMaxContigs' parameter must be a number. Provided value: ${params.spadesMaxContigs{{ param_id }}}\"\n}\n\nIN_process_spades_opts_{{ pid }} = Channel.value([params.spadesMinContigLen{{ param_id }}, params.spadesMinKmerCoverage{{ param_id }}, params.spadesMaxContigs{{ param_id }}])\nIN_genome_size_{{ pid }} = Channel.value(params.genomeSize{{ param_id }})\n\nprocess process_spades_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    // This process can only use a single CPU\n    cpus 1\n    publishDir \"reports/assembly/spades_filter_{{ pid }}\", pattern: '*.report.csv', mode: 'copy'\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    val opts from IN_process_spades_opts_{{ pid }}\n    val gsize from IN_genome_size_{{ pid }}\n    val assembler from Channel.value(\"spades\")\n\n    output:\n    set sample_id, file('*.fasta') into {{ output_channel }}\n    file '*.report.csv' optional true\n    {% with task_name=\"process_spades\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_assembly.py\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/progressive_mauve.nf",
    "content": "process progressive_mauve_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { 'progressive_mauve' }\n\n    publishDir \"results/alignment/progressive_mauve_{{ pid }}/\", pattern: '*.align*', mode: 'copy'\n\n    input:\n    file(assembly) from {{ input_channel }}.map{ it[1] }.collect()\n\n    output:\n    file (\"*.align\") into {{ output_channel }}\n    {% with task_name=\"progressive_mauve\", sample_id=\"val('single')\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    progressiveMauve --output=${workflow.scriptName}.align --collinear ${assembly}\n    \"\"\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/prokka.nf",
    "content": "\nIN_centre_{{ pid }} = Channel.value(params.centre{{ param_id }})\n\nIN_kingdom_{{ pid }} = Channel.value(params.kingdom{{ param_id }})\n\n// check if genus is provided or not\ngenusVar = (params.genus{{ param_id }} == false) ? \"\" : \"--usegenus --genus ${params.genus{{param_id}}} \"\n\nprocess prokka_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir \"results/annotation/prokka_{{ pid }}/${sample_id}\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    val centre from IN_centre_{{ pid }}\n    val kingdom from IN_kingdom_{{ pid }}\n\n    output:\n    file \"${sample_id}/*\"\n    {% with task_name=\"prokka\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        prokka --outdir $sample_id --cpus $task.cpus --centre ${centre} \\\n        --compliant --kingdom ${kingdom} ${genusVar} --increment 10 $assembly\n        echo pass > .status\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n\n}\n\n\n"
  },
  {
    "path": "flowcraft/generator/templates/quast.nf",
    "content": "if (params.reference{{param_id}} == null && params.genomeSizeBp{{param_id}} == null)\n    exit 1, \"Specify at least one of reference or genomeSizeBp\"\nif (params.reference{{param_id}} != null && params.genomeSizeBp{{param_id}} != null)\n    exit 1, \"Specify only one of reference or genomeSizeBp\"\n\nif (params.reference{{param_id}} != null) {\n    process quast_{{pid}} {\n        {% include \"post.txt\" ignore missing %}\n\n        tag { sample_id }\n        publishDir \"results/assembly/quast_{{pid}}/$sample_id\", pattern: \"*.tsv\"\n        publishDir \"reports/assembly/quast_{{pid}}/$sample_id\"\n\n        input:\n        set sample_id, file(assembly) from {{input_channel}}\n        file reference from Channel.fromPath(params.reference{{param_id}})\n\n        output:\n        file \"*\"\n        {% with task_name=\"quast\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n\n        script:\n        \"/usr/bin/time -v quast -o . -r $reference -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1\"\n    }\n} else if (params.genomeSizeBp{{param_id}} != null) {\n    process quast_{{pid}} {\n        {% include \"post.txt\" ignore missing %}\n\n        tag { sample_id }\n        publishDir \"results/assembly/quast_{{pid}}/$sample_id\", pattern: \"*.tsv\"\n        publishDir \"reports/assembly/quast_{{pid}}/$sample_id\"\n\n        input:\n        set sample_id, file(assembly) from {{input_channel}}\n        val genomeSizeBp from Channel.value(params.genomeSizeBp{{param_id}})\n\n        output:\n        file \"*\"\n        {% with task_name=\"quast\" %}\n        {%- include \"compiler_channels.txt\" ignore missing -%}\n        {% endwith %}\n\n        script:\n        \"/usr/bin/time -v quast -o . --est-ref-size=$genomeSizeBp -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1\"\n    }\n}\n"
  },
  {
    "path": "flowcraft/generator/templates/raxml.nf",
    "content": "IN_substitution_model_{{ pid }} = Channel.value(params.substitutionModel{{ param_id }})\nIN_seed_number_{{ pid }} = Channel.value(params.seedNumber{{ param_id }})\nIN_bootstrap_number_{{ pid }} = Channel.value(params.bootstrap{{ param_id }})\nIN_simple_label_{{ pid}} = Channel.value(params.simpleLabel{{ param_id }})\n\nprocess raxml_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { 'raxml' }\n\n    publishDir \"results/phylogeny/raxml_{{ pid }}/\"\n\n    input:\n    file(alignment) from {{ input_channel }}\n    val substitution_model from IN_substitution_model_{{ pid }}\n    val seednumber from IN_seed_number_{{ pid }}\n    val bootstrapnumber from IN_bootstrap_number_{{ pid }}\n\n    output:\n    file (\"RAxML_*\") into {{ output_channel }}\n    file (\"RAxML_bipartitions.*.nf\") into into_json_{{ pid }}\n    {% with task_name=\"raxml\", sample_id=\"val('single')\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    raxmlHPC -s ${alignment} -p 12345 -m ${substitution_model} -T $task.cpus -n $workflow.scriptName -f a -x ${seednumber} -N ${bootstrapnumber}\n\n    # Add information to dotfiles\n    version_str=\"[{'program':'raxmlHPC','version':'8.2.11'}]\"\n    echo \\$version_str > .versions\n    \"\"\"\n\n}\n\nprocess report_raxml_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { 'raxml' }\n\n    input:\n    file(newick) from into_json_{{ pid }}\n    val label from IN_simple_label_{{ pid}}\n\n    output:\n    {% with task_name=\"report_raxml\", sample_id=\"val('single')\"  %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_newick.py\"\n\n}\n\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/reads_download.nf",
    "content": "if (params.asperaKey{{ param_id }}){\n    if (file(params.asperaKey{{ param_id }}).exists()){\n        IN_asperaKey_{{ pid }} = Channel.fromPath(params.asperaKey{{ param_id }})\n    } else {\n        IN_asperaKey_{{ pid }} = Channel.value(\"\")\n    }\n} else {\n    IN_asperaKey_{{ pid }} = Channel.value(\"\")\n}\n\nprocess reads_download_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { accession_id }\n    publishDir \"reads\", pattern: \"${accession_id}/*fq.gz\"\n    maxRetries 1\n\n    input:\n    set val(accession_id), val(name) from reads_download_in_1_0.splitText(){ it.trim() }.unique().filter{ it != \"\" }.map{ it.split().length > 1 ? [\"accession\": it.split()[0], \"name\": it.split()[1]] : [it.split()[0], null] }\n    each file(aspera_key) from IN_asperaKey_{{ pid }}\n\n    output:\n    set val({ \"$name\" != \"null\" ? \"$name\" : \"$accession_id\" }), file(\"${accession_id}/*fq.gz\") optional true into {{ output_channel }}\n    {% with task_name=\"reads_download\", sample_id=\"accession_id\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # getSeqENA requires accession numbers to be provided as a text file\n        echo \"${accession_id}\" >> accession_file.txt\n        # Set default status value. It will be overwritten if anything goes wrong\n        echo \"pass\" > \".status\"\n\n        if [ -f $aspera_key ]; then\n            asperaOpt=\"-a $aspera_key\"\n        else\n            asperaOpt=\"\"\n        fi\n\n        getSeqENA.py -l accession_file.txt \\$asperaOpt -o ./ --SRAopt --downloadCramBam\n\n        # If a name has been provided along with the accession, rename the\n        # fastq files.\n        if [ $name != null ];\n        then\n            echo renaming pattern '${accession_id}' to '${name}' && cd ${accession_id} && rename \"s/${accession_id}/${name}/\" *.gz\n        fi\n    } || {\n        # If exit code other than 0\n        if [ \\$? -eq 0 ]\n        then\n            echo \"pass\" > .status\n        else\n            echo \"fail\" > .status\n            echo \"Could not download accession $accession_id\" > .fail\n        fi\n    }\n    version_str=\"{'version':[{'program':'getSeqENA.py','version':'1.3'}]}\"\n    echo \\$version_str > .versions\n    \"\"\"\n\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/remove_host.nf",
    "content": "IN_index_files_{{ pid }} = Channel.value(params.refIndex{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess remove_host_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/mapping/remove_host_{{ pid }}/', pattern: '*_bowtie2.log', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val bowtie2Index from IN_index_files_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id , file(\"${sample_id}*.headersRenamed_*.fq.gz\") into {{ output_channel }}\n    set sample_id, file(\"*_bowtie2.log\") into into_json_{{ pid }}\n    {% with task_name=\"remove_host\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        bowtie2 -x ${bowtie2Index} -1 ${fastq_pair[0]} -2 ${fastq_pair[1]} -p $task.cpus 1> ${sample_id}.bam 2> ${sample_id}_bowtie2.log\n\n        samtools view -buh -f 12 -o ${sample_id}_samtools.bam -@ $task.cpus ${sample_id}.bam\n\n        rm ${sample_id}.bam\n\n        samtools fastq -1 ${sample_id}_unmapped_1.fq -2 ${sample_id}_unmapped_2.fq ${sample_id}_samtools.bam\n\n        rm ${sample_id}_samtools.bam\n\n        renamePE_samtoolsFASTQ.py -1 ${sample_id}_unmapped_1.fq -2 ${sample_id}_unmapped_2.fq\n\n        gzip *.headersRenamed_*.fq\n        rm *.fq\n\n        if [ \"$clear\" = \"true\" ];\n        then\n            work_regex=\".*/work/.{2}/.{30}/.*\"\n            file_source1=\\$(readlink -f \\$(pwd)/${fastq_pair[0]})\n            file_source2=\\$(readlink -f \\$(pwd)/${fastq_pair[1]})\n            if [[ \"\\$file_source1\" =~ \\$work_regex ]]; then\n                rm \\$file_source1 \\$file_source2\n            fi\n        fi\n\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\n\n\nprocess report_remove_host_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(bowtie_log) from into_json_{{ pid }}\n\n    output:\n    {% with task_name=\"report_remove_host\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_mapping.py\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/report_compiler.nf",
    "content": "\n/** Reports\nCompiles the reports from every process\n*/\nprocess report {\n\n    tag { sample_id }\n\n    input:\n    set sample_id,\n            task_name,\n            pid,\n            report_json,\n            version_json,\n            trace from {{ compile_channels }}\n\n    output:\n    file \"*\" optional true into master_report\n\n    \"\"\"\n    prepare_reports.py $report_json $version_json $trace $sample_id $task_name 1 $pid $workflow.scriptId $workflow.runName\n    \"\"\"\n\n}\n\nFile forkTree = new File(\"${workflow.projectDir}/.forkTree.json\")\nFile treeDag = new File(\"${workflow.projectDir}/.treeDag.json\")\nFile js = new File(\"${workflow.projectDir}/resources/main.js.zip\")\n\n\nforks_channel = forkTree.exists() ?  Channel.fromPath(\"${workflow.projectDir}/.forkTree.json\") : Channel.value(null)\ndag_channel = forkTree.exists() ?  Channel.fromPath(\"${workflow.projectDir}/.treeDag.json\") : Channel.value(null)\njs_channel = forkTree.exists() ?  Channel.fromPath(\"${workflow.projectDir}/resources/main.js.zip\") : Channel.value(null)\n\nprocess compile_reports {\n\n    publishDir \"pipeline_report/\", mode: \"copy\"\n\n    if ( params.reportHTTP != null ){\n        beforeScript \"PATH=${workflow.projectDir}/bin:\\$PATH; export PATH;\"\n        afterScript \"metadata_POST.sh $params.projectId $params.pipelineId 0 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId 0 \\\"$params.platformSpecies\\\"\"\n    }\n\n   input:\n   file report from master_report.collect()\n   file forks from forks_channel\n   file dag from dag_channel\n   file js from js_channel\n\n    output:\n    file \"pipeline_report.json\"\n    file \"pipeline_report.html\"\n    file \"src/main.js\"\n\n    script:\n    template \"compile_reports.py\"\n}\n\n\n"
  },
  {
    "path": "flowcraft/generator/templates/report_post.txt",
    "content": "    if ( params.platformHTTP != null ) {\n        beforeScript \"PATH=${workflow.projectDir}/bin:\\$PATH; export PATH; set_dotfiles.sh\"\n        afterScript \"report_POST.sh $params.projectId $params.pipelineId {{ pid }} $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId {{ template }}_{{ pid }} \\\"$params.platformSpecies\\\" {{ overwrite|default(\"true\") }}\"\n    } else {\n        beforeScript \"PATH=${workflow.projectDir}/bin:\\$PATH; export PATH; set_dotfiles.sh\"\n        }"
  },
  {
    "path": "flowcraft/generator/templates/resources.config",
    "content": "process {\n{{ process_info }}\n\n}"
  },
  {
    "path": "flowcraft/generator/templates/retrieve_mapped.nf",
    "content": "process retrieve_mapped_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/mapping/retrieve_mapped_{{ pid }}/'\n\n    input:\n    set sample_id, file(bam) from {{ input_channel }}\n\n    output:\n    set sample_id , file(\"*.headersRenamed_*.fq.gz\") into {{ output_channel }}\n    {% with task_name=\"retrieve_mapped\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    samtools view -buh -F 12 -o ${sample_id}_samtools.bam -@ $task.cpus ${bam}\n\n    rm ${bam}\n\n    samtools fastq -1 ${sample_id}_mapped_1.fq -2 ${sample_id}_mapped_2.fq ${sample_id}_samtools.bam\n\n    rm ${sample_id}_samtools.bam\n\n    renamePE_samtoolsFASTQ.py -1 ${sample_id}_mapped_1.fq -2 ${sample_id}_mapped_2.fq\n\n    gzip *.headersRenamed_*.fq\n\n    rm *.fq\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/seq_typing.nf",
    "content": "file(params.referenceFileO{{ param_id }}) ? params.referenceFileO{{ param_id }} : exit(1, \"'referenceFileO{{ param_id }}' parameter missing\")\nIN_refO_{{ pid }} = Channel.fromPath(params.referenceFileO{{ param_id }})\n    .map{ it -> it.exists() ? it : exit(1, \"referenceFileO file was not found: '${params.referenceFileO{{ param_id }}}'\")}\n\nfile(params.referenceFileH{{ param_id }}) ? params.referenceFileH{{ param_id }} : exit(1, \"'referenceFileH{{ param_id }}' parameter missing\")\nIN_refH_{{ pid }} = Channel.fromPath(params.referenceFileH{{ param_id }})\n    .map{ it -> it.exists() ? it : exit(1, \"referenceFileH file was not found: '${params.referenceFileH{{ param_id }}}'\")}\n\nprocess seq_typing_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    errorStrategy \"ignore\"\n    publishDir \"results/seqtyping/${sample_id}/\"\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    each file(refO) from IN_refO_{{ pid }}\n    each file(refH) from IN_refH_{{ pid }}\n\n    output:\n    file \"seq_typing*\"\n    {% with task_name=\"seq_typing\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # Prevents read-only issues\n        mkdir rematch_temp\n        cp -r /NGStools/ReMatCh rematch_temp\n        export PATH=\"\\$(pwd)/rematch_temp/ReMatCh:\\$PATH\"\n\n        seq_typing.py -f ${fastq_pair[0]} ${fastq_pair[1]} -r \\$(pwd)/$refO \\$(pwd)/$refH -o ./ -j $task.cpus --extraSeq 0 --mapRefTogether --minGeneCoverage 60\n\n        # Add information to dotfiles\n        json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'seqtyping','value':'\\$(cat seq_typing.report.txt)','table':'typing'}]}]}\"\n        echo \\$json_str > .report.json\n        version_str=\"[{'program':'seq_typing.py','version':'0.1'}]\"\n        echo \\$version_str > .versions\n\n        rm -r rematch_temp\n\n        if [ -s seq_typing.report.txt ];\n        then\n            echo pass > .status\n        else\n            echo fail > .status\n        fi\n    } || {\n        echo fail > .status\n        json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'seqtyping','value':'NA','table':'typing'}]}]}\"\n        echo \\$json_str > .report.json\n    }\n    \"\"\"\n\n}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/seroba.nf",
    "content": "Coverage_{{ pid }} = Channel.value(params.coverage{{ param_id }})\n\nprocess seroba_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(fastq) from {{ input_channel }}\n    val coverage from Coverage_{{ pid }}\n\n    output:\n    file(\"pred.tsv\") into LOG_seroba_{{ pid }}\n    {% with task_name=\"seroba\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        # create a directory in /tmp to store the results\n        mkdir /tmp/results\n        #rename input files for seroba (avoid match error)\n        mv ${fastq[0]} ${sample_id}_1.fq.gz\n        mv ${fastq[1]} ${sample_id}_2.fq.gz\n        # run seroba typing module\n        seroba runSerotyping --coverage ${coverage} /seroba/database/ ${sample_id}_1.fq.gz ${sample_id}_2.fq.gz /tmp/results/${sample_id}\n\n        # Get the ST for the sample\n        if [ -f \"/tmp/results/${sample_id}/pred.tsv\" ];\n        then\n            cp /tmp/results/${sample_id}/pred.tsv .\n            sed -i -- 's|/tmp/results/||g' pred.tsv\n            # Add ST information to report JSON\n            json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'serotype','value':'\\$(cat pred.tsv | cut -f2)','table':'typing'}]}]}\"\n            echo \\$json_str > .report.json\n        else\n            echo fail > .status\n            rm -r /tmp/results/\n        fi\n    } || {\n        echo fail > .status\n        # Remove results directory\n        rm -r /tmp/results/\n    }\n    \"\"\"\n\n}\n\nprocess compile_seroba_{{ pid }} {\n\n    publishDir \"results/typing/seroba_{{ pid }}/\"\n\n    input:\n    file res from LOG_seroba_{{ pid }}.collect()\n\n    output:\n    file \"seroba_report.tsv\"\n\n    script:\n    \"\"\"\n    cat $res >> seroba_report.tsv\n    \"\"\"\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/sistr.nf",
    "content": "\nprocess sistr_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/typing/sistr_{{ pid }}', pattern: \".tab\", mode: \"copy\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n\n    output:\n    {% with task_name=\"sistr\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"\"\"\n    {\n        sistr --qc -vv -t $task.cpus -f tab -o ${sample_id}_sistr.tab ${assembly}\n        json_str=\"{'tableRow':[{'sample':'${sample_id}','data':[{'header':'sistr','value':'\\$(awk \\\"FNR == 2\\\" *.tab | cut -f14)','table':'typing'}]}]}\"\n        echo \\$json_str > .report.json\n        sistr_version=\\$(sistr --version | cut -d\" \" -f2)\n        version_str=\"[{'program':'sistr','version':'\\$sistr_version'}]\"\n        echo \\$version_str > .versions\n\n        if [ -s ${sample_id}_sistr.tab ];\n        then\n            echo pass > .status\n        else\n            echo fail > .status\n        fi\n\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/skesa.nf",
    "content": "\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess skesa_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/skesa_{{ pid }}', pattern: '*skesa*.fasta', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, file('*.fasta') into {{ output_channel }}\n    {% with task_name=\"skesa\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"skesa.py\"\n\n}\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/spades.nf",
    "content": "if ( !params.spadesMinCoverage{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'spadesMinCoverage{{ param_id }}' parameter must be a number. Provided value: '${params.spadesMinCoverage{{ param_id }}}'\"\n}\nif ( !params.spadesMinKmerCoverage{{ param_id }}.toString().isNumber()){\n    exit 1, \"'spadesMinKmerCoverage{{ param_id }}' parameter must be a number. Provided value: '${params.spadesMinKmerCoverage{{ param_id }}}'\"\n}\n\nIN_spades_opts_{{ pid }} = Channel.value(\n    [params.spadesMinCoverage{{ param_id }},\n     params.spadesMinKmerCoverage{{ param_id }}\n     ])\n\nif ( params.spadesKmers{{ param_id }}.toString().split(\" \").size() <= 1 ){\n    if (params.spadesKmers{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'spadesKmers{{ param_id }}' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.spadesKmers{{ param_id }}}\"\n    }\n}\nIN_spades_kmers_{{pid}} = Channel.value(params.spadesKmers{{ param_id }})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ndisable_rr = params.disableRR{{ param_id }} ? \"true\" : \"false\"\n\ncheckpointClear_{{ pid }} = Channel.value(clear)\ndisableRR_{{ pid }} = Channel.value(disable_rr)\n\nprocess spades_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/spades_{{ pid }}/', pattern: '*_spades*.fasta', mode: 'copy'\n    publishDir \"results/assembly/spades_{{ pid }}/$sample_id\", pattern: \"*.gfa\", mode: \"copy\"\n    publishDir \"results/assembly/spades_{{ pid }}/$sample_id\", pattern: \"*.fastg\", mode: \"copy\"\n\n    input:\n    set sample_id, file(fastq_pair), max_len from {{ input_channel }}.join(SIDE_max_len_{{ pid }})\n    val opts from IN_spades_opts_{{ pid }}\n    val kmers from IN_spades_kmers_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n    val disable_rr from disableRR_{{ pid }}\n\n    output:\n    set sample_id, file('*_spades*.fasta') into {{ output_channel }}\n    file \"*.fastg\" optional true\n    file \"*.gfa\" into gfa1_{{ pid }}\n    {% with task_name=\"spades\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"spades.py\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/split_assembly.nf",
    "content": "// Check for the presence of absence of the minimum contig size parameter\nif (params.size{{ param_id }} == null){\n    exit 1, \"A minimum contig size must be provided.\"\n}\n\nIN_min_contig_size_{{ pid }} = Channel.value(params.size{{ param_id }})\n\nprocess split_assembly_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    publishDir \"results/assembly/split_assembly_{{ pid }}/${sample_id}/\"\n\n    input:\n    set sample_id, file(assembly) from {{ input_channel }}\n    val min_contig_size from IN_min_contig_size_{{ pid }}\n\n    output:\n    file('*.fasta') into splitCh_{{ pid }}\n    {% with task_name=\"split_assembly\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"split_fasta.py\"\n\n\n}\n\n{{ output_channel }} = Channel.create()\n\nsplitCh_{{ pid }}.flatMap().map{ it -> [it.toString().tokenize('/').last().tokenize('.')[0..-2].join('.'), it]}.into( {{ output_channel }} )\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/templates/status_compiler.nf",
    "content": "\n/** STATUS\nReports the status of a sample in any given process.\n*/\nprocess status {\n\n    tag { sample_id }\n    publishDir \"pipeline_status/$task_name\"\n\n    input:\n    set sample_id, task_name, status, warning, fail, file(log) from {{ compile_channels }}\n\n    output:\n    file '*.status' into master_status\n    file '*.warning' into master_warning\n    file '*.fail' into master_fail\n    file '*.log'\n\n    \"\"\"\n    echo $sample_id, $task_name, \\$(cat $status) > ${sample_id}_${task_name}.status\n    echo $sample_id, $task_name, \\$(cat $warning) > ${sample_id}_${task_name}.warning\n    echo $sample_id, $task_name, \\$(cat $fail) > ${sample_id}_${task_name}.fail\n    echo \"\\$(cat .command.log)\" > ${sample_id}_${task_name}.log\n    \"\"\"\n}\n\nprocess compile_status_buffer {\n\n    input:\n    file status from master_status.buffer( size: 5000, remainder: true)\n    file warning from master_warning.buffer( size: 5000, remainder: true)\n    file fail from master_fail.buffer( size: 5000, remainder: true)\n\n    output:\n    file 'master_status_*.csv' into compile_status_buffer\n    file 'master_warning_*.csv' into compile_warning_buffer\n    file 'master_fail_*.csv' into compile_fail_buffer\n\n    \"\"\"\n    cat $status >> master_status_${task.index}.csv\n    cat $warning >> master_warning_${task.index}.csv\n    cat $fail >> master_fail_${task.index}.csv\n    \"\"\"\n}\n\nprocess compile_status {\n\n    publishDir 'reports/status'\n\n    input:\n    file status from compile_status_buffer.collect()\n    file warning from compile_warning_buffer.collect()\n    file fail from compile_fail_buffer.collect()\n\n    output:\n    file \"*.csv\"\n\n    \"\"\"\n    cat $status >> master_status.csv\n    cat $warning >> master_warning.csv\n    cat $fail >> master_fail.csv\n    \"\"\"\n\n}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/trace_compiler.nf",
    "content": "\n\nprocess compile_traces {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    input:\n    set sample_id, vals from {{ input_channel }}\n\n   script:\n   template \"pipeline_status.py\"\n\n}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/trimmomatic.nf",
    "content": "// Check sliding window parameter\nif ( params.trimSlidingWindow{{ param_id }}.toString().split(\":\").size() != 2 ){\n    exit 1, \"'trimSlidingWindow{{ param_id}}' parameter must contain two values separated by a ':'. Provided value: '${params.trimSlidingWindow{{ param_id}}}'\"\n}\nif ( !params.trimLeading{{ param_id}}.toString().isNumber() ){\n    exit 1, \"'trimLeading{{ param_id}}' parameter must be a number. Provide value: '${params.trimLeading_{{pid}}}'\"\n}\nif ( !params.trimTrailing{{ param_id}}.toString().isNumber() ){\n    exit 1, \"'trimTrailing{{ param_id}}' parameter must be a number. Provide value: '${params.trimTrailing{{ param_id}}}'\"\n}\nif ( !params.trimMinLength{{ param_id}}.toString().isNumber() ){\n    exit 1, \"'trimMinLength{{ param_id}}' parameter must be a number. Provide value: '${params.trimMinLength{{ param_id}}}'\"\n}\n\nIN_trimmomatic_opts_{{ pid }} = Channel.value([params.trimSlidingWindow{{ param_id}},params.trimLeading{{ param_id}},params.trimTrailing{{ param_id}},params.trimMinLength{{ param_id}}])\nIN_adapters_{{ pid }} = Channel.value(params.adapters{{ param_id}})\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClear_{{ pid }} = Channel.value(clear)\n\nprocess trimmomatic_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    publishDir \"results/trimmomatic_{{ pid }}\", pattern: \"*.gz\"\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(fastq_pair), phred from {{ input_channel }}.join(SIDE_phred_{{ pid }})\n    val trim_range from Channel.value(\"None\")\n    val opts from IN_trimmomatic_opts_{{ pid }}\n    val ad from IN_adapters_{{ pid }}\n    val clear from checkpointClear_{{ pid }}\n\n    output:\n    set sample_id, \"${sample_id}_*trim.fastq.gz\" into {{ output_channel }}\n    file 'trimmomatic_report.csv'\n    {% with task_name=\"trimmomatic\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"trimmomatic.py\"\n\n}\n\n{{ forks }}\n\n"
  },
  {
    "path": "flowcraft/generator/templates/true_coverage.nf",
    "content": "if ( !params.species{{ param_id }}){\n    exit 1, \"'species{{ param_id }}' parameter missing\"\n}\nif ( params.species{{ param_id }}.toString().split(\" \").size() != 2 ){\n    exit 1, \"'species{{ param_id }}' parameter must contain two values (e.g.: 'escherichia coli').Provided value: '${params.species{{ param_id }}}'\"\n}\n\nIN_pathoSpecies_{{ pid }} = Channel.value(params.species{{ param_id }})\n\nprocess true_coverage_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(fastq_pair) from {{ input_channel }}\n    val species from IN_pathoSpecies_{{ pid }}\n\n    output:\n    set sample_id, file(fastq_pair) into {{ output_channel }}\n    {% with task_name=\"true_coverage\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    \"\"\"\n    {\n        trueCoverage_rematch.py -f $fastq_pair --species $species \\\n        -i /NGStools/true_coverage/data --json\n        if ls failing* 1> /dev/null 2>&1;\n        then\n            parse_true_coverage.py sample_*.json failing*.json\n        else\n            parse_true_coverage.py sample_*.json\n        fi\n        echo pass > .status\n    } || {\n        echo fail > .status\n    }\n    \"\"\"\n\n}\n\n{{ forks }}\n"
  },
  {
    "path": "flowcraft/generator/templates/unicycler.nf",
    "content": "process unicycler_{{pid}} {\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/unicycler_{{pid}}/', pattern: 'assembly.fasta'\n    publishDir 'results/assembly/unicycler_{{pid}}/', pattern: 'assembly.gfa'\n\n    input:\n    set sample_id, file(fastq_pair) from {{input_channel}}\n\n    output:\n    set sample_id, file('assembly.fasta') into {{output_channel}}\n    file \"assembly.gfa\" into gfa1_{{pid}}\n    {% with task_name=\"unicycler\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    \"unicycler -t $task.cpus -o . --no_correct --no_pilon -1 ${fastq_pair[0]} -2 ${fastq_pair[1]}\"\n}\n\n{{forks}}\n"
  },
  {
    "path": "flowcraft/generator/templates/user.config",
    "content": "// User configuration file that is not overwritten by flowcraft\n// Use this file to provide persistent configurations in the same pipeline\n// directory\n\n"
  },
  {
    "path": "flowcraft/generator/templates/viral_assembly.nf",
    "content": "//MAIN INPUT - FASTQ FILES\nspades_in = Channel.create()\nmegahit_in = Channel.create()\n{{ input_channel }}.into{ spades_in; megahit_in }\n\n//EXPECTED GENOME SIZE\nif ( !params.minimumContigSize{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'minimumContigSize{{ param_id }}' parameter must be a number. Provided value: '${params.minimumContigSize{{ param_id }}}'\"\n}\n\n//SPADES OPTIONS\nif ( !params.spadesMinCoverage{{ param_id }}.toString().isNumber() ){\n    exit 1, \"'spadesMinCoverage{{ param_id }}' parameter must be a number. Provided value: '${params.spadesMinCoverage{{ param_id }}}'\"\n}\nif ( !params.spadesMinKmerCoverage{{ param_id }}.toString().isNumber()){\n    exit 1, \"'spadesMinKmerCoverage{{ param_id }}' parameter must be a number. Provided value: '${params.spadesMinKmerCoverage{{ param_id }}}'\"\n}\n\nif ( params.spadesKmers{{ param_id }}.toString().split(\" \").size() <= 1 ){\n    if (params.spadesKmers{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'spadesKmers{{ param_id }}' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.spadesKmers{{ param_id }}}\"\n    }\n}\n\nclear = params.clearInput{{ param_id }} ? \"true\" : \"false\"\ncheckpointClearSpades_{{ pid }} = Channel.value(clear)\ncheckpointClearMegahit_{{ pid }} = Channel.value(clear)\n\n//MEGAHIT OPTIONS\nif ( params.megahitKmers{{ param_id }}.toString().split(\" \").size() <= 1 ){\n    if (params.megahitKmers{{ param_id }}.toString() != 'auto'){\n        exit 1, \"'megahitKmers{{ param_id }}' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.megahitKmers{{ param_id }}}\"\n    }\n}\n\n//SPADES INPUT CHANNELS\nIN_spades_opts_{{ pid }} = Channel.value([params.spadesMinCoverage{{ param_id }},params.spadesMinKmerCoverage{{ param_id }}])\nIN_spades_kmers_{{ pid }} = Channel.value(params.spadesKmers{{ param_id }})\n\n//MEGAGIT INPUT CHANNELS\nIN_megahit_kmers_{{ pid }} = Channel.value(params.megahitKmers{{ param_id }})\n\nSIDE_max_len_spades = Channel.create()\nSIDE_max_len_megahit = Channel.create()\nSIDE_max_len_{{ pid }}.into{SIDE_max_len_spades ; SIDE_max_len_megahit}\n\ndisableRR_{{ pid }} = \"false\"\n\nprocess va_spades_{{ pid }} {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    validExitStatus 0,1\n\n    tag { sample_id }\n    publishDir 'results/assembly/spades_{{ pid }}/', pattern: '*_spades*.fasta', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), max_len from spades_in.join(SIDE_max_len_spades)\n    val opts from IN_spades_opts_{{ pid }}\n    val kmers from IN_spades_kmers_{{ pid }}\n    val clear from checkpointClearSpades_{{ pid }}\n    val disable_rr from disableRR_{{ pid }}\n\n    output:\n    set sample_id, file({task.exitStatus == 1 ? \".exitcode\" : '*_spades*.fasta'}) into assembly_spades\n    {% with task_name=\"va_spades\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"spades.py\"\n\n}\n\nclass VerifyCompletness {\n\n    public static boolean contigs(String filename, int threshold){\n        BufferedReader reader = new BufferedReader(new FileReader(filename));\n        boolean result = processContigs(reader, threshold);\n        reader.close()\n\n        return result;\n    }\n\n    private static boolean processContigs(BufferedReader reader, int threshold){\n        String line;\n        int lineThreshold = 0;\n        List splittedLine\n\n        while ((line = reader.readLine()) != null) {\n            if (line.startsWith('>')) {\n                splittedLine = line.split('_')\n                lineThreshold = splittedLine[3].toInteger()\n                if(lineThreshold >= threshold) {\n                    return true;\n                }\n             }\n        }\n\n        return false;\n    }\n}\n\nmegahit = Channel.create()\ngood_assembly = Channel.create()\nassembly_spades.choice(good_assembly, megahit){a -> a[1].toString() == \"null\" ? false : VerifyCompletness.contigs(a[1].toString(), params.minimumContigSize{{ param_id }}.toInteger()) == true ? 0 : 1}\n\n\nprocess va_megahit_{{ pid }}  {\n\n    // Send POST request to platform\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n    publishDir 'results/assembly/megahit_{{ pid }}/', pattern: '*_megahit*.fasta', mode: 'copy'\n\n    input:\n    set sample_id, file(fastq_pair), max_len from megahit_in.join(megahit).map{ ot -> [ot[0], ot[1]] }.join(SIDE_max_len_megahit)\n    val kmers from IN_megahit_kmers_{{ pid }}\n    val clear from checkpointClearSpades_{{ pid }}\n\n    output:\n    set sample_id, file('*megahit*.fasta') into megahit_assembly\n    {% with task_name=\"va_megahit\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"megahit.py\"\n\n}\n\n\ngood_assembly.mix(megahit_assembly).into{ to_report_{{ pid }} ; {{ output_channel }} }\norf_size = Channel.value(params.minimumContigSize{{ param_id }})\n\n\nprocess report_viral_assembly_{{ pid }} {\n\n    {% include \"post.txt\" ignore missing %}\n\n    tag { sample_id }\n\n    input:\n    set sample_id, file(assembly) from to_report_{{ pid }}\n    val min_size from orf_size\n\n    output:\n    {% with task_name=\"report_viral_assembly\" %}\n    {%- include \"compiler_channels.txt\" ignore missing -%}\n    {% endwith %}\n\n    script:\n    template \"process_viral_assembly.py\"\n\n}\n\n\n{{ forks }}"
  },
  {
    "path": "flowcraft/generator/utils.py",
    "content": "import re\n\ntry:\n    import generator.error_handling as eh\nexcept ImportError:\n    import flowcraft.generator.error_handling as eh\n\n\ndef get_nextflow_filepath(log_file):\n    \"\"\"Gets the nextflow file path from the nextflow log file. It searches for\n    the nextflow run command throughout the file.\n\n    Parameters\n    ----------\n    log_file : str\n        Path for the .nextflow.log file\n\n    Returns\n    -------\n    str\n        Path for the nextflow file\n    \"\"\"\n\n    with open(log_file) as fh:\n        # Searches for the first occurence of the nextflow pipeline\n        # file name in the .nextflow.log file\n        while 1:\n            line = fh.readline()\n            if not line:\n                # file is empty\n                raise eh.LogError(\"Nextflow command path could not be found - Is \"\n                                 \".nextflow.log empty?\")\n            try:\n                # Regex supports absolute paths and relative paths\n                pipeline_path = re.match(\".*\\s(.*.nf).*\", line) \\\n                    .group(1)\n                return pipeline_path\n            except AttributeError:\n                continue\n"
  },
  {
    "path": "flowcraft/lib/CheckParams.groovy",
    "content": "class Params {\n\n    static void check(Map params) {\n\n        // Checks genomeSize for type\n        try {\n            params.genomeSize as Double\n        } catch (e) {\n            print_error(\"The genomeSize option must be a number\")\n        }\n\n        // Checks minCoverage for type\n        try {\n            params.minCoverage as Double\n        } catch (e) {\n            print_error(\"the minCoverage option must be a number\")\n        }\n\n        // Check if fastqc adapters file exists\n        if (!params.adapters.equalsIgnoreCase(\"none\")) {\n            File f = new File(params.adapters)\n            if (!f.exists()) {\n                print_error(\"The provided adapters file does \" +\n                            \"not exist ($params.adapters)\")\n            }\n        }\n\n        // Check for trimmomatic parameters\n        try {\n            params.trimLeading as Double\n            params.trimTrailing as Double\n            params.trimMinLength as Double\n        } catch (e) {\n            print_error(\"The trimLeading ($params.trimLeading), \" +\n                        \"trimTrailing ($params.trimTrailing) and \" +\n                        \"trimMinLength ($params.trimMinLength) \" +\n                        \"options must be numbers\")\n        }\n\n        // Check for Spades parameters\n        [\n            \"spadesMincoverage\": params.spadesMinCoverage,\n            \"spadesMinKmerCoverage\": params.spadesMinKmerCoverage,\n            \"spadesMinContigLen\": params.spadesMinContigLen,\n            \"spadesMaxContigs\": params.spadesMaxContigs\n        ].each { k, v ->\n            try {\n                v as Integer\n            } catch (e) {\n                print_error(\"The spades parameter $k ($v) must be an integer\")\n            }\n         }\n\n    }\n\n    static def print_error(String msg) {\n\n        println \"\\nERROR: $msg\"\n        System.exit(1)\n\n    }\n\n}"
  },
  {
    "path": "flowcraft/profiles.config",
    "content": "// Compilation of commonly used profile combinations of executor and container\n// engine\nprofiles {\n\n    standard {\n        singularity.enabled = true\n    }\n\n    docker {\n        docker.enabled = true\n    }\n\n    // SLURM executor\n    slurm_sing {\n        singularity.enabled = true\n        process.executor = \"slurm\"\n    }\n\n    slurm_docker {\n        docker.enabled = true\n        process.executor = \"slurm\"\n    }\n\n    slurm_shifter {\n        shifter.enabled = true\n        process.executor = \"slurm\"\n    }\n    \n    // SGE executor\n    sge_sing {\n        singularity.enabled = true\n        process.executor = \"sge\"\n    }\n\n    sge_docker {\n        docker.enabled = true\n        process.executor = \"sge\"\n    }\n\n    sge_shifter {\n        shifter.enabled = true\n        process.executor = \"sge\"\n    }\n\n    // LSF executor\n    lsf_sing {\n        singularity.enabled = true\n        process.executor = \"lsf\"\n    }\n\n    lsf_docker {\n        docker.enabled = true\n        process.executor = \"lsf\"\n    }\n\n    lsf_shifter {\n        shifter.enabled = true\n        process.executor = \"lsf\"\n    }\n\n    // PBS executor\n    pbs_sing {\n        singularity.enabled = true\n        process.executor = \"pbs\"\n    }\n\n    pbs_docker {\n        docker.enabled = true\n        process.executor = \"pbs\"\n    }\n\n    pbs_shifter {\n        shifter.enabled = true\n        process.executor = \"pbs\"\n    }\n    \n    // NQSII executor\n    nqsii_sing {\n        singularity.enabled = true\n        process.executor = \"nqsii\"\n    }\n\n    nqsii_docker {\n        docker.enabled = true\n        process.executor = \"nqsii\"\n    }\n\n    nqsii_shifter {\n        shifter.enabled = true\n        process.executor = \"nqsii\"\n    }\n    \n    // HTCondor executor\n    condor_sing {\n        singularity.enabled = true\n        process.executor = \"condor\"\n    }\n\n    condor_docker {\n        docker.enabled = true\n        process.executor = \"condor\"\n    }\n\n    condor_shifter {\n        shifter.enabled = true\n        process.executor = \"condor\"\n    }\n\n}"
  },
  {
    "path": "flowcraft/templates/README.md",
    "content": "# Templates\n\nA bunch of templates for processing HTS data. Particularly\nuseful for using with nextflow pipelines.\n\n## Quick reference\n\n* process_assembly_mapping.py - Processes the coverage report and checks\nassembly filters from the `assembly_mapping` process. [[changelog](https://github.com/ODiogoSilva/templates/wiki/process_assembly_mapping-changelog), [API](http://assemblerflow.readthedocs.io/en/doc_galore/assemblerflow.templates.process_assembly_mapping.html)]\n\n* mapping2json.py - exports results from a samtool depth file to a json\nfile that contains a `key:value` such as `accession number:coverage` .\n\n* mashdist2json.py - exports results from `mash dist` to a json file\nthat contains a `key:value` such as `accession number:distance` .\n\n* mashscreen2json.py - exports results from `mash screen` to a json\nfile that contains a `key:[values]` such as `accession number:[copy number, identity]` .\n\n## How to use as a submodule\n\n### Add templates to your project\n\n```\ngit submodule add https://github.com/ODiogoSilva/templates.git templates\n```\n\n### Update templates on your project\n\n```\ngit submodule foreach git pull origin master\n```"
  },
  {
    "path": "flowcraft/templates/__init__.py",
    "content": "\"\"\"\nPlaceholder for template generation docs\n\"\"\""
  },
  {
    "path": "flowcraft/templates/assembly_report.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to provide a summary report for a given assembly\nin Fasta format.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``assembly`` : Path to assembly file in Fasta format.\n    - e.g.: ``'assembly.fasta'``\n\nGenerated output\n----------------\n\n- ``${sample_id}_assembly_report.csv`` : CSV with summary information of the \\\n    assembly.\n    - e.g.: ``'SampleA_assembly_report.csv'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"16012018\"\n__template__ = \"assembly_report-nf\"\n\nimport os\nimport re\nimport json\nimport traceback\nimport subprocess\n\nfrom collections import OrderedDict\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_pilon():\n\n    pilon_path = \"/NGStools/pilon-1.22.jar\"\n\n    try:\n\n        cli = [\"java\", \"-jar\", pilon_path, \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.split()[2].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"Pilon\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY_FILE = '$assembly'\n    COVERAGE_BP_FILE = '$coverage_bp'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY_FILE: {}\".format(ASSEMBLY_FILE))\n    logger.debug(\"COVERAGE_BP_FILE: {}\".format(COVERAGE_BP_FILE))\n\n\nclass Assembly:\n    \"\"\"Class that parses and filters an assembly file in Fasta format.\n\n    This class parses an assembly file, collects a number\n    of summary statistics and metadata from the contigs and reports.\n\n    Parameters\n    ----------\n    assembly_file : str\n        Path to assembly file.\n    sample_id : str\n        Name of the sample for the current assembly.\n    \"\"\"\n\n    def __init__(self, assembly_file, sample_id):\n\n        self.summary_info = OrderedDict([\n            (\"ncontigs\", 0),\n            (\"avg_contig_size\", []),\n            (\"n50\", 0),\n            (\"total_len\", 0),\n            (\"avg_gc\", []),\n            (\"missing_data\", 0)\n        ])\n        \"\"\"\n        OrderedDict: Initialize summary information dictionary. Contains keys:\n\n            - ``ncontigs``: Number of contigs\n            - ``avg_contig_size``: Average size of contigs\n            - ``n50``: N50 metric\n            - ``total_len``: Total assembly length\n            - ``avg_gc``: Average GC proportion\n            - ``missing_data``: Count of missing data characters\n        \"\"\"\n\n        self.contigs = OrderedDict()\n        \"\"\"\n        OrderedDict: Object that maps the contig headers to the corresponding\n        sequence\n        \"\"\"\n\n        self.contig_coverage = OrderedDict()\n        \"\"\"\n        OrderedDict: Object that maps the contig headers to the corresponding\n        list of per-base coverage\n        \"\"\"\n\n        self.sample = sample_id\n        \"\"\"\n        str: Sample id\n        \"\"\"\n\n        self.contig_boundaries = {}\n        \"\"\"\n        dict: Maps the boundaries of each contig in the genome\n        \"\"\"\n\n        self._parse_assembly(assembly_file)\n\n    def _parse_assembly(self, assembly_file):\n        \"\"\"Parse an assembly file in fasta format.\n\n        This is a Fasta parsing method that populates the\n        :py:attr:`Assembly.contigs` attribute with data for each contig in the\n         assembly.\n\n        Parameters\n        ----------\n        assembly_file : str\n            Path to the assembly fasta file.\n\n        \"\"\"\n\n        with open(assembly_file) as fh:\n\n            header = None\n            logger.debug(\"Starting iteration of assembly file: {}\".format(\n                assembly_file))\n\n            for line in fh:\n\n                # Skip empty lines\n                if not line.strip():\n                    continue\n\n                if line.startswith(\">\"):\n                    # Add contig header to contig dictionary\n                    header = line[1:].strip()\n                    self.contigs[header] = []\n\n                else:\n                    # Add sequence string for the current contig\n                    self.contigs[header].append(line.strip())\n\n            # After populating the contigs dictionary, convert the values\n            # list into a string sequence\n            self.contigs = OrderedDict(\n                (header, \"\".join(seq)) for header, seq in self.contigs.items())\n\n    @staticmethod\n    def _get_contig_id(contig_str):\n        \"\"\"Tries to retrieve contig id. Returns the original string if it\n        is unable to retrieve the id.\n\n        Parameters\n        ----------\n        contig_str : str\n            Full contig string (fasta header)\n\n        Returns\n        -------\n        str\n            Contig id\n        \"\"\"\n\n        contig_id = contig_str\n\n        try:\n            contig_id = re.search(\".*NODE_([0-9]*)_.*\", contig_str).group(1)\n        except AttributeError:\n            pass\n\n        try:\n            contig_id = re.search(\".*Contig_([0-9]*)_.*\", contig_str).group(1)\n        except AttributeError:\n            pass\n\n        return contig_id\n\n    def get_summary_stats(self, output_csv=None):\n        \"\"\"Generates a CSV report with summary statistics about the assembly\n\n        The calculated statistics are:\n\n            - Number of contigs\n            - Average contig size\n            - N50\n            - Total assembly length\n            - Average GC content\n            - Amount of missing data\n\n        Parameters\n        ----------\n        output_csv: str\n            Name of the output CSV file.\n        \"\"\"\n\n        contig_size_list = []\n\n        self.summary_info[\"ncontigs\"] = len(self.contigs)\n\n        for contig_id, sequence in self.contigs.items():\n\n            logger.debug(\"Processing contig: {}\".format(contig_id))\n\n            # Get contig sequence size\n            contig_len = len(sequence)\n\n            # Add size for average contig size\n            contig_size_list.append(contig_len)\n\n            # Add to total assembly length\n            self.summary_info[\"total_len\"] += contig_len\n\n            # Add to average gc\n            self.summary_info[\"avg_gc\"].append(\n                sum(map(sequence.count, [\"G\", \"C\"])) / contig_len\n            )\n\n            # Add to missing data\n            self.summary_info[\"missing_data\"] += sequence.count(\"N\")\n\n        # Get average contig size\n        logger.debug(\"Getting average contig size\")\n        self.summary_info[\"avg_contig_size\"] = \\\n            sum(contig_size_list) / len(contig_size_list)\n\n        # Get average gc content\n        logger.debug(\"Getting average GC content\")\n        self.summary_info[\"avg_gc\"] = \\\n            sum(self.summary_info[\"avg_gc\"]) / len(self.summary_info[\"avg_gc\"])\n\n        # Get N50\n        logger.debug(\"Getting N50\")\n        cum_size = 0\n        for l in sorted(contig_size_list, reverse=True):\n            cum_size += l\n            if cum_size >= self.summary_info[\"total_len\"] / 2:\n                self.summary_info[\"n50\"] = l\n                break\n\n        if output_csv:\n            logger.debug(\"Writing report to csv\")\n            # Write summary info to CSV\n            with open(output_csv, \"w\") as fh:\n                summary_line = \"{}, {}\\\\n\".format(\n                    self.sample, \",\".join(\n                        [str(x) for x in self.summary_info.values()]))\n                fh.write(summary_line)\n\n    def _get_window_labels(self, window):\n        \"\"\"Returns the mapping between sliding window points and their contigs,\n        and the x-axis position of contig\n\n        Parameters\n        ----------\n        window : int\n            Size of the window.\n\n        Returns\n        -------\n        xbars : list\n            The x-axis position of the ending for each contig.\n        labels : list\n            The x-axis labels for each data point in the sliding window\n\n        \"\"\"\n\n        # Get summary stats, if they have not yet been triggered\n        if not self.summary_info:\n            self.get_summary_stats()\n\n        # Get contig boundary positon\n        c = 0\n        xbars = []\n        for contig, seq in self.contigs.items():\n            contig_id = self._get_contig_id(contig)\n            self.contig_boundaries[contig_id] = [c, c + len(seq)]\n            c += len(seq)\n            xbars.append((contig_id, c, contig))\n\n        return xbars\n\n    @staticmethod\n    def _gc_prop(s, length):\n        \"\"\"Get proportion of GC from a string\n\n        Parameters\n        ----------\n        s : str\n            Arbitrary string\n\n        Returns\n        -------\n        x : float\n            GC proportion.\n        \"\"\"\n\n        gc = sum(map(s.count, [\"c\", \"g\"]))\n\n        return gc / length\n\n    def get_gc_sliding(self, window=2000):\n        \"\"\"Calculates a sliding window of the GC content for the assembly\n\n\n        Returns\n        -------\n        gc_res : list\n            List of GC proportion floats for each data point in the sliding\n            window\n        \"\"\"\n\n        gc_res = []\n\n        # Get complete sequence to calculate sliding window values\n        complete_seq = \"\".join(self.contigs.values()).lower()\n\n        for i in range(0, len(complete_seq), window):\n\n            seq_window = complete_seq[i:i + window]\n\n            # Get GC proportion\n            gc_res.append(round(self._gc_prop(seq_window, len(seq_window)), 2))\n\n        return gc_res\n\n    def _get_coverage_from_file(self, coverage_file):\n        \"\"\"\n\n        Parameters\n        ----------\n        coverage_file\n\n        Returns\n        -------\n\n        \"\"\"\n\n        with open(coverage_file) as fh:\n\n            for line in fh:\n\n                fields = line.strip().split()\n\n                # Get header\n                header = fields[0]\n                coverage = int(fields[2])\n\n                if header not in self.contig_coverage:\n                    self.contig_coverage[header] = [coverage]\n                else:\n                    self.contig_coverage[header].append(coverage)\n\n    def get_coverage_sliding(self, coverage_file, window=2000):\n        \"\"\"\n\n        Parameters\n        ----------\n        coverage_file : str\n            Path to file containing the coverage info at the per-base level\n            (as generated by samtools depth)\n        window : int\n            Size of sliding window\n\n        Returns\n        -------\n\n        \"\"\"\n\n        if not self.contig_coverage:\n            self._get_coverage_from_file(coverage_file)\n\n        # Stores the coverage results\n        cov_res = []\n\n        # Make flat list of coverage values across genome\n        complete_cov = [x for y in self.contig_coverage.values() for x in y]\n\n        for i in range(0, len(complete_cov), window):\n            # Get coverage values for current window\n            cov_window = complete_cov[i:i + window]\n            # Get mean coverage\n            cov_res.append(int(sum(cov_window) / len(cov_window)))\n\n        return cov_res\n\n\n@MainWrapper\ndef main(sample_id, assembly_file, coverage_bp_file=None):\n    \"\"\"Main executor of the assembly_report template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly_file : str\n        Path to assembly file in Fasta format.\n\n    \"\"\"\n\n    logger.info(\"Starting assembly report\")\n    assembly_obj = Assembly(assembly_file, sample_id)\n\n    logger.info(\"Retrieving summary statistics for assembly\")\n    assembly_obj.get_summary_stats(\"{}_assembly_report.csv\".format(sample_id))\n\n    size_dist = [len(x) for x in assembly_obj.contigs.values()]\n    json_dic = {\n        \"tableRow\": [{\n            \"sample\": sample_id,\n            \"data\": [\n                {\"header\": \"Contigs\",\n                 \"value\": assembly_obj.summary_info[\"ncontigs\"],\n                 \"table\": \"assembly\",\n                 \"columnBar\": True},\n                {\"header\": \"Assembled BP\",\n                 \"value\": assembly_obj.summary_info[\"total_len\"],\n                 \"table\": \"assembly\",\n                 \"columnBar\": True},\n            ]\n        }],\n        \"plotData\": [{\n            \"sample\": sample_id,\n            \"data\": {\n                \"size_dist\": size_dist\n            }\n        }]\n    }\n\n    if coverage_bp_file:\n        try:\n            window = 2000\n            gc_sliding_data = assembly_obj.get_gc_sliding(window=window)\n            cov_sliding_data = \\\n                assembly_obj.get_coverage_sliding(coverage_bp_file,\n                                                  window=window)\n\n            # Get total basepairs based on the individual coverage of each\n            # contig bpx\n            total_bp = sum(\n                [sum(x) for x in assembly_obj.contig_coverage.values()]\n            )\n\n            # Add data to json report\n            json_dic[\"plotData\"][0][\"data\"][\"genomeSliding\"] = {\n                \"gcData\": gc_sliding_data,\n                \"covData\": cov_sliding_data,\n                \"window\": window,\n                \"xbars\": assembly_obj._get_window_labels(window),\n                \"assemblyFile\": os.path.basename(assembly_file)\n            }\n            json_dic[\"plotData\"][0][\"data\"][\"sparkline\"] = total_bp\n\n        except:\n            logger.error(\"Unexpected error creating sliding window data:\\\\n\"\n                         \"{}\".format(traceback.format_exc()))\n\n    # Write json report\n    with open(\".report.json\", \"w\") as json_report:\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY_FILE, COVERAGE_BP_FILE)\n\n"
  },
  {
    "path": "flowcraft/templates/compile_reports.py",
    "content": "#!/usr/bin/python3\nimport os\nimport sys\nimport json\nimport zipfile\nimport logging\n\nREPORTS = \"${report}\".split()\nFORKS = \"${forks}\"\nDAG = \"${dag}\"\nMAIN_JS = \"${js}\"\n\n\nhtml_template = \"\"\"\n<!DOCTYPE html>\n<html>\n<head>\n  <meta charset=\"utf-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n  <link href=\"https://fonts.googleapis.com/icon?family=Material+Icons\" rel=\"stylesheet\">\n  <title>FlowCraft App</title>\n</head>\n<body style=\"background-color: #f2f2f2\">\n    <div id=\"app\"><!-- React --></div>\n</body>\n<script> const _fileReportData = {} </script>\n<script src=\"./src/main.js\"></script>\n</html>\n\"\"\"\n\n\ndef main(reports, forks, dag, main_js):\n\n    metadata = {\n        \"nfMetadata\": {\n            \"scriptId\": \"${workflow.scriptId}\",\n            \"scriptName\": \"${workflow.scriptName}\",\n            \"profile\": \"${workflow.profile}\",\n            \"container\": \"${workflow.container}\",\n            \"containerEngine\": \"${workflow.containerEngine}\",\n            \"commandLine\": \"${workflow.commandLine}\",\n            \"runName\": \"${workflow.runName}\",\n            \"sessionId\": \"${workflow.sessionId}\",\n            \"projectDir\": \"${workflow.projectDir}\",\n            \"launchDir\": \"${workflow.launchDir}\",\n            \"startTime\": \"${workflow.start}\"\n        }\n    }\n\n    # Add nextflow metadata\n    storage = []\n\n    # Add forks dictionary\n    try:\n        with open(forks) as fh:\n            forks = json.load(fh)\n            metadata[\"nfMetadata\"][\"forks\"] = forks\n    except json.JSONDecodeError:\n        logging.warning(\"Could not parse versions JSON: {}\".format(\n            dag))\n\n    # Add tree DAG in JSON format\n    try:\n        with open(dag) as fh:\n            dag = json.load(fh)\n            metadata[\"nfMetadata\"][\"dag\"] = dag\n    except json.JSONDecodeError:\n        logging.warning(\"Could not parse versions JSON: {}\".format(\n            dag))\n\n    storage.append(metadata)\n    # Write metadata information to dotfile. This dotfile is then sent to the\n    # ReportHTTP, when available in the afterScript process directive.\n    with open(\".metadata.json\", \"w\") as fh:\n        fh.write(json.dumps(metadata, separators=(\",\", \":\")))\n\n    for r in reports:\n        with open(r) as fh:\n            rjson = json.load(fh)\n            storage.append(rjson)\n            print(\"{}: {}\".format(rjson[\"processName\"],\n                                  sys.getsizeof(json.dumps(rjson))))\n\n    with open(\"pipeline_report.html\", \"w\") as html_fh:\n        html_fh.write(html_template.format(\n            json.dumps({\"data\": {\"results\": storage}}, separators=(\",\", \":\"))))\n\n    with zipfile.ZipFile(main_js) as zf:\n        os.mkdir(\"src\")\n        zf.extractall(\"./src\")\n\n    with open(\"pipeline_report.json\", \"w\") as rep_fh:\n        rep_fh.write(json.dumps({\"data\": {\"results\": storage}},\n                                separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n    main(REPORTS, FORKS, DAG, MAIN_JS)\n"
  },
  {
    "path": "flowcraft/templates/dengue_typing_assembly.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module intends to type DENV genome assembly with seqTyping (BLAST mode)\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fasta`` : A fasta file path.\n    - e.g.: ``'SampleA.fasta'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n\nGenerated output\n----------------\n\n-  The sample fasta file path or, if a complete ORF isn't obtained, a consesus sequence\n-  The closest reference fasta file path\n\"\"\"\n\n__version__ = \"0.0.2\"\n__build__ = \"01022019\"\n__template__ = \"dengue_typing-nf\"\n\nimport json\nimport os\nimport sys\nimport subprocess\nfrom subprocess import PIPE\nfrom itertools import groupby\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY = '$assembly'\n    REFERENCE = '$reference'\n    RESULT = '$get_reference'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY: {}\".format(ASSEMBLY))\n    logger.debug(\"REFERENCE: {}\".format(REFERENCE))\n    logger.debug(\"RESULT: {}\".format(RESULT))\n\n\ndef __get_version_seq_typing():\n    \"\"\"\n    Gets Seq_typing software version\n    Returns\n    -------\n    version : str\n        Seqtyping version\"\"\"\n\n    try:\n        cli = [\"seq_typing.py\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout = p.communicate()[0]\n\n        version = stdout.splitlines()[0].split()[-1].decode(\"utf8\")\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return version\n\ndef replace_char(text):\n    \"\"\"\n    Cleans the string from problematic chars\n\n    Parameters\n    ----------\n    text : str\n        String to clean\"\"\"\n\n    for ch in ['/', '`', '*', '{', '}', '[', ']', '(', ')', '#', '+', '-', '.', '!', '\\$', ':', '|']:\n        text = text.replace(ch, \"_\")\n    return text\n\n\ndef getSequence(ref, fasta):\n    \"\"\"\n     Gets the fasta sequence from the Database with the header \"ref\"\n\n     Parameters\n     ----------\n     ref : str\n         Reference whose sequence needs to be fetched\n     fasta: str\n        Path to the multifasta\"\"\"\n\n    fasta_header = \"\"\n\n    fh_fasta = open(fasta, \"r\")\n    entry = (x[1] for x in groupby(fh_fasta, lambda line: line[0] == \">\"))\n\n    for header in entry:\n        headerStr = header.__next__()[1:].strip()\n\n        seq = \"\".join(s.strip() for s in entry.__next__())\n\n        if ref == headerStr.replace('>',''):\n            filename = os.path.join(os.getcwd(), ref.replace('/','_').split('|')[0])\n            fasta_header = replace_char(headerStr)\n\n            with open(filename + '.fa', \"w\") as output_file:\n                output_file.write(\">\" + fasta_header + \"\\\\n\" + seq.upper() + \"\\\\n\")\n\n    fh_fasta.close()\n    return fasta_header\n\n\ndef get_reference_header(file):\n    \"\"\"\n    Gets the header for the closest reference from the seqtyping report\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    with open(file, \"r\") as typing_report:\n        lines = typing_report.readlines()\n    return lines[1].split('\\\\t')[3]\n\n\ndef getType(file):\n    \"\"\"\n    Gets the typing result from the seqtyping report\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    with open(file, \"r\") as result:\n        return result.readline().strip()\n\n\ndef getScore(file):\n    \"\"\"\n    Method to write QC warnings based on the mapping statistics\n    (sequence covered and identity)\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    with open(file, \"r\") as typing_report:\n        lines = typing_report.readlines()\n\n        sequence_covered = float(lines[1].split(\"\\\\t\")[4])\n        sequence_identity = float(lines[1].split(\"\\\\t\")[6])\n\n        if sequence_covered < 70:\n            logger.fail(\"Sequence coverage below 70% on the best hit.\")\n            with open(\".fails\", \"w\") as fails:\n                fails.write(\"Sequence coverage below 70% on the best hit.\")\n\n        elif 90 > sequence_covered < 70:\n            logger.warning(\"Sequence coverage lower than 90% on the best hit.\")\n            with open(\".warnings\", \"w\") as fails:\n                fails.write(\"Sequence coverage below 70% on the best hit.\")\n\n        return sequence_identity, sequence_covered\n\n@MainWrapper\ndef main(sample_id, assembly, reference, result):\n    \"\"\"Main executor of the dengue_typing template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly : str\n        Assembly file.\n    fastq_pair: list\n        FastQ files\n    result: str\n        String stating is the reference genome is to be recovered\"\"\"\n\n    json_report = {}\n\n    st_version = __get_version_seq_typing()\n\n    cli = [\"seq_typing.py\",\n           \"assembly\",\n           \"-b\", os.path.join(os.getcwd(), reference),\n           \"-j\", \"${task.cpus}\",\n           \"-f\", assembly,\n           \"-t\", \"nucl\"]\n\n    logger.info(\"Runnig seq_typing subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished seq_typing index subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(\n        stdout))\n    logger.info(\"Fished seq_typing index subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(\n        stderr))\n    logger.info(\"Finished seq_typing index with return code: {}\".format(\n        p.returncode))\n\n    if p.returncode == 0:\n\n        typing_result = getType(\"seq_typing.report.txt\")\n\n        logger.info(\"Type found: {}\".format(typing_result))\n\n        # write appropriate QC dot files based on blast statistics\n        identity = 0\n        coverage = 0\n\n        if typing_result != \"NT\":\n            # write appropriate QC dot files based on blast statistics\n            identity, coverage = getScore(\"seq_typing.report_types.tab\")\n\n            best_reference = get_reference_header(\"seq_typing.report_types.tab\")\n\n            reference_name = getSequence(best_reference, os.path.join(os.getcwd(), reference))\n\n        else:\n            logger.info(\"No typing information was obtained.\")\n\n        if result == \"true\":\n\n            json_report = {'tableRow': [{\n                'sample': sample_id,\n                'data': [\n                    {'header': 'seqtyping',\n                     'value': typing_result,\n                     'table': 'typing'},\n                    {'header': 'Identity',\n                     'value': round(identity,2),\n                     'table': 'typing'},\n                    {'header': 'Coverage',\n                     'value': round(coverage,2),\n                     'table': 'typing'},\n                    {'header': 'Reference',\n                     'value': reference_name.replace(\"gb_\", \"gb:\").split(\"_\")[0],\n                     'table': 'typing'}\n                ]}],\n                'metadata': [\n                    {'sample': sample_id,\n                     'treeData': typing_result,\n                     'column': 'typing'},\n                    {'sample': reference_name,\n                     'treeData': typing_result,\n                     'column': 'typing'}]}\n\n        else:\n            json_report = {'tableRow': [{\n                'sample': sample_id,\n                'data': [\n                    {'header': 'seqtyping',\n                     'value': typing_result,\n                     'table': 'typing'},\n                    {'header': 'Identity',\n                     'value': round(identity),\n                     'table': 'typing'},\n                    {'header': 'Coverage',\n                     'value': round(coverage),\n                     'table': 'typing'},\n                    {'header': 'Reference',\n                     'value': reference_name.replace(\"gb_\", \"gb:\").split(\"_\")[1],\n                     'table': 'typing'}\n                ]}],\n                'metadata': [\n                    {'sample': sample_id,\n                     'treeData': typing_result,\n                     'column': 'typing'}]}\n\n    else:\n        logger.error(\"Failed to run seq_typing for Dengue Virus.\")\n        with open(\".status\", \"w\") as status:\n            status.write(\"fail\")\n        sys.exit(1)\n\n    # Add information to dotfiles\n    with open(\".report.json\", \"w\") as report, \\\n            open(\".status\", \"w\") as status, \\\n            open(\".version\", \"w\") as version:\n        report.write(json.dumps(json_report, separators=(\",\", \":\")))\n        status.write(\"pass\")\n        version.write(st_version)\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY, REFERENCE, RESULT)\n"
  },
  {
    "path": "flowcraft/templates/dengue_typing_reads.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module intends to type DENV genome assembly with seqTyping\n(mapping mode)\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fasta`` : A fasta file path.\n    - e.g.: ``'SampleA.fasta'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n\nGenerated output\n----------------\n\n-  The sample fasta file path or, if a complete ORF isn't obtained, a consesus sequence\n-  The closest reference fasta file path\n\"\"\"\n\n__version__ = \"0.0.2\"\n__build__ = \"01022019\"\n__template__ = \"dengue_typing-nf\"\n\nimport glob\nimport json\nimport os\nimport sys\nimport subprocess\nfrom subprocess import PIPE\nfrom itertools import groupby\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY = '$assembly'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    REFERENCE = '$reference'\n    RESULT = '$get_reference'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY: {}\".format(ASSEMBLY))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"REFERENCE: {}\".format(REFERENCE))\n    logger.debug(\"RESULT: {}\".format(RESULT))\n\n\ndef __get_version_seq_typing():\n    \"\"\"\n    Gets Seq_typing software version\n    Returns\n    -------\n    version : str\n        Seqtyping version\"\"\"\n\n    try:\n        cli = [\"seq_typing.py\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout = p.communicate()[0]\n\n        version = stdout.splitlines()[0].split()[-1].decode(\"utf8\")\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return version\n\n\ndef replace_char(text):\n    \"\"\"\n    Cleans the string from problematic chars\n\n    Parameters\n    ----------\n    text : str\n        String to clean\"\"\"\n\n    for ch in ['/', '`', '*', '{', '}', '[', ']', '(', ')', '#', '+', '-', '.', '!', '\\$', ':', '|']:\n        text = text.replace(ch, \"_\")\n    return text\n\n\ndef getSequence(ref, fasta):\n    \"\"\"\n     Gets the fasta sequence from the Database with the header \"ref\"\n\n     Parameters\n     ----------\n     ref : str\n         Reference whose sequence needs to be fetched\n     fasta: str\n        Path to the multifasta\"\"\"\n\n    fasta_header = \"\"\n\n    fh_fasta = open(fasta, \"r\")\n    entry = (x[1] for x in groupby(fh_fasta, lambda line: line[0] == \">\"))\n\n    for header in entry:\n        headerStr = header.__next__()[1:].strip()\n\n        seq = \"\".join(s.strip() for s in entry.__next__())\n\n        if ref == headerStr.replace('>',''):\n            filename = os.path.join(os.getcwd(), ref.replace('/','_').split('|')[0])\n            fasta_header = replace_char(headerStr)\n\n            with open(filename + '.fa', \"w\") as output_file:\n                output_file.write(\">\" + fasta_header + \"\\\\n\" + seq.upper() + \"\\\\n\")\n    fh_fasta.close()\n\n    return fasta_header\n\n\ndef get_reference_header(file):\n    \"\"\"\n    Gets the header for the closest reference from the seqtyping report\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    with open(file, \"r\") as typing_report:\n        lines = typing_report.readlines()\n    return lines[1].split('\\\\t')[3]\n\n\ndef getType(file):\n    \"\"\"\n    Gets the typing result from the seqtyping report\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    with open(file, \"r\") as result:\n        return result.readline().strip()\n\n\ndef getConsesusSequence(best_reference, consensus, sample_id):\n    \"\"\"\n    Gets the consensus sequence for the sample based\n    on the closest reference\n\n    Parameters\n    ----------\n    best_reference: str\n        Closest reference whose consensus is to be retrieved\n    consensus: str\n        Path to the consensus file produced by rematch\n    sample_id: str\n        sample id\"\"\"\n\n    gb_ID = best_reference.split('|')[0].replace(\":\", \"_\")\n    fh_consensus = open(consensus, \"r\")\n\n    entry = (x[1] for x in groupby(fh_consensus, lambda line: line[0] == \">\"))\n\n    for header in entry:\n\n        headerStr = header.__next__()[1:].strip()\n        seq = \"\".join(s.strip() for s in entry.__next__())\n\n        if gb_ID in headerStr:\n            with open(sample_id + '_consensus.fasta', \"w\") as output_file:\n                output_file.write(\">\" + sample_id + \"_consensus_\" +\n                                  replace_char(best_reference.split(\"_\")[0]) + \"\\\\n\" + seq.upper() + \"\\\\n\")\n\n    fh_consensus.close()\n\n\ndef getScore(file):\n    \"\"\"\n    Method to write QC warnings based on the mapping statistics\n    (sequence covered and identity)\n\n    Parameters\n    ----------\n    file: str\n     Path to the seqtyping report\"\"\"\n\n    identity = 0\n    coverage = 0\n\n    with open(file, \"r\") as typing_report:\n        lines = typing_report.readlines()\n\n        sequence_covered = float(lines[1].split(\"\\\\t\")[4])\n        sequence_identity = float(lines[1].split(\"\\\\t\")[6])\n\n        if sequence_covered < 70:\n            logger.fail(\"Sequence coverage below 70% on the best hit.\")\n            with open(\".fails\", \"w\") as fails:\n                fails.write(\"Sequence coverage below 70% on the best hit.\")\n\n        elif 90 > sequence_covered < 70:\n            logger.warning(\"Sequence coverage lower than 90% on the best hit.\")\n            with open(\".warnings\", \"w\") as fails:\n                fails.write(\"Sequence coverage below 70% on the best hit.\")\n\n        return sequence_identity, sequence_covered\n\n\n@MainWrapper\ndef main(sample_id, assembly, fastq_pair, reference, result):\n    \"\"\"Main executor of the dengue_typing template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly : str\n        Assembly file.\n    fastq_pair: list\n        FastQ files\n    reference: str\n        Reference multi-fasta to be mapped against\n    result: str\n        String stating is the reference genome is to be recovered\"\"\"\n\n    json_report = {}\n\n    st_version = __get_version_seq_typing()\n\n    cli = [\"seq_typing.py\",\n           \"reads\",\n           \"-r\", reference,\n           \"-j\", \"${task.cpus}\",\n           \"--debug\",\n           '--bowtieAlgo=\"--very-fast\"',\n           \"--doNotRemoveConsensus\",\n           \"-f\", fastq_pair[0], fastq_pair[1]]\n\n    logger.info(\"Runnig seq_typing subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished seq_typing index subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(\n        stdout))\n    logger.info(\"Fished seq_typing index subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(\n        stderr))\n    logger.info(\"Finished seq_typing index with return code: {}\".format(\n        p.returncode))\n\n    if p.returncode == 0:\n\n        typing_result = getType(\"seq_typing.report.txt\")\n\n        logger.info(\"Type found: {}\".format(typing_result))\n\n        best_reference = get_reference_header(\"seq_typing.report_types.tab\")\n\n        if typing_result != \"NT\":\n            logger.info(\"Getting consensus sequenceq\")\n            getConsesusSequence(best_reference,\n                                glob.glob(\"rematch/*/sample.noMatter.fasta\")[0],\n                                sample_id)\n\n            # check confidence and emmit appropriate warnings\n            identity, coverage = getScore(\"seq_typing.report_types.tab\")\n\n            reference_name = getSequence(best_reference, os.path.join(os.getcwd(), reference))\n\n        else:\n            logger.error(\"Failed to obtain a close reference sequence in read mode. No consensus sequence is obtained.\")\n            with open(\".status\", \"w\") as status:\n                status.write(\"fail\")\n            sys.exit(120)\n\n        if result == \"true\":\n\n            json_report = {'tableRow': [{\n                'sample': sample_id,\n                'data': [\n                    {'header': 'seqtyping',\n                     'value': typing_result,\n                     'table': 'typing'},\n                    {'header': 'Identity',\n                     'value': round(identity, 2),\n                     'table': 'typing'},\n                    {'header': 'Coverage',\n                     'value': round(coverage, 2),\n                     'table': 'typing'},\n                    {'header': 'Reference',\n                     'value': reference_name.replace(\"gb_\", \"gb:\").split(\"_\")[0],\n                     'table': 'typing'}\n                ]}],\n                'metadata': [\n                    {'sample': sample_id,\n                     'treeData': typing_result,\n                     'column': 'typing'},\n                    {'sample': reference_name,\n                     'treeData': typing_result,\n                     'column': 'typing'}]}\n\n        else:\n\n            json_report = {'tableRow': [{\n                'sample': sample_id,\n                'data': [\n                    {'header': 'seqtyping',\n                     'value': typing_result,\n                     'table': 'typing'},\n                    {'header': 'Identity',\n                     'value': round(identity, 2),\n                     'table': 'typing'},\n                    {'header': 'Coverage',\n                     'value': round(coverage, 2),\n                     'table': 'typing'},\n                    {'header': 'Reference',\n                     'value': reference_name.replace(\"gb_\", \"gb:\").split(\"_\")[1],\n                     'table': 'typing'}\n                ]}],\n                'metadata': [\n                    {'sample': sample_id,\n                     'treeData': typing_result,\n                     'column': 'typing'}]}\n\n    else:\n        logger.error(\"Failed to run seq_typing for Dengue Virus.\")\n        with open(\".status\", \"w\") as status:\n            status.write(\"fail\")\n        sys.exit(1)\n\n    # Add information to dotfiles\n    with open(\".report.json\", \"w\") as report, \\\n            open(\".status\", \"w\") as status, \\\n            open(\".version\", \"w\") as version:\n        report.write(json.dumps(json_report, separators=(\",\", \":\")))\n        status.write(\"pass\")\n        version.write(st_version)\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY, FASTQ_PAIR, REFERENCE, RESULT)\n"
  },
  {
    "path": "flowcraft/templates/downsample_fastq.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to sub-sample FastQ files to a certain coverage, based\non the expected genome size.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``gsize`` : *Expected genome size*\n    - e.g.: ``'2.5'``\n- ``depth`` : Maximum depth threshold above which the subsampling will be\n    performed.\n    - e.g.: ``100``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nGenerated output\n----------------\n\n- ``*_ss.fq.gz`` : Subsample fastq reads\n    - e.g.: ``sampleA_ss.fq.gz``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.0\"\n__build__ = \"30072018\"\n__template__ = \"sample_fastq-nf\"\n\nimport os\nimport re\nimport json\nimport subprocess\n\nfrom os.path import basename\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    GSIZE = float('$gsize'.strip())\n    DEPTH = float('$depth'.strip())\n    CLEAR = '$clear'\n    SEED = '$seed'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"GENOME_SIZE: {}\".format(GSIZE))\n    logger.debug(\"DEPTH: {}\".format(DEPTH))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n    logger.debug(\"SEED: {}\".format(SEED))\n\n\ndef __get_version_spades():\n\n    try:\n\n        cli = [\"seqtk\"]\n        p = subprocess.Popen(cli, stdout=subprocess.PIPE,\n                             stderr=subprocess.PIPE)\n        _, stderr = p.communicate()\n\n        _version = stderr.splitlines()[2]\n        try:\n            version = re.match(\n                \"Version: (.*)\", _version.decode(\"utf8\")).group(1)\n        except AttributeError:\n            version = \"undefined\"\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"seqtk\",\n        \"version\": version,\n    }\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, genome_size, depth, clear, seed):\n\n    genome_size = genome_size\n    target_depth = depth\n    p1 = fastq_pair[0]\n    p2 = fastq_pair[1]\n    bn1 = \".\".join(basename(p1).split('.')[:-2])\n    bn2 = \".\".join(basename(p2).split('.')[:-2])\n\n    R1_fqchk = subprocess.Popen(['seqtk', 'fqchk', p1], stdout=subprocess.PIPE)\n    R1_stdout, R1_stderr = R1_fqchk.communicate()\n    B_P1 = int(R1_stdout.splitlines()[2].split()[1])\n    logger.debug(\"Bases p1: {}\".format(B_P1))\n\n    R2_fqchk = subprocess.Popen(['seqtk', 'fqchk', p2], stdout=subprocess.PIPE)\n    R2_stdout, R2_stderr = R2_fqchk.communicate()\n    B_P2= int(R2_stdout.splitlines()[2].split()[1])\n    logger.debug(\"Bases p2: {}\".format(B_P2))\n\n    estimated_coverage = (B_P1 + B_P2) / (genome_size * 1E6)\n    logger.debug(\"Estimated coverage: {}\".format(estimated_coverage))\n    ratio = target_depth/estimated_coverage\n    logger.debug(\"Estimated ration: {}\".format(ratio))\n\n    # if seed param is specified then use it, otherwise use the default -s100\n    if seed:\n        # through flowcraft everything should pass through here\n        parsed_seed = \"-s{}\".format(str(seed))\n        logger.info(\"Using seed parameter: {}.\".format(parsed_seed))\n    else:\n        logger.debug(\"Seed parameter not specified. Using default value -s100.\")\n        parsed_seed = \"-s100\"\n\n    if ratio < 1:\n        # print (\"Writing R1.fq.gz\")\n        ps = subprocess.Popen(('seqtk', 'sample', parsed_seed, p1, str(ratio)),\n                              stdout=subprocess.PIPE)\n        with open('{}_ss.fq.gz'.format(bn1), 'w') as outfile:\n            subprocess.Popen(('gzip', '--fast', '-c'),\n                             stdin=ps.stdout, stdout=outfile )\n        ps.wait()\n\n        # print (\"Writing R2.fq.gz\")\n        ps = subprocess.Popen(('seqtk', 'sample', parsed_seed, p2, str(ratio)),\n                              stdout=subprocess.PIPE)\n        with open('{}_ss.fq.gz'.format(bn2), 'w') as outfile:\n            subprocess.Popen(('gzip', '--fast', '-c'),\n                             stdin=ps.stdout, stdout=outfile)\n        ps.wait()\n\n        if clear == \"true\":\n            # Get real path of the symlink\n            for fq in [p1, p2]:\n                rp = os.path.realpath(fq)\n                print(\"removing temporary fastq file path: {}\".format(rp))\n                # remove only when the file is in the work directory\n                if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n                    os.remove(rp)\n\n    else:\n        os.symlink(p1, \"{}._ss.fq.gz\".format(bn1))\n        os.symlink(p2, \"{}._ss.fq.gz\".format(bn2))\n\n    # Record the original estimated coverage\n    with open(\".report.json\", \"w\") as fh:\n        json_dic = {\n            \"tableRow\": [\n                {\n                    \"sample\": sample_id,\n                    \"data\": [{\n                        \"header\": \"Coverage\",\n                        \"value\": round(estimated_coverage, 1),\n                        \"table\": \"qc\"\n                    }]\n                 }\n            ]\n        }\n        fh.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n        main(SAMPLE_ID, FASTQ_PAIR, GSIZE, DEPTH, CLEAR, SEED)\n"
  },
  {
    "path": "flowcraft/templates/fasta_spliter.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to split all fastas in a multifasta file into different\nfasta files.\n\nCode documentation\n------------------\n\n\"\"\"\n\nimport os\nimport sys\n\n\ndef main():\n\n    cwd = os.getcwd()\n    # a var to check if out_handle is started and if so it enables to control\n    # how it should be closed\n    out_handle = False\n    # opens the input file of the process\n    input_file = open(sys.argv[1])\n    # a file with the list of all paths to fasta files that will be used by\n    # fastANI\n    list_files = open(\"files_fastani.txt\", \"w\")\n    # iterates by each entry in the fasta file\n    for line in input_file:\n        if line.startswith(\">\"):\n            if out_handle:\n                out_handle.close()\n            # writes the output to fasta store folder inside cwd, respective\n            # workdir\n            path_to_file = os.path.join(cwd, \"fasta_store\",\n                                        \"_\".join(line.split(\"_\")[0:3])\n                                        .replace(\">\", \"\") + \".fas\")\n            # writes to list of files\n            list_files.write(path_to_file + \"\\n\")\n            out_handle = open(path_to_file, \"w\")\n            out_handle.write(line)\n        else:\n            out_handle.write(line)\n\n    out_handle.close()\n    input_file.close()\n    list_files.close()\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "flowcraft/templates/fastqc.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to run FastQC on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``fastq_pair`` : *Pair of FastQ file paths*\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n\nGenerated output\n----------------\n\nThe generated output are output files that contain an object, usually a string.\n\n- ``pair_{1,2}_data`` : File containing FastQC report at the nucleotide level\\\n    for each pair\n    - e.g.: ``'pair_1_data'`` and ``'pair_2_data'``\n- ``pair_{1,2}_summary``: File containing FastQC report for each category and\\\n    for each pair\n    - e.g.: ``'pair_1_summary'`` and ``'pair_2_summary'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"28032018\"\n__template__ = \"fastqc-nf\"\n\nimport os\nimport subprocess\n\nfrom subprocess import PIPE\nfrom os.path import exists, join\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_fastqc():\n\n    try:\n\n        cli = [\"fastqc\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[1][1:].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"FastQC\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    FASTQ_PAIR = '$fastq_pair'.split()\n    ADAPTER_FILE = '$ad'\n    CPUS = '$task.cpus'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"ADAPTER_FILE: {}\".format(ADAPTER_FILE))\n    logger.debug(\"CPUS: {}\".format(CPUS))\n\n\ndef convert_adatpers(adapter_fasta):\n    \"\"\"Generates an adapter file for FastQC from a fasta file.\n\n    The provided adapters file is assumed to be a simple fasta file with the\n    adapter's name as header and the corresponding sequence::\n\n        >TruSeq_Universal_Adapter\n        AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT\n        >TruSeq_Adapter_Index 1\n        GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG\n\n    Parameters\n    ----------\n    adapter_fasta : str\n        Path to Fasta file with adapter sequences.\n\n    Returns\n    -------\n    adapter_out : str or None\n        The path to the reformatted adapter file. Returns ``None`` if the\n        adapters file does not exist or the path is incorrect.\n    \"\"\"\n\n    adapter_out = \"fastqc_adapters.tab\"\n    logger.debug(\"Setting output adapters file to: {}\".format(adapter_out))\n\n    try:\n\n        with open(adapter_fasta) as fh, \\\n                open(adapter_out, \"w\") as adap_fh:\n\n            for line in fh:\n                if line.startswith(\">\"):\n\n                    head = line[1:].strip()\n                    # Get the next line with the sequence string\n                    sequence = next(fh).strip()\n\n                    adap_fh.write(\"{}\\\\t{}\\\\n\".format(head, sequence))\n\n        logger.info(\"Converted adapters file\")\n\n        return adapter_out\n\n    # If an invalid adapters file is provided, return None.\n    except FileNotFoundError:\n        logger.warning(\"Could not find the provided adapters file: {}\".format(\n            adapter_fasta))\n        return\n\n\n@MainWrapper\ndef main(fastq_pair, adapter_file, cpus):\n    \"\"\" Main executor of the fastq template.\n\n    Parameters\n    ----------\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    adapter_file : str\n        Path to adapters file.\n    cpus : int or str\n        Number of cpu's that will be by FastQC.\n\n    \"\"\"\n\n    logger.info(\"Starting fastqc\")\n\n    # If an adapter file was provided, convert it to FastQC format\n    if os.path.exists(adapter_file):\n        logger.info(\"Adapters file provided: {}\".format(adapter_file))\n        adapters = convert_adatpers(adapter_file)\n    else:\n        logger.info(\"Adapters file '{}' not provided or does not \"\n                    \"exist\".format(adapter_file))\n        adapters = None\n\n    # Setting command line for FastQC\n    cli = [\n        \"fastqc\",\n        \"--extract\",\n        \"--nogroup\",\n        \"--format\",\n        \"fastq\",\n        \"--threads\",\n        str(cpus)\n    ]\n\n    # Add adapters file to command line, if it exists\n    if adapters:\n        cli += [\"--adapters\", \"{}\".format(adapters)]\n\n    # Add FastQ files at the end of command line\n    cli += fastq_pair\n\n    logger.debug(\"Running fastqc subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE, shell=False)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n\n    logger.info(\"Finished fastqc subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished fastqc subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished fastqc with return code: {}\".format(\n        p.returncode))\n\n    logger.info(\"Checking if FastQC output was correctly generated\")\n    # Check if the FastQC output was correctly generated.\n    with open(\".status\", \"w\") as status_fh:\n        for fastq in fastq_pair:\n            fpath = join(fastq.rsplit(\".\", 2)[0] + \"_fastqc\",\n                         \"fastqc_data.txt\")\n            logger.debug(\"Checking path: {}\".format(fpath))\n            # If the FastQC output does not exist, pass the STDERR to\n            # the output status channel and exit\n            if not exists(fpath):\n                logger.warning(\"Path does not exist: {}\".format(fpath))\n                status_fh.write(\"fail\")\n                return\n\n            logger.debug(\"Found path: {}\".format(fpath))\n\n        # If the output directories exist, write 'pass' to the output status\n        # channel\n            status_fh.write(\"pass\")\n\n    logger.info(\"Retrieving relevant FastQC output files\")\n\n    # Both FastQC have been correctly executed. Get the relevant FastQC\n    # output files for the output channel\n    for i, fastq in enumerate(fastq_pair):\n        # Get results for each pair\n        fastqc_dir = fastq.rsplit(\".\", 2)[0] + \"_fastqc\"\n\n        summary_file = join(fastqc_dir, \"summary.txt\")\n        logger.debug(\"Retrieving summary file: {}\".format(summary_file))\n        fastqc_data_file = join(fastqc_dir, \"fastqc_data.txt\")\n        logger.debug(\"Retrieving data file: {}\".format(fastqc_data_file))\n\n        # Rename output files to a file name that is easier to handle in the\n        # output channel\n        os.rename(fastqc_data_file, \"pair_{}_data\".format(i + 1))\n        os.rename(summary_file, \"pair_{}_summary\".format(i + 1))\n\n\nif __name__ == \"__main__\":\n\n    main(FASTQ_PAIR, ADAPTER_FILE, CPUS)\n"
  },
  {
    "path": "flowcraft/templates/fastqc_report.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended parse the results of FastQC for paired end FastQ \\\nsamples. It parses two reports:\n\n    - Categorical report\n    - Nucleotide level report.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample identification string\n    - e.g.: ``'SampleA'``\n\n- ``result_p1`` : Path to both FastQC result files for pair 1\n    - e.g.: ``'SampleA_1_data SampleA_1_summary'``\n\n- ``result_p2`` : Path to both FastQC result files for pair 2\n    - e.g.: ``'SampleA_2_data SampleA_2_summary'``\n\n- ``opts`` : *Specify additional arguments for executing fastqc_report. \\\n    The arguments should be a string of command line arguments,\\\n    The accepted arguments are:*\n    - ``'--ignore-tests'`` : Ignores test results from FastQC categorical\\\n    summary. This is used in the first run of FastQC.\n\nGenerated output\n----------------\n\nThe generated output are output files that contain an object, usually a string.\n\n- ``fastqc_health`` : Stores the health check for the current sample. If it\n    passes all checks, it contains only the string 'pass'. Otherwise, contains\n    the summary categories and their respective results\n    - e.g.: ``'pass'``\n- ``optimal_trim`` : Stores a tuple with the optimal trimming positions for 5'\n    and 3' ends of the reads.\n    - e.g.: ``'15 151'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.2\"\n__build__ = \"12052018\"\n__template__ = \"fastqc_report-nf\"\n\nimport os\nimport json\n\nfrom collections import OrderedDict\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    RESULT_P1 = '$result_p1'.split()\n    RESULT_P2 = '$result_p2'.split()\n    SAMPLE_ID = '$sample_id'\n    OPTS = '$opts'.split()\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"RESULT_P1: {}\".format(RESULT_P1))\n    logger.debug(\"RESULT_P2: {}\".format(RESULT_P2))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"OPTS: {}\".format(OPTS))\n\n\ndef _get_quality_stats(d, start_str, field_start=1, field_end=2):\n    \"\"\"\n\n    Parameters\n    ----------\n    d\n\n    Returns\n    -------\n\n    \"\"\"\n\n    min_parsed = False\n    parse = False\n    report = []\n    start_str = start_str\n    end_str = \">>END_MODULE\"\n\n    with open(d) as fh:\n\n        for line in fh:\n\n            if line.startswith(start_str):\n                next(fh)\n                parse = True\n                status = line.strip().split()[-1]\n\n            # Exit parser when end string is found\n            elif parse and line.startswith(end_str):\n                return report, status\n\n            elif parse:\n\n                fields = line.strip().split()\n\n                # This is triggered when the first value of a line series is\n                # not 1. If the starting point of the series is a number\n                # different from 1, fill the report with 0 until that point\n                if not min_parsed:\n                    if fields[0] != \"1\":\n                        try:\n                            blank_points = int(fields[0]) - 1\n                            report.extend([0] * blank_points)\n                        except ValueError:\n                            pass\n                    min_parsed = True\n\n                report.append(\";\".join([\n                    str(round(float(x), 2)) for x in\n                    fields[field_start: field_end]\n                ]))\n\n\ndef write_json_report(sample_id, data1, data2):\n    \"\"\"Writes the report\n\n    Parameters\n    ----------\n    data1\n    data2\n\n    Returns\n    -------\n\n    \"\"\"\n\n    parser_map = {\n        \"base_sequence_quality\": \">>Per base sequence quality\",\n        \"sequence_quality\": \">>Per sequence quality scores\",\n        \"base_gc_content\": \">>Per sequence GC content\",\n        \"base_n_content\": \">>Per base N content\",\n        \"sequence_length_dist\": \">>Sequence Length Distribution\",\n        \"per_base_sequence_content\": \">>Per base sequence content\"\n    }\n\n    json_dic = {\n        \"plotData\": [{\n            \"sample\": sample_id,\n            \"data\": {\n                \"base_sequence_quality\": {\"status\": None, \"data\": []},\n                \"sequence_quality\": {\"status\": None, \"data\": []},\n                \"base_gc_content\": {\"status\": None, \"data\": []},\n                \"base_n_content\": {\"status\": None, \"data\": []},\n                \"sequence_length_dist\": {\"status\": None, \"data\": []},\n                \"per_base_sequence_content\": {\"status\": None, \"data\": []}\n            }\n        }]\n    }\n\n    for cat, start_str in parser_map.items():\n\n        if cat == \"per_base_sequence_content\":\n            fs = 1\n            fe = 5\n        else:\n            fs = 1\n            fe = 2\n\n        report1, status1 = _get_quality_stats(data1, start_str,\n                                              field_start=fs, field_end=fe)\n        report2, status2 = _get_quality_stats(data2, start_str,\n                                              field_start=fs, field_end=fe)\n\n        status = None\n        for i in [\"fail\", \"warn\", \"pass\"]:\n            if i in [status1, status2]:\n                status = i\n\n        json_dic[\"plotData\"][0][\"data\"][cat][\"data\"] = [report1, report2]\n        json_dic[\"plotData\"][0][\"data\"][cat][\"status\"] = status\n\n    return json_dic\n\n\ndef get_trim_index(biased_list):\n    \"\"\"Returns the trim index from a ``bool`` list\n\n    Provided with a list of ``bool`` elements (``[False, False, True, True]``),\n    this function will assess the index of the list that minimizes the number\n    of True elements (biased positions) at the extremities. To do so,\n    it will iterate over the boolean list and find an index position where\n    there are two consecutive ``False`` elements after a ``True`` element. This\n    will be considered as an optimal trim position. For example, in the\n    following list::\n\n        [True, True, False, True, True, False, False, False, False, ...]\n\n    The optimal trim index will be the 4th position, since it is the first\n    occurrence of a ``True`` element with two False elements after it.\n\n    If the provided ``bool`` list has no ``True`` elements, then the 0 index is\n    returned.\n\n    Parameters\n    ----------\n    biased_list: list\n        List of ``bool`` elements, where ``True`` means a biased site.\n\n    Returns\n    -------\n        x : index position of the biased list for the optimal trim.\n\n    \"\"\"\n\n    # Return index 0 if there are no biased positions\n    if set(biased_list) == {False}:\n        return 0\n\n    if set(biased_list[:5]) == {False}:\n        return 0\n\n    # Iterate over the biased_list array. Keep the iteration going until\n    # we find a biased position with the two following positions unbiased\n    # (e.g.: True, False, False).\n    # When this condition is verified, return the last biased position\n    # index for subsequent trimming.\n    for i, val in enumerate(biased_list):\n        if val and set(biased_list[i+1:i+3]) == {False}:\n            return i + 1\n\n    # If the previous iteration could not find and index to trim, it means\n    # that the whole list is basically biased. Return the length of the\n    # biased_list\n    return len(biased_list)\n\n\ndef trim_range(data_file):\n    \"\"\"Assess the optimal trim range for a given FastQC data file.\n\n    This function will parse a single FastQC data file, namely the\n    *'Per base sequence content'* category. It will retrieve the A/T and G/C\n    content for each nucleotide position in the reads, and check whether the\n    G/C and A/T proportions are between 80% and 120%. If they are, that\n    nucleotide position is marked as biased for future removal.\n\n    Parameters\n    ----------\n    data_file: str\n        Path to FastQC data file.\n\n    Returns\n    -------\n    trim_nt: list\n        List containing the range with the best trimming positions for the\n        corresponding FastQ file. The first element is the 5' end trim index\n        and the second element is the 3' end trim index.\n    \"\"\"\n\n    logger.debug(\"Starting trim range assessment\")\n\n    # Target string for nucleotide bias assessment\n    target_nuc_bias = \">>Per base sequence content\"\n    logger.debug(\"Target string to start nucleotide bias assessment set to \"\n                 \"{}\".format(target_nuc_bias))\n    # This flag will become True when gathering base proportion data\n    # from file.\n    gather = False\n\n    # This variable will store a boolean array on the biased/unbiased\n    # positions. Biased position will be True, while unbiased positions\n    # will be False\n    biased = []\n\n    with open(data_file) as fh:\n\n        for line in fh:\n            # Start assessment of nucleotide bias\n            if line.startswith(target_nuc_bias):\n                # Skip comment line\n                logger.debug(\"Found target string at line: {}\".format(line))\n                next(fh)\n                gather = True\n            # Stop assessment when reaching end of target module\n            elif line.startswith(\">>END_MODULE\") and gather:\n                logger.debug(\"Stopping parsing at line: {}\".format(line))\n                break\n            elif gather:\n                # Get proportions of each nucleotide\n                g, a, t, c = [float(x) for x in line.strip().split()[1:]]\n                # Get 'GC' and 'AT content\n                gc = (g + 0.1) / (c + 0.1)\n                at = (a + 0.1) / (t + 0.1)\n                # Assess bias\n                if 0.8 <= gc <= 1.2 and 0.8 <= at <= 1.2:\n                    biased.append(False)\n                else:\n                    biased.append(True)\n\n    logger.debug(\"Finished bias assessment with result: {}\".format(biased))\n\n    # Split biased list in half to get the 5' and 3' ends\n    biased_5end, biased_3end = biased[:int(len(biased)/2)],\\\n        biased[int(len(biased)/2):][::-1]\n\n    logger.debug(\"Getting optimal trim range from biased list\")\n    trim_nt = [0, 0]\n    # Assess number of nucleotides to clip at 5' end\n    trim_nt[0] = get_trim_index(biased_5end)\n    logger.debug(\"Optimal trim range at 5' end set to: {}\".format(trim_nt[0]))\n    # Assess number of nucleotides to clip at 3' end\n    trim_nt[1] = len(biased) - get_trim_index(biased_3end)\n    logger.debug(\"Optimal trim range at 3' end set to: {}\".format(trim_nt[1]))\n\n    return trim_nt\n\n\ndef get_sample_trim(p1_data, p2_data):\n    \"\"\"Get the optimal read trim range from data files of paired FastQ reads.\n\n    Given the FastQC data report files for paired-end FastQ reads, this\n    function will assess the optimal trim range for the 3' and 5' ends of\n    the paired-end reads. This assessment will be based on the *'Per sequence\n    GC content'*.\n\n    Parameters\n    ----------\n    p1_data: str\n        Path to FastQC data report file from pair 1\n    p2_data: str\n        Path to FastQC data report file from pair 2\n\n    Returns\n    -------\n    optimal_5trim: int\n        Optimal trim index for the 5' end of the reads\n    optima_3trim: int\n        Optimal trim index for the 3' end of the reads\n\n    See Also\n    --------\n    trim_range\n\n    \"\"\"\n\n    sample_ranges = [trim_range(x) for x in [p1_data, p2_data]]\n\n    # Get the optimal trim position for 5' end\n    optimal_5trim = max([x[0] for x in sample_ranges])\n    # Get optimal trim position for 3' end\n    optimal_3trim = min([x[1] for x in sample_ranges])\n\n    return optimal_5trim, optimal_3trim\n\n\ndef get_summary(summary_file):\n    \"\"\"Parses a FastQC summary report file and returns it as a dictionary.\n\n    This function parses a typical FastQC summary report file, retrieving\n    only the information on the first two columns. For instance, a line could\n    be::\n\n        'PASS\tBasic Statistics\tSH10762A_1.fastq.gz'\n\n    This parser will build a dictionary with the string in the second column\n    as a key and the QC result as the value. In this case, the returned\n    ``dict`` would be something like::\n\n        {\"Basic Statistics\": \"PASS\"}\n\n    Parameters\n    ----------\n    summary_file: str\n        Path to FastQC summary report.\n\n    Returns\n    -------\n    summary_info: :py:data:`OrderedDict`\n        Returns the information of the FastQC summary report as an ordered\n        dictionary, with the categories as strings and the QC result as values.\n\n    \"\"\"\n\n    summary_info = OrderedDict()\n    logger.debug(\"Retrieving summary information from file: {}\".format(\n        summary_file))\n\n    with open(summary_file) as fh:\n        for line in fh:\n            # Skip empty lines\n            if not line.strip():\n                continue\n            # Populate summary info\n            fields = [x.strip() for x in line.split(\"\\t\")]\n            summary_info[fields[1]] = fields[0]\n\n    logger.debug(\"Retrieved summary information from file: {}\".format(\n        summary_info))\n\n    return summary_info\n\n\ndef check_summary_health(summary_file, **kwargs):\n    \"\"\"Checks the health of a sample from the FastQC summary file.\n\n    Parses the FastQC summary file and tests whether the sample is good\n    or not. There are four categories that cannot fail, and two that\n    must pass in order for the sample pass this check. If the sample fails\n    the quality checks, a list with the failing categories is also returned.\n\n    Categories that cannot fail::\n\n        fail_sensitive = [\n            \"Per base sequence quality\",\n            \"Overrepresented sequences\",\n            \"Sequence Length Distribution\",\n            \"Per sequence GC content\"\n        ]\n\n    Categories that must pass::\n\n        must_pass = [\n            \"Per base N content\",\n            \"Adapter Content\"\n        ]\n\n    Parameters\n    ----------\n    summary_file: str\n        Path to FastQC summary file.\n\n    Returns\n    -------\n    x : bool\n        Returns ``True`` if the sample passes all tests. ``False`` if not.\n    summary_info : list\n        A list with the FastQC categories that failed the tests. Is empty\n        if the sample passes all tests.\n    \"\"\"\n\n    # Store the summary categories that cannot fail. If they fail, do not\n    # proceed with this sample\n    fail_sensitive = kwargs.get(\"fail_sensitive\", [\n        \"Per base sequence quality\",\n        \"Overrepresented sequences\",\n        \"Sequence Length Distribution\",\n        \"Per sequence GC content\"\n    ])\n    logger.debug(\"Fail sensitive categories: {}\".format(fail_sensitive))\n\n    # Store summary categories that must pass. If they do not, do not proceed\n    # with that sample\n    must_pass = kwargs.get(\"must_pass\", [\n        \"Per base N content\",\n        \"Adapter Content\"\n    ])\n    logger.debug(\"Must pass categories: {}\".format(must_pass))\n\n    warning_fail_sensitive = kwargs.get(\"warning_fail_sensitive\", [\n        \"Per base sequence quality\",\n        \"Overrepresented sequences\",\n\n    ])\n\n    warning_must_pass = kwargs.get(\"warning_must_pass\", [\n        \"Per base sequence content\"\n    ])\n\n    # Get summary dictionary\n    summary_info = get_summary(summary_file)\n\n    # This flag will change to False if one of the tests fails\n    health = True\n    # List of failing categories\n    failed = []\n    # List of warning categories\n    warning = []\n\n    for cat, test in summary_info.items():\n\n        logger.debug(\"Assessing category {} with result {}\".format(cat, test))\n\n        # FAILURES\n        # Check for fail sensitive\n        if cat in fail_sensitive and test == \"FAIL\":\n            health = False\n            failed.append(\"{}:{}\".format(cat, test))\n            logger.error(\"Category {} failed a fail sensitive \"\n                         \"category\".format(cat))\n\n        # Check for must pass\n        if cat in must_pass and test != \"PASS\":\n            health = False\n            failed.append(\"{}:{}\".format(cat, test))\n            logger.error(\"Category {} failed a must pass category\".format(\n                cat))\n\n        # WARNINGS\n        # Check for fail sensitive\n        if cat in warning_fail_sensitive and test == \"FAIL\":\n            warning.append(\"Failed category: {}\".format(cat))\n            logger.warning(\"Category {} flagged at a fail sensitive \"\n                           \"category\".format(cat))\n\n        if cat in warning_must_pass and test != \"PASS\":\n            warning.append(\"Did not pass category: {}\".format(cat))\n            logger.warning(\"Category {} flagged at a must pass \"\n                           \"category\".format(cat))\n\n    # Passed all tests\n    return health, failed, warning\n\n\n@MainWrapper\ndef main(sample_id, result_p1, result_p2, opts):\n    \"\"\"Main executor of the fastqc_report template.\n\n    If the \"--ignore-tests\" option is present in the ``opts`` argument,\n    the health check of the sample will be bypassed, and it will pass the\n    check. This option is used in the first run of FastQC. In the second\n    run (after filtering with trimmomatic) this option is not provided and\n    the samples are submitted to a health check before proceeding in the\n    pipeline.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    result_p1 : list\n        Two element list containing the path to the FastQC report files to\n        the first FastQ pair.\n        The first must be the nucleotide level report and the second the\n        categorical report.\n    result_p2: list\n        Two element list containing the path to the FastQC report files to\n        the second FastQ pair.\n        The first must be the nucleotide level report and the second the\n        categorical report.\n    opts : list\n        List of arbitrary options. See `Expected input`_.\n\n    \"\"\"\n\n    logger.info(\"Starting fastqc report\")\n    json_dic = {}\n\n    with open(\"{}_trim_report\".format(sample_id), \"w\") as trep_fh, \\\n            open(\"optimal_trim\", \"w\") as trim_fh, \\\n            open(\"{}_status_report\".format(sample_id), \"w\") as rep_fh, \\\n            open(\".status\", \"w\") as status_fh, \\\n            open(\".warning\", \"w\") as warn_fh, \\\n            open(\".fail\", \"w\") as fail_fh, \\\n            open(\".report.json\", \"w\") as report_fh:\n\n        # Perform health check according to the FastQC summary report for\n        # each pair. If both pairs pass the check, send the 'pass' information\n        # to the 'fastqc_health' channel. If at least one fails, send the\n        # summary report.\n        if \"--ignore-tests\" not in opts:\n\n            # Get reports for each category in json format\n            json_dic = write_json_report(sample_id, result_p1[0],\n                                         result_p2[0])\n\n            logger.info(\"Performing FastQ health check\")\n            for p, fastqc_summary in enumerate([result_p1[1], result_p2[1]]):\n\n                logger.debug(\"Checking files: {}\".format(fastqc_summary))\n                # Get the boolean health variable and a list of failed\n                # categories, if any\n                health, f_cat, warnings = check_summary_health(fastqc_summary)\n                logger.debug(\"Health checked: {}\".format(health))\n                logger.debug(\"Failed categories: {}\".format(f_cat))\n\n                # Write any warnings\n                if warnings:\n                    json_dic[\"warnings\"] = [{\n                        \"sample\": sample_id,\n                        \"table\": \"qc\",\n                        \"value\": []\n                    }]\n                    for w in warnings:\n                        warn_fh.write(\"{}\\\\n\".format(w))\n                        json_dic[\"warnings\"][0][\"value\"].append(w)\n\n                # Rename category summary file to the channel that will publish\n                # The results\n                output_file = \"{}_{}_summary.txt\".format(sample_id, p)\n                os.rename(fastqc_summary, output_file)\n                logger.debug(\"Setting summary file name to {}\".format(\n                    output_file))\n\n                # If one of the health flags returns False, send the summary\n                # report through the status channel\n                if not health:\n                    fail_msg = \"Sample failed quality control checks:\" \\\n                               \" {}\".format(\",\".join(f_cat))\n                    logger.warning(fail_msg)\n                    fail_fh.write(fail_msg)\n                    json_dic[\"fail\"] = [{\n                        \"sample\": sample_id,\n                        \"table\": \"qc\",\n                        \"value\": [fail_msg]\n                    }]\n                    report_fh.write(\n                        json.dumps(json_dic, separators=(\",\", \":\")))\n                    status_fh.write(\"fail\")\n                    trim_fh.write(\"fail\")\n                    rep_fh.write(\"{}, {}\\\\n\".format(sample_id, \",\".join(f_cat)))\n                    trep_fh.write(\"{},fail,fail\\\\n\".format(sample_id))\n\n                    return\n\n            logger.info(\"Sample passed quality control checks\")\n\n        status_fh.write(\"pass\")\n        rep_fh.write(\"{}, pass\\\\n\".format(sample_id))\n\n        logger.info(\"Assessing optimal trim range for sample\")\n        # Get optimal trimming range for sample, based on the per base sequence\n        # content\n        optimal_trim = get_sample_trim(result_p1[0], result_p2[0])\n        logger.info(\"Optimal trim range set to: {}\".format(optimal_trim))\n        trim_fh.write(\"{}\".format(\" \".join([str(x) for x in optimal_trim])))\n\n        trep_fh.write(\"{},{},{}\\\\n\".format(sample_id, optimal_trim[0],\n                                           optimal_trim[1]))\n\n        # The json dict report is only populated when the FastQC quality\n        # checks are performed, that is, when the --ignore-tests option\n        # is not provide\n        if json_dic:\n            report_fh.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, RESULT_P1, RESULT_P2, OPTS)\n"
  },
  {
    "path": "flowcraft/templates/flowcraft_utils/__init__.py",
    "content": ""
  },
  {
    "path": "flowcraft/templates/flowcraft_utils/flowcraft_base.py",
    "content": "\"\"\"\n\n\"\"\"\n\nimport os\nimport sys\nimport json\nimport logging\nimport traceback\n\nfrom time import gmtime, strftime\n\n\ndef get_logger(filepath, level=logging.DEBUG):\n    # create logger\n    logger = logging.getLogger(os.path.basename(filepath))\n    logger.setLevel(level)\n    # create console handler and set level to debug\n    ch = logging.StreamHandler()\n    ch.setLevel(level)\n    # create formatter\n    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')\n    # add formatter to ch\n    ch.setFormatter(formatter)\n    # add ch to logger\n    logger.addHandler(ch)\n\n    return logger\n\n\ndef log_error():\n    \"\"\"Nextflow specific function that logs an error upon unexpected failing\n    \"\"\"\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"error\")\n\n\nclass MainWrapper:\n\n    def __init__(self, f):\n\n        self.f = f\n        self.context = self.f.__globals__\n        self.logger = self.context.get(\"logger\", None)\n\n    def __call__(self, *args, **kwargs):\n\n        self.logger.debug(\"Starting template at {}\".format(\n            strftime(\"%Y-%m-%d %H:%M:%S\", gmtime())))\n        self.logger.debug(\"Working directory: {}\".format(os.getcwd()))\n\n        try:\n            self.build_versions()\n            self.f(*args, **kwargs)\n        except SystemExit as e:\n            sys.exit(e)\n        except:\n            if self.logger:\n                self.logger.error(\"Module exited unexpectedly with error:\"\n                                  \"\\\\n{}\".format(traceback.format_exc()))\n            log_error()\n\n        self.logger.debug(\"Finished template at {}\".format(\n            strftime(\"%Y-%m-%d %H:%M:%S\", gmtime())))\n\n    def build_versions(self):\n        \"\"\"Writes versions JSON for a template file\n\n        This method creates the JSON file ``.versions`` based on the metadata\n        and specific functions that are present in a given template script.\n\n        It starts by fetching the template metadata, which can be specified\n        via the ``__version__``, ``__template__`` and ``__build__``\n        attributes. If all of these attributes exist, it starts to populate\n        a JSON/dict array (Note that the absence of any one of them will\n        prevent the version from being written).\n\n        Then, it will search the\n        template scope for functions that start with the substring\n        ``__set_version`` (For example ``def __set_version_fastqc()`).\n        These functions should gather the version of\n        an arbitrary program and return a JSON/dict object with the following\n        information::\n\n            {\n                \"program\": <program_name>,\n                \"version\": <version>\n                \"build\": <build>\n            }\n\n        This JSON/dict object is then written in the ``.versions`` file.\n        \"\"\"\n\n        version_storage = []\n\n        template_version = self.context.get(\"__version__\", None)\n        template_program = self.context.get(\"__template__\", None)\n        template_build = self.context.get(\"__build__\", None)\n\n        if template_version and template_program and template_build:\n            if self.logger:\n                self.logger.debug(\"Adding template version: {}; {}; \"\n                                  \"{}\".format(template_program,\n                                              template_version,\n                                              template_build))\n            version_storage.append({\n                \"program\": template_program,\n                \"version\": template_version,\n                \"build\": template_build\n            })\n\n        for var, obj in self.context.items():\n            if var.startswith(\"__get_version\"):\n                ver = obj()\n                version_storage.append(ver)\n                if self.logger:\n                    self.logger.debug(\"Found additional software version\"\n                                      \"{}\".format(ver))\n\n        with open(\".versions\", \"w\") as fh:\n            fh.write(json.dumps(version_storage, separators=(\",\", \":\")))\n\n"
  },
  {
    "path": "flowcraft/templates/integrity_coverage.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module receives paired FastQ files, a genome size estimate and a minimum\ncoverage threshold and has three purposes while iterating over the FastQ files:\n\n    - Checks the integrity of FastQ files (corrupted files).\n    - Guesses the encoding of FastQ files (this can be turned off in the \\\n    ``opts`` argument).\n    - Estimates the coverage for each sample.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : *Sample Identification string*\n    - e.g.: ``'SampleA'``\n\n- ``fastq_pair`` : *Pair of FastQ file paths*\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n\n- ``gsize`` : *Expected genome size*\n    - e.g.: ``'2.5'``\n\n- ``cov`` : *Minimum coverage threshold*\n    - e.g.: ``'15'``\n\n- ``opts`` : *Specify additional arguments for executing integrity_coverage. \\\n    The arguments should be a string of command line arguments, such as \\\n    '-e'. The accepted arguments are:*\n    - ``'-e'`` : Skip encoding guess.\n\nGenerated output\n----------------\n\nThe generated output are output files that contain an object, usually a string.\n(Values within ``${}`` are substituted by the corresponding variable.)\n\n- ``${sample_id}_encoding`` : Stores the encoding for the sample FastQ. If no \\\n    encoding could be guessed, write 'None' to file.\n    - e.g.: ``'Illumina-1.8'`` or ``'None'``\n\n- ``${sample_id}_phred`` : Stores the phred value for the sample FastQ. If no \\\n    phred could be guessed, write 'None' to file.\n    - ``'33'`` or ``'None'``\n\n- ``${sample_id}_coverage`` : Stores the expected coverage of the samples, \\\n    based on a given genome size.\n    - ``'112'`` or ``'fail'``\n\n- ``${sample_id}_report`` : Stores the report on the expected coverage \\\n    estimation. This string written in this file will appear in the \\\n    coverage report.\n    - ``'${sample_id}, 112, PASS'``\n\n- ``${sample_id}_max_len`` : Stores the maximum read length for the current \\\n    sample.\n    - ``'152'``\n\nNotes\n-----\n\nIn case of a corrupted sample, all expected output files should have\n``'corrupt'`` written.\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"03082018\"\n__template__ = \"integrity_coverage-nf\"\n\nimport os\nimport bz2\nimport gzip\nimport json\nimport zipfile\n\nfrom itertools import chain\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n# Set constants when running from Nextflow\nif __file__.endswith(\".command.sh\"):\n    # CONSTANTS\n    FASTQ_PAIR = '$fastq_pair'.split()\n    SAMPLE_ID = '$sample_id'\n    GSIZE = float('$gsize')\n    MINIMUM_COVERAGE = float('$cov')\n    OPTS = '$opts'.split()\n\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"GSIZE: {}\".format(GSIZE))\n    logger.debug(\"MINIMUM_COVERAGE: {}\".format(MINIMUM_COVERAGE))\n    logger.debug(\"OPTS: {}\".format(OPTS))\n\nRANGES = {\n    'Sanger': [33, (33, 73)],\n    'Illumina-1.8': [33, (33, 74)],\n    'Solexa': [64, (59, 104)],\n    'Illumina-1.3': [64, (64, 104)],\n    'Illumina-1.5': [64, (66, 105)]\n}\n\"\"\"\ndict: Dictionary containing the encoding values for several fastq formats. The\nkey contains the format and the value contains a list with the corresponding\nphred score and a list with the range of encodings.\n\"\"\"\n\nCOPEN = {\n    \"gz\": gzip.open,\n    \"bz2\": bz2.open,\n    \"zip\": zipfile.ZipFile\n}\n\nMAGIC_DICT = {\n    b\"\\\\x1f\\\\x8b\\\\x08\": \"gz\",\n    b\"\\\\x42\\\\x5a\\\\x68\": \"bz2\",\n    b\"\\\\x50\\\\x4b\\\\x03\\\\x04\": \"zip\"\n}\n\"\"\"\ndict: Dictionary containing the binary signatures for three compression formats\n(gzip, bzip2 and zip).\n\"\"\"\n\n\ndef guess_file_compression(file_path, magic_dict=None):\n    \"\"\"Guesses the compression of an input file.\n\n    This function guesses the compression of a given file by checking for\n    a binary signature at the beginning of the file. These signatures are\n    stored in the :py:data:`MAGIC_DICT` dictionary. The supported compression\n    formats are gzip, bzip2 and zip. If none of the signatures in this\n    dictionary are found at the beginning of the file, it returns ``None``.\n\n    Parameters\n    ----------\n    file_path : str\n        Path to input file.\n    magic_dict : dict, optional\n        Dictionary containing the signatures of the compression types. The\n        key should be the binary signature and the value should be the\n        compression format. If left ``None``, it falls back to\n        :py:data:`MAGIC_DICT`.\n\n    Returns\n    -------\n    file_type : str or None\n        If a compression type is detected, returns a string with the format.\n        If not, returns ``None``.\n    \"\"\"\n\n    if not magic_dict:\n        magic_dict = MAGIC_DICT\n\n    max_len = max(len(x) for x in magic_dict)\n\n    with open(file_path, \"rb\") as f:\n        file_start = f.read(max_len)\n\n    logger.debug(\"Binary signature start: {}\".format(file_start))\n\n    for magic, file_type in magic_dict.items():\n        if file_start.startswith(magic):\n            return file_type\n\n    return None\n\n\ndef get_qual_range(qual_str):\n    \"\"\" Get range of the Unicode encode range for a given string of characters.\n\n    The encoding is determined from the result of the :py:func:`ord` built-in.\n\n    Parameters\n    ----------\n    qual_str : str\n        Arbitrary string.\n\n    Returns\n    -------\n    x : tuple\n        (Minimum Unicode code, Maximum Unicode code).\n    \"\"\"\n\n    vals = [ord(c) for c in qual_str]\n\n    return min(vals), max(vals)\n\n\ndef get_encodings_in_range(rmin, rmax):\n    \"\"\" Returns the valid encodings for a given encoding range.\n\n    The encoding ranges are stored in the :py:data:`RANGES` dictionary, with\n    the encoding name as a string and a list as a value containing the\n    phred score and a tuple with the encoding range. For a given encoding\n    range provided via the two first arguments, this function will return\n    all possible encodings and phred scores.\n\n    Parameters\n    ----------\n    rmin : int\n        Minimum Unicode code in range.\n    rmax : int\n        Maximum Unicode code in range.\n\n    Returns\n    -------\n    valid_encodings : list\n        List of all possible encodings for the provided range.\n    valid_phred : list\n        List of all possible phred scores.\n\n    \"\"\"\n\n    valid_encodings = []\n    valid_phred = []\n\n    for encoding, (phred, (emin, emax)) in RANGES.items():\n        if rmin >= emin and rmax <= emax:\n            valid_encodings.append(encoding)\n            valid_phred.append(phred)\n\n    return valid_encodings, valid_phred\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, gsize, minimum_coverage, opts):\n    \"\"\" Main executor of the integrity_coverage template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    gsize : float or int\n        Estimate of genome size in Mb.\n    minimum_coverage : float or int\n        Minimum coverage required for a sample to pass the coverage check\n    opts : list\n        List of arbitrary options. See `Expected input`_.\n\n    \"\"\"\n\n    logger.info(\"Starting integrity coverage main\")\n\n    # Check for runtime options\n    if \"-e\" in opts:\n        skip_encoding = True\n    else:\n        skip_encoding = False\n\n    # Information for encoding guess\n    gmin, gmax = 99, 0\n    encoding = []\n    phred = None\n\n    # Information for coverage estimation\n    chars = 0\n    nreads = 0\n\n    # Information on maximum read length\n    max_read_length = 0\n\n    # Get compression of each FastQ pair file\n    file_objects = []\n    for fastq in fastq_pair:\n\n        logger.info(\"Processing file {}\".format(fastq))\n\n        logger.info(\"[{}] Guessing file compression\".format(fastq))\n        ftype = guess_file_compression(fastq)\n\n        # This can guess the compression of gz, bz2 and zip. If it cannot\n        # find the compression type, it tries to open a regular file\n        if ftype:\n            logger.info(\"[{}] Found file compression: {}\".format(\n                fastq, ftype))\n            file_objects.append(COPEN[ftype](fastq, \"rt\"))\n        else:\n            logger.info(\"[{}] File compression not found. Assuming an \"\n                        \"uncompressed file\".format(fastq))\n            file_objects.append(open(fastq))\n\n    logger.info(\"Starting FastQ file parsing\")\n\n    # The '*_encoding' file stores a string with the encoding ('Sanger')\n    # If no encoding is guessed, 'None' should be stored\n    # The '*_phred' file stores a string with the phred score ('33')\n    # If no phred is guessed, 'None' should be stored\n    # The '*_coverage' file stores the estimated coverage ('88')\n    # The '*_report' file stores a csv report of the file\n    # The '*_max_len' file stores a string with the maximum contig len ('155')\n    with open(\"{}_encoding\".format(sample_id), \"w\") as enc_fh, \\\n            open(\"{}_phred\".format(sample_id), \"w\") as phred_fh, \\\n            open(\"{}_coverage\".format(sample_id), \"w\") as cov_fh, \\\n            open(\"{}_report\".format(sample_id), \"w\") as cov_rep, \\\n            open(\"{}_max_len\".format(sample_id), \"w\") as len_fh, \\\n            open(\".report.json\", \"w\") as json_report, \\\n            open(\".status\", \"w\") as status_fh, \\\n            open(\".fail\", \"w\") as fail_fh:\n\n        try:\n            # Iterate over both pair files sequentially using itertools.chain\n            for i, line in enumerate(chain(*file_objects)):\n\n                # Parse only every 4th line of the file for the encoding\n                # e.g.: AAAA/EEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEE (...)\n                if (i + 1) % 4 == 0 and not skip_encoding:\n                    # It is important to strip() the line so that any newline\n                    # character is removed and not accounted for in the\n                    # encoding guess\n                    lmin, lmax = get_qual_range(line.strip())\n\n                    # Guess new encoding if the range expands the previously\n                    # set boundaries of gmin and gmax\n                    if lmin < gmin or lmax > gmax:\n                        gmin, gmax = min(lmin, gmin), max(lmax, gmax)\n                        encoding, phred = get_encodings_in_range(gmin, gmax)\n                        logger.debug(\n                            \"Updating estimates at line {} with range {} to\"\n                            \" '{}' (encoding) and '{}' (phred)\".format(\n                                i, [lmin, lmax], encoding, phred))\n\n                # Parse only every 2nd line of the file for the coverage\n                # e.g.: GGATAATCTACCTTGACGATTTGTACTGGCGTTGGTTTCTTA (...)\n                if (i + 3) % 4 == 0:\n                    read_len = len(line.strip())\n                    chars += read_len\n                    nreads += 1\n\n                    # Evaluate maximum read length for sample\n                    if read_len > max_read_length:\n                        logger.debug(\"Updating maximum read length at line \"\n                                     \"{} to {}\".format(i, read_len))\n                        max_read_length = read_len\n\n            # End of FastQ parsing\n            logger.info(\"Finished FastQ file parsing\")\n\n            # The minimum expected coverage for a sample to pass\n            exp_coverage = round(chars / (gsize * 1e6), 2)\n\n            # Set json report\n            if \"-e\" not in opts:\n\n                json_dic = {\n                    \"tableRow\": [{\n                        \"sample\": sample_id,\n                        \"data\": [\n                            {\"header\": \"Raw BP\",\n                             \"value\": chars,\n                             \"table\": \"qc\",\n                             \"columnBar\": True},\n                            {\"header\": \"Reads\",\n                             \"value\": nreads,\n                             \"table\": \"qc\",\n                             \"columnBar\": True},\n                            {\"header\": \"Coverage\",\n                             \"value\": exp_coverage,\n                             \"table\": \"qc\",\n                             \"columnBar\": True,\n                             \"failThreshold\": minimum_coverage\n                             }\n                        ]\n                    }],\n                    \"plotData\": [{\n                        \"sample\": sample_id,\n                        \"data\": {\n                            \"sparkline\": chars\n                        }\n                    }],\n                }\n            else:\n                json_dic = {\n                    \"tableRow\": [{\n                        \"sample\": sample_id,\n                        \"data\": [\n                            {\"header\": \"Coverage\",\n                             \"value\": exp_coverage,\n                             \"table\": \"qc\",\n                             \"columnBar\": True,\n                             \"failThreshold\": minimum_coverage\n                             }\n                        ],\n                    }],\n                }\n\n            # Get encoding\n            if len(encoding) > 0:\n                encoding = set(encoding)\n                phred = set(phred)\n                # Get encoding and phred as strings\n                # e.g. enc: Sanger, Illumina-1.8\n                # e.g. phred: 64\n                enc = \"{}\".format(\",\".join([x for x in encoding]))\n                phred = \"{}\".format(\",\".join(str(x) for x in phred))\n                logger.info(\"Encoding set to {}\".format(enc))\n                logger.info(\"Phred set to {}\".format(enc))\n\n                enc_fh.write(enc)\n                phred_fh.write(phred)\n            # Encoding not found\n            else:\n                if not skip_encoding:\n                    encoding_msg = \"Could not guess encoding and phred from \" \\\n                                   \"FastQ\"\n                    logger.warning(encoding_msg)\n                    json_dic[\"warnings\"] = [{\n                        \"sample\": sample_id,\n                        \"table\": \"qc\",\n                        \"value\": [encoding_msg]\n                    }]\n                    enc_fh.write(\"None\")\n                    phred_fh.write(\"None\")\n\n            # Estimate coverage\n            logger.info(\"Estimating coverage based on a genome size of \"\n                        \"{}\".format(gsize))\n            logger.info(\"Expected coverage is {}\".format(exp_coverage))\n\n            if exp_coverage >= minimum_coverage:\n                cov_rep.write(\"{},{},{}\\\\n\".format(\n                    sample_id, str(exp_coverage), \"PASS\"))\n                cov_fh.write(str(exp_coverage))\n                status_fh.write(\"pass\")\n            # Estimated coverage does not pass minimum threshold\n            else:\n                fail_msg = \"Sample with low coverage ({}), below the {} \" \\\n                           \"threshold\".format(exp_coverage, minimum_coverage)\n                logger.error(fail_msg)\n                fail_fh.write(fail_msg)\n                cov_fh.write(\"fail\")\n                status_fh.write(\"fail\")\n                cov_rep.write(\"{},{},{}\\\\n\".format(\n                    sample_id, str(exp_coverage), \"FAIL\"))\n                json_dic[\"fail\"] = [{\n                    \"sample\": sample_id,\n                    \"table\": \"qc\",\n                    \"value\": [fail_msg]\n                }]\n\n            json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n            # Maximum read length\n            len_fh.write(\"{}\".format(max_read_length))\n\n        # This exception is raised when the input FastQ files are corrupted\n        except EOFError:\n            logger.error(\"The FastQ files could not be correctly \"\n                         \"parsed. They may be corrupt\")\n            for fh in [enc_fh, phred_fh, cov_fh, cov_rep, len_fh]:\n                fh.write(\"corrupt\")\n                status_fh.write(\"fail\")\n                fail_fh.write(\"Could not read/parse FastQ. \"\n                              \"Possibly corrupt file\")\n\n\nif __name__ == \"__main__\":\n\n    main(SAMPLE_ID, FASTQ_PAIR, GSIZE, MINIMUM_COVERAGE, OPTS)\n"
  },
  {
    "path": "flowcraft/templates/mapping2json.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to generate a json output for mapping results that\ncan be imported in pATLAS.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``depth_file`` : String with the name of the mash screen output file.\n    - e.g.: ``'samtoolsDepthOutput_sampleA.txt'``\n- ``json_dict`` : the file that contains the dictionary with keys and values for\n        accessions and their respective lengths.\n    - e.g.: ``'reads_sample_result_length.json'``\n- ``cutoff`` : The cutoff used to trim the unwanted matches for the minimum\n        coverage results from mapping. This value may range between 0 and 1.\n    - e.g.: ``0.6``\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.1.0\"\n__build__ = \"04072018\"\n__template__ = \"mapping2json-nf\"\n\nimport os\nimport json\nimport sys\nfrom pympler.asizeof import asizeof\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    DEPTH_TXT = '$depthFile'\n    JSON_LENGTH = '$lengthJson'\n    CUTOFF = '$cov_cutoff'\n    SAMPLE_ID = '$sample_id'\nelse:\n    DEPTH_TXT = sys.argv[1]\n    JSON_LENGTH = sys.argv[2]\n    CUTOFF = sys.argv[3]\n    SAMPLE_ID = sys.argv[4]\n\nlogger.debug(\"List of arguments given: {}\".format([\n    DEPTH_TXT,\n    JSON_LENGTH,\n    CUTOFF,\n    SAMPLE_ID\n]))\n\n# check if all variables are assigned\nif DEPTH_TXT and JSON_LENGTH and SAMPLE_ID and CUTOFF:\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"DEPTH_TXT: {}\".format(DEPTH_TXT))\n    logger.debug(\"JSON_LENGHT: {}\".format(JSON_LENGTH))\n    logger.debug(\"CUTOFF: {}\".format(CUTOFF))\nelse:\n    logger.error(\"Args should be given to this template, either from sys.argv\"\n                 \" or through nextflow variables\")\n\n\ndef depth_file_reader(depth_file):\n    \"\"\"\n    Function that parse samtools depth file and creates 3 dictionaries that\n    will be useful to make the outputs of this script, both the tabular file\n    and the json file that may be imported by pATLAS\n\n    Parameters\n    ----------\n    depth_file: textIO\n        the path to depth file for each sample\n\n    Returns\n    -------\n    depth_dic_coverage: dict\n            dictionary with the coverage per position for each plasmid\n    \"\"\"\n\n    # dict to store the mean coverage for each reference\n    depth_dic_coverage = {}\n\n    for line in depth_file:\n        tab_split = line.split()  # split by any white space\n        reference = \"_\".join(tab_split[0].strip().split(\"_\")[0:3])  # store\n        # only the gi for the reference\n        position = tab_split[1]\n        num_reads_align = float(tab_split[2].rstrip())\n\n        if reference not in depth_dic_coverage:\n            depth_dic_coverage[reference] = {}\n\n        depth_dic_coverage[reference][position] = num_reads_align\n\n    logger.info(\"Finished parsing depth file.\")\n    depth_file.close()\n\n    logger.debug(\"Size of dict_cov: {} kb\".format(\n        asizeof(depth_dic_coverage)/1024))\n\n    return depth_dic_coverage\n\n\ndef generate_jsons(depth_dic_coverage, plasmid_length, cutoff):\n    \"\"\"\n\n    Parameters\n    ----------\n    depth_dic_coverage: dict\n         dictionary with the coverage per position for each plasmid\n\n    Returns\n    -------\n    percentage_bases_covered: dict\n    dict_cov:  dict\n\n    \"\"\"\n\n    # initializes the dictionary with the mean coverage results per plasmid\n    percentage_bases_covered = {}\n    # dict to store coverage results for a given interval of points\n    dict_cov = {}\n\n    for ref in depth_dic_coverage:\n        # calculates the percentage value per each reference\n        perc_value_per_ref = float(len(depth_dic_coverage[ref])) / \\\n            float(plasmid_length[ref])\n        # checks if percentage value is higher or equal to the cutoff defined\n        if perc_value_per_ref >= cutoff:\n            percentage_bases_covered[ref] = round(perc_value_per_ref, 2)\n\n            # starts parser to get the array with the coverage for all the\n            # positions\n            # first, sets the interval for the reference being parsed\n            interval = round(int(plasmid_length[ref]) * 0.01,\n                             ndigits=0)\n\n            # if the sequence is smaller than 100 bp, which shouldn't happen\n            # anyway\n            if interval < 1:\n                interval = 1\n\n            # starts dict cov for the reference\n            dict_cov[ref] = {\n                \"length\": int(plasmid_length[ref]),\n                \"interval\": int(interval),\n                \"values\": []\n            }\n\n            # array to store the values of coverage for each interval\n            array_of_cov = []\n            # the counter that is used to output the values per interval\n            reset_counter = 0\n            # loop to generate dict_cov\n            logger.info(\"Generating plot data for plasmid: {}\".format(ref))\n            for i in range(int(plasmid_length[ref])):\n                # checks if key for a given position is in dict and if so\n                # adds it to array of cov, otherwise it will add a 0\n                try:\n                    array_of_cov.append(int(depth_dic_coverage[ref][str(i)]))\n                except KeyError:\n                    array_of_cov.append(0)\n\n                # if the counter equals the interval then output to dict_cov\n                if reset_counter == interval:\n                    dict_cov[ref][\"values\"].append(\n                        int(sum(array_of_cov)/len(array_of_cov))\n                    )\n                    # reset counter\n                    reset_counter = 0\n                else:\n                    # if counter is less than interval then sums 1\n                    reset_counter += 1\n\n    logger.info(\"Successfully generated dicts necessary for output json file \"\n                \"and .report.json depth file.\")\n    logger.debug(\"Size of percentage_bases_covered: {} kb\".format(\n        asizeof(percentage_bases_covered)/1024))\n    logger.debug(\"Size of dict_cov: {} kb\".format(asizeof(dict_cov)/1024))\n    return percentage_bases_covered, dict_cov\n\n\n@MainWrapper\ndef main(depth_file, json_dict, cutoff, sample_id):\n    \"\"\"\n    Function that handles the inputs required to parse depth files from bowtie\n    and dumps a dict to a json file that can be imported into pATLAS.\n\n    Parameters\n    ----------\n    depth_file: str\n         the path to depth file for each sample\n    json_dict: str\n        the file that contains the dictionary with keys and values for\n        accessions\n        and their respective lengths\n    cutoff: str\n        the cutoff used to trim the unwanted matches for the minimum coverage\n        results from mapping. This value may range between 0 and 1.\n    sample_id: str\n        the id of the sample being parsed\n\n    \"\"\"\n\n    # check for the appropriate value for the cutoff value for coverage results\n    logger.debug(\"Cutoff value: {}. Type: {}\".format(cutoff, type(cutoff)))\n    try:\n        cutoff_val = float(cutoff)\n        if cutoff_val < 0.4:\n            logger.warning(\"This cutoff value will generate a high volume of \"\n                           \"plot data. Therefore '.report.json' can be too big\")\n    except ValueError:\n        logger.error(\"Cutoff value should be a string such as: '0.6'. \"\n                     \"The outputted value: {}. Make sure to provide an \"\n                     \"appropriate value for --cov_cutoff\".format(cutoff))\n        sys.exit(1)\n\n    # loads dict from file, this file is provided in docker image\n\n    plasmid_length = json.load(open(json_dict))\n    if plasmid_length:\n        logger.info(\"Loaded dictionary of plasmid lengths\")\n    else:\n        logger.error(\"Something went wrong and plasmid lengths dictionary\"\n                     \"could not be loaded. Check if process received this\"\n                     \"param successfully.\")\n        sys.exit(1)\n\n    # read depth file\n    depth_file_in = open(depth_file)\n\n    # first reads the depth file and generates dictionaries to handle the input\n    # to a simpler format\n    logger.info(\"Reading depth file and creating dictionary to dump.\")\n    depth_dic_coverage = depth_file_reader(depth_file_in)\n    percentage_bases_covered, dict_cov = generate_jsons(depth_dic_coverage,\n                                                        plasmid_length,\n                                                        cutoff_val)\n\n    if percentage_bases_covered and dict_cov:\n        logger.info(\"percentage_bases_covered length: {}\".format(\n            str(len(percentage_bases_covered))))\n        logger.info(\"dict_cov length: {}\".format(str(len(dict_cov))))\n    else:\n        logger.error(\"Both dicts that dump to JSON file or .report.json are \"\n                     \"empty.\")\n\n    # then dump do file\n    logger.info(\"Dumping to {}\".format(\"{}_mapping.json\".format(depth_file)))\n    with open(\"{}_mapping.json\".format(depth_file), \"w\") as output_json:\n        output_json.write(json.dumps(percentage_bases_covered))\n\n    json_dic = {\n        \"tableRow\": [{\n            \"sample\": sample_id,\n            \"data\": [{\n                \"header\": \"Mapping\",\n                \"table\": \"plasmids\",\n                \"patlas_mapping\": percentage_bases_covered,\n                \"value\": len(percentage_bases_covered)\n            }]\n        }],\n        \"sample\": sample_id,\n        \"patlas_mapping\": percentage_bases_covered,\n        \"plotData\": [{\n            \"sample\": sample_id,\n            \"data\": {\n                \"patlasMappingSliding\": dict_cov\n            },\n        }]\n    }\n\n    logger.debug(\"Size of dict_cov: {} kb\".format(asizeof(json_dic)/1024))\n    logger.info(\"Writing to .report.json\")\n    with open(\".report.json\", \"w\") as json_report:\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n    main(DEPTH_TXT, JSON_LENGTH, CUTOFF, SAMPLE_ID)\n"
  },
  {
    "path": "flowcraft/templates/mashdist2json.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to generate a json output for mash dist results that\ncan be imported in pATLAS.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``mash_output`` : String with the name of the mash screen output file.\n    - e.g.: ``'fastaFileA_mashdist.txt'``\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.4.0\"\n__build__ = \"04092018\"\n__template__ = \"mashsdist2json-nf\"\n\nimport json\nimport os\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    MASH_TXT = '$mashtxt'\n    HASH_CUTOFF = '$shared_hashes'\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY_IN = '$fasta'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"MASH_TXT: {}\".format(MASH_TXT))\n    logger.debug(\"HASH_CUTOFF: {}\".format(HASH_CUTOFF))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY_IN: {}\".format(ASSEMBLY_IN))\n\n\ndef send_to_output(master_dict, mash_output, sample_id, assembly_file):\n    \"\"\"Send dictionary to output json file\n    This function sends master_dict dictionary to a json file if master_dict is\n    populated with entries, otherwise it won't create the file\n\n    Parameters\n    ----------\n    master_dict: dict\n        dictionary that stores all entries for a specific query sequence\n        in multi-fasta given to mash dist as input against patlas database\n    last_seq: str\n        string that stores the last sequence that was parsed before writing to\n        file and therefore after the change of query sequence between different\n        rows on the input file\n    mash_output: str\n        the name/path of input file to main function, i.e., the name/path of\n        the mash dist output txt file.\n    sample_id: str\n        The name of the sample being parse to .report.json file\n\n    Returns\n    -------\n\n    \"\"\"\n\n    plot_dict = {}\n\n    # create a new file only if master_dict is populated\n    if master_dict:\n        out_file = open(\"{}.json\".format(\n            \"\".join(mash_output.split(\".\")[0])), \"w\")\n        out_file.write(json.dumps(master_dict))\n        out_file.close()\n\n        # iterate through master_dict in order to make contigs the keys\n        for k,v in master_dict.items():\n            if not v[2] in plot_dict:\n                plot_dict[v[2]] = [k]\n            else:\n                plot_dict[v[2]].append(k)\n\n        number_hits = len(master_dict)\n    else:\n        number_hits = 0\n\n    json_dic = {\n        \"tableRow\": [{\n            \"sample\": sample_id,\n            \"data\": [{\n                \"header\": \"Mash Dist\",\n                \"table\": \"plasmids\",\n                \"patlas_mashdist\": master_dict,\n                \"value\": number_hits\n            }]\n        }],\n        \"plotData\": [{\n            \"sample\": sample_id,\n            \"data\": {\n                \"patlasMashDistXrange\": plot_dict\n            },\n            \"assemblyFile\": assembly_file\n        }]\n    }\n\n    with open(\".report.json\", \"w\") as json_report:\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\n@MainWrapper\ndef main(mash_output, hash_cutoff, sample_id, assembly_file):\n    \"\"\"\n    Main function that allows to dump a mash dist txt file to a json file\n\n    Parameters\n    ----------\n    mash_output: str\n        A string with the input file.\n    hash_cutoff: str\n        the percentage cutoff for the percentage of shared hashes between query\n        and plasmid in database that is allowed for the plasmid to be reported\n        to the results outputs\n    sample_id: str\n        The name of the sample.\n    \"\"\"\n\n    input_f = open(mash_output, \"r\")\n\n    master_dict = {}\n\n    for line in input_f:\n\n        tab_split = line.split(\"\\t\")\n        current_seq = tab_split[1].strip()\n        ref_accession = \"_\".join(tab_split[0].strip().split(\"_\")[0:3])\n        mash_dist = tab_split[2].strip()\n        hashes_list = tab_split[-1].strip().split(\"/\")\n\n        # creates a percentage of the shared hashes between the sample and the\n        # reference\n        perc_hashes = float(hashes_list[0]) / float(hashes_list[1])\n\n        # if ref_accession already in dict, i.e., if the same accession number\n        # matches more than one contig.\n        if ref_accession in master_dict.keys():\n            current_seq += \", {}\".format(master_dict[ref_accession][-1])\n\n        # assures that only the hashes with a given shared percentage are\n        # reported to json file\n        if perc_hashes > float(hash_cutoff):\n\n            master_dict[ref_accession] = [\n                round(1 - float(mash_dist), 2),\n                round(perc_hashes, 2),\n                current_seq\n            ]\n\n    # assures that file is closed in last iteration of the loop\n    send_to_output(master_dict, mash_output, sample_id, assembly_file)\n\n\nif __name__ == \"__main__\":\n\n    main(MASH_TXT, HASH_CUTOFF, SAMPLE_ID, ASSEMBLY_IN)\n"
  },
  {
    "path": "flowcraft/templates/mashscreen2json.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to generate a json output for mash screen results that\ncan be imported in pATLAS.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``mash_output`` : String with the name of the mash screen output file.\n    - e.g.: ``'sortedMashScreenResults_SampleA.txt'``\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.1.0\"\n__build__ = \"04072018\"\n__template__ = \"mashscreen2json-nf\"\n\nfrom statistics import median\nimport os\nimport json\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    MASH_TXT = '$mashtxt'\n    SAMPLE_ID = '$sample_id'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"MASH_TXT: {}\".format(MASH_TXT))\n    logger.debug(\"SAMPLE_ID: {}\".format(MASH_TXT))\n\n@MainWrapper\ndef main(mash_output, sample_id):\n    '''\n    converts top results from mash screen txt output to json format\n\n    Parameters\n    ----------\n    mash_output: str\n        this is a string that stores the path to this file, i.e, the name of\n        the file\n    sample_id: str\n        sample name\n\n    '''\n    logger.info(\"Reading file : {}\".format(mash_output))\n    read_mash_output = open(mash_output)\n\n    dic = {}\n    median_list = []\n    filtered_dic = {}\n\n    logger.info(\"Generating dictionary and list to pre-process the final json\")\n    for line in read_mash_output:\n        tab_split = line.split(\"\\t\")\n        identity = tab_split[0]\n        # shared_hashes = tab_split[1]\n        median_multiplicity = tab_split[2]\n        # p_value = tab_split[3]\n        query_id = tab_split[4]\n        # query-comment should not exist here and it is irrelevant\n\n        # here identity is what in fact interests to report to json but\n        # median_multiplicity also is important since it gives an rough\n        # estimation of the coverage depth for each plasmid.\n        # Plasmids should have higher coverage depth due to their increased\n        # copy number in relation to the chromosome.\n        dic[query_id] = [identity, median_multiplicity]\n        median_list.append(float(median_multiplicity))\n\n    output_json = open(\" \".join(mash_output.split(\".\")[:-1]) + \".json\", \"w\")\n\n    # median cutoff is twice the median of all median_multiplicity values\n    # reported by mash screen. In the case of plasmids, since the database\n    # has 9k entries and reads shouldn't have that many sequences it seems ok...\n    if len(median_list) > 0:\n        # this statement assures that median_list has indeed any entries\n        median_cutoff = median(median_list)\n        logger.info(\"Generating final json to dump to a file\")\n        for k, v in dic.items():\n            # estimated copy number\n            copy_number = int(float(v[1]) / median_cutoff)\n            # assure that plasmid as at least twice the median coverage depth\n            if float(v[1]) > median_cutoff:\n                filtered_dic[\"_\".join(k.split(\"_\")[0:3])] = [\n                    round(float(v[0]),2),\n                    copy_number\n                ]\n        logger.info(\n            \"Exported dictionary has {} entries\".format(len(filtered_dic)))\n    else:\n        # if no entries were found raise an error\n        logger.error(\"No matches were found using mash screen for the queried reads\")\n\n    output_json.write(json.dumps(filtered_dic))\n    output_json.close()\n\n    json_dic = {\n        \"tableRow\": [{\n            \"sample\": sample_id,\n            \"data\": [{\n                \"header\": \"Mash Screen\",\n                \"table\": \"plasmids\",\n                \"patlas_mashscreen\": filtered_dic,\n                \"value\": len(filtered_dic)\n            }]\n        }],\n    }\n\n    with open(\".report.json\", \"w\") as json_report:\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n\n    main(MASH_TXT, SAMPLE_ID)\n"
  },
  {
    "path": "flowcraft/templates/megahit.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended execute megahit on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``kmers`` : Setting for megahit kmers. Can be either ``'auto'``, \\\n    ``'default'`` or a user provided list. All must be odd, in the range 15-255, increment <= 28\n    - e.g.: ``'auto'`` or ``'default'`` or ``'55 77 99 113 127'``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nGenerated output\n----------------\n\n- ``contigs.fa`` : Main output of megahit with the assembly\n    - e.g.: ``contigs.fa``\n- ``megahit_status`` :  Stores the status of the megahit run. If it was \\\n    successfully executed, it stores ``'pass'``. Otherwise, it stores the\\\n    ``STDERR`` message.\n    - e.g.: ``'pass'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"26042018\"\n__template__ = \"megahit-nf\"\n\nimport os\nimport re\nimport subprocess\n\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef is_odd(k_mer):\n    for i in k_mer:\n        if i % 2 != 0:\n            return True\n    return False\n\n\ndef __get_version_megahit():\n\n    try:\n\n        cli = [\"megahit\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[-1][1:].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"megahit\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    MAX_LEN = int('$max_len'.strip())\n    KMERS = '$kmers'.strip()\n    MEM = '$task.memory'\n    CLEAR = '$clear'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"MAX_LEN: {}\".format(MAX_LEN))\n    logger.debug(\"KMERS: {}\".format(KMERS))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n\n\ndef set_kmers(kmer_opt, max_read_len):\n    \"\"\"Returns a kmer list based on the provided kmer option and max read len.\n\n    Parameters\n    ----------\n    kmer_opt : str\n        The k-mer option. Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n    max_read_len : int\n        The maximum read length of the current sample.\n\n    Returns\n    -------\n    kmers : list\n        List of k-mer values that will be provided to megahit.\n\n    \"\"\"\n\n    logger.debug(\"Kmer option set to: {}\".format(kmer_opt))\n\n    # Check if kmer option is set to auto\n    if kmer_opt == \"auto\":\n\n        if max_read_len >= 175:\n            kmers = [55, 77, 99, 113, 127]\n        else:\n            kmers = [21, 33, 55, 67, 77]\n\n        logger.debug(\"Kmer range automatically selected based on max read\"\n                     \"length of {}: {}\".format(max_read_len, kmers))\n\n    # Check if manual k-mers were specified\n    elif len(kmer_opt.split()) > 1:\n\n        kmers = kmer_opt.split()\n        if kmers[0]<15 or kmers[-1]>255 or is_odd(kmers):\n            kmers = []\n            logger.debug(\"Kmer out of range or with even numbers\"\n                         \"(will be automatically determined by megahit\")\n        else:\n            logger.debug(\"Kmer range manually set to: {}\".format(kmers))\n\n    else:\n\n        kmers = []\n        logger.debug(\"Kmer range set to empty (will be automatically \"\n                     \"determined by megahit\")\n\n    return kmers\n\n\ndef fix_contig_names(asseembly_path):\n    \"\"\"Removes whitespace from the assembly contig names\n\n    Parameters\n    ----------\n    asseembly_path : path to assembly file\n\n    Returns\n    -------\n    str:\n        Path to new assembly file with fixed contig names\n    \"\"\"\n\n    fixed_assembly = \"fixed_assembly.fa\"\n\n    with open(asseembly_path) as in_hf, open(fixed_assembly, \"w\") as ou_fh:\n\n        for line in in_hf:\n\n            if line.startswith(\">\"):\n                fixed_line = line.replace(\" \", \"_\")\n                ou_fh.write(fixed_line)\n            else:\n                ou_fh.write(line)\n\n    return fixed_assembly\n\n\ndef clean_up(fastq):\n    \"\"\"\n    Cleans the temporary fastq files. If they are symlinks, the link\n    source is removed\n\n    Parameters\n    ----------\n    fastq : list\n        List of fastq files.\n    \"\"\"\n\n    for fq in fastq:\n        # Get real path of fastq files, following symlinks\n        rp = os.path.realpath(fq)\n        logger.debug(\"Removing temporary fastq file path: {}\".format(rp))\n        if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n            os.remove(rp)\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, max_len, kmer, mem, clear):\n    \"\"\"Main executor of the megahit template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    max_len : int\n        Maximum read length. This value is determined in\n        :py:class:`templates.integrity_coverage`\n    kmer : str\n        Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n\n    \"\"\"\n\n    logger.info(\"Starting megahit\")\n\n    logger.info(\"Setting megahit kmers\")\n    kmers = set_kmers(kmer, max_len)\n    logger.info(\"megahit kmers set to: {}\".format(kmers))\n\n    mem_bytes = int(mem.replace(\" GB\", \"\")) * 1073741824\n\n    cli = [\n        \"megahit\",\n        \"--num-cpu-threads\",\n        \"$task.cpus\",\n        \"--memory\",\n        str(mem_bytes),\n        \"-o\",\n        \"megahit\"\n    ]\n\n    # Add kmers, if any were specified\n    if kmers:\n        cli += [\n            \"--k-list\",\n            \",\".join([str(x) for x in kmers])\n        ]\n\n    # Add FastQ files\n    cli += [\n        \"-1\",\n        fastq_pair[0],\n        \"-2\",\n        fastq_pair[1]\n    ]\n\n    logger.debug(\"Running megahit subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished megahit subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished megahit subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished megahit with return code: {}\".format(\n        p.returncode))\n\n    with open(\".status\", \"w\") as fh:\n        if p.returncode != 0:\n            fh.write(\"error\")\n            return\n        else:\n            fh.write(\"pass\")\n\n    assembly_path = \"megahit/final.contigs.fa\"\n    fixed_assembly = fix_contig_names(assembly_path)\n\n    # Change the default final.contigs.fa assembly name to a more informative\n    #  one\n    if \"_trim.\" in fastq_pair[0]:\n        sample_id += \"_trim\"\n    # Get megahit version for output name\n    info = __get_version_megahit()\n\n    assembly_file = \"{}_megahit{}.fasta\".format(\n        sample_id, info[\"version\"].replace(\".\", \"\"))\n    os.rename(fixed_assembly, assembly_file)\n    logger.info(\"Setting main assembly file to: {}\".format(assembly_file))\n\n    # Remove input fastq files when clear option is specified.\n    # Only remove temporary input when the expected output exists.\n    if clear == \"true\" and os.path.exists(assembly_file):\n        clean_up(fastq_pair)\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, FASTQ_PAIR, MAX_LEN, KMERS, MEM, CLEAR)\n"
  },
  {
    "path": "flowcraft/templates/metaspades.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended execute metaSpades on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``kmers`` : Setting for Spades kmers. Can be either ``'auto'``, \\\n    ``'default'`` or a user provided list.\n    - e.g.: ``'auto'`` or ``'default'`` or ``'55 77 99 113 127'``\n\nGenerated output\n----------------\n\n- ``contigs.fasta`` : Main output of spades with the assembly\n    - e.g.: ``contigs.fasta``\n- ``spades_status`` :  Stores the status of the spades run. If it was \\\n    successfully executed, it stores ``'pass'``. Otherwise, it stores the\\\n    ``STDERR`` message.\n    - e.g.: ``'pass'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"16012018\"\n__template__ = \"metaspades-nf\"\n\nimport os\nimport re\nimport subprocess\n\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_spades():\n\n    try:\n\n        cli = [\"metaspades.py\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[-1][1:].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"metaSPAdes\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    MAX_LEN = int('$max_len'.strip())\n    KMERS = '$kmers'.strip()\n    CLEAR = '$clear'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"MAX_LEN: {}\".format(MAX_LEN))\n    logger.debug(\"KMERS: {}\".format(KMERS))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n\n\ndef clean_up(fastq):\n    \"\"\"\n    Cleans the temporary fastq files. If they are symlinks, the link\n    source is removed\n\n    Parameters\n    ----------\n    fastq : list\n        List of fastq files.\n    \"\"\"\n\n    for fq in fastq:\n        # Get real path of fastq files, following symlinks\n        rp = os.path.realpath(fq)\n        logger.debug(\"Removing temporary fastq file path: {}\".format(rp))\n        if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n            os.remove(rp)\n\n\ndef set_kmers(kmer_opt, max_read_len):\n    \"\"\"Returns a kmer list based on the provided kmer option and max read len.\n\n    Parameters\n    ----------\n    kmer_opt : str\n        The k-mer option. Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n    max_read_len : int\n        The maximum read length of the current sample.\n\n    Returns\n    -------\n    kmers : list\n        List of k-mer values that will be provided to Spades.\n\n    \"\"\"\n\n    logger.debug(\"Kmer option set to: {}\".format(kmer_opt))\n\n    # Check if kmer option is set to auto\n    if kmer_opt == \"auto\":\n\n        if max_read_len >= 175:\n            kmers = [55, 77, 99, 113, 127]\n        else:\n            kmers = [21, 33, 55, 67, 77]\n\n        logger.debug(\"Kmer range automatically selected based on max read\"\n                     \"length of {}: {}\".format(max_read_len, kmers))\n\n    # Check if manual kmers were specified\n    elif len(kmer_opt.split()) > 1:\n\n        kmers = kmer_opt.split()\n        logger.debug(\"Kmer range manually set to: {}\".format(kmers))\n\n    else:\n\n        kmers = []\n        logger.debug(\"Kmer range set to empty (will be automatically \"\n                     \"determined by SPAdes\")\n\n    return kmers\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, max_len, kmer, clear):\n    \"\"\"Main executor of the spades template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    max_len : int\n        Maximum read length. This value is determined in\n        :py:class:`templates.integrity_coverage`\n    kmer : str\n        Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n\n    \"\"\"\n\n    logger.info(\"Starting spades\")\n\n    logger.info(\"Setting SPAdes kmers\")\n    kmers = set_kmers(kmer, max_len)\n    logger.info(\"SPAdes kmers set to: {}\".format(kmers))\n\n    cli = [\n        \"metaspades.py\",\n        \"--only-assembler\",\n        \"--threads\",\n        \"$task.cpus\",\n        \"-o\",\n        \".\"\n    ]\n\n    # Add kmers, if any were specified\n    if kmers:\n        cli += [\"-k {}\".format(\",\".join([str(x) for x in kmers]))]\n\n    # Add FastQ files\n    cli += [\n        \"-1\",\n        fastq_pair[0],\n        \"-2\",\n        fastq_pair[1]\n    ]\n\n    logger.debug(\"Running metaSPAdes subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished metaSPAdes subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished metaSPAdes subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished metaSPAdes with return code: {}\".format(\n        p.returncode))\n\n    with open(\".status\", \"w\") as fh:\n        if p.returncode != 0:\n            fh.write(\"error\")\n            return\n        else:\n            fh.write(\"pass\")\n\n    # Change the default contigs.fasta assembly name to a more informative one\n    if \"_trim.\" in fastq_pair[0]:\n        sample_id += \"_trim\"\n\n    assembly_file = \"{}_metaspades.fasta\".format(\n        sample_id)\n    os.rename(\"contigs.fasta\", assembly_file)\n    logger.info(\"Setting main assembly file to: {}\".format(assembly_file))\n\n    # Remove input fastq files when clear option is specified.\n    # Only remove temporary input when the expected output exists.\n    if clear == \"true\" and os.path.exists(assembly_file):\n        clean_up(fastq_pair)\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, FASTQ_PAIR, MAX_LEN, KMERS, CLEAR)\n"
  },
  {
    "path": "flowcraft/templates/pATLAS_consensus_json.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to generate a json output from the consensus results from\nall the approaches available through options (mapping, assembly, mash screen)\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``mapping_json`` : String with the name of the json file with mapping results.\n    - e.g.: ``'mapping_SampleA.json'``\n- ``dist_json`` : String with the name of the json file with mash dist results.\n    - e.g.: ``'mash_dist_SampleA.json'``\n- ``screen_json`` : String with the name of the json file with mash screen results.\n    - e.g.: ``'mash_screen_sampleA.json'``\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"0.1.0\"\n__build__ = \"24022018\"\n__template__ = \"pATLAS_consensus_json-nf\"\n\nimport os\nimport json\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    LIST_OF_FILES = '$infile_list'.split()\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"LIST_OF_FILES: {}\".format(LIST_OF_FILES))\n\n\n@MainWrapper\ndef main(list_of_jsons):\n    \"\"\"\n\n    Parameters\n    ----------\n    list_of_jsons: list\n        A list of files provided by fullConsensus process provided by nextflow\n\n    \"\"\"\n\n    # first lets gather a collection of the input and their corresponding dicts\n    file_correspondence = {}\n\n    for infile in list_of_jsons:\n        file_dict = json.load(open(infile))\n        file_correspondence[infile] = file_dict\n\n    json_dict = {}\n    for accession in list(file_correspondence.values())[0]:\n        if all([True if accession in f_dict else False\n                for f_dict in file_correspondence.values()]):\n            accession_dict = {}\n            for infile in file_correspondence.keys():\n                accession_dict[infile] = file_correspondence[infile][accession]\n\n            json_dict[accession] = accession_dict\n\n    out_file = open(\"consensus_{}.json\".format(\n        list_of_jsons[0].split(\".\")[0].split(\"_\")[-1]), \"w\")\n\n    out_file.write(json.dumps(json_dict))\n    out_file.close()\n\n    json_dic = {\n        \"patlas_mashscreen\": json_dict\n        # TODO add information for report webapp\n    }\n\n    with open(\".report.json\", \"w\") as json_report:\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n    main(LIST_OF_FILES)"
  },
  {
    "path": "flowcraft/templates/pipeline_status.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to collect pipeline run statistics (such as\ntime, cpu, RAM for each tasks) into a report JSON\n\nExpected input\n--------------\n\n- ``trace_file`` : *Trace file generated by nextflow*\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.0\"\n__build__ = \"16012018\"\n__template__ = \"pipeline_status-nf\"\n\n\nimport os\nimport json\nimport traceback\n\nfrom os.path import join\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\nLOG_STATS = \".pipeline_status.json\"\n\nif __file__.endswith(\".command.sh\"):\n    fastq_id = 'sample_id'\n    TRACE_FILE = 'pipeline_stats.txt'\n    WORKDIR = '${workflow.projectDir}'\n\n\ndef get_json_info(fields, header):\n    \"\"\"\n\n    Parameters\n    ----------\n    fields\n\n    Returns\n    -------\n\n    \"\"\"\n\n    json_dic = dict((x, y) for x, y in zip(header, fields))\n\n    return json_dic\n\n\ndef get_previous_stats(stats_path):\n    \"\"\"\n\n    Parameters\n    ----------\n    workdir\n\n    Returns\n    -------\n\n    \"\"\"\n\n    logger.debug(\"Path to pipeline status data set to: {}\".format(stats_path))\n    if os.path.exists(stats_path):\n        logger.debug(\"Existing pipeline status data found. Loading JSON.\")\n        with open(stats_path) as fh:\n            stats_json = json.load(fh)\n\n    else:\n        logger.debug(\"No pipeline status data found.\")\n        stats_json = {}\n\n    return stats_json\n\n\n@MainWrapper\ndef main(sample_id, trace_file, workdir):\n    \"\"\"\n    Parses a nextflow trace file, searches for processes with a specific tag\n    and sends a JSON report with the relevant information\n\n    The expected fields for the trace file are::\n\n        0. task_id\n        1. process\n        2. tag\n        3. status\n        4. exit code\n        5. start timestamp\n        6. container\n        7. cpus\n        8. duration\n        9. realtime\n        10. queue\n        11. cpu percentage\n        12. memory percentage\n        13. real memory size of the process\n        14. virtual memory size of the process\n\n    Parameters\n    ----------\n    trace_file : str\n        Path to the nextflow trace file\n    \"\"\"\n\n    # Determine the path of the stored JSON for the sample_id\n    stats_suffix = \".stats.json\"\n    stats_path = join(workdir, sample_id + stats_suffix)\n    trace_path = join(workdir, trace_file)\n\n    logger.info(\"Starting pipeline status routine\")\n\n    logger.debug(\"Checking for previous pipeline status data\")\n    stats_array = get_previous_stats(stats_path)\n    logger.info(\"Stats JSON object set to : {}\".format(stats_array))\n\n    # Search for this substring in the tags field. Only lines with this\n    # tag will be processed for the reports\n    tag = \" getStats\"\n    logger.debug(\"Tag variable set to: {}\".format(tag))\n\n    logger.info(\"Starting parsing of trace file: {}\".format(trace_path))\n    with open(trace_path) as fh:\n\n        header = next(fh).strip().split()\n        logger.debug(\"Header set to: {}\".format(header))\n\n        for line in fh:\n            fields = line.strip().split(\"\\t\")\n            # Check if tag substring is in the tag field of the nextflow trace\n            if tag in fields[2] and fields[3] == \"COMPLETED\":\n                logger.debug(\n                    \"Parsing trace line with COMPLETED status: {}\".format(\n                        line))\n                current_json = get_json_info(fields, header)\n\n                stats_array[fields[0]] = current_json\n            else:\n                logger.debug(\n                    \"Ignoring trace line without COMPLETED status\"\n                    \" or stats specific tag: {}\".format(\n                        line))\n\n    with open(join(stats_path), \"w\") as fh, open(\".report.json\", \"w\") as rfh:\n        fh.write(json.dumps(stats_array, separators=(\",\", \":\")))\n        rfh.write(json.dumps(stats_array, separators=(\",\", \":\")))\n\n\nif __name__ == \"__main__\":\n\n    main(fastq_id, TRACE_FILE, WORKDIR)\n"
  },
  {
    "path": "flowcraft/templates/process_abricate.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended parse the results of the Abricate for one or more\nsamples.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``abricate_files`` : Path to abricate output file.\n    - e.g.: ``'abr_resfinder.tsv'``\n\nGenerated output\n----------------\n\nNone\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"26032018\"\n__template__ = \"process_abricate-nf\"\n\nimport re\nimport os\nimport json\nimport operator\nimport subprocess\n\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_abricate():\n\n    try:\n\n        # Get abricate version\n        cli = [\"abricate\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[-1].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    try:\n\n        # Get abricate database versions\n        cli = [\"abricate\", \"--list\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        dbout, _ = p.communicate()\n\n        databases = [[u.decode(\"utf8\") for u in i.strip().split()]\n                     for i in dbout.splitlines()][1:]\n\n    except Exception as e:\n        logger.debug(e)\n        databases = \"undefined\"\n\n    return {\n        \"program\": \"abricate\",\n        \"version\": version,\n        \"databases\": databases\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    ABRICATE_FILES = '$abricate_file'.split()\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"ABRICATE_FILE: {}\".format(ABRICATE_FILES))\n\n\nclass Abricate:\n    \"\"\"Main parser for Abricate output files.\n\n    This class parses one or more output files from Abricate, usually from\n    different databases. In addition to the parsing methods, it also provides\n    a flexible method to filter and re-format the content of the abricate\n    files.\n\n    Parameters\n    ----------\n    fls : list\n       List of paths to Abricate output files.\n    \"\"\"\n\n    def __init__(self, fls):\n\n        self.storage = {}\n        \"\"\"\n        dic: Main storage of Abricate's file content. Each entry corresponds\n        to a single line and contains the keys::\n\n            - ``log_file``: Name of the summary log file containing abricate\n              results\n            - ``infile``: Input file of Abricate.\n            - ``reference``: Reference of the query sequence.\n            - ``seq_range``: Range of the query sequence in the database\n             sequence.\n            - ``gene``: AMR gene name.\n            - ``accession``: The genomic source of the sequence.\n            - ``database``: The database the sequence came from.\n            - ``coverage``: Proportion of gene covered.\n            - ``identity``: Proportion of exact nucleotide matches.\n        \"\"\"\n\n        self._key = 0\n        \"\"\"\n        int: Arbitrary key for unique entries in the storage attribute\n        \"\"\"\n\n        self.parse_files(fls)\n\n    def parse_files(self, fls):\n        \"\"\"Public method for parsing abricate output files.\n\n        This method is called at at class instantiation for the provided\n        output files. Additional abricate output files can be added using\n        this method after the class instantiation.\n\n        Parameters\n        ----------\n        fls : list\n            List of paths to Abricate files\n\n        \"\"\"\n\n        for f in fls:\n            # Make sure paths exists\n            if os.path.exists(f):\n                self._parser(f)\n            else:\n                logger.warning(\"File {} does not exist\".format(f))\n\n    def _parser(self, fl):\n        \"\"\"Parser for a single abricate output file.\n\n        This parser will scan a single Abricate output file and populate\n        the :py:attr:`Abricate.storage` attribute.\n\n        Parameters\n        ----------\n        fl : str\n            Path to abricate output file\n\n        Notes\n        -----\n        This method will populate the :py:attr:`Abricate.storage` attribute\n        with all compliant lines in the abricate output file. Entries are\n        inserted using an arbitrary key that is set by the\n        :py:attr:`Abricate._key` attribute.\n\n        \"\"\"\n\n        with open(fl) as fh:\n\n            for line in fh:\n                # Skip header and comment lines\n                if line.startswith(\"#\") or line.strip() == \"\":\n                    continue\n\n                fields = line.strip().split(\"\\t\")\n\n                try:\n                    coverage = float(fields[8])\n                except ValueError:\n                    coverage = None\n                try:\n                    identity = float(fields[9])\n                except ValueError:\n                    identity = None\n\n                try:\n                    accession = fields[11]\n                except IndexError:\n                    accession = None\n\n                self.storage[self._key] = {\n                    \"log_file\": os.path.basename(fl),\n                    \"infile\": fields[0],\n                    \"reference\": fields[1],\n                    \"seq_range\": (int(fields[2]), int(fields[3])),\n                    \"gene\": fields[4],\n                    \"accession\": accession,\n                    \"database\": fields[10],\n                    \"coverage\": coverage,\n                    \"identity\": identity\n                }\n\n                self._key += 1\n\n    @staticmethod\n    def _test_truth(x, op, y):\n        \"\"\" Test the truth of a comparison between x and y using an operator.\n\n        If you want to compare '100 > 200', this method can be called as\n        self._test_truth(100, \">\", 200).\n\n        Parameters\n        ----------\n        x : int\n            Arbitrary value to compare in the left.\n        op : str\n            Comparison operator.\n        y : int\n            Arbitrary value to compare in the right.\n\n        Returns\n        -------\n        x : bool\n            The 'truthness' of the test.\n        \"\"\"\n\n        ops = {\n            \">\": operator.gt,\n            \"<\": operator.lt,\n            \">=\": operator.ge,\n            \"<=\": operator.le,\n            \"==\": operator.eq,\n            \"!=\": operator.ne\n        }\n\n        return ops[op](x, y)\n\n    def iter_filter(self, filters, databases=None, fields=None,\n                    filter_behavior=\"and\"):\n        \"\"\"General purpose filter iterator.\n\n        This general filter iterator allows the filtering of entries based\n        on one or more custom filters. These filters must contain\n        an entry of the `storage` attribute, a comparison operator, and the\n        test value. For example, to filter out entries with coverage below 80::\n\n            my_filter = [\"coverage\", \">=\", 80]\n\n        Filters should always be provide as a list of lists::\n\n            iter_filter([[\"coverage\", \">=\", 80]])\n            # or\n            my_filters = [[\"coverage\", \">=\", 80],\n                          [\"identity\", \">=\", 50]]\n\n            iter_filter(my_filters)\n\n        As a convenience, a list of the desired databases can be directly\n        specified using the `database` argument, which will only report\n        entries for the specified databases::\n\n            iter_filter(my_filters, databases=[\"plasmidfinder\"])\n\n        By default, this method will yield the complete entry record. However,\n        the returned filters can be specified using the `fields` option::\n\n            iter_filter(my_filters, fields=[\"reference\", \"coverage\"])\n\n        Parameters\n        ----------\n        filters : list\n            List of lists with the custom filter. Each list should have three\n            elements. (1) the key from the entry to be compared; (2) the\n            comparison operator; (3) the test value. Example:\n                ``[[\"identity\", \">\", 80]]``.\n        databases : list\n            List of databases that should be reported.\n        fields : list\n            List of fields from each individual entry that are yielded.\n        filter_behavior : str\n            options: ``'and'`` ``'or'``\n            Sets the behaviour of the filters, if multiple filters have been\n            provided. By default it is set to ``'and'``, which means that an\n            entry has to pass all filters. It can be set to ``'or'``, in which\n            case one one of the filters has to pass.\n\n        yields\n        ------\n        dic : dict\n            Dictionary object containing a :py:attr:`Abricate.storage` entry\n            that passed the filters.\n\n        \"\"\"\n\n        if filter_behavior not in [\"and\", \"or\"]:\n            raise ValueError(\"Filter behavior must be either 'and' or 'or'\")\n\n        for dic in self.storage.values():\n\n            # This attribute will determine whether an entry will be yielded\n            # or not\n            _pass = False\n\n            # Stores the flags with the test results for each filter\n            # The results will be either True or False\n            flag = []\n\n            # Filter for databases\n            if databases:\n                # Skip entry if not in specified database\n                if dic[\"database\"] not in databases:\n                    continue\n\n            # Apply filters\n            for f in filters:\n                # Get value of current filter\n                val = dic[f[0]]\n                if not self._test_truth(val, f[1], f[2]):\n                    flag.append(False)\n                else:\n                    flag.append(True)\n\n            # Test whether the entry will pass based on the test results\n            # and the filter behaviour\n            if filter_behavior == \"and\":\n                if all(flag):\n                    _pass = True\n            elif filter_behavior == \"or\":\n                if any(flag):\n                    _pass = True\n\n            if _pass:\n                if fields:\n                    yield dict((x, y) for x, y in dic.items() if x in fields)\n                else:\n                    yield dic\n\n    def get_filter(self, *args, **kwargs):\n        \"\"\" Wrapper of the iter_filter method that returns a list with results\n\n        It should be called exactly as in the `iter_filter`\n\n        Returns\n        -------\n        _ : list\n            List of dictionary entries that passed the filters in the\n            `iter_filter` method.\n\n        See Also\n        --------\n        iter_filter\n        \"\"\"\n\n        return list(self.iter_filter(*args, **kwargs))\n\n\nclass AbricateReport(Abricate):\n    \"\"\"Report generator for single Abricate output files\n\n    This class is intended to parse an Abricate output file from a single\n    sample and database and generates a JSON report for the report webpage.\n\n    Parameters\n    ----------\n    fls : list\n       List of paths to Abricate output files.\n    database : (optional) str\n        Name of the database for the current report. If not provided, it will\n        be inferred based on the first entry of the Abricate file.\n    \"\"\"\n\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n\n    @staticmethod\n    def _get_contig_id(contig_str):\n        \"\"\"Tries to retrieve contig id. Returns the original string if it\n        is unable to retrieve the id.\n\n        Parameters\n        ----------\n        contig_str : str\n            Full contig string (fasta header)\n\n        Returns\n        -------\n        str\n            Contig id\n        \"\"\"\n\n        contig_id = contig_str\n\n        try:\n            contig_id = re.search(\".*NODE_([0-9]*)_.*\", contig_str).group(1)\n        except AttributeError:\n            pass\n\n        try:\n            contig_id = re.search(\".*Contig_([0-9]*)_.*\", contig_str).group(1)\n        except AttributeError:\n            pass\n\n        return contig_id\n\n    def get_plot_data(self):\n        \"\"\" Generates the JSON report to plot the gene boxes\n\n        Following the convention of the reports platform, this method returns\n        a list of JSON/dict objects with the information about each entry in\n        the abricate file. The information contained in this JSON is::\n\n            {contig_id: <str>,\n             seqRange: [<int>, <int>],\n             gene: <str>,\n             accession: <str>,\n             coverage: <float>,\n             identity: <float>\n             }\n\n        Note that the `seqRange` entry contains the position in the\n        corresponding contig, not the absolute position in the whole assembly.\n\n        Returns\n        -------\n        json_dic : list\n            List of JSON/dict objects with the report data.\n        \"\"\"\n\n        json_dic = {\"plotData\": []}\n        sample_dic = {}\n        sample_assembly_map = {}\n\n        for entry in self.storage.values():\n\n            sample_id = re.match(\"(.*)_abr\", entry[\"log_file\"]).groups()[0]\n            if sample_id not in sample_dic:\n                sample_dic[sample_id] = {}\n\n            # Get contig ID using the same regex as in `assembly_report.py`\n            # template\n            contig_id = self._get_contig_id(entry[\"reference\"])\n            # Get database\n            database = entry[\"database\"]\n            if database not in sample_dic[sample_id]:\n                sample_dic[sample_id][database] = []\n\n            # Update the sample-assembly correspondence dict\n            if sample_id not in sample_assembly_map:\n                sample_assembly_map[sample_id] = entry[\"infile\"]\n\n            sample_dic[sample_id][database].append(\n                {\"contig\": contig_id,\n                 \"seqRange\": entry[\"seq_range\"],\n                 \"gene\": entry[\"gene\"].replace(\"'\", \"\"),\n                 \"accession\": entry[\"accession\"],\n                 \"coverage\": entry[\"coverage\"],\n                 \"identity\": entry[\"identity\"],\n                 },\n            )\n\n        for sample, data in sample_dic.items():\n            json_dic[\"plotData\"].append(\n                {\n                    \"sample\": sample,\n                    \"data\": {\"abricateXrange\": data},\n                    \"assemblyFile\": sample_assembly_map[sample]\n                }\n            )\n\n        return json_dic\n\n    def get_table_data(self):\n        \"\"\"\n\n        Returns\n        -------\n\n        \"\"\"\n\n        gene_storage = {}\n        json_dic = {\"tableRow\": []}\n        logger.info(\"Generating JSON table data\")\n\n        # Collect the gene lists for each database\n        for key, entry in self.storage.items():\n\n            # Retrieve and initiate new sample entry, if not present already\n            logger.debug(\"Retrieving sample if from: {}\".format(\n                entry[\"infile\"]))\n            sample_id = re.match(\"(.*)_abr\", entry[\"log_file\"]).groups()[0]\n            database = entry[\"database\"]\n\n            if sample_id not in gene_storage:\n                gene_storage[sample_id] = {}\n\n            if database not in gene_storage[sample_id]:\n                gene_storage[sample_id][database] = []\n\n            gene_storage[sample_id][database].append(\n                entry[\"gene\"].replace(\"'\", \"\").replace('\"', '')\n            )\n\n        # For each database, create the JSON report\n        for sample, table_data in gene_storage.items():\n\n            json_dic[\"tableRow\"].append({\n                \"sample\": sample,\n                \"data\": []\n            })\n\n            for db, gene_list in table_data.items():\n\n                ind_json = {\n                    \"table\": \"abricate\",\n                    \"header\": db,\n                    \"value\": len(gene_list),\n                    \"geneList\": gene_list\n                }\n                json_dic[\"tableRow\"][-1][\"data\"].append(ind_json)\n\n        return json_dic\n\n    def write_report_data(self):\n        \"\"\"Writes the JSON report to a json file\n        \"\"\"\n\n        json_plot = self.get_plot_data()\n        json_table = self.get_table_data()\n\n        json_dic = {**json_plot, **json_table}\n\n        with open(\".report.json\", \"w\") as json_report:\n            json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\nif __name__ == '__main__':\n\n    @MainWrapper\n    def main(abr_file):\n\n        abr = AbricateReport(fls=abr_file)\n        abr.write_report_data()\n\n    main(ABRICATE_FILES)\n"
  },
  {
    "path": "flowcraft/templates/process_assembly.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to process the output of assemblies from a single\nsample from programs such as Spades or Skesa.\nThe main input is an assembly file produced by an assembler, which will then be\nfiltered according to user-specified parameters.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id``: Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``assembly``: Fasta file with the assembly.\n    - e.g.: ``'contigs.fasta'``\n- ``opts``: List of options for processing spades assembly.\n    1. Minimum contig length.\n        - e.g.: ``'150'``\n    2. Minimum k-mer coverage.\n        - e.g.: ``'2'``\n    3. Maximum number of contigs per 1.5Mb.\n        - e.g.: ``'100'``\n- ``assembler``: The name of the assembler\n    - e.g.: ``spades``\n\nGenerated output\n----------------\n\n(Values within ``${}`` are substituted by the corresponding variable.)\n\n- ``'${sample_id}.assembly.fasta'`` : Fasta file with the filtered assembly.\n    - e.g.: ``'Sample1.assembly.fasta'``\n- ``${sample_id}.report.fasta`` : CSV file with the results of the filters for\\\n    each contig.\n    - e.g.: ``'Sample1.report.csv'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"11042018\"\n__template__ = \"process_assembly-nf\"\n\nimport os\nimport json\nimport operator\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY_FILE = '$assembly'\n    GSIZE = float('$gsize')\n    OPTS = [x.strip() for x in '$opts'.strip(\"[]\").split(\",\")]\n    ASSEMBLER = '$assembler'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"GSIZE: {}\".format(GSIZE))\n    logger.debug(\"OPTS: {}\".format(OPTS))\n    logger.debug(\"ASSEMBLER: {}\".format(ASSEMBLER))\n\n\nclass Assembly:\n    \"\"\"Class that parses and filters a Fasta assembly file\n\n    This class parses an assembly fasta file, collects a number\n    of summary statistics and metadata from the contigs, filters\n    contigs based on user-defined metrics and writes filtered assemblies\n    and reports.\n\n    Parameters\n    ----------\n    assembly_file : str\n        Path to assembly file.\n    min_contig_len : int\n        Minimum contig length when applying the initial assembly filter.\n    min_kmer_cov : int\n        Minimum k-mer coverage when applying the initial assembly.\n        filter.\n    sample_id : str\n        Name of the sample for the current assembly.\n    \"\"\"\n\n    def __init__(self, assembly_file, min_contig_len, min_kmer_cov,\n                 sample_id):\n\n        self.contigs = {}\n        \"\"\"\n        dict: Dictionary storing data for each contig.\n        \"\"\"\n\n        self.filtered_ids = []\n        \"\"\"\n        list: List of filtered contig_ids.\n        \"\"\"\n\n        self.min_gc = 0.05\n        \"\"\"\n        float: Sets the minimum GC content on a contig.\n        \"\"\"\n\n        self.sample = sample_id\n        \"\"\"\n        str: The name of the sample for the assembly.\n        \"\"\"\n\n        self.report = {}\n        \"\"\"\n        dict: Will contain the filtering results for each contig.\n        \"\"\"\n\n        self.filters = [\n            [\"length\", \">=\", min_contig_len],\n            [\"kmer_cov\", \">=\", min_kmer_cov]\n        ]\n        \"\"\"\n        list: Setting initial filters to check when parsing the assembly file.\n        This can be later changed using the 'filter_contigs' method.\n        \"\"\"\n\n        # Parse assembly and populate self.contigs\n        self._parse_assembly(assembly_file)\n\n        # Perform first contig filtering using min_contig_len, min_kmer_cov,\n        # and gc content\n        self.filter_contigs(*self.filters)\n\n    @staticmethod\n    def _parse_coverage(header_str):\n        \"\"\"Attempts to retrieve the coverage value from the header string.\n\n        It splits the header by \"_\" and then screens the list backwards in\n        search of the first float value. This will be interpreted as the\n        coverage value. If it cannot find a float value, it returns None.\n        This search methodology is based on the strings of assemblers\n        like spades and skesa that put the mean kmer coverage for each\n        contig in its corresponding fasta header.\n\n        Parameters\n        ----------\n        header_str : str\n            String\n\n        Returns\n        -------\n        float or None\n            The coverage value for the contig. None if it cannot find the\n            value in the provide string.\n        \"\"\"\n\n        cov = None\n        for i in header_str.split(\"_\")[::-1]:\n            try:\n                cov = float(i)\n                break\n            except ValueError:\n                continue\n\n        return cov\n\n    def _parse_assembly(self, assembly_file):\n        \"\"\"Parse an assembly fasta file.\n\n        This is a Fasta parsing method that populates the\n        :py:attr:`~Assembly.contigs` attribute with data for each contig in the\n        assembly.\n\n        The insertion of data on the self.contigs is done by the\n        :py:meth:`Assembly._populate_contigs` method, which also calculates\n        GC content and proportions.\n\n        Parameters\n        ----------\n        assembly_file : str\n            Path to the assembly fasta file.\n\n        \"\"\"\n\n        # Temporary storage of sequence data\n        seq_temp = []\n        # Id counter for contig that will serve as key in self.contigs\n        contig_id = 0\n        # Initialize kmer coverage and header\n        cov, header = None, None\n\n        with open(assembly_file) as fh:\n\n            logger.debug(\"Starting iteration of assembly file: {}\".format(\n                assembly_file))\n            for line in fh:\n                # Skip empty lines\n                if not line.strip():\n                    continue\n                else:\n                    # Remove whitespace surrounding line for further processing\n                    line = line.strip()\n\n                if line.startswith(\">\"):\n                    # If a sequence has already been populated, save the\n                    # previous contig information\n                    if seq_temp:\n                        # Use join() to convert string list into the full\n                        # contig string. This is generally much more efficient\n                        # than successively concatenating strings.\n                        seq = \"\".join(seq_temp)\n\n                        logger.debug(\"Populating contig with contig_id '{}', \"\n                                     \"header '{}' and cov '{}'\".format(\n                                        contig_id, header, cov))\n                        self._populate_contigs(contig_id, header, cov, seq)\n\n                        # Reset temporary sequence storage\n                        seq_temp = []\n                        contig_id += 1\n\n                    header = line[1:]\n                    cov = self._parse_coverage(line)\n\n                else:\n                    seq_temp.append(line)\n\n            # Populate last contig entry\n            logger.debug(\"Populating contig with contig_id '{}', \"\n                         \"header '{}' and cov '{}'\".format(\n                            contig_id, header, cov))\n            seq = \"\".join(seq_temp)\n            self._populate_contigs(contig_id, header, cov, seq)\n\n    def _populate_contigs(self, contig_id, header, cov, sequence):\n        \"\"\" Inserts data from a single contig into\\\n         :py:attr:`~Assembly.contigs`.\n\n        By providing a contig id, the original header, the coverage that\n        is parsed from the header and the sequence, this method will\n        populate the :py:attr:`~Assembly.contigs` attribute.\n\n        Parameters\n        ----------\n        contig_id : int\n            Arbitrary unique contig identifier.\n        header : str\n            Original header of the current contig.\n        cov : float\n            The contig coverage, parsed from the fasta header\n        sequence : str\n            The complete sequence of the contig.\n\n        \"\"\"\n\n        # Get AT/GC/N counts and proportions.\n        # Note that self._get_gc_content returns a dictionary with the\n        # information on the GC/AT/N counts and proportions. This makes it\n        # much easier to add to the contigs attribute using the ** notation.\n        gc_kwargs = self._get_gc_content(sequence, len(sequence))\n        logger.debug(\"Populate GC content with: {}\".format(gc_kwargs))\n\n        self.contigs[contig_id] = {\n            \"header\": header,\n            \"sequence\": sequence,\n            \"length\": len(sequence),\n            \"kmer_cov\": cov,\n            **gc_kwargs\n        }\n\n    @staticmethod\n    def _get_gc_content(sequence, length):\n        \"\"\"Get GC content and proportions.\n\n        Parameters\n        ----------\n        sequence : str\n            The complete sequence of the contig.\n        length : int\n            The length of the sequence contig.\n\n        Returns\n        -------\n        x : dict\n            Dictionary with the at/gc/n counts and proportions\n\n        \"\"\"\n\n        # Get AT/GC/N counts\n        at = sum(map(sequence.count, [\"A\", \"T\"]))\n        gc = sum(map(sequence.count, [\"G\", \"C\"]))\n        n = length - (at + gc)\n\n        # Get AT/GC/N proportions\n        at_prop = at / length\n        gc_prop = gc / length\n        n_prop = n / length\n\n        return {\"at\": at, \"gc\": gc, \"n\": n,\n                \"at_prop\": at_prop, \"gc_prop\": gc_prop, \"n_prop\": n_prop}\n\n    @staticmethod\n    def _test_truth(x, op, y):\n        \"\"\" Test the truth of a comparisong between x and y using an \\\n        ``operator``.\n\n        If you want to compare '100 > 200', this method can be called as::\n\n            self._test_truth(100, \">\", 200).\n\n        Parameters\n        ----------\n        x : int\n            Arbitrary value to compare in the left\n        op : str\n            Comparison operator\n        y : int\n            Arbitrary value to compare in the rigth\n\n        Returns\n        -------\n        x : bool\n            The 'truthness' of the test\n        \"\"\"\n\n        ops = {\n            \">\": operator.gt,\n            \"<\": operator.lt,\n            \">=\": operator.ge,\n            \"<=\": operator.le,\n        }\n\n        return ops[op](x, y)\n\n    def filter_contigs(self, *comparisons):\n        \"\"\"Filters the contigs of the assembly according to user provided\\\n        comparisons.\n\n        The comparisons must be a list of three elements with the\n        :py:attr:`~Assembly.contigs` key, operator and test value. For\n        example, to filter contigs with a minimum length of 250, a comparison\n        would be::\n\n            self.filter_contigs([\"length\", \">=\", 250])\n\n        The filtered contig ids will be stored in the\n        :py:attr:`~Assembly.filtered_ids` list.\n\n        The result of the test for all contigs will be stored in the\n        :py:attr:`~Assembly.report` dictionary.\n\n        Parameters\n        ----------\n        comparisons : list\n            List with contig key, operator and value to test.\n\n        \"\"\"\n\n        # Reset list of filtered ids\n        self.filtered_ids = []\n        self.report = {}\n\n        gc_filters = [\n            [\"gc_prop\", \">=\", self.min_gc],\n            [\"gc_prop\", \"<=\", 1 - self.min_gc]\n        ]\n\n        self.filters = list(comparisons) + gc_filters\n\n        logger.debug(\"Filtering contigs using filters: {}\".format(\n            self.filters))\n\n        for contig_id, contig in self.contigs.items():\n            for key, op, value in list(comparisons) + gc_filters:\n                if not self._test_truth(contig[key], op, value):\n                    self.filtered_ids.append(contig_id)\n                    self.report[contig_id] = \"{}/{}/{}\".format(key,\n                                                               contig[key],\n                                                               value)\n                    break\n                else:\n                    self.report[contig_id] = \"pass\"\n\n    def get_assembly_length(self):\n        \"\"\"Returns the length of the assembly, without the filtered contigs.\n\n        Returns\n        -------\n        x : int\n            Total length of the assembly.\n\n        \"\"\"\n\n        return sum(\n            [vals[\"length\"] for contig_id, vals in self.contigs.items()\n             if contig_id not in self.filtered_ids])\n\n    def write_assembly(self, output_file, filtered=True):\n        \"\"\"Writes the assembly to a new file.\n\n        The ``filtered`` option controls whether the new assembly will be\n        filtered or not.\n\n        Parameters\n        ----------\n        output_file : str\n            Name of the output assembly file.\n        filtered : bool\n            If ``True``, does not include filtered ids.\n        \"\"\"\n\n        logger.debug(\"Writing the filtered assembly into: {}\".format(\n            output_file))\n        with open(output_file, \"w\") as fh:\n\n            for contig_id, contig in self.contigs.items():\n                if contig_id not in self.filtered_ids and filtered:\n                    fh.write(\">{}_{}\\\\n{}\\\\n\".format(self.sample,\n                                                     contig[\"header\"],\n                                                     contig[\"sequence\"]))\n\n    def write_report(self, output_file):\n        \"\"\"Writes a report with the test results for the current assembly\n\n        Parameters\n        ----------\n        output_file : str\n            Name of the output assembly file.\n\n        \"\"\"\n\n        logger.debug(\"Writing the assembly report into: {}\".format(\n            output_file))\n        with open(output_file, \"w\") as fh:\n\n            for contig_id, vals in self.report.items():\n                fh.write(\"{}, {}\\\\n\".format(contig_id, vals))\n\n\n@MainWrapper\ndef main(sample_id, assembly_file, gsize, opts, assembler):\n    \"\"\"Main executor of the process_spades template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly_file : str\n        Path to the assembly file generated by Spades.\n    gsize : int\n        Estimate of genome size.\n    opts : list\n        List of options for processing spades assembly.\n    assembler : str\n        Name of the assembler, for logging purposes\n\n    \"\"\"\n\n    logger.info(\"Starting assembly file processing\")\n    warnings = []\n    fails = \"\"\n\n    min_contig_len, min_kmer_cov, max_contigs = [int(x) for x in opts]\n    logger.debug(\"Setting minimum conting length to: {}\".format(\n        min_contig_len))\n    logger.debug(\"Setting minimum kmer coverage: {}\".format(min_kmer_cov))\n\n    # Parse the spades assembly file and perform the first filtering.\n    logger.info(\"Starting assembly parsing\")\n    assembly_obj = Assembly(assembly_file, min_contig_len, min_kmer_cov,\n                               sample_id)\n\n    with open(\".warnings\", \"w\") as warn_fh:\n        t_80 = gsize * 1000000 * 0.8\n        t_150 = gsize * 1000000 * 1.5\n        # Check if assembly size of the first assembly is lower than 80% of the\n        # estimated genome size. If True, redo the filtering without the\n        # k-mer coverage filter\n        assembly_len = assembly_obj.get_assembly_length()\n        logger.debug(\"Checking assembly length: {}\".format(assembly_len))\n\n        if assembly_len < t_80:\n\n            logger.warning(\"Assembly size ({}) smaller than the minimum \"\n                           \"threshold of 80% of expected genome size. \"\n                           \"Applying contig filters without the k-mer \"\n                           \"coverage filter\".format(assembly_len))\n            assembly_obj.filter_contigs(*[\n                [\"length\", \">=\", min_contig_len]\n            ])\n\n            assembly_len = assembly_obj.get_assembly_length()\n            logger.debug(\"Checking updated assembly length: \"\n                         \"{}\".format(assembly_len))\n            if assembly_len < t_80:\n\n                warn_msg = \"Assembly size smaller than the minimum\" \\\n                           \" threshold of 80% of expected genome size: {}\".format(\n                                assembly_len)\n                logger.warning(warn_msg)\n                warn_fh.write(warn_msg)\n                fails = warn_msg\n\n        if assembly_len > t_150:\n\n            warn_msg = \"Assembly size ({}) larger than the maximum\" \\\n                       \" threshold of 150% of expected genome size.\".format(\n                            assembly_len)\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            fails = warn_msg\n\n        logger.debug(\"Checking number of contigs: {}\".format(\n            len(assembly_obj.contigs)))\n        contig_threshold = (max_contigs * gsize) / 1.5\n        if len(assembly_obj.contigs) > contig_threshold:\n\n            warn_msg = \"The number of contigs ({}) exceeds the threshold of \" \\\n                       \"{} contigs per 1.5Mb ({})\".format(\n                            len(assembly_obj.contigs),\n                            max_contigs,\n                            round(contig_threshold, 1))\n\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            warnings.append(warn_msg)\n\n    # Write filtered assembly\n    logger.debug(\"Renaming old assembly file to: {}\".format(\n        \"{}.old\".format(assembly_file)))\n    assembly_obj.write_assembly(\"{}_proc.fasta\".format(\n        os.path.splitext(assembly_file)[0]))\n    # Write report\n    output_report = \"{}.report.csv\".format(sample_id)\n    assembly_obj.write_report(output_report)\n    # Write json report\n    with open(\".report.json\", \"w\") as json_report:\n        json_dic = {\n            \"tableRow\": [{\n                \"sample\": sample_id,\n                \"data\": [\n                    {\"header\": \"Contigs ({})\".format(assembler),\n                     \"value\": len(assembly_obj.contigs),\n                     \"table\": \"assembly\",\n                     \"columnBar\": True},\n                    {\"header\": \"Assembled BP ({})\".format(assembler),\n                     \"value\": assembly_len,\n                     \"table\": \"assembly\",\n                     \"columnBar\": True}\n                ]\n            }],\n        }\n\n        if warnings:\n            json_dic[\"warnings\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": warnings\n            }]\n\n        if fails:\n            json_dic[\"fail\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": [fails]\n            }]\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY_FILE, GSIZE, OPTS, ASSEMBLER)\n"
  },
  {
    "path": "flowcraft/templates/process_assembly_mapping.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to process the coverage report from the\n:py:class:`assembly_mapping` process.\n\nTODO: Better purpose\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``assembly`` : Fasta assembly file.\n    - e.g.: ``'SH10761A.assembly.fasta'``\n- ``coverage`` : TSV file with the average coverage for each assembled contig.\n    - e.g.: ``'coverage.tsv'``\n- ``coverage_bp`` : TSV file with the coverage for each assembled bp.\n    - e.g.: ``'coverage.tsv'``\n- ``bam_file`` : BAM file with the alignment of reads to the genome.\n    - e.g.: ``'sorted.bam'``\n- ``opts`` : List of options for processing assembly mapping output.\n    1. Minimum coverage for assembled contigs. Can be``auto``.\n        - e.g.: ``'auto'`` or ``'10'``\n    2. Maximum number of contigs.\n        - e.g.: '100'\n- ``gsize``: Expected genome size.\n    - e.g.: ``'2.5'``\n\nGenerated output\n----------------\n- ``${sample_id}_filtered.assembly.fasta`` : Filtered assembly file in Fasta \\\n    format.\n    - e.g.: ``'SampleA_filtered.assembly.fasta'``\n- ``filtered.bam`` : BAM file with the same filtering as the assembly file.\n    - e.g.: ``filtered.bam``\n\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"09022018\"\n__template__ = \"process_assembly_mapping-nf\"\n\nimport os\nimport json\nimport shutil\nimport subprocess\n\nfrom subprocess import PIPE\nfrom collections import OrderedDict\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_samtools():\n\n    try:\n        cli = [\"samtools\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout = p.communicate()[0]\n\n        version = stdout.splitlines()[0].split()[1].decode(\"utf8\")\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"Samtools\",\n        \"version\": version\n    }\n\n\ndef __get_version_bowtie2():\n\n    try:\n        cli = [\"bowtie2\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout = p.communicate()[0]\n\n        version = stdout.splitlines()[0].split()[-1].decode(\"utf8\")\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"Bowtie2\",\n        \"version\": version\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY_FILE = '$assembly'\n    COVERAGE_FILE = '$coverage'\n    COVERAGE_BP_FILE = '$coverage_bp'\n    BAM_FILE = '$bam_file'\n    OPTS = [x.strip() for x in '$opts'.strip(\"[]\").split(\",\")]\n    GSIZE = float('$gsize')\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY_FILE: {}\".format(ASSEMBLY_FILE))\n    logger.debug(\"COVERAGE_FILE: {}\".format(COVERAGE_FILE))\n    logger.debug(\"COVERAGE_BP_FILE: {}\".format(COVERAGE_BP_FILE))\n    logger.debug(\"BAM_FILE: {}\".format(BAM_FILE))\n    logger.debug(\"MIN_ASSEMBLY_COVERAGE: {}\".format(OPTS))\n    logger.debug(\"GSIZE: {}\".format(GSIZE))\n\n\ndef parse_coverage_table(coverage_file):\n    \"\"\"Parses a file with coverage information into objects.\n\n    This function parses a TSV file containing coverage results for\n    all contigs in a given assembly and will build an ``OrderedDict``\n    with the information about their coverage and length.  The length\n    information is actually gathered from the contig header using a\n    regular expression that assumes the usual header produced by Spades::\n\n        contig_len = int(re.search(\"length_(.+?)_\", line).group(1))\n\n    Parameters\n    ----------\n    coverage_file : str\n        Path to TSV file containing the coverage results.\n\n    Returns\n    -------\n    coverage_dict : OrderedDict\n        Contains the coverage and length information for each contig.\n    total_size : int\n        Total size of the assembly in base pairs.\n    total_cov : int\n        Sum of coverage values across all contigs.\n    \"\"\"\n\n    # Stores the correspondence between a contig and the corresponding coverage\n    # e.g.: {\"contig_1\": {\"cov\": 424} }\n    coverage_dict = OrderedDict()\n    # Stores the total coverage\n    total_cov = 0\n\n    with open(coverage_file) as fh:\n        for line in fh:\n            # Get contig and coverage\n            contig, cov = line.strip().split()\n            coverage_dict[contig] = {\"cov\": int(cov)}\n            # Add total coverage\n            total_cov += int(cov)\n            logger.debug(\"Processing contig '{}' with coverage '{}'\"\n                         \"\".format(contig, cov))\n\n    return coverage_dict, total_cov\n\n\ndef filter_assembly(assembly_file, minimum_coverage, coverage_info,\n                    output_file):\n    \"\"\"Generates a filtered assembly file.\n\n    This function generates a filtered assembly file based on an original\n    assembly and a minimum coverage threshold.\n\n    Parameters\n    ----------\n    assembly_file : str\n        Path to original assembly file.\n    minimum_coverage : int or float\n        Minimum coverage required for a contig to pass the filter.\n    coverage_info : OrderedDict or dict\n        Dictionary containing the coverage information for each contig.\n    output_file : str\n        Path where the filtered assembly file will be generated.\n\n    \"\"\"\n\n    # This flag will determine whether sequence data should be written or\n    # ignored because the current contig did not pass the minimum\n    # coverage threshold\n    write_flag = False\n\n    with open(assembly_file) as fh, open(output_file, \"w\") as out_fh:\n\n        for line in fh:\n            if line.startswith(\">\"):\n                # Reset write_flag\n                write_flag = False\n                # Get header of contig\n                header = line.strip()[1:]\n                # Check coverage for current contig\n                contig_cov = coverage_info[header][\"cov\"]\n                # If the contig coverage is above the threshold, write to\n                # output filtered assembly\n                if contig_cov >= minimum_coverage:\n                    write_flag = True\n                    out_fh.write(line)\n\n            elif write_flag:\n                out_fh.write(line)\n\n\ndef filter_bam(coverage_info, bam_file, min_coverage, output_bam):\n    \"\"\"Uses Samtools to filter a BAM file according to minimum coverage\n\n    Provided with a minimum coverage value, this function will use Samtools\n    to filter a BAM file. This is performed to apply the same filter to\n    the BAM file as the one applied to the assembly file in\n    :py:func:`filter_assembly`.\n\n    Parameters\n    ----------\n    coverage_info : OrderedDict or dict\n        Dictionary containing the coverage information for each contig.\n    bam_file : str\n        Path to the BAM file.\n    min_coverage : int\n        Minimum coverage required for a contig to pass the filter.\n    output_bam : str\n        Path to the generated filtered BAM file.\n    \"\"\"\n\n    # Get list of contigs that will be kept\n    contig_list = [x for x, vals in coverage_info.items()\n                   if vals[\"cov\"] >= min_coverage]\n\n    cli = [\n        \"samtools\",\n        \"view\",\n        \"-bh\",\n        \"-F\",\n        \"4\",\n        \"-o\",\n        output_bam,\n        \"-@\",\n        \"1\",\n        bam_file,\n    ]\n\n    cli += contig_list\n\n    logger.debug(\"Runnig samtools view subprocess with command: {}\".format(\n        cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished samtools view subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished samtools view subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished samtools view with return code: {}\".format(\n        p.returncode))\n\n    if not p.returncode:\n        # Create index\n        cli = [\n            \"samtools\",\n            \"index\",\n            output_bam\n        ]\n\n        logger.debug(\"Runnig samtools index subprocess with command: \"\n                     \"{}\".format(cli))\n\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, stderr = p.communicate()\n\n        try:\n            stderr = stderr.decode(\"utf8\")\n            stdout = stdout.decode(\"utf8\")\n        except (UnicodeDecodeError, AttributeError):\n            stderr = str(stderr)\n            stdout = str(stdout)\n\n        logger.info(\"Finished samtools index subprocess with STDOUT:\\\\n\"\n                    \"======================================\\\\n{}\".format(\n            stdout))\n        logger.info(\"Fished samtools index subprocesswith STDERR:\\\\n\"\n                    \"======================================\\\\n{}\".format(\n            stderr))\n        logger.info(\"Finished samtools index with return code: {}\".format(\n            p.returncode))\n\n\ndef check_filtered_assembly(coverage_info, coverage_bp, minimum_coverage,\n                            genome_size, contig_size, max_contigs,\n                            sample_id):\n    \"\"\"Checks whether a filtered assembly passes a size threshold\n\n    Given a minimum coverage threshold, this function evaluates whether an\n    assembly will pass the minimum threshold of ``genome_size * 1e6 * 0.8``,\n    which means 80% of the expected genome size or the maximum threshold\n    of ``genome_size * 1e6 * 1.5``, which means 150% of the expected genome\n    size. It will issue a warning if any of these thresholds is crossed.\n    In the case of an expected genome size below 80% it will return False.\n\n    Parameters\n    ----------\n    coverage_info : OrderedDict or dict\n        Dictionary containing the coverage information for each contig.\n    coverage_bp : dict\n        Dictionary containing the per base coverage information for each\n        contig. Used to determine the total number of base pairs in the\n        final assembly.\n    minimum_coverage : int\n        Minimum coverage required for a contig to pass the filter.\n    genome_size : int\n        Expected genome size.\n    contig_size : dict\n        Dictionary with the len of each contig. Contig headers as keys and\n        the corresponding lenght as values.\n    max_contigs : int\n        Maximum threshold for contig number. A warning is issued if this\n        threshold is crossed.\n    sample_id : str\n        Id or name of the current sample\n\n    Returns\n    -------\n    x : bool\n        True if the filtered assembly size is higher than 80% of the\n        expected genome size.\n\n    \"\"\"\n\n    # Get size of assembly after filtering contigs below minimum_coverage\n    assembly_len = sum([v for k, v in contig_size.items()\n                        if coverage_info[k][\"cov\"] >= minimum_coverage])\n    logger.debug(\"Assembly length after filtering for minimum coverage of\"\n                 \" {}: {}\".format(minimum_coverage, assembly_len))\n    # Get number of contigs after filtering\n    ncontigs = len([x for x in coverage_info.values()\n                    if x[\"cov\"] >= minimum_coverage])\n    logger.debug(\"Number of contigs: {}\".format(ncontigs))\n    # Get number of bp after filtering\n    filtered_contigs = [k for k, v in coverage_info.items()\n                        if v[\"cov\"] >= minimum_coverage]\n    logger.debug(\"Filtered contigs for minimum coverage of \"\n                 \"{}: {}\".format(minimum_coverage, filtered_contigs))\n    total_assembled_bp = sum([sum(coverage_bp[x]) for x in filtered_contigs\n                              if x in coverage_bp])\n    logger.debug(\"Total number of assembled base pairs:\"\n                 \"{}\".format(total_assembled_bp))\n\n    warnings = []\n    fails = []\n    health = True\n\n    with open(\".warnings\", \"w\") as warn_fh, \\\n            open(\".report.json\", \"w\") as json_report:\n\n        logger.debug(\"Checking assembly size after filtering : {}\".format(\n            assembly_len))\n\n        # If the filtered assembly size is above the 150% genome size\n        # threshold, issue a warning\n        if assembly_len > genome_size * 1e6 * 1.5:\n            warn_msg = \"Assembly size ({}) smaller than the maximum\" \\\n                       \" threshold of 150% of expected genome size.\".format(\n                            assembly_len)\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            fails.append(\"Large_genome_size_({})\".format(assembly_len))\n\n        # If the number of contigs in the filtered assembly size crosses the\n        # max_contigs threshold, issue a warning\n        logger.debug(\"Checking number of contigs: {}\".format(\n                len(coverage_info)))\n        contig_threshold = max_contigs * genome_size / 1.5\n        if ncontigs > contig_threshold:\n            warn_msg = \"The number of contigs ({}) exceeds the threshold of \" \\\n                       \"100 contigs per 1.5Mb ({})\".format(\n                            ncontigs, round(contig_threshold, 1))\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            warnings.append(warn_msg)\n\n        # If the filtered assembly size falls below the 80% genome size\n        # threshold, fail this check and return False\n        if assembly_len < genome_size * 1e6 * 0.8:\n            warn_msg = \"Assembly size smaller than the minimum\" \\\n                       \" threshold of 80% of expected genome size: {}\".format(\n                            assembly_len)\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            fails.append(\"Small_genome_size_({})\".format(assembly_len))\n            assembly_len = sum([v for v in contig_size.values()])\n            total_assembled_bp = sum(\n                [sum(coverage_bp[x]) for x in coverage_info if x in\n                 coverage_bp])\n            logger.debug(\"Assembly length without coverage filtering: \"\n                         \"{}\".format(assembly_len))\n            logger.debug(\"Total number of assembled base pairs without\"\n                         \" filtering: {}\".format(total_assembled_bp))\n\n            health = False\n\n        json_dic = {\n            \"plotData\": [{\n                \"sample\": sample_id,\n                \"data\": {\n                    \"sparkline\": total_assembled_bp\n                }\n            }]\n        }\n\n        if warnings:\n            json_dic[\"warnings\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": warnings\n            }]\n        if fails:\n            json_dic[\"fail\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": [fails]\n            }]\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    return health\n\n\ndef get_coverage_from_file(coverage_file):\n    \"\"\"\n\n    Parameters\n    ----------\n    coverage_file\n\n    Returns\n    -------\n\n    \"\"\"\n\n    contig_coverage = {}\n\n    with open(coverage_file) as fh:\n        for line in fh:\n\n            fields = line.strip().split()\n\n            # Get header\n            header = fields[0]\n            coverage = int(fields[2])\n\n            if header not in contig_coverage:\n                contig_coverage[header] = [coverage]\n            else:\n                contig_coverage[header].append(coverage)\n\n    return contig_coverage\n\n\ndef evaluate_min_coverage(coverage_opt, assembly_coverage, assembly_size):\n    \"\"\" Evaluates the minimum coverage threshold from the value provided in\n    the coverage_opt.\n\n    Parameters\n    ----------\n    coverage_opt : str or int or float\n        If set to \"auto\" it will try to automatically determine the coverage\n        to 1/3 of the assembly size, to a minimum value of 10. If it set\n        to a int or float, the specified value will be used.\n    assembly_coverage : int or float\n        The average assembly coverage for a genome assembly. This value\n        is retrieved by the `:py:func:parse_coverage_table` function.\n    assembly_size : int\n        The size of the genome assembly. This value is retrieved by the\n        `py:func:get_assembly_size` function.\n\n    Returns\n    -------\n    x: int\n        Minimum coverage threshold.\n\n    \"\"\"\n\n    if coverage_opt == \"auto\":\n        # Get the 1/3 value of the current assembly coverage\n        min_coverage = (assembly_coverage / assembly_size) * .3\n        logger.info(\"Minimum assembly coverage automatically set to: \"\n                    \"{}\".format(min_coverage))\n        # If the 1/3 coverage is lower than 10, change it to the minimum of\n        # 10\n        if min_coverage < 10:\n            logger.info(\"Minimum assembly coverage cannot be set to lower\"\n                        \" that 10. Setting to 10\")\n            min_coverage = 10\n    else:\n        min_coverage = int(coverage_opt)\n        logger.info(\"Minimum assembly coverage manually set to: {}\".format(\n            min_coverage))\n\n    return min_coverage\n\n\ndef get_assembly_size(assembly_file):\n    \"\"\"Returns the number of nucleotides and the size per contig for the\n    provided assembly file path\n\n    Parameters\n    ----------\n    assembly_file : str\n        Path to assembly file.\n\n    Returns\n    -------\n    assembly_size : int\n        Size of the assembly in nucleotides\n    contig_size : dict\n        Length of each contig (contig name as key and length as value)\n\n    \"\"\"\n\n    assembly_size = 0\n    contig_size = {}\n    header = \"\"\n\n    with open(assembly_file) as fh:\n        for line in fh:\n\n            # Skip empty lines\n            if line.strip() == \"\":\n                continue\n\n            if line.startswith(\">\"):\n                header = line.strip()[1:]\n                contig_size[header] = 0\n\n            else:\n                line_len = len(line.strip())\n                assembly_size += line_len\n                contig_size[header] += line_len\n\n    return assembly_size, contig_size\n\n\n@MainWrapper\ndef main(sample_id, assembly_file, coverage_file, coverage_bp_file, bam_file,\n         opts, gsize):\n    \"\"\"Main executor of the process_assembly_mapping template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly_file : str\n        Path to assembly file in Fasta format.\n    coverage_file : str\n        Path to TSV file with coverage information for each contig.\n    coverage_bp_file : str\n        Path to TSV file with coverage information for each base.\n    bam_file : str\n        Path to BAM file.\n    opts : list\n        List of options for processing assembly mapping.\n    gsize : int\n        Expected genome size\n\n    \"\"\"\n\n    min_assembly_coverage, max_contigs = opts\n\n    logger.info(\"Starting assembly mapping processing\")\n\n    # Get coverage info, total size and total coverage from the assembly\n    logger.info(\"Parsing coverage table\")\n    coverage_info, a_cov = parse_coverage_table(coverage_file)\n    a_size, contig_size = get_assembly_size(assembly_file)\n    logger.info(\"Assembly processed with a total size of '{}' and coverage\"\n                \" of '{}'\".format(a_size, a_cov))\n    # Get number of assembled bp after filters\n    logger.info(\"Parsing coverage per bp table\")\n    coverage_bp_data = get_coverage_from_file(coverage_bp_file)\n\n    # Assess the minimum assembly coverage\n    min_coverage = evaluate_min_coverage(min_assembly_coverage, a_cov, a_size)\n\n    # Check if filtering the assembly using the provided min_coverage will\n    # reduce the final bp number to less than 80% of the estimated genome\n    # size.\n    # If the check below passes with True, then the filtered assembly\n    # is above the 80% genome size threshold.\n    filtered_assembly = \"{}_filt.fasta\".format(\n        os.path.splitext(assembly_file)[0])\n    filtered_bam = \"filtered.bam\"\n    logger.info(\"Checking filtered assembly\")\n    if check_filtered_assembly(coverage_info, coverage_bp_data, min_coverage,\n                               gsize, contig_size, int(max_contigs),\n                               sample_id):\n        # Filter assembly contigs based on the minimum coverage.\n        logger.info(\"Filtered assembly passed minimum size threshold\")\n        logger.info(\"Writting filtered assembly\")\n        filter_assembly(assembly_file, min_coverage, coverage_info,\n                        filtered_assembly)\n        logger.info(\"Filtering BAM file according to saved contigs\")\n        filter_bam(coverage_info, bam_file, min_coverage, filtered_bam)\n    # Could not filter the assembly as it would drop below acceptable\n    # length levels. Copy the original assembly to the output assembly file\n    # for compliance with the output channel\n    else:\n        shutil.copy(assembly_file, filtered_assembly)\n        shutil.copy(bam_file, filtered_bam)\n        shutil.copy(bam_file + \".bai\", filtered_bam + \".bai\")\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY_FILE, COVERAGE_FILE, COVERAGE_BP_FILE,\n         BAM_FILE, OPTS, GSIZE)\n"
  },
  {
    "path": "flowcraft/templates/process_concoct.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\nThis module is intended to process the output of concoct\n to generate a report in json format.\n\nExpected input\n--------------\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n- ``sample_id`` : Sample Identification string.\n- ``cluster``: concoct cluster output.\n\n\"\"\"\n\nimport json\nimport csv\nimport os\nfrom itertools import groupby\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n__version__ = \"1.0.0\"\n__build__ = \"22.05.2019\"\n__template__ = \"concoct-nf\"\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    CLUSTER = '$cluster'\n    CONTIGS = '$contigs'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"CLUSTER: {}\".format(CLUSTER))\n    logger.debug(\"CONTIGS: {}\".format(CONTIGS))\n\n\ndef parse_assembly(file):\n    \"\"\"\n    Simple fasta parser.\n    :param file: assembly file in fasta format\n    :return: dictionary containing the contigs in the assembly\n    \"\"\"\n\n    all_seqs = {}\n\n    with open(file, \"r\") as handle:\n        entry = (x[1] for x in groupby(handle, lambda line: line[0] == \">\"))\n        for header in entry:\n            contig_header = header.__next__()[1:].strip()\n            contig_seq = \"\".join(s.strip() for s in entry.__next__())\n            all_seqs[contig_header] = contig_seq\n\n    return all_seqs\n\n\ndef parse_cluster_csv(file):\n    \"\"\"\n    Simple csv parser for clustering file of concoct\n    :param file: clustering csv file\n    :return: dictionary containing the cluster id and the contigs in the cluster\n    \"\"\"\n\n    clusters = {}\n\n    reader = csv.reader(open(file), delimiter=',')\n    next(reader)  # skip header\n    for row in reader:\n        if row[1] in clusters:\n            clusters[row[1]].append(row[0])\n        else:\n            clusters[row[1]] = [row[0]]\n\n    return clusters\n\n\ndef get_GC(sequence):\n\n    return round(sum(1 for nucl in sequence if nucl in ['G', 'C'])/len(sequence)*100, 2)\n\n\ndef merge_data(contigs, clusters):\n    \"\"\"\n    Obtain genome size, cg content and number of contigs for concoct bins\n\n    :param contigs: dict with the sequences for the binned contigs\n    :param clusters: dict with the cluster and respective sequence headers\n    :return: dict with the statistics for each bin (cluster)\n    \"\"\"\n\n    binning = {}\n\n    for cluster_id in clusters.keys():\n        complete_sequence = ''\n        n_sequences = 0\n        for sequence in clusters[cluster_id]:\n            complete_sequence += contigs[sequence]\n            n_sequences += 1\n\n        binning[int(cluster_id)] = {\"Bin name\": cluster_id,\n                                    \"Contig number\": n_sequences,\n                                    \"Genome size\": len(complete_sequence),\n                                    \"GC content\": get_GC(complete_sequence)}\n\n    return binning\n\n\n\n@MainWrapper\ndef main(sample_id, cluster_file, contig_file):\n\n    seqs = parse_assembly(contig_file)\n\n    clusters = parse_cluster_csv(cluster_file)\n\n    bin_stats = merge_data(seqs, clusters)\n\n    report_list = [[\"Bin name\", \"Contig number\", \"Genome size\", \"GC content %\"]]\n\n    for key, value in sorted(bin_stats.items(), key=lambda x: x[0]):\n        print(\"{} : {}\".format(key, value))\n        report_list.append([value[\"Bin name\"],\n                            str(value[\"Contig number\"]),\n                            str(value[\"Genome size\"]),\n                            str(value[\"GC content\"])])\n\n    # this tsvData is a single object since it only has one element\n    # this data type expects full tables in tsv format\n    report_json = {\n        \"tsvData\": [{\n            \"sample\": sample_id,\n            \"data\": {}\n        }]\n    }\n\n    # web-app excepts a list with all the values in the table.\n    #  To expand this to other processes other than MaxBin2, this line needs to be reworked\n    report_json[\"tsvData\"][0][\"data\"][\"MaxBin2\"] = report_list\n\n    with open(\".report.json\", \"w\") as k:\n        k.write(json.dumps(report_json))\n\n\nif __name__ == \"__main__\":\n    main(SAMPLE_ID, CLUSTER, CONTIGS)\n"
  },
  {
    "path": "flowcraft/templates/process_mapping.py",
    "content": "#!/usr/bin/env python3\n\nimport re\nimport os\nimport json\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to process the output of mapping proces from a single\nsample from the program Bowtie for the report component.\nThe main input is an log file produced by the mapper.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id``: Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``bowtie_log``: Log file from the mapper.\n    - e.g.: ``'bowtie.log'``\n\nGenerated output\n----------------\n- ``.report.jason``: Data structure for the report\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"10.09.2018\"\n__template__ = \"remove_host-nf\"\n\nlogger = get_logger(__file__)\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    BOWTIE_LOG = '$bowtie_log'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"BOWTIE_LOG: {}\".format(BOWTIE_LOG))\n\n\n\nclass Bowtie:\n    \"\"\"\n    Class to parse and store the info in the bowtie log file.\n\n    \"\"\"\n\n    def __init__(self, sample_id, bowtie_log):\n\n        self.sample = sample_id\n        \"\"\"\n        str: The name of the sample for the assembly.\n        \"\"\"\n\n        self.n_reads = 0\n\n        self.align_0x = 0\n\n        self.align_1x = 0\n\n        self.align_mt1x = 0\n\n        self.overall_rate = 0.0\n\n        # Parse assembly and populate self.n_reads, self.align_0x, self.align_1x, self.align_mt1x and self.overall_rate\n        self.parse_log(bowtie_log)\n\n\n    def set_n_reads(self, n_reads):\n        self.n_reads = int(n_reads)\n\n\n    def set_align_0x(self,align_0x):\n        self.align_0x = align_0x\n\n\n    def set_align_1x(self,align_1x):\n        self.align_1x = align_1x\n\n\n    def set_align_mt1x(self,align_mt1x):\n        self.align_mt1x = align_mt1x\n\n\n    def set_overall_rate(self,overall_rate):\n        self.overall_rate = overall_rate\n\n\n    def parse_log(self, bowtie_log):\n        \"\"\"Parse a bowtie log file.\n\n        This is a bowtie log parsing method that populates the\n        :py:attr:`self.n_reads, self.align_0x, self.align_1x, self.align_mt1x and self.overall_rate` attributes with\n        data from the log file.\n\n        Disclamer: THIS METHOD IS HORRIBLE BECAUSE THE BOWTIE LOG IS HORRIBLE.\n\n        The insertion of data on the attribytes is done by the\n        :py:meth:`set_attribute method.\n\n        Parameters\n        ----------\n        bowtie_log : str\n            Path to the boetie log file.\n\n       \"\"\"\n\n        print(\"is here!\")\n\n        # Regexes - thanks to https://github.com/ewels/MultiQC/blob/master/multiqc/modules/bowtie2/bowtie2.py\n        regexes = {\n            'unpaired': {\n                'unpaired_aligned_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned 0 times\",\n                'unpaired_aligned_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned exactly 1 time\",\n                'unpaired_aligned_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned >1 times\"\n            },\n            'paired': {\n                'paired_aligned_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly 0 times\",\n                'paired_aligned_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly exactly 1 time\",\n                'paired_aligned_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned concordantly >1 times\",\n                'paired_aligned_discord_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned discordantly 1 time\",\n                'paired_aligned_discord_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned discordantly >1 times\",\n                'paired_aligned_mate_one': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned exactly 1 time\",\n                'paired_aligned_mate_multi': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned >1 times\",\n                'paired_aligned_mate_none': r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) aligned 0 times\"\n            }\n        }\n\n        #Missing parser for unpaired (not implemented in flowcraft yet)\n\n        with open(bowtie_log, \"r\") as f:\n            #Go through log file line by line\n            for l in f:\n\n                print(l)\n\n                #total reads\n                total = re.search(r\"(\\\\d+) reads; of these:\", l)\n                print(total)\n                if total:\n                    print(total)\n                    self.set_n_reads(total.group(1))\n\n\n                # Paired end reads aka the pain\n                paired = re.search(r\"(\\\\d+) \\\\([\\\\d\\\\.]+%\\\\) were paired; of these:\", l)\n                if paired:\n                    paired_total = int(paired.group(1))\n\n                    paired_numbers = {}\n\n                    # Do nested loop whilst we have this level of indentation\n                    l = f.readline()\n                    while l.startswith('    '):\n                        for k, r in regexes['paired'].items():\n                            match = re.search(r, l)\n                            if match:\n                                paired_numbers[k] = int(match.group(1))\n                        l = f.readline()\n\n\n                    align_zero_times = paired_numbers['paired_aligned_none'] + paired_numbers['paired_aligned_mate_none']\n                    if align_zero_times:\n                        self.set_align_0x(align_zero_times)\n\n                    align_one_time = paired_numbers['paired_aligned_one'] + paired_numbers['paired_aligned_mate_one']\n                    if align_one_time:\n                        self.set_align_1x(align_one_time)\n\n                    align_more_than_one_time = paired_numbers['paired_aligned_multi'] + paired_numbers['paired_aligned_mate_multi']\n                    if align_more_than_one_time:\n                        self.set_align_mt1x(align_more_than_one_time)\n\n\n                # Overall alignment rate\n                overall = re.search(r\"([\\\\d\\\\.]+)% overall alignment rate\", l)\n                if overall:\n                    self.overall_rate = float(overall.group(1))\n\n\n@MainWrapper\ndef main(sample_id, bowite_log):\n    \"\"\"Main executor of the process_mapping template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    boetie_log: str\n        Path to the log file generated by bowtie.\n\n    \"\"\"\n\n    logger.info(\"Starting mapping file processing\")\n    warnings = []\n    fails = \"\"\n\n    bowtie_info = Bowtie(sample_id, bowite_log)\n\n    print(bowtie_info.overall_rate)\n\n\n    with open(\".report.json\", \"w\") as json_report:\n        json_dic = {\n            \"tableRow\": [{\n                \"sample\": sample_id,\n                \"data\": [\n                    {\"header\": \"Reads\",\n                     \"value\": int(bowtie_info.n_reads),\n                     \"table\": \"mapping\",\n                     \"columnBar\": False},\n                    {\"header\": \"Unmapped\",\n                     \"value\": int(bowtie_info.align_0x),\n                     \"table\": \"mapping\",\n                     \"columnBar\": False},\n                    {\"header\": \"Mapped 1x\",\n                     \"value\": int(bowtie_info.align_1x),\n                     \"table\": \"mapping\",\n                     \"columnBar\": False},\n                    {\"header\": \"Mapped >1x\",\n                     \"value\": int(bowtie_info.align_mt1x),\n                     \"table\": \"mapping\",\n                     \"columnBar\": False},\n                    {\"header\": \"Overall alignment rate (%)\",\n                     \"value\": float(bowtie_info.overall_rate),\n                     \"table\": \"mapping\",\n                     \"columnBar\": False}\n                ]\n            }],\n        }\n\n        if warnings:\n            json_dic[\"warnings\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"mapping\",\n                \"value\": warnings\n            }]\n\n        if fails:\n            json_dic[\"fail\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"mapping\",\n                \"value\": [fails]\n            }]\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, BOWTIE_LOG)"
  },
  {
    "path": "flowcraft/templates/process_metabat.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\nThis module is intended to process the output of metaBAT\n to generate a report in json format.\n\nExpected input\n--------------\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n- ``sample_id`` : Sample Identification string.\n- ``cluster``: concoct cluster output.\n\n\"\"\"\n\nimport json\nimport csv\nimport os\nfrom itertools import groupby\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n__version__ = \"1.0.0\"\n__build__ = \"22.05.2019\"\n__template__ = \"concoct-nf\"\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    BINS = '$bins'.split()\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"BINS: {}\".format(BINS))\n\n\ndef parse_assembly(file):\n    \"\"\"\n    Simple fasta parser.\n    :param file: assembly file in fasta format\n    :return: dictionary containing the contigs in the assembly\n    \"\"\"\n\n    all_seqs = {}\n\n    with open(file, \"r\") as handle:\n        entry = (x[1] for x in groupby(handle, lambda line: line[0] == \">\"))\n        for header in entry:\n            contig_header = header.__next__()[1:].strip()\n            contig_seq = \"\".join(s.strip() for s in entry.__next__())\n            all_seqs[contig_header] = contig_seq\n\n    return all_seqs\n\n\ndef get_cg(sequence):\n\n    return round(sum(1 for nucl in sequence if nucl in ['G', 'C'])/len(sequence)*100, 2)\n\n\ndef get_bin_stats(bin_file):\n    n_contigs = 0\n    all_seq = \"\"\n\n    with open(bin_file, \"r\") as handle:\n        entry = (x[1] for x in groupby(handle, lambda line: line[0] == \">\"))\n        for header in entry:\n            n_contigs += 1\n            all_seq += \"\".join(s.strip() for s in entry.__next__())\n\n    return str(n_contigs), str(len(all_seq)), str(get_cg(all_seq))\n\n@MainWrapper\ndef main(sample_id, bins):\n\n    report_list = [[\"Bin name\", \"Contig number\", \"Genome size\", \"GC content %\"]]\n\n    if len(bins) == 1 and \"false_bin.fa\" not in bins:\n        ncontigs, gsize, gc = get_bin_stats(bins)\n        report_list.append([bins.split(\".\")[1], ncontigs, gsize, gc])\n    else:\n        for file in bins:\n            ncontigs, gsize, gc = get_bin_stats(file)\n            report_list.append([file.split(\".\")[1], ncontigs, gsize, gc])\n\n    # this tsvData is a single object since it only has one element\n    # this data type expects full tables in tsv format\n    report_json = {\n        \"tsvData\": [{\n            \"sample\": sample_id,\n            \"data\": {}\n        }]\n    }\n\n    # web-app excepts a list with all the values in the table.\n    #  To expand this to other processes other than MaxBin2, this line needs to be reworked\n    report_json[\"tsvData\"][0][\"data\"][\"MaxBin2\"] = report_list\n\n    with open(\".report.json\", \"w\") as k:\n        k.write(json.dumps(report_json))\n\n\nif __name__ == \"__main__\":\n    main(SAMPLE_ID, BINS)\n"
  },
  {
    "path": "flowcraft/templates/process_newick.py",
    "content": "#!/usr/bin/env python3\n\nimport os\nimport json\nimport dendropy\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to process the newick generated by\n a proces to generate a report. The newick tree will be \n rooted (midpoint). \n \n \nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``newick``: phylogenetic tree in newick format.\n\nGenerated output\n----------------\n- ``.report.jason``: Data structure for the report\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.2\"\n__build__ = \"28.12.2018\"\n__template__ = \"raxml-nf\"\n\nlogger = get_logger(__file__)\n\n\nif __file__.endswith(\".command.sh\"):\n    NEWICK = '$newick'\n    LABELS = '$label'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"NEWICK: {}\".format(NEWICK))\n    logger.debug(\"LABELS: {}\".format(LABELS))\n\n\n\n@MainWrapper\ndef main(newick, labels):\n    \"\"\"Main executor of the process_newick template.\n\n    Parameters\n    ----------\n    newick : str\n        path to the newick file.\n\n    \"\"\"\n\n    logger.info(\"Starting newick file processing\")\n\n    #load tree and midpoint root\n    tree = dendropy.Tree.get(file=open(newick, 'r'), schema=\"newick\")\n    tree.reroot_at_midpoint()\n\n    to_write_trees = tree.as_string(\"newick\").strip().replace(\"[&R] \", '').replace(' ', '_').replace(\"'\", \"\")\n\n    #add labels to replace taxon names in phylocanvas\n    labels_dict = {}\n\n    if labels == 'true':\n\n        original_labels = tree.update_taxon_namespace()\n\n        for item in original_labels:\n\n            original_name = str(item).strip().replace(\"[&R] \", '').replace(' ', '_').replace(\"'\", \"\")\n\n            # if it's a reference sequence\n            if '|' in original_name:\n                new_name = original_name.split('|')[0]\n            else:\n                # in case it's a reversed complement sequence or a genebank reference\n                new_name = original_name.replace(\"_R_\", \"\").replace(\"gb_\", \"gb:\").split('_')[0]\n\n            labels_dict[original_name] = new_name\n\n    # write report in json format\n    with open(\".report.json\", \"w\") as json_report:\n        json_dic = {\n            \"treeData\": [{\n                \"trees\": [\n                    to_write_trees\n                ],\n                \"labels\":[\n                    labels_dict\n                ]\n            }],\n        }\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n    main(NEWICK, LABELS)\n\n"
  },
  {
    "path": "flowcraft/templates/process_tsv.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\nThis module is intended to process the output in tsv\n to generate a report in json format.\n\nExpected input\n--------------\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n- ``sample_id`` : Sample Identification string.\n- ``tsv``: tsv output.\n\n\"\"\"\n\nimport json\nimport csv\nimport os\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n__version__ = \"1.0.1\"\n__build__ = \"05.10.2018\"\n__template__ = \"maxbin2-nf\"\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FILE = '$tsv'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FILE: {}\".format(FILE))\n\n@MainWrapper\ndef main(sample_id, tsv_file):\n\n    # this tsvData could be a single object since it only has one element\n    # this data type expects full tables in tsv format\n    report_json = {\n        \"tsvData\": [{\n            \"sample\": sample_id,\n            \"data\": {}\n        }]\n    }\n\n    # web-app excepts a list with all the values in the table.\n    #  To expand this to other processes other than MaxBin2, this line needs to be reworked\n    report_json[\"tsvData\"][0][\"data\"][\"MaxBin2\"] = list(csv.reader(open(tsv_file), delimiter='\\t'))\n\n    with open(\".report.json\", \"w\") as k:\n        k.write(json.dumps(report_json))\n\n\nif __name__ == \"__main__\":\n    main(SAMPLE_ID, FILE)\n"
  },
  {
    "path": "flowcraft/templates/process_viral_assembly.py",
    "content": "#!/usr/bin/env python3\n\nimport os\nimport json\nimport operator\nfrom itertools import groupby\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\n\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended to process the output of assembly process from a single\nsample from the program Spades or Megahit for the report component.\nThe main input is an fasta file produced by the assembler.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id``: Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``assembly``: fasta file from the assembler.\n    - e.g.: ``'spades.fasta'``\n-  ``orfSize``: minimum contig size to be considered a complete ORF\n\nGenerated output\n----------------\n- ``.report.jason``: Data structure for the report\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.1\"\n__build__ = \"11.09.2018\"\n__template__ = \"viral_assembly-nf\"\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY = '$assembly'\n    MINSIZE = '$min_size'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY: {}\".format(ASSEMBLY))\n    logger.debug(\"MINSIZE: {}\".format(MINSIZE))\n\n\nclass Assembly:\n    \"\"\"Class that parses and filters a Fasta assembly file\n\n    This class parses an assembly fasta file, collects a number\n    of summary statistics and metadata from the contigs, filters\n    contigs based on user-defined metrics and writes filtered assemblies\n    and reports.\n\n    Parameters\n    ----------\n    assembly_file : str\n        Path to assembly file.\n    min_contig_len : int\n        Minimum contig length when applying the initial assembly filter.\n    min_kmer_cov : int\n        Minimum k-mer coverage when applying the initial assembly.\n        filter.\n    sample_id : str\n        Name of the sample for the current assembly.\n    \"\"\"\n\n    def __init__(self, assembly_file, min_contig_len, min_kmer_cov,\n                 sample_id, min_size):\n\n        self.contigs = {}\n        \"\"\"\n        dict: Dictionary storing data for each contig.\n        \"\"\"\n\n        self.filtered_ids = []\n        \"\"\"\n        list: List of filtered contig_ids.\n        \"\"\"\n\n        self.min_gc = 0.05\n        \"\"\"\n        float: Sets the minimum GC content on a contig.\n        \"\"\"\n\n        self.sample = sample_id\n        \"\"\"\n        str: The name of the sample for the assembly.\n        \"\"\"\n\n        self.nORFs = 0\n        \"\"\"\n        int: number of complete ORFs in the assembly.\n        \"\"\"\n\n        self.report = {}\n        \"\"\"\n        dict: Will contain the filtering results for each contig.\n        \"\"\"\n\n        self.filters = [\n            [\"length\", \">=\", min_contig_len],\n            [\"kmer_cov\", \">=\", min_kmer_cov]\n        ]\n        \"\"\"\n        list: Setting initial filters to check when parsing the assembly file.\n        This can be later changed using the 'filter_contigs' method.\n        \"\"\"\n\n        # Parse assembly and populate self.contigs\n        self._parse_assembly(assembly_file)\n\n        #Gets the number of ORFs\n        self.getORFs(assembly_file, min_size)\n\n    def getORFs(self, assembly, min_size):\n\n        f_open = open(assembly, \"rU\")\n\n        entry = (x[1] for x in groupby(f_open, lambda line: line[0] == \">\"))\n\n        ORF = 0\n\n        for header in entry:\n            seq = \"\".join(s.strip() for s in entry.__next__())\n            if len(seq) >= int(min_size):\n                ORF += 1\n\n        self.nORFs = ORF\n\n\n    @staticmethod\n    def _parse_coverage(header_str):\n        \"\"\"Attempts to retrieve the coverage value from the header string.\n\n        It splits the header by \"_\" and then screens the list backwards in\n        search of the first float value. This will be interpreted as the\n        coverage value. If it cannot find a float value, it returns None.\n        This search methodology is based on the strings of assemblers\n        like spades and skesa that put the mean kmer coverage for each\n        contig in its corresponding fasta header.\n\n        Parameters\n        ----------\n        header_str : str\n            String\n\n        Returns\n        -------\n        float or None\n            The coverage value for the contig. None if it cannot find the\n            value in the provide string.\n        \"\"\"\n\n        cov = None\n        for i in header_str.split(\"_\")[::-1]:\n            try:\n                cov = float(i)\n                break\n            except ValueError:\n                continue\n\n        return cov\n\n    def _parse_assembly(self, assembly_file):\n        \"\"\"Parse an assembly fasta file.\n\n        This is a Fasta parsing method that populates the\n        :py:attr:`~Assembly.contigs` attribute with data for each contig in the\n        assembly.\n\n        The insertion of data on the self.contigs is done by the\n        :py:meth:`Assembly._populate_contigs` method, which also calculates\n        GC content and proportions.\n\n        Parameters\n        ----------\n        assembly_file : str\n            Path to the assembly fasta file.\n\n        \"\"\"\n\n        # Temporary storage of sequence data\n        seq_temp = []\n        # Id counter for contig that will serve as key in self.contigs\n        contig_id = 0\n        # Initialize kmer coverage and header\n        cov, header = None, None\n\n        with open(assembly_file) as fh:\n\n            logger.debug(\"Starting iteration of assembly file: {}\".format(\n                assembly_file))\n            for line in fh:\n                # Skip empty lines\n                if not line.strip():\n                    continue\n                else:\n                    # Remove whitespace surrounding line for further processing\n                    line = line.strip()\n\n                if line.startswith(\">\"):\n                    # If a sequence has already been populated, save the\n                    # previous contig information\n                    if seq_temp:\n                        # Use join() to convert string list into the full\n                        # contig string. This is generally much more efficient\n                        # than successively concatenating strings.\n                        seq = \"\".join(seq_temp)\n\n                        logger.debug(\"Populating contig with contig_id '{}', \"\n                                     \"header '{}' and cov '{}'\".format(\n                                        contig_id, header, cov))\n                        self._populate_contigs(contig_id, header, cov, seq)\n\n                        # Reset temporary sequence storage\n                        seq_temp = []\n                        contig_id += 1\n\n                    header = line[1:]\n                    cov = self._parse_coverage(line)\n\n                else:\n                    seq_temp.append(line)\n\n            # Populate last contig entry\n            logger.debug(\"Populating contig with contig_id '{}', \"\n                         \"header '{}' and cov '{}'\".format(\n                            contig_id, header, cov))\n            seq = \"\".join(seq_temp)\n            self._populate_contigs(contig_id, header, cov, seq)\n\n    def _populate_contigs(self, contig_id, header, cov, sequence):\n        \"\"\" Inserts data from a single contig into\\\n         :py:attr:`~Assembly.contigs`.\n\n        By providing a contig id, the original header, the coverage that\n        is parsed from the header and the sequence, this method will\n        populate the :py:attr:`~Assembly.contigs` attribute.\n\n        Parameters\n        ----------\n        contig_id : int\n            Arbitrary unique contig identifier.\n        header : str\n            Original header of the current contig.\n        cov : float\n            The contig coverage, parsed from the fasta header\n        sequence : str\n            The complete sequence of the contig.\n\n        \"\"\"\n\n        # Get AT/GC/N counts and proportions.\n        # Note that self._get_gc_content returns a dictionary with the\n        # information on the GC/AT/N counts and proportions. This makes it\n        # much easier to add to the contigs attribute using the ** notation.\n        gc_kwargs = self._get_gc_content(sequence, len(sequence))\n        logger.debug(\"Populate GC content with: {}\".format(gc_kwargs))\n\n        self.contigs[contig_id] = {\n            \"header\": header,\n            \"sequence\": sequence,\n            \"length\": len(sequence),\n            \"kmer_cov\": cov,\n            **gc_kwargs\n        }\n\n    @staticmethod\n    def _get_gc_content(sequence, length):\n        \"\"\"Get GC content and proportions.\n\n        Parameters\n        ----------\n        sequence : str\n            The complete sequence of the contig.\n        length : int\n            The length of the sequence contig.\n\n        Returns\n        -------\n        x : dict\n            Dictionary with the at/gc/n counts and proportions\n\n        \"\"\"\n\n        # Get AT/GC/N counts\n        at = sum(map(sequence.count, [\"A\", \"T\"]))\n        gc = sum(map(sequence.count, [\"G\", \"C\"]))\n        n = length - (at + gc)\n\n        # Get AT/GC/N proportions\n        at_prop = at / length\n        gc_prop = gc / length\n        n_prop = n / length\n\n        return {\"at\": at, \"gc\": gc, \"n\": n,\n                \"at_prop\": at_prop, \"gc_prop\": gc_prop, \"n_prop\": n_prop}\n\n    @staticmethod\n    def _test_truth(x, op, y):\n        \"\"\" Test the truth of a comparisong between x and y using an \\\n        ``operator``.\n\n        If you want to compare '100 > 200', this method can be called as::\n\n            self._test_truth(100, \">\", 200).\n\n        Parameters\n        ----------\n        x : int\n            Arbitrary value to compare in the left\n        op : str\n            Comparison operator\n        y : int\n            Arbitrary value to compare in the rigth\n\n        Returns\n        -------\n        x : bool\n            The 'truthness' of the test\n        \"\"\"\n\n        ops = {\n            \">\": operator.gt,\n            \"<\": operator.lt,\n            \">=\": operator.ge,\n            \"<=\": operator.le,\n        }\n\n        return ops[op](x, y)\n\n    def filter_contigs(self, *comparisons):\n        \"\"\"Filters the contigs of the assembly according to user provided\\\n        comparisons.\n\n        The comparisons must be a list of three elements with the\n        :py:attr:`~Assembly.contigs` key, operator and test value. For\n        example, to filter contigs with a minimum length of 250, a comparison\n        would be::\n\n            self.filter_contigs([\"length\", \">=\", 250])\n\n        The filtered contig ids will be stored in the\n        :py:attr:`~Assembly.filtered_ids` list.\n\n        The result of the test for all contigs will be stored in the\n        :py:attr:`~Assembly.report` dictionary.\n\n        Parameters\n        ----------\n        comparisons : list\n            List with contig key, operator and value to test.\n\n        \"\"\"\n\n        # Reset list of filtered ids\n        self.filtered_ids = []\n        self.report = {}\n\n        gc_filters = [\n            [\"gc_prop\", \">=\", self.min_gc],\n            [\"gc_prop\", \"<=\", 1 - self.min_gc]\n        ]\n\n        self.filters = list(comparisons) + gc_filters\n\n        logger.debug(\"Filtering contigs using filters: {}\".format(\n            self.filters))\n\n        for contig_id, contig in self.contigs.items():\n            for key, op, value in list(comparisons) + gc_filters:\n                if not self._test_truth(contig[key], op, value):\n                    self.filtered_ids.append(contig_id)\n                    self.report[contig_id] = \"{}/{}/{}\".format(key,\n                                                               contig[key],\n                                                               value)\n                    break\n                else:\n                    self.report[contig_id] = \"pass\"\n\n    def get_assembly_length(self):\n        \"\"\"Returns the length of the assembly, without the filtered contigs.\n\n        Returns\n        -------\n        x : int\n            Total length of the assembly.\n\n        \"\"\"\n\n        return sum(\n            [vals[\"length\"] for contig_id, vals in self.contigs.items()\n             if contig_id not in self.filtered_ids])\n\n    def write_assembly(self, output_file, filtered=True):\n        \"\"\"Writes the assembly to a new file.\n\n        The ``filtered`` option controls whether the new assembly will be\n        filtered or not.\n\n        Parameters\n        ----------\n        output_file : str\n            Name of the output assembly file.\n        filtered : bool\n            If ``True``, does not include filtered ids.\n        \"\"\"\n\n        logger.debug(\"Writing the filtered assembly into: {}\".format(\n            output_file))\n        with open(output_file, \"w\") as fh:\n\n            for contig_id, contig in self.contigs.items():\n                if contig_id not in self.filtered_ids and filtered:\n                    fh.write(\">{}_{}\\\\n{}\\\\n\".format(self.sample,\n                                                     contig[\"header\"],\n                                                     contig[\"sequence\"]))\n\n    def write_report(self, output_file):\n        \"\"\"Writes a report with the test results for the current assembly\n\n        Parameters\n        ----------\n        output_file : str\n            Name of the output assembly file.\n\n        \"\"\"\n\n        logger.debug(\"Writing the assembly report into: {}\".format(\n            output_file))\n        with open(output_file, \"w\") as fh:\n\n            for contig_id, vals in self.report.items():\n                fh.write(\"{}, {}\\\\n\".format(contig_id, vals))\n\n\n\n@MainWrapper\ndef main(sample_id, assembly_file, minsize):\n    \"\"\"Main executor of the process_mapping template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly: str\n        Path to the fatsa file generated by the assembler.\n    minsize: str\n        Min contig size to be considered a complete ORF\n\n    \"\"\"\n\n    logger.info(\"Starting assembly file processing\")\n    warnings = []\n    fails = \"\"\n\n    # Parse the spades assembly file and perform the first filtering.\n    logger.info(\"Starting assembly parsing\")\n    assembly_obj = Assembly(assembly_file, 0, 0,\n                            sample_id, minsize)\n\n    if 'spades' in assembly_file:\n        assembler = \"SPAdes\"\n    else:\n        assembler = \"MEGAHIT\"\n\n    with open(\".warnings\", \"w\") as warn_fh:\n\n        t_80 = int(minsize) * 0.8\n        t_150 = int(minsize) * 1.5\n        # Check if assembly size of the first assembly is lower than 80% of the\n        # estimated genome size - DENV ORF has min 10k nt. If True, redo the filtering without the\n        # k-mer coverage filter\n        assembly_len = assembly_obj.get_assembly_length()\n        logger.debug(\"Checking assembly length: {}\".format(assembly_len))\n\n        if assembly_obj.nORFs < 1:\n            warn_msg = \"No complete ORFs found.\"\n            warn_fh.write(warn_msg)\n            fails = warn_msg\n\n        if assembly_len < t_80:\n\n            logger.warning(\"Assembly size ({}) smaller than the minimum \"\n                           \"threshold of 80% of expected genome size. \"\n                           \"Applying contig filters without the k-mer \"\n                           \"coverage filter\".format(assembly_len))\n\n            assembly_len = assembly_obj.get_assembly_length()\n            logger.debug(\"Checking updated assembly length: \"\n                         \"{}\".format(assembly_len))\n            if assembly_len < t_80:\n\n                warn_msg = \"Assembly size smaller than the minimum\" \\\n                           \" threshold of 80% of expected genome size: {}\".format(\n                                assembly_len)\n                logger.warning(warn_msg)\n                warn_fh.write(warn_msg)\n                fails = warn_msg\n\n        if assembly_len > t_150:\n\n            warn_msg = \"Assembly size ({}) larger than the maximum\" \\\n                       \" threshold of 150% of expected genome size.\".format(\n                            assembly_len)\n            logger.warning(warn_msg)\n            warn_fh.write(warn_msg)\n            fails = warn_msg\n\n\n    # Write json report\n    with open(\".report.json\", \"w\") as json_report:\n        json_dic = {\n            \"tableRow\": [{\n                \"sample\": sample_id,\n                \"data\": [\n                    {\"header\": \"Contigs ({})\".format(assembler),\n                     \"value\": len(assembly_obj.contigs),\n                     \"table\": \"assembly\",\n                     \"columnBar\": True},\n                    {\"header\": \"Assembled BP ({})\".format(assembler),\n                     \"value\": assembly_len,\n                     \"table\": \"assembly\",\n                     \"columnBar\": True},\n                    {\"header\": \"ORFs\",\n                     \"value\": assembly_obj.nORFs,\n                     \"table\": \"assembly\",\n                     \"columnBar\":False}\n                ]\n            }],\n        }\n\n        if warnings:\n            json_dic[\"warnings\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": warnings\n            }]\n\n        if fails:\n            json_dic[\"fail\"] = [{\n                \"sample\": sample_id,\n                \"table\": \"assembly\",\n                \"value\": [fails]\n            }]\n\n        json_report.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n    with open(\".status\", \"w\") as status_fh:\n        status_fh.write(\"pass\")\n\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, ASSEMBLY, MINSIZE)\n\n"
  },
  {
    "path": "flowcraft/templates/skesa.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended execute Skesa on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nGenerated output\n----------------\n\n- ``${sample_id}_*.assembly.fasta`` : Main output of skesawith the assembly\n    - e.g.: ``sample_1_skesa.fasta``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.2\"\n__build__ = \"29062018\"\n__template__ = \"skesa-nf\"\n\nimport os\nimport re\nimport subprocess\n\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_skesa():\n\n    try:\n\n        cli = [\"skesa\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        _, err = p.communicate()\n\n        try:\n            version = re.search(\"v((\\\\..*))-\", err.decode(\"utf8\")).group(1)\n        except AttributeError:\n            version = \"undefined\"\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"skesa\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    CLEAR = '$clear'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n\n\ndef clean_up(fastq):\n    \"\"\"\n    Cleans the temporary fastq files. If they are symlinks, the link\n    source is removed\n\n    Parameters\n    ----------\n    fastq : list\n        List of fastq files.\n    \"\"\"\n\n    for fq in fastq:\n        # Get real path of fastq files, following symlinks\n        rp = os.path.realpath(fq)\n        logger.debug(\"Removing temporary fastq file path: {}\".format(rp))\n        if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n            os.remove(rp)\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, clear):\n    \"\"\"Main executor of the skesa template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    clear : str\n        Can be either 'true' or 'false'. If 'true', the input fastq files will\n        be removed at the end of the run, IF they are in the working directory\n    \"\"\"\n\n    logger.info(\"Starting skesa\")\n\n    # Determine output file\n    if \"_trim.\" in fastq_pair[0]:\n        sample_id += \"_trim\"\n    version = __get_version_skesa()[\"version\"]\n    output_file = \"{}_skesa{}.fasta\".format(sample_id, version.replace(\".\", \"\"))\n\n    cli = [\n        \"skesa\",\n        \"--fastq\",\n        \"{},{}\".format(fastq_pair[0], fastq_pair[1]),\n        \"--gz\",\n        \"--use_paired_ends\",\n        \"--cores\",\n        \"${task.cpus}\"\n    ]\n\n    logger.debug(\"Running Skesa subprocess with command: {}\".format(cli))\n\n    with open(output_file, \"w\") as fh:\n        p = subprocess.Popen(cli, stdout=fh, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished Skesa subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished Skesa subprocess with STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished Skesa with return code: {}\".format(\n        p.returncode))\n\n    # Remove input fastq files when clear option is specified.\n    # Only remove temporary input when the expected output exists.\n    if clear == \"true\" and os.path.exists(output_file):\n        clean_up(fastq_pair)\n\n    with open(\".status\", \"w\") as fh:\n        if p.returncode != 0:\n            fh.write(\"error\")\n            raise SystemExit(p.returncode)\n        else:\n            fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, FASTQ_PAIR, CLEAR)\n"
  },
  {
    "path": "flowcraft/templates/spades.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended execute Spades on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``kmers`` : Setting for Spades kmers. Can be either ``'auto'``, \\\n    ``'default'`` or a user provided list.\n    - e.g.: ``'auto'`` or ``'default'`` or ``'55 77 99 113 127'``\n- ``opts`` : List of options for spades execution.\n    1. The minimum number of reads to consider an edge in the de Bruijn \\\n    graph during the assembly.\n        - e.g.: ``'5'``\n    2. Minimum contigs k-mer coverage.\n        - e.g.: ``['2' '2']``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nGenerated output\n----------------\n\n- ``contigs.fasta`` : Main output of spades with the assembly\n    - e.g.: ``contigs.fasta``\n- ``spades_status`` :  Stores the status of the spades run. If it was \\\n    successfully executed, it stores ``'pass'``. Otherwise, it stores the\\\n    ``STDERR`` message.\n    - e.g.: ``'pass'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.2\"\n__build__ = \"29062018\"\n__template__ = \"spades-nf\"\n\nimport os\nimport sys\nimport re\nimport subprocess\n\nfrom subprocess import PIPE\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_spades():\n\n    try:\n\n        cli = [\"spades.py\", \"--version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().split()[-1][1:].decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"SPAdes\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    MAX_LEN = int('$max_len'.strip())\n    KMERS = '$kmers'.strip()\n    CLEAR = '$clear'\n    DISABLE_RR = '$disable_rr'\n    OPTS = [x.strip() for x in '$opts'.strip(\"[]\").split(\",\")]\n    CLEAR = '$clear'\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"MAX_LEN: {}\".format(MAX_LEN))\n    logger.debug(\"KMERS: {}\".format(KMERS))\n    logger.debug(\"OPTS: {}\".format(OPTS))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n    logger.debug(\"DISABLE_RR: {}\".format(DISABLE_RR))\n\n\ndef set_kmers(kmer_opt, max_read_len):\n    \"\"\"Returns a kmer list based on the provided kmer option and max read len.\n\n    Parameters\n    ----------\n    kmer_opt : str\n        The k-mer option. Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n    max_read_len : int\n        The maximum read length of the current sample.\n\n    Returns\n    -------\n    kmers : list\n        List of k-mer values that will be provided to Spades.\n\n    \"\"\"\n\n    logger.debug(\"Kmer option set to: {}\".format(kmer_opt))\n\n    # Check if kmer option is set to auto\n    if kmer_opt == \"auto\":\n\n        if max_read_len >= 175:\n            kmers = [55, 77, 99, 113, 127]\n        else:\n            kmers = [21, 33, 55, 67, 77]\n\n        logger.debug(\"Kmer range automatically selected based on max read\"\n                     \"length of {}: {}\".format(max_read_len, kmers))\n\n    # Check if manual kmers were specified\n    elif len(kmer_opt.split()) > 1:\n\n        kmers = kmer_opt.split()\n        logger.debug(\"Kmer range manually set to: {}\".format(kmers))\n\n    else:\n\n        kmers = []\n        logger.debug(\"Kmer range set to empty (will be automatically \"\n                     \"determined by SPAdes\")\n\n    return kmers\n\n\ndef clean_up(fastq):\n    \"\"\"\n    Cleans the temporary fastq files. If they are symlinks, the link\n    source is removed\n\n    Parameters\n    ----------\n    fastq : list\n        List of fastq files.\n    \"\"\"\n\n    for fq in fastq:\n        # Get real path of fastq files, following symlinks\n        rp = os.path.realpath(fq)\n        logger.debug(\"Removing temporary fastq file path: {}\".format(rp))\n        if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n            os.remove(rp)\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, max_len, kmer, opts, clear, disable_rr):\n    \"\"\"Main executor of the spades template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    max_len : int\n        Maximum read length. This value is determined in\n        :py:class:`templates.integrity_coverage`\n    kmer : str\n        Can be either ``'auto'``, ``'default'`` or a\n        sequence of space separated integers, ``'23, 45, 67'``.\n    opts : List of options for spades execution. See above.\n    clear : str\n        Can be either 'true' or 'false'. If 'true', the input fastq files will\n        be removed at the end of the run, IF they are in the working directory\n    disable_rr : str\n        Can either be 'true' or 'false'. If 'true', disables repeat resolution \n        stage of assembling\n    \"\"\"\n\n    logger.info(\"Starting spades\")\n\n    min_coverage, min_kmer_coverage = opts\n\n    logger.info(\"Setting SPAdes kmers\")\n    kmers = set_kmers(kmer, max_len)\n    logger.info(\"SPAdes kmers set to: {}\".format(kmers))\n\n    cli = [\n        \"spades.py\",\n        \"--careful\",\n        \"--only-assembler\",\n        \"--threads\",\n        \"$task.cpus\",\n        \"--cov-cutoff\",\n        min_coverage,\n        \"-o\",\n        \".\"\n    ]\n\n    # Add kmers, if any were specified\n    if kmers:\n        cli += [\"-k {}\".format(\",\".join([str(x) for x in kmers]))]\n\n    # Add FastQ files\n    cli += [\n        \"-1\",\n        fastq_pair[0],\n        \"-2\",\n        fastq_pair[1]\n    ]\n\n    # Disable RR?\n    if disable_rr == 'true':\n        cli += ['--disable-rr']\n\n    logger.debug(\"Running SPAdes subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n        stdout = stdout.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n        stdout = str(stdout)\n\n    logger.info(\"Finished SPAdes subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Fished SPAdes subprocess with STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished SPAdes with return code: {}\".format(\n        p.returncode))\n\n    with open(\".status\", \"w\") as fh:\n        if p.returncode != 0:\n            fh.write(\"error\")\n            sys.exit(p.returncode)\n        else:\n            fh.write(\"pass\")\n\n    # Change the default contigs.fasta assembly name to a more informative one\n    if \"_trim.\" in fastq_pair[0]:\n        sample_id += \"_trim\"\n    # Get spades version for output name\n    info = __get_version_spades()\n\n    assembly_file = \"{}_spades{}.fasta\".format(\n        sample_id, info[\"version\"].replace(\".\", \"\"))\n    os.rename(\"contigs.fasta\", assembly_file)\n    logger.info(\"Setting main assembly file to: {}\".format(assembly_file))\n\n    # Remove input fastq files when clear option is specified.\n    # Only remove temporary input when the expected output exists.\n    if clear == \"true\" and os.path.exists(assembly_file):\n        clean_up(fastq_pair)\n\n\nif __name__ == '__main__':\n    main(SAMPLE_ID, FASTQ_PAIR, MAX_LEN, KMERS, OPTS, CLEAR, DISABLE_RR)\n"
  },
  {
    "path": "flowcraft/templates/split_fasta.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module intends to split a multifasta file into seperate fasta files.\n\nIf no sequence is larger than min_contig_size, returns the original assembly.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Sample Identification string.\n    - e.g.: ``'SampleA'``\n- ``fasta`` : A fasta file path.\n    - e.g.: ``'SampleA.fasta'``\n- ``min_contig_size`` : A minimum contig length\n    - e.g.: ``'1000'``\n\nGenerated output\n----------------\n\n-  A fasta file per contig (given the minimum contig size\n\"\"\"\n\n__version__ = \"0.0.3\"\n__build__ = \"19122018\"\n__template__ = \"split_assembly-nf\"\n\nimport os\nfrom itertools import groupby\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    ASSEMBLY = '$assembly'\n    MIN_SIZE = int('$min_contig_size'.strip())\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"ASSEMBLY: {}\".format(ASSEMBLY))\n    logger.debug(\"MIN_SIZE: {}\".format(MIN_SIZE))\n\n@MainWrapper\ndef main(sample_id, assembly, min_size):\n    \"\"\"Main executor of the split_fasta template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    assembly : list\n        Assembly file.\n    min_size : int\n        Minimum contig size.\"\"\"\n\n    logger.info(\"Starting script\")\n\n    f_open = open(assembly, \"rU\")\n\n    success = 0\n\n    entry = (x[1] for x in groupby(f_open, lambda line: line[0] == \">\"))\n\n    for header in entry:\n\n        header_str = header.__next__()[1:].strip()\n        seq = \"\".join(s.strip() for s in entry.__next__())\n        if len(seq) >= min_size:\n            with open(sample_id + '_' + header_str.replace(\" \", \"_\").replace(\"=\", \"_\") + '.fasta', \"w\") as output_file:\n                output_file.write(\n                    \">\" + sample_id + \"_\" + header_str.replace(\" \", \"_\").replace(\"=\", \"_\") + \"\\\\n\" + seq + \"\\\\n\")\n                success += 1\n\n    if success < 1:\n        with open(sample_id + \".fasta\", \"w\") as logfile:\n\n            for x in f_open.readlines():\n                logfile.write(x)\n\n    f_open.close()\n\n\n    logger.info(\"{} sequences sucessfully splitted.\".format(success))\n\n\nif __name__ == '__main__':\n    main(SAMPLE_ID, ASSEMBLY, MIN_SIZE)"
  },
  {
    "path": "flowcraft/templates/trimmomatic.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended execute trimmomatic on paired-end FastQ files.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``sample_id`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA'``\n- ``fastq_pair`` : Pair of FastQ file paths.\n    - e.g.: ``'SampleA_1.fastq.gz SampleA_2.fastq.gz'``\n- ``trim_range`` : Crop range detected using FastQC.\n    - e.g.: ``'15 151'``\n- ``opts`` : List of options for trimmomatic\n    - e.g.: ``'[\"5:20\", \"3\", \"3\", \"55\"]'``\n    - e.g.: ``'[trim_sliding_window, trim_leading, trim_trailing, trim_min_length]'``\n- ``phred`` : List of guessed phred values for each sample\n    - e.g.: ``'[SampleA: 33, SampleB: 33]'``\n- ``clear`` : If 'true', remove the input fastq files at the end of the\n    component run, IF THE FILES ARE IN THE WORK DIRECTORY\n\nGenerated output\n----------------\n\nThe generated output are output files that contain an object, usually a string.\n(Values within ``${}`` are substituted by the corresponding variable.)\n\n- ``${sample_id}_*P*``: Pair of paired FastQ files generated by Trimmomatic\n    - e.g.: ``'SampleA_1_P.fastq.gz SampleA_2_P.fastq.gz'``\n- ``trimmomatic_status``: Stores the status of the trimmomatic run. If it was\\\n    successfully executed, it stores 'pass'. Otherwise, it stores the \\\n    ``STDERR`` message.\n    - e.g.: ``'pass'``\n\nCode documentation\n------------------\n\n\"\"\"\n\n# TODO: More control over read trimming\n# TODO: Add option to remove adapters\n# TODO: What to do when there is encoding failure\n\n__version__ = \"1.0.3\"\n__build__ = \"29062018\"\n__template__ = \"trimmomatic-nf\"\n\nimport os\nimport re\nimport json\nimport fileinput\nimport subprocess\nimport tempfile\n\nfrom subprocess import PIPE\nfrom collections import OrderedDict\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\ndef __get_version_trimmomatic():\n\n    try:\n\n        cli = [\"java\", \"-jar\", TRIM_PATH, \"-version\"]\n        p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n        stdout, _ = p.communicate()\n\n        version = stdout.strip().decode(\"utf8\")\n\n    except Exception as e:\n        logger.debug(e)\n        version = \"undefined\"\n\n    return {\n        \"program\": \"Trimmomatic\",\n        \"version\": version,\n    }\n\n\nif __file__.endswith(\".command.sh\"):\n    SAMPLE_ID = '$sample_id'\n    FASTQ_PAIR = '$fastq_pair'.split()\n    TRIM_RANGE = '$trim_range'.split()\n    TRIM_OPTS = [x.strip() for x in '$opts'.strip(\"[]\").split(\",\")]\n    PHRED = '$phred'\n    ADAPTERS_FILE = '$ad'\n    CLEAR = '$clear'\n\n    logger.debug(\"Running {} with parameters:\".format(\n        os.path.basename(__file__)))\n    logger.debug(\"SAMPLE_ID: {}\".format(SAMPLE_ID))\n    logger.debug(\"FASTQ_PAIR: {}\".format(FASTQ_PAIR))\n    logger.debug(\"TRIM_RANGE: {}\".format(TRIM_RANGE))\n    logger.debug(\"TRIM_OPTS: {}\".format(TRIM_OPTS))\n    logger.debug(\"PHRED: {}\".format(PHRED))\n    logger.debug(\"ADAPTERS_FILE: {}\".format(ADAPTERS_FILE))\n    logger.debug(\"CLEAR: {}\".format(CLEAR))\n\nTRIM_PATH = \"/NGStools/Trimmomatic-0.36/trimmomatic.jar\"\nADAPTERS_PATH = \"/NGStools/Trimmomatic-0.36/adapters\"\n\n\ndef parse_log(log_file):\n    \"\"\"Retrieves some statistics from a single Trimmomatic log file.\n\n    This function parses Trimmomatic's log file and stores some trimming\n    statistics in an :py:class:`OrderedDict` object. This object contains\n    the following keys:\n\n        - ``clean_len``: Total length after trimming.\n        - ``total_trim``: Total trimmed base pairs.\n        - ``total_trim_perc``: Total trimmed base pairs in percentage.\n        - ``5trim``: Total base pairs trimmed at 5' end.\n        - ``3trim``: Total base pairs trimmed at 3' end.\n\n    Parameters\n    ----------\n    log_file : str\n        Path to trimmomatic log file.\n\n    Returns\n    -------\n    x : :py:class:`OrderedDict`\n        Object storing the trimming statistics.\n\n    \"\"\"\n\n    template = OrderedDict([\n        # Total length after trimming\n        (\"clean_len\", 0),\n        # Total trimmed base pairs\n        (\"total_trim\", 0),\n        # Total trimmed base pairs in percentage\n        (\"total_trim_perc\", 0),\n        # Total trimmed at 5' end\n        (\"5trim\", 0),\n        # Total trimmed at 3' end\n        (\"3trim\", 0),\n        # Bad reads (completely trimmed)\n        (\"bad_reads\", 0)\n    ])\n\n    with open(log_file) as fh:\n\n        for line in fh:\n            # This will split the log fields into:\n            # 0. read length after trimming\n            # 1. amount trimmed from the start\n            # 2. last surviving base\n            # 3. amount trimmed from the end\n            fields = [int(x) for x in line.strip().split()[-4:]]\n\n            if not fields[0]:\n                template[\"bad_reads\"] += 1\n\n            template[\"5trim\"] += fields[1]\n            template[\"3trim\"] += fields[3]\n            template[\"total_trim\"] += fields[1] + fields[3]\n            template[\"clean_len\"] += fields[0]\n\n        total_len = template[\"clean_len\"] + template[\"total_trim\"]\n\n        if total_len:\n            template[\"total_trim_perc\"] = round(\n                (template[\"total_trim\"] / total_len) * 100, 2)\n        else:\n            template[\"total_trim_perc\"] = 0\n\n    return template\n\n\ndef write_report(storage_dic, output_file, sample_id):\n    \"\"\" Writes a report from multiple samples.\n\n    Parameters\n    ----------\n    storage_dic : dict or :py:class:`OrderedDict`\n        Storage containing the trimming statistics. See :py:func:`parse_log`\n        for its generation.\n    output_file : str\n        Path where the output file will be generated.\n    \"\"\"\n\n    with open(output_file, \"w\") as fh, open(\".report.json\", \"w\") as json_rep:\n\n        # Write header\n        fh.write(\"Sample,Total length,Total trimmed,%,5end Trim,3end Trim,\"\n                 \"bad_reads\\\\n\")\n\n        # Write contents\n        for sample, vals in storage_dic.items():\n            fh.write(\"{},{}\\\\n\".format(\n                sample, \",\".join([str(x) for x in vals.values()])))\n\n            json_dic = {\n                \"tableRow\": [{\n                    \"sample\": sample_id,\n                    \"data\": [\n                        {\"header\": \"Trimmed (%)\",\n                         \"value\": vals[\"total_trim_perc\"],\n                         \"table\": \"qc\",\n                         \"columnBar\": True},\n                    ]\n                }],\n                \"plotData\": [{\n                    \"sample\": sample_id,\n                    \"data\": {\n                        \"sparkline\": vals[\"clean_len\"]\n                    }\n                }],\n                \"badReads\": vals[\"bad_reads\"]\n            }\n            json_rep.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\ndef trimmomatic_log(log_file, sample_id):\n\n    log_storage = OrderedDict()\n\n    log_storage[sample_id] = parse_log(log_file)\n\n    #remove temp dir where log file is stored\n    tempdir = os.path.dirname(log_file)\n\n    os.remove(log_file)\n\n    os.rmdir(tempdir)\n\n    write_report(log_storage, \"trimmomatic_report.csv\", sample_id)\n\n\ndef clean_up(fastq_pairs, clear):\n    \"\"\"Cleans the working directory of unwanted temporary files\"\"\"\n\n    # Find unpaired fastq files\n    unpaired_fastq = [f for f in os.listdir(\".\")\n                      if f.endswith(\"_U.fastq.gz\")]\n\n    # Remove unpaired fastq files, if any\n    for fpath in unpaired_fastq:\n        os.remove(fpath)\n\n    # Expected output to assess whether it is safe to remove temporary input\n    expected_out = [f for f in os.listdir(\".\") if f.endswith(\"_trim.fastq.gz\")]\n\n    if clear == \"true\" and len(expected_out) == 2:\n        for fq in fastq_pairs:\n            # Get real path of fastq files, following symlinks\n            rp = os.path.realpath(fq)\n            logger.debug(\"Removing temporary fastq file path: {}\".format(rp))\n            if re.match(\".*/work/.{2}/.{30}/.*\", rp):\n                os.remove(rp)\n\n\ndef merge_default_adapters():\n    \"\"\"Merges the default adapters file in the trimmomatic adapters directory\n\n    Returns\n    -------\n    str\n        Path with the merged adapters file.\n    \"\"\"\n\n    default_adapters = [os.path.join(ADAPTERS_PATH, x) for x in\n                        os.listdir(ADAPTERS_PATH)]\n    filepath = os.path.join(os.getcwd(), \"default_adapters.fasta\")\n\n    with open(filepath, \"w\") as fh, \\\n            fileinput.input(default_adapters) as in_fh:\n        for line in in_fh:\n            fh.write(\"{}{}\".format(line, \"\\\\n\"))\n\n    return filepath\n\n\n@MainWrapper\ndef main(sample_id, fastq_pair, trim_range, trim_opts, phred, adapters_file,\n         clear):\n    \"\"\" Main executor of the trimmomatic template.\n\n    Parameters\n    ----------\n    sample_id : str\n        Sample Identification string.\n    fastq_pair : list\n        Two element list containing the paired FastQ files.\n    trim_range : list\n        Two element list containing the trimming range.\n    trim_opts : list\n        Four element list containing several trimmomatic options:\n        [*SLIDINGWINDOW*; *LEADING*; *TRAILING*; *MINLEN*]\n    phred : int\n        Guessed phred score for the sample. The phred score is a generated\n        output from :py:class:`templates.integrity_coverage`.\n    adapters_file : str\n        Path to adapters file. If not provided, or the path is not available,\n        it will use the default adapters from Trimmomatic will be used\n    clear : str\n        Can be either 'true' or 'false'. If 'true', the input fastq files will\n        be removed at the end of the run, IF they are in the working directory\n    \"\"\"\n\n    logger.info(\"Starting trimmomatic\")\n\n    # Create base CLI\n    cli = [\n        \"java\",\n        \"-Xmx{}\".format(\"$task.memory\"[:-1].lower().replace(\" \", \"\")),\n        \"-jar\",\n        TRIM_PATH.strip(),\n        \"PE\",\n        \"-threads\",\n        \"$task.cpus\"\n    ]\n\n    # If the phred encoding was detected, provide it\n    try:\n        # Check if the provided PHRED can be converted to int\n        phred = int(phred)\n        phred_flag = \"-phred{}\".format(str(phred))\n        cli += [phred_flag]\n    # Could not detect phred encoding. Do not add explicit encoding to\n    # trimmomatic and let it guess\n    except ValueError:\n        pass\n\n    # Add input samples to CLI\n    cli += fastq_pair\n\n    # Add output file names\n    output_names = []\n    for i in range(len(fastq_pair)):\n        output_names.append(\"{}_{}_trim.fastq.gz\".format(\n            SAMPLE_ID, str(i + 1)))\n        output_names.append(\"{}_{}_U.fastq.gz\".format(\n            SAMPLE_ID, str(i + 1)))\n    cli += output_names\n\n    if trim_range != [\"None\"]:\n        cli += [\n            \"CROP:{}\".format(trim_range[1]),\n            \"HEADCROP:{}\".format(trim_range[0]),\n        ]\n\n    if os.path.exists(adapters_file):\n        logger.debug(\"Using the provided adapters file '{}'\".format(\n            adapters_file))\n    else:\n        logger.debug(\"Adapters file '{}' not provided or does not exist. Using\"\n                     \" default adapters\".format(adapters_file))\n        adapters_file = merge_default_adapters()\n\n    cli += [\n        \"ILLUMINACLIP:{}:3:30:10:6:true\".format(adapters_file)\n    ]\n\n    #create log file im temporary dir to avoid issues when running on a docker container in macOS\n    logfile = os.path.join(tempfile.mkdtemp(prefix='tmp'), \"{}_trimlog.txt\".format(sample_id))\n\n    # Add trimmomatic options\n    cli += [\n        \"SLIDINGWINDOW:{}\".format(trim_opts[0]),\n        \"LEADING:{}\".format(trim_opts[1]),\n        \"TRAILING:{}\".format(trim_opts[2]),\n        \"MINLEN:{}\".format(trim_opts[3]),\n        \"TOPHRED33\",\n        \"-trimlog\",\n        logfile\n    ]\n\n    logger.debug(\"Running trimmomatic subprocess with command: {}\".format(cli))\n\n    p = subprocess.Popen(cli, stdout=PIPE, stderr=PIPE)\n    stdout, stderr = p.communicate()\n\n    # Attempt to decode STDERR output from bytes. If unsuccessful, coerce to\n    # string\n    try:\n        stderr = stderr.decode(\"utf8\")\n    except (UnicodeDecodeError, AttributeError):\n        stderr = str(stderr)\n\n    logger.info(\"Finished trimmomatic subprocess with STDOUT:\\\\n\"\n                \"======================================\\\\n{}\".format(stdout))\n    logger.info(\"Finished trimmomatic subprocesswith STDERR:\\\\n\"\n                \"======================================\\\\n{}\".format(stderr))\n    logger.info(\"Finished trimmomatic with return code: {}\".format(\n        p.returncode))\n\n    trimmomatic_log(logfile, sample_id)\n\n    if p.returncode == 0 and os.path.exists(\"{}_1_trim.fastq.gz\".format(\n            SAMPLE_ID)):\n        clean_up(fastq_pair, clear)\n\n    # Check if trimmomatic ran successfully. If not, write the error message\n    # to the status channel and exit.\n    with open(\".status\", \"w\") as status_fh:\n        if p.returncode != 0:\n            status_fh.write(\"fail\")\n            return\n        else:\n            status_fh.write(\"pass\")\n\n\nif __name__ == '__main__':\n\n    main(SAMPLE_ID, FASTQ_PAIR, TRIM_RANGE, TRIM_OPTS, PHRED, ADAPTERS_FILE,\n         CLEAR)\n"
  },
  {
    "path": "flowcraft/templates/trimmomatic_report.py",
    "content": "#!/usr/bin/env python3\n\n\"\"\"\nPurpose\n-------\n\nThis module is intended parse the results of the Trimmomatic log for a set\nof one or more samples.\n\nExpected input\n--------------\n\nThe following variables are expected whether using NextFlow or the\n:py:func:`main` executor.\n\n- ``log_files``: Trimmomatic log files.\n    - e.g.: ``'Sample1_trimlog.txt Sample2_trimlog.txt'``\n\n\nGenerated output\n----------------\n- ``trimmomatic_report.csv`` : Summary report of the trimmomatic logs for\\\n    all samples\n\nCode documentation\n------------------\n\n\"\"\"\n\n__version__ = \"1.0.0\"\n__build__ = \"16012018\"\n__template__ = \"trimmomatic_report-nf\"\n\nimport os\nimport json\n\nfrom collections import OrderedDict\n\nfrom flowcraft_utils.flowcraft_base import get_logger, MainWrapper\n\nlogger = get_logger(__file__)\n\n\nif __file__.endswith(\".command.sh\"):\n    LOG_FILES = '$log_files'.split()\n\n\ndef parse_log(log_file):\n    \"\"\"Retrieves some statistics from a single Trimmomatic log file.\n\n    This function parses Trimmomatic's log file and stores some trimming\n    statistics in an :py:class:`OrderedDict` object. This object contains\n    the following keys:\n\n        - ``clean_len``: Total length after trimming.\n        - ``total_trim``: Total trimmed base pairs.\n        - ``total_trim_perc``: Total trimmed base pairs in percentage.\n        - ``5trim``: Total base pairs trimmed at 5' end.\n        - ``3trim``: Total base pairs trimmed at 3' end.\n\n    Parameters\n    ----------\n    log_file : str\n        Path to trimmomatic log file.\n\n    Returns\n    -------\n    x : :py:class:`OrderedDict`\n        Object storing the trimming statistics.\n\n    \"\"\"\n\n    template = OrderedDict([\n        # Total length after trimming\n        (\"clean_len\", 0),\n        # Total trimmed base pairs\n        (\"total_trim\", 0),\n        # Total trimmed base pairs in percentage\n        (\"total_trim_perc\", 0),\n        # Total trimmed at 5' end\n        (\"5trim\", 0),\n        # Total trimmed at 3' end\n        (\"3trim\", 0),\n        # Bad reads (completely trimmed)\n        (\"bad_reads\", 0)\n    ])\n\n    with open(log_file) as fh:\n\n        for line in fh:\n            # This will split the log fields into:\n            # 0. read length after trimming\n            # 1. amount trimmed from the start\n            # 2. last surviving base\n            # 3. amount trimmed from the end\n            fields = [int(x) for x in line.strip().split()[-4:]]\n\n            if not fields[0]:\n                template[\"bad_reads\"] += 1\n\n            template[\"5trim\"] += fields[1]\n            template[\"3trim\"] += fields[3]\n            template[\"total_trim\"] += fields[1] + fields[3]\n            template[\"clean_len\"] += fields[0]\n\n        total_len = template[\"clean_len\"] + template[\"total_trim\"]\n\n        if total_len:\n            template[\"total_trim_perc\"] = round(\n                (template[\"total_trim\"] / total_len) * 100, 2)\n        else:\n            template[\"total_trim_perc\"] = 0\n\n    return template\n\n\ndef write_report(storage_dic, output_file, sample_id):\n    \"\"\" Writes a report from multiple samples.\n\n    Parameters\n    ----------\n    storage_dic : dict or :py:class:`OrderedDict`\n        Storage containing the trimming statistics. See :py:func:`parse_log`\n        for its generation.\n    output_file : str\n        Path where the output file will be generated.\n    sample_id : str\n        Id or name of the current sample.\n    \"\"\"\n\n    with open(output_file, \"w\") as fh, open(\".report.json\", \"w\") as json_rep:\n\n        # Write header\n        fh.write(\"Sample,Total length,Total trimmed,%,5end Trim,3end Trim,\"\n                 \"bad_reads\\\\n\")\n\n        # Write contents\n        for sample, vals in storage_dic.items():\n            fh.write(\"{},{}\\\\n\".format(\n                sample, \",\".join([str(x) for x in vals.values()])))\n\n            json_dic = {\n                \"tableRow\": [{\n                    \"sample\": sample_id,\n                    \"data\": [\n                        {\"header\": \"trimmed\",\n                         \"value\": vals[\"total_trim_perc\"],\n                         \"table\": \"qc\",\n                         \"columnBar\": True},\n                    ]\n                }],\n                \"plotData\": [{\n                    \"sample\": sample_id,\n                    \"data\": {\n                        \"sparkline\": vals[\"clean_len\"]\n                    }\n                }],\n                \"badReads\": vals[\"bad_reads\"]\n            }\n            json_rep.write(json.dumps(json_dic, separators=(\",\", \":\")))\n\n\n@MainWrapper\ndef main(log_files):\n    \"\"\" Main executor of the trimmomatic_report template.\n\n    Parameters\n    ----------\n    log_files : list\n        List of paths to the trimmomatic log files.\n    \"\"\"\n\n    log_storage = OrderedDict()\n\n    for log in log_files:\n\n        log_id = log.rstrip(\"_trimlog.txt\")\n\n        # Populate storage of current sample\n        log_storage[log_id] = parse_log(log)\n\n        # Remove temporary trim log file\n        os.remove(log)\n\n    write_report(log_storage, \"trimmomatic_report.csv\", log_id)\n\n\nif __name__ == '__main__':\n\n    main(LOG_FILES)\n"
  },
  {
    "path": "flowcraft/tests/__init__.py",
    "content": ""
  },
  {
    "path": "flowcraft/tests/broadcast_tests/empty_log.txt",
    "content": ""
  },
  {
    "path": "flowcraft/tests/broadcast_tests/log_with_command.txt",
    "content": "Log with command\nnextflow run file.nf -profile docker"
  },
  {
    "path": "flowcraft/tests/broadcast_tests/log_with_command_regex.txt",
    "content": "Log with command - different chars in path\n/usr/local/bin/nextflow run /mnt/innuendo_storage/users/bgoncalves/jobs/2-3/test.nf -profile incd -resume"
  },
  {
    "path": "flowcraft/tests/broadcast_tests/log_without_command.txt",
    "content": "Test for log file without command"
  },
  {
    "path": "flowcraft/tests/data_pipelines.py",
    "content": "pipelines = [\n    [\"A\", [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"A\", \"lane\": 1}}]],\n    [\"A B\", [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n              \"output\": {\"process\": \"A\", \"lane\": 1}},\n             {\"input\": {\"process\": \"A\", \"lane\": 1},\n              \"output\": {\"process\": \"B\", \"lane\": 1}}]],\n    [\"A B (C | D)\", [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n                      \"output\": {\"process\": \"A\", \"lane\": 1}},\n                     {\"input\": {\"process\": \"A\", \"lane\": 1},\n                      \"output\": {\"process\": \"B\", \"lane\": 1}},\n                     {\"input\": {\"process\": \"B\", \"lane\": 1},\n                      \"output\": {\"process\": \"C\", \"lane\": 2}},\n                     {\"input\": {\"process\": \"B\", \"lane\": 1},\n                      \"output\": {\"process\": \"D\", \"lane\": 3}}]],\n    [\"A B (C | D E F)\", [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n                          \"output\": {\"process\": \"A\", \"lane\": 1}},\n                         {\"input\": {\"process\": \"A\", \"lane\": 1},\n                          \"output\": {\"process\": \"B\", \"lane\": 1}},\n                         {\"input\": {\"process\": \"B\", \"lane\": 1},\n                          \"output\": {\"process\": \"C\", \"lane\": 2}},\n                         {\"input\": {\"process\": \"B\", \"lane\": 1},\n                          \"output\": {\"process\": \"D\", \"lane\": 3}},\n                         {\"input\": {\"process\": \"D\", \"lane\": 3},\n                          \"output\": {\"process\": \"E\", \"lane\": 3}},\n                         {\"input\": {\"process\": \"E\", \"lane\": 3},\n                          \"output\": {\"process\": \"F\", \"lane\": 3}}]],\n    [\"(A | B | C)\", [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"A\", \"lane\": 1}},\n                     {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"B\", \"lane\": 2}},\n                     {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"C\", \"lane\": 3}}]],\n    [\"(A | B | C E (F | G))\", [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                                \"output\": {\"process\": \"A\", \"lane\": 1}},\n                               {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                                \"output\": {\"process\": \"B\", \"lane\": 2}},\n                               {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                                \"output\": {\"process\": \"C\", \"lane\": 3}},\n                               {\"input\": {\"process\": \"C\", \"lane\": 3},\n                                \"output\": {\"process\": \"E\", \"lane\": 3}},\n                               {\"input\": {\"process\": \"E\", \"lane\": 3},\n                                \"output\": {\"process\": \"F\", \"lane\": 4}},\n                               {\"input\": {\"process\": \"E\", \"lane\": 3},\n                                \"output\": {\"process\": \"G\", \"lane\": 5}}]],\n    [\"(A (Z | X)| B | C E (F | G))\",\n                    [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"A\", \"lane\": 1}},\n                     {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"B\", \"lane\": 2}},\n                     {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n                      \"output\": {\"process\": \"C\", \"lane\": 3}},\n                     {\"input\": {\"process\": \"C\", \"lane\": 3},\n                      \"output\": {\"process\": \"E\", \"lane\": 3}},\n                     {\"input\": {\"process\": \"A\", \"lane\": 1},\n                      \"output\": {\"process\": \"Z\", \"lane\": 4}},\n                     {\"input\": {\"process\": \"A\", \"lane\": 1},\n                      \"output\": {\"process\": \"X\", \"lane\": 5}},\n                     {\"input\": {\"process\": \"E\", \"lane\": 3},\n                      \"output\": {\"process\": \"F\", \"lane\": 6}},\n                     {\"input\": {\"process\": \"E\", \"lane\": 3},\n                      \"output\": {\"process\": \"G\", \"lane\": 7}}]],\n    [\"(A (Z | X)| B(Y|H A) | C E (F | G))\",\n     [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n       \"output\": {\"process\": \"A\", \"lane\": 1}},\n      {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n       \"output\": {\"process\": \"B\", \"lane\": 2}},\n      {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n       \"output\": {\"process\": \"C\", \"lane\": 3}},\n      {\"input\": {\"process\": \"C\", \"lane\": 3},\n       \"output\": {\"process\": \"E\", \"lane\": 3}},\n      {\"input\": {\"process\": \"A\", \"lane\": 1},\n       \"output\": {\"process\": \"Z\", \"lane\": 4}},\n      {\"input\": {\"process\": \"A\", \"lane\": 1},\n       \"output\": {\"process\": \"X\", \"lane\": 5}},\n      {\"input\": {\"process\": \"B\", \"lane\": 2},\n       \"output\": {\"process\": \"Y\", \"lane\": 6}},\n      {\"input\": {\"process\": \"B\", \"lane\": 2},\n       \"output\": {\"process\": \"H\", \"lane\": 7}},\n      {\"input\": {\"process\": \"H\", \"lane\": 7},\n       \"output\": {\"process\": \"A\", \"lane\": 7}},\n      {\"input\": {\"process\": \"E\", \"lane\": 3},\n       \"output\": {\"process\": \"F\", \"lane\": 8}},\n      {\"input\": {\"process\": \"E\", \"lane\": 3},\n       \"output\": {\"process\": \"G\", \"lane\": 9}}]]\n]\n"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe1.txt",
    "content": "A"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe2.txt",
    "content": "A B"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe3.txt",
    "content": "A B (\n    C |\n    D)"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe4.txt",
    "content": "A B (\n    C |\n    D E F)"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe5.txt",
    "content": "(A | B | C)"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe6.txt",
    "content": "(A | B | C E\n    (F |\n     G))"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe7.txt",
    "content": "(A\n    (Z |\n    X)|\nB | C E\n    (F |\n    G))"
  },
  {
    "path": "flowcraft/tests/pipeline_tests/pipe8.txt",
    "content": "(A (\n    Z |\n    X)|\nB(\n    Y|\n    H A) |\nC E (\n    F |\n    G))"
  },
  {
    "path": "flowcraft/tests/test_assemblerflow.py",
    "content": "import os\nimport sys\nimport shutil\nimport pytest\n\nimport flowcraft.flowcraft as af\n\n\n@pytest.fixture\ndef tmp():\n\n    os.mkdir(\"temp\")\n    yield \"temp\"\n    shutil.rmtree(\"temp\")\n\n\ndef test_check():\n\n    sys.argv.append(1)\n    args = af.get_args([\"build\", \"-t 'A B C'\", \"-c\", \"-o teste.nf\"])\n\n    with pytest.raises(SystemExit):\n        af.build(args)\n\n\ndef test_check_invalid():\n\n    sys.argv.append(1)\n    args = af.get_args([\"build\", \"-t\",  \"'A B C()'\", \"-c\", \"-o teste.nf\"])\n\n    with pytest.raises(SystemExit):\n        af.build(args)\n\n\ndef test_build_file(tmp):\n\n    p = os.path.join(os.path.abspath(tmp), \"teste.nf\")\n    sys.argv.append(1)\n\n    args = af.get_args([\"build\", \"-t\", \"integrity_coverage fastqc\", \"-o\",\n                        \"{}\".format(p)])\n    af.build(args)\n\n\ndef test_build_file_2(tmp):\n\n    sys.argv.append(1)\n    p = os.path.join(os.path.abspath(tmp), \"teste.nf\")\n\n    args = af.get_args([\"build\", \"-t integrity_coverage fastqc\", \"-o\",\n                        \"{}\".format(p), \"--pipeline-only\"])\n    af.build(args)\n\n    assert sorted(os.listdir(tmp)) == [\".forkTree.json\", \".treeDag.json\",\n                                       \"containers.config\",\n                                       \"lib\", \"nextflow.config\", \"params.config\",\n                                       \"resources.config\", \"teste.html\",\n                                       \"teste.nf\", \"user.config\"]\n\n\ndef test_build_recipe(tmp):\n\n    sys.argv.append(1)\n    p = os.path.join(os.path.abspath(tmp), \"teste.nf\")\n\n    args = af.get_args([\"build\", \"-r\", \"innuca\", \"-o\",\n                        \"{}\".format(p), \"--pipeline-only\"])\n    af.build(args)\n\n\ndef test_build_recipe_innuendo(tmp):\n\n    sys.argv.append(1)\n    p = os.path.join(os.path.abspath(tmp), \"teste.nf\")\n\n    args = af.get_args([\"build\", \"-r\", \"innuendo\", \"-o\",\n                        \"{}\".format(p), \"--pipeline-only\"])\n    af.build(args)\n"
  },
  {
    "path": "flowcraft/tests/test_broadcast.py",
    "content": "import pytest\nimport os\n\nimport flowcraft.generator.utils as utils\nfrom flowcraft.generator.error_handling import LogError\n\n\ndef test_empty_log():\n    with pytest.raises(LogError):\n        utils.get_nextflow_filepath(\n            os.path.join(os.getcwd(), \"flowcraft/tests/broadcast_tests/empty_log.txt\"))\n\n\ndef test_no_path_in_log():\n    with pytest.raises(LogError):\n        utils.get_nextflow_filepath(\n            os.path.join(os.getcwd(), \"flowcraft/tests/broadcast_tests/log_without_command.txt\"))\n\n\ndef test_path_in_log():\n    filepath = utils.get_nextflow_filepath(\n        os.path.join(os.getcwd(), \"flowcraft/tests/broadcast_tests/log_with_command.txt\"))\n\n    assert filepath != \"\"\n\n\ndef test_regex_in_log():\n    filepath = utils.get_nextflow_filepath(\n        os.path.join(os.getcwd(), \"flowcraft/tests/broadcast_tests/log_with_command_regex.txt\"))\n\n    assert filepath != \"\"\n"
  },
  {
    "path": "flowcraft/tests/test_engine.py",
    "content": "import os\nimport shutil\nimport pytest\n\nimport flowcraft.generator.engine as eg\nimport flowcraft.generator.process as pc\nimport flowcraft.generator.error_handling as eh\n\nfrom flowcraft.generator.process_collector import collect_process_map\n\nprocess_map = collect_process_map()\n\n\n@pytest.fixture\ndef single_con():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc\", \"lane\": 1}}\n           ]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef single_status():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 1}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef single_con_fasta():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"abricate\", \"lane\": 1}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef single_con_multi_raw():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"assembly_mapping\", \"lane\": 1}},\n           {\"input\": {\"process\": \"assembly_mapping\", \"lane\": 1},\n            \"output\": {\"process\": \"pilon\", \"lane\": 1}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef implicit_link():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc\", \"lane\": 1}},\n           {\"input\": {\"process\": \"fastqc\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"assembly_mapping\", \"lane\": 1}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef implicit_link_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"assembly_mapping\", \"lane\": 1}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef single_fork():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 2}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 3}},\n           {'input': {'process': 'spades', 'lane': 2},\n            'output': {'process': 'abricate', 'lane': 2}},\n           {'input': {'process': 'skesa', 'lane': 3},\n            'output': {'process': 'abricate', 'lane': 3}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef raw_forks():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc\", \"lane\": 1}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"patho_typing\", \"lane\": 2}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"seq_typing\", \"lane\": 3}}]\n\n    return eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\n@pytest.fixture\ndef multi_forks():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"seq_typing\", \"lane\": 2}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 3}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 3},\n            \"output\": {\"process\": \"check_coverage\", \"lane\": 3}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 4}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 5}},\n           {\"input\": {\"process\": \"check_coverage\", \"lane\": 3},\n            \"output\": {\"process\": \"spades\", \"lane\": 6}},\n           {\"input\": {\"process\": \"check_coverage\", \"lane\": 3},\n            \"output\": {\"process\": \"skesa\", \"lane\": 7}}]\n\n    os.mkdir(\".temp\")\n    yield eg.NextflowGenerator(con, os.path.join(\".temp\", \"teste.nf\"),\n                               process_map)\n    shutil.rmtree(\".temp\")\n\n\ndef test_simple_init():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"{}\", \"lane\": 1}}]\n\n    for p in process_map:\n\n        con[0][\"output\"][\"process\"] = p\n        nf = eg.NextflowGenerator(con, \"teste/teste.nf\", process_map,\n                                  ignore_dependencies=True)\n\n        assert [len(nf.processes), nf.processes[1].template] == \\\n            [2, p]\n\n\ndef test_invalid_process():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"invalid\", \"lane\": 1}}]\n\n    with pytest.raises(SystemExit):\n        eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\ndef test_connections_single_process_channels(single_con):\n\n    template = \"integrity_coverage\"\n\n    p = single_con.processes[1]\n\n    assert [p.input_channel, p.output_channel] == \\\n        [\"{}_in_1_0\".format(template), \"{}_out_1_0\".format(template)]\n\n\ndef test_connections_invalid():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}}\n           ]\n\n    with pytest.raises(SystemExit):\n        eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\ndef test_connections_ignore_type():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 1}},\n           {\"input\": {\"process\": \"skesa\", \"lane\": 1},\n            \"output\": {\"process\": \"patho_typing\", \"lane\": 1}}\n           ]\n\n    eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\ndef test_build_header(single_con):\n\n    single_con._build_header()\n\n    assert single_con.template != \"\"\n\n\ndef test_connections_nofork(single_con):\n\n    assert single_con._fork_tree == {}\n\n\ndef test_connections_singlefork(single_fork):\n\n    assert single_fork._fork_tree == {1: [2, 3]}\n\n\ndef test_connections_rawfork(raw_forks):\n\n    assert raw_forks._fork_tree == {0: [1, 2, 3]}\n\n\ndef test_connections_multiforks(multi_forks):\n\n    assert multi_forks._fork_tree == {0: [1, 2, 3], 1: [4, 5], 3: [6, 7]}\n\n\ndef test_connections_no_fork_channel_update(single_con):\n\n    p = single_con.processes[1]\n\n    assert p.forks == []\n\n\ndef test_connections_fork_channel_update(single_fork):\n\n    p = single_fork.processes[1]\n\n    assert p.forks != []\n\n\ndef test_connections_channel_update(single_con):\n\n    p1 = single_con.processes[1]\n    p2 = single_con.processes[2]\n\n    assert p1.output_channel == p2.input_channel\n\n\ndef test_connections_channel_update_wfork(single_fork):\n\n    p1 = single_fork.processes[1]\n    p2 = single_fork.processes[2]\n    p3 = single_fork.processes[3]\n\n    assert [p1.main_forks[1], p1.main_forks[2]] == \\\n           [p2.input_channel, p3.input_channel]\n\n\ndef test_connections_channel_update_wfork_2(single_fork):\n\n    p1 = single_fork.processes[3]\n    p2 = single_fork.processes[5]\n\n    assert p1.output_channel == p2.input_channel\n\n\ndef test_connections_channel_update_wfork_3(single_fork):\n\n    p1 = single_fork.processes[2]\n    p2 = single_fork.processes[4]\n\n    assert p1.output_channel == p2.input_channel\n\n\n\ndef test_set_channels_single_con_raw_fastq(single_con):\n\n    single_con._set_channels()\n\n    assert [list(single_con.main_raw_inputs.keys())[0],\n            len(single_con.main_raw_inputs),\n            list(single_con.main_raw_inputs.values())[0][\"raw_forks\"]] == \\\n           [\"fastq\", 1, [\"integrity_coverage_in_1_0\"]]\n\n\ndef test_set_channels_single_con_raw_fasta(single_con_fasta):\n\n    single_con_fasta._set_channels()\n\n    assert [list(single_con_fasta.main_raw_inputs.keys())[0],\n            len(single_con_fasta.main_raw_inputs),\n            list(single_con_fasta.main_raw_inputs.values())[0][\n                \"raw_forks\"]] == \\\n           [\"fasta\", 1, [\"abricate_in_1_0\"]]\n\n\ndef test_set_channels_multi_raw_input(single_con_multi_raw):\n\n    single_con_multi_raw._set_channels()\n\n    print(single_con_multi_raw.main_raw_inputs)\n\n    assert [list(single_con_multi_raw.main_raw_inputs.keys()),\n            len(single_con_multi_raw.main_raw_inputs)] == \\\n           [[\"fasta\", \"fastq\"], 2]\n\n\ndef test_set_channels_secondary_channels_nolink(single_con):\n\n    single_con._set_channels()\n\n    assert single_con.secondary_channels[\"SIDE_phred\"][1][\"end\"] == []\n\n\ndef test_set_channels_secondary_chanels_link(multi_forks):\n\n    multi_forks._set_channels()\n\n    assert [multi_forks.secondary_channels[\"SIDE_phred\"][1][\"end\"],\n            multi_forks.secondary_channels[\"SIDE_max_len\"][1][\"end\"],\n            multi_forks.secondary_channels[\"SIDE_max_len\"][3][\"end\"]] == \\\n           [[], [\"SIDE_max_len_4_5\"], [\"SIDE_max_len_6_7\"]]\n\n\ndef test_set_secondary_inputs_raw_forks(raw_forks):\n\n    raw_forks._set_channels()\n    raw_forks._set_init_process()\n\n    p = raw_forks\n\n    assert p.main_raw_inputs[\"fastq\"][\"raw_forks\"] == \\\n           [\"integrity_coverage_in_0_0\",\n            \"patho_typing_in_0_2\",\n            \"seq_typing_in_0_3\"]\n\n\ndef test_set_secondary_inputs_multi_raw(single_con_multi_raw):\n\n    single_con_multi_raw._set_channels()\n    single_con_multi_raw._set_init_process()\n\n    p = single_con_multi_raw\n\n    assert sorted(list(p.main_raw_inputs.keys())) == [\"fasta\", \"fastq\"]\n\n\ndef test_set_secondary_channels(multi_forks):\n\n    multi_forks._set_channels()\n    multi_forks._set_secondary_channels()\n\n    p = multi_forks.processes[1]\n\n    print(multi_forks.main_raw_inputs)\n\n    assert [p._context[\"output_channel\"], p._context[\"forks\"]] == \\\n        [\"_integrity_coverage_out_1_0\",\n         \"\\n_integrity_coverage_out_1_0.into{ integrity_coverage_out_1_0;\"\n         \"spades_in_1_4;skesa_in_1_5 }\\n\\n\\nSIDE_max_len_1_1.set{\"\n         \" SIDE_max_len_4_5 }\\n\"]\n\n\ndef test_set_secondary_channels_2(multi_forks):\n\n    multi_forks._set_channels()\n    multi_forks._set_secondary_channels()\n\n    p = multi_forks.processes[4]\n\n    assert [p._context[\"output_channel\"], p.main_forks] == \\\n           [\"_check_coverage_out_3_3\",\n            [\"check_coverage_out_3_3\", \"spades_in_3_6\", \"skesa_in_3_7\"]]\n\n\ndef test_set_implicit_link(implicit_link):\n\n    implicit_link._set_channels()\n    implicit_link._set_secondary_channels()\n\n    p = implicit_link.processes[2]\n\n    assert p.main_forks == [\"fastqc_out_1_1\", \"_LAST_fastq_4\"]\n\n\ndef test_set_implicit_link(implicit_link_2):\n\n    implicit_link_2._set_channels()\n    implicit_link_2._set_secondary_channels()\n\n    p = implicit_link_2.processes[1]\n\n    assert p.main_forks == [\"integrity_coverage_out_1_0\", \"_LAST_fastq_1_3\"]\n\n\ndef test_set_status_channels_multi(single_con):\n\n    single_con._set_channels()\n    single_con._set_status_channels()\n\n    p = [x for x in single_con.processes[::-1]\n         if isinstance(x, pc.StatusCompiler)][0]\n\n    assert p._context[\"compile_channels\"] == \\\n        \"STATUS_integrity_coverage_1_1.mix(STATUS_fastqc2_1_2,\" \\\n        \"STATUS_fastqc2_report_1_2)\"\n\n\ndef test_set_status_channels_single(single_status):\n\n    single_status._set_channels()\n    single_status._set_status_channels()\n\n    p = [x for x in single_status.processes[::-1]\n         if isinstance(x, pc.StatusCompiler)][0]\n\n    assert p._context[\"compile_channels\"] == \"STATUS_skesa_1_1\"\n\n\ndef test_set_compiler_channels(single_status):\n\n    single_status.lane = 1\n    single_status._set_channels()\n    single_status._set_compiler_channels()\n\n    p = [x for x in single_status.processes[::-1]\n         if isinstance(x, pc.StatusCompiler)][0]\n\n    assert p._context[\"compile_channels\"] == \"STATUS_skesa_1_1\"\n\n\ndef test_set_status_channels_no_status(single_status):\n\n    single_status.processes[1].status_channels = []\n\n    single_status._set_channels()\n    single_status._set_status_channels()\n\n    with pytest.raises(IndexError):\n        p = [x for x in single_status.processes[::-1]\n             if isinstance(x, pc.StatusCompiler)][0]\n\n\ndef test_set_status_channels_duplicate_status(single_status):\n\n    single_status.processes[1].status_channels = [\"A\", \"A\"]\n\n    single_status._set_channels()\n\n    with pytest.raises(eh.ProcessError):\n        single_status._set_status_channels()\n\n\ndef test_build(multi_forks):\n\n    multi_forks.build()\n\n    assert multi_forks.template != \"\"\n\n\ndef test_resources_string(single_con):\n\n    res_dict = {\"procA\": {\"cpus\": 1, \"memory\": \"'4GB'\", \"container\": \"img\",\n                          \"version\": \"1\"}}\n\n    res = single_con._get_resources_string(res_dict, 1)\n\n    assert res == '\\n\\t$procA_1.cpus = 1\\n\\t$procA_1.memory = \\'4GB\\''\n\n\ndef test_resources_string_2(single_con):\n\n    res_dict = {\"procA\": {\"cpus\": 1, \"container\": \"img\",\n                          \"version\": \"1\"}}\n\n    res = single_con._get_resources_string(res_dict, 1)\n\n    assert res == '\\n\\t$procA_1.cpus = 1'\n\n\ndef test_resources_string_3(single_con):\n\n    res_dict = {\"procA\": {\"cpus\": 1, \"memory\": \"'4GB'\", \"container\": \"img\",\n                          \"version\": \"1\"},\n                \"procB\": {\"memory\": \"{ 4.GB * task.attempt }\"}}\n\n    res = single_con._get_resources_string(res_dict, 1)\n\n    assert res == '\\n\\t$procA_1.cpus = 1\\n\\t$procA_1.memory = \\'4GB\\'' \\\n                  '\\n\\t$procB_1.memory = { 4.GB * task.attempt }'\n\n\ndef test_container_string(single_con):\n\n    res_dict = {\"procA\": {\"cpus\": 1, \"memory\": \"4GB\", \"container\": \"img\",\n                          \"version\": \"1\"}}\n\n    res = single_con._get_container_string(res_dict, 2)\n\n    assert res == '\\n\\t$procA_2.container = \"img:1\"'\n\n\ndef test_container_string_2(single_con):\n\n    res_dict = {\"procA\": {\"cpus\": 1, \"memory\": \"4GB\", \"container\": \"img\",\n                          \"version\": \"1\"},\n                \"procB\": {\"container\": \"img\"}}\n\n    res = single_con._get_container_string(res_dict, 2)\n\n    assert res == '\\n\\t$procA_2.container = \"img:1\"\\n\\t' \\\n                  '$procB_2.container = \"img:latest\"'\n\n\ndef test_extra_inputs_1():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'extra_input':'teste'}\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[2].extra_input == \"teste\"\n\n\ndef test_extra_inputs_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"abricate={'extra_input':'teste'}\", \"lane\": 1}}\n           ]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[3].extra_input == \"teste\"\n\n\ndef test_extra_inputs_3():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'extra_input':'teste'}\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n    nf._set_channels()\n\n    assert [list(nf.extra_inputs.keys())[0],\n            nf.extra_inputs[\"teste\"][\"input_type\"],\n            nf.extra_inputs[\"teste\"][\"channels\"]] == \\\n           [\"teste\", \"fastq\", [\"EXTRA_fastqc_1_2\"]]\n\n\ndef test_extra_inputs_default():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"abricate={'extra_input':'default'}\", \"lane\": 1}}\n           ]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n    nf._set_channels()\n\n    assert [list(nf.extra_inputs.keys())[0],\n            nf.extra_inputs[\"fasta\"][\"input_type\"],\n            nf.extra_inputs[\"fasta\"][\"channels\"]] == \\\n           [\"fasta\", \"fasta\", [\"EXTRA_abricate_1_3\"]]\n\n\ndef test_extra_inputs_invalid():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'extra_input':'default'}\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    with pytest.raises(SystemExit):\n        nf._set_channels()\n\n\ndef test_extra_inputs_invalid_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"spades={'extra_input':'teste'}\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"abricate={'extra_input':'teste'}\", \"lane\": 1}}\n           ]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    with pytest.raises(SystemExit):\n        nf._set_channels()\n\n\ndef test_run_time_directives():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'cpus':'3'}\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[2].directives[\"fastqc2\"][\"cpus\"] == \"3\"\n\n\ndef test_run_time_directives_full():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'cpus':'3','memory':'4GB',\"\n                                  \"'container':'img','version':'1'}\",\n                       \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert [nf.processes[2].directives[\"fastqc2\"][\"cpus\"],\n            nf.processes[2].directives[\"fastqc2\"][\"memory\"],\n            nf.processes[2].directives[\"fastqc2\"][\"container\"],\n            nf.processes[2].directives[\"fastqc2\"][\"version\"]] == \\\n           [\"3\", \"4GB\", \"img\", \"1\"]\n\n\ndef test_run_time_directives_invalid():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 1}},\n           {\"input\": {\"process\": \"integrity_coverage\", \"lane\": 1},\n            \"output\": {\"process\": \"fastqc={'cpus'\", \"lane\": 1}}]\n\n    with pytest.raises(SystemExit):\n        eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n\ndef test_not_automatic_dependency():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}}]\n\n    with pytest.raises(SystemExit):\n        eg.NextflowGenerator(con, \"teste.nf\", process_map,\n                             auto_dependency=False)\n\n\ndef test_automatic_dependency():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[1].template == \"integrity_coverage\"\n\n\ndef test_automatic_dependency_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[1].output_channel == nf.processes[2].input_channel\n\n\ndef test_automatic_dependency_3():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert [nf.processes[1].parent_lane, nf.processes[2].parent_lane] == \\\n           [None, 1]\n\n\ndef test_automatic_dependency_wfork():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 2}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[1].template == \"integrity_coverage\"\n\n\ndef test_automatic_dependency_wfork_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"integrity_coverage\", \"lane\": 2}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n    nf._set_channels()\n\n    assert len(nf.main_raw_inputs[\"fastq\"][\"raw_forks\"]) == 2\n\n\ndef test_automatic_dependency_wfork_3():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"reads_download\", \"lane\": 1}},\n           {\"input\": {\"process\": \"reads_download\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 2}},\n           {\"input\": {\"process\": \"reads_download\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 3}}\n           ]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n    nf._set_channels()\n\n    assert nf.processes[3].parent_lane == 1\n\n\ndef test_automatic_dependency_wfork_4():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"reads_download\", \"lane\": 1}},\n           {\"input\": {\"process\": \"reads_download\", \"lane\": 1},\n            \"output\": {\"process\": \"skesa\", \"lane\": 2}},\n           {\"input\": {\"process\": \"reads_download\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 3}}\n           ]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n    nf._set_channels()\n\n    assert nf.processes[4].parent_lane == 3\n\n\ndef test_automatic_dependency_multi():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"trimmomatic\", \"lane\": 1}},\n           {\"input\": {\"process\": \"trimmomatic\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert len([x for x in nf.processes\n                if x.template == \"integrity_coverage\"]) == 1\n\n\ndef test_automatic_dependency_non_raw():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"spades\", \"lane\": 1}},\n           {\"input\": {\"process\": \"spades\", \"lane\": 1},\n            \"output\": {\"process\": \"pilon\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    assert nf.processes[2].parent_lane == 1\n\n\ndef test_patlas_compiler_channels():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"mash_screen\", \"lane\": 1}},\n           {\"input\": {\"process\": \"__init__\", \"lane\": 0},\n            \"output\": {\"process\": \"mapping_patlas\", \"lane\": 2}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    nf._set_channels()\n    nf._set_compiler_channels()\n\n    assert len(nf.compilers[\"patlas_consensus\"][\"channels\"]) == 2\n\n\ndef test_patlas_compiler_channels_2():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"mash_screen\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    nf._set_channels()\n    nf._set_compiler_channels()\n\n    assert len(nf.compilers[\"patlas_consensus\"][\"channels\"]) == 1\n\n\ndef test_patlas_compiler_channels_empty():\n\n    con = [{\"input\": {\"process\": \"__init__\", \"lane\": 1},\n            \"output\": {\"process\": \"trimmomatic\", \"lane\": 1}}]\n\n    nf = eg.NextflowGenerator(con, \"teste.nf\", process_map)\n\n    nf._set_channels()\n    nf._set_compiler_channels()\n\n    assert len(nf.compilers[\"patlas_consensus\"][\"channels\"]) == 0\n"
  },
  {
    "path": "flowcraft/tests/test_pipeline_parser.py",
    "content": "import os\nimport json\n\nimport flowcraft.generator.pipeline_parser as ps\nfrom flowcraft.tests.data_pipelines import pipelines as pipes\n\n\ndef test_get_lanes():\n\n    raw_string = [\n        \"A | B)\",\n        \"A | B C D | E F)\",\n        \"A Z | B C (D | E) | G H)\",\n        \"A | B (C | D) | E (E | F I ))\"\n    ]\n\n    expected = [\n        [[\"A\"], [\"B\"]],\n        [[\"A\"], [\"B\", \"C\", \"D\"], [\"E\", \"F\"]],\n        [[\"A\", \"Z\"], [\"B\", \"C\"], [\"G\", \"H\"]],\n        [[\"A\"], [\"B\"], [\"E\"]]\n    ]\n\n    for p, exp in zip(raw_string, expected):\n        res = ps.get_lanes(p)\n        assert exp == res\n\n\ndef test_linear_connection():\n\n    p = [\"A\", \"B\", \"C\"]\n    lane = 1\n\n    res = ps.linear_connection(p, lane)\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": lane\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": lane\n        }},\n        {\"input\": {\n            \"process\": \"B\",\n            \"lane\": lane\n        },\n        \"output\": {\n            \"process\": \"C\",\n            \"lane\": lane\n        }\n    }]\n\n\ndef test_two_fork_connection():\n\n    source_lane = 1\n\n    res = ps.fork_connection(\n        source=\"A\",\n        sink=[\"B\", \"C\"],\n        source_lane=source_lane,\n        lane=source_lane\n    )\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": source_lane + 1\n        }}, {\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane,\n        },\n        \"output\": {\n            \"process\": \"C\",\n            \"lane\": source_lane + 2\n        }\n    }]\n\n\ndef test_two_fork_connection_mismatch_lane():\n\n    source_lane = 1\n    lane = 3\n\n    res = ps.fork_connection(\n        source=\"A\",\n        sink=[\"B\", \"C\"],\n        source_lane=source_lane,\n        lane=lane\n    )\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": lane + 1\n        }}, {\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane,\n        },\n        \"output\": {\n            \"process\": \"C\",\n            \"lane\": lane + 2\n        }\n    }]\n\n\ndef test_multi_fork_connection():\n\n    source_lane = 1\n\n    res = ps.fork_connection(\n        source=\"A\",\n        sink=[\"B\", \"C\", \"D\"],\n        source_lane=source_lane,\n        lane=source_lane\n    )\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": source_lane + 1\n        }}, {\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane,\n        },\n        \"output\": {\n            \"process\": \"C\",\n            \"lane\": source_lane + 2\n        }}, {\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": source_lane,\n        },\n        \"output\": {\n            \"process\": \"D\",\n            \"lane\": source_lane + 3\n        }\n    }]\n\n\ndef test_linear_lane_connection():\n\n    res = ps.linear_lane_connection([[\"A\", \"B\", \"C\"]], lane=1)\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": 2\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": 2\n        }},\n        {\"input\": {\n            \"process\": \"B\",\n            \"lane\": 2\n        },\n        \"output\": {\n            \"process\": \"C\",\n            \"lane\": 2\n        }\n    }]\n\n\ndef test_linear_multi_lane_connection():\n\n    res = ps.linear_lane_connection([[\"A\", \"B\"], [\"C\", \"D\"]], lane=1)\n\n    assert res == [{\n        \"input\": {\n            \"process\": \"A\",\n            \"lane\": 2\n        },\n        \"output\": {\n            \"process\": \"B\",\n            \"lane\": 2\n        }},\n        {\"input\": {\n            \"process\": \"C\",\n            \"lane\": 3\n        },\n        \"output\": {\n            \"process\": \"D\",\n            \"lane\": 3\n        }\n    }]\n\n\ndef test_get_source_lane():\n\n    pipeline_list = [{'input': {'process': '__init__', 'lane': 1},\n                      'output': {'process': 'integrity_coverage', 'lane': 1}},\n                     {'input': {'process': 'integrity_coverage', 'lane': 1},\n                      'output': {'process': 'fastqc_trimmomatic', 'lane': 1}},\n                     {'input': {'process': 'fastqc_trimmomatic', 'lane': 1},\n                      'output': {'process': 'spades', 'lane': 2}},\n                     {'input': {'process': 'fastqc_trimmomatic', 'lane': 1},\n                      'output': {'process': 'skesa', 'lane': 3}}]\n\n    res = ps.get_source_lane([\"integrity_coverage\", \"fastqc_trimmomatic\"],\n                             pipeline_list)\n\n    assert res == 1\n\n\ndef test_get_source_lane_2():\n\n    pipeline_list = [{'input': {'process': '__init__', 'lane': 1},\n                      'output': {'process': 'integrity_coverage', 'lane': 1}},\n                     {'input': {'process': 'integrity_coverage', 'lane': 1},\n                      'output': {'process': 'fastqc_trimmomatic', 'lane': 1}},\n                     {'input': {'process': 'fastqc_trimmomatic', 'lane': 1},\n                      'output': {'process': 'spades', 'lane': 2}},\n                     {'input': {'process': 'fastqc_trimmomatic', 'lane': 1},\n                      'output': {'process': 'skesa', 'lane': 3}},\n                     {'input': {'process': 'spades', 'lane': 2},\n                      'output': {'process': 'pilon', 'lane': 2}},\n                     {'input': {'process': 'skesa', 'lane': 3},\n                      'output': {'process': 'pilon', 'lane': 3}},\n                     ]\n\n    res = ps.get_source_lane([\"spades\", \"pilon\"], pipeline_list)\n\n    assert res == 2\n\n\ndef test_parse_pipeline():\n\n    for p, expected in pipes:\n        res = ps.parse_pipeline(p)\n        assert res == expected\n\n\ndef test_parse_pipeline_file():\n\n    for i in range(1, 9):\n\n        p_path = os.path.join(\"flowcraft\", \"tests\", \"pipeline_tests\",\n                              \"pipe{}.txt\".format(i))\n        expected = pipes[i - 1][1]\n        print(p_path)\n        res = ps.parse_pipeline(p_path)\n        print(res)\n        assert res == expected\n\n\ndef test_unique_id_len():\n\n    pip_list = [\n        \"A B C\",\n        \"A (B C (D | E)| B C (D | E))\",\n        \"A (B C (D | E)| C (D | E))\",\n        \"A (B C (D | E)| B (D | E))\",\n    ]\n\n    res_list = [\n        \"A_0 B_1 C_2\",\n        \"A_0 (B_1 C_2 (D_3 | E_4)| B_5 C_6 (D_7 | E_8))\",\n        \"A_0 (B_1 C_2 (D_3 | E_4)| C_5 (D_6 | E_7))\",\n        \"A_0 (B_1 C_2 (D_3 | E_4)| B_5 (D_6 | E_7))\",\n    ]\n\n    for x, pip_str in enumerate(pip_list):\n        res_str, res_ids = ps.add_unique_identifiers(pip_str)\n        assert res_str.replace(\" \", \"\") == res_list[x].replace(\" \", \"\")\n\ndef test_remove_id():\n\n    pip_list = [\n        \"A B C\",\n        \"A (B C (D | E)| B C (D | E))\",\n    ]\n\n    pipeline_mod_links = [\n        [{'input': {'process': '__init__', 'lane': 1},\n          'output': {'process': 'A_0', 'lane': 1}},\n         {'input': {'process': 'A_0', 'lane': 1},\n          'output': {'process': 'B_1', 'lane': 1}},\n         {'input': {'process': 'B_1', 'lane': 1},\n          'output': {'process': 'C_2', 'lane': 1}}],\n        [{'input': {'process': '__init__', 'lane': 1},\n          'output': {'process': 'A_0', 'lane': 1}},\n         {'input': {'process': 'A_0', 'lane': 1},\n          'output': {'process': 'B_1', 'lane': 2}},\n         {'input': {'process': 'A_0', 'lane': 1},\n          'output': {'process': 'B_5', 'lane': 3}},\n         {'input': {'process': 'B_1', 'lane': 2},\n          'output': {'process': 'C_2', 'lane': 2}},\n         {'input': {'process': 'B_5', 'lane': 3},\n          'output': {'process': 'C_6', 'lane': 3}},\n         {'input': {'process': 'C_2', 'lane': 2},\n          'output': {'process': 'D_3', 'lane': 4}},\n         {'input': {'process': 'C_2', 'lane': 2},\n          'output': {'process': 'E_4', 'lane': 5}},\n         {'input': {'process': 'C_6', 'lane': 3},\n          'output': {'process': 'D_7', 'lane': 6}},\n         {'input': {'process': 'C_6', 'lane': 3},\n          'output': {'process': 'E_8', 'lane': 7}}]\n    ]\n\n    pipeline_exp_links = [\n        [{'input': {'process': '__init__', 'lane': 1},\n          'output': {'process': 'A', 'lane': 1}},\n         {'input': {'process': 'A', 'lane': 1},\n          'output': {'process': 'B', 'lane': 1}},\n         {'input': {'process': 'B', 'lane': 1},\n          'output': {'process': 'C', 'lane': 1}}],\n        [{'input': {'process': '__init__', 'lane': 1},\n          'output': {'process': 'A', 'lane': 1}},\n         {'input': {'process': 'A', 'lane': 1},\n          'output': {'process': 'B', 'lane': 2}},\n         {'input': {'process': 'A', 'lane': 1},\n          'output': {'process': 'B', 'lane': 3}},\n         {'input': {'process': 'B', 'lane': 2},\n          'output': {'process': 'C', 'lane': 2}},\n         {'input': {'process': 'B', 'lane': 3},\n          'output': {'process': 'C', 'lane': 3}},\n         {'input': {'process': 'C', 'lane': 2},\n          'output': {'process': 'D', 'lane': 4}},\n         {'input': {'process': 'C', 'lane': 2},\n          'output': {'process': 'E', 'lane': 5}},\n         {'input': {'process': 'C', 'lane': 3},\n          'output': {'process': 'D', 'lane': 6}},\n         {'input': {'process': 'C', 'lane': 3},\n          'output': {'process': 'E', 'lane': 7}}]\n    ]\n\n    for x, pip_str in enumerate(pip_list):\n        res_str, res_ids = ps.add_unique_identifiers(pip_str)\n        res = ps.remove_unique_identifiers(res_ids, pipeline_mod_links[x])\n        assert json.dumps(res) == json.dumps(pipeline_exp_links[x])"
  },
  {
    "path": "flowcraft/tests/test_process_details.py",
    "content": "import pytest\n\nimport flowcraft.generator.process_details as pd\nimport flowcraft.flowcraft as af\n\nfrom flowcraft.generator.process_collector import collect_process_map\nfrom flowcraft.generator.process_details import COLORS\n\nprocess_map = collect_process_map()\n\n\ndef test_color_print():\n\n    for c in COLORS:\n        pd.colored_print(\"teste_msg\", c)\n\n    assert 1\n\n\ndef test_long_list():\n\n    arguments = af.get_args([\"build\", \"-L\"])\n\n    pipeline_string = \"fastqc trimmomatic\"\n\n    with pytest.raises(SystemExit):\n        pd.proc_collector(process_map, arguments, pipeline_string)\n\n\ndef test_short_list():\n\n    arguments = af.get_args([\"build\", \"-l\"])\n\n    pipeline_string = \"fastqc trimmomatic\"\n\n    with pytest.raises(SystemExit):\n        pd.proc_collector(process_map, arguments, pipeline_string)\n"
  },
  {
    "path": "flowcraft/tests/test_processes.py",
    "content": "import os\nimport pytest\n\nimport flowcraft.generator.process as pc\nimport flowcraft.generator.error_handling as eh\n\nfrom flowcraft.generator.components import assembly\nfrom flowcraft.generator.components import assembly_processing as ap\nfrom flowcraft.generator.components import reads_quality_control as readsqc\n\nfrom flowcraft.generator.process_collector import collect_process_map\n\nprocess_map = collect_process_map()\n\n\n@pytest.fixture\ndef mock_process():\n\n    return pc.Process(template=\"integrity_coverage\")\n\n\n@pytest.fixture\ndef process_wchannels():\n\n    p = pc.Process(template=\"integrity_coverage\")\n\n    p.input_channel = \"in_channel\"\n    p.output_channel = \"out_channel\"\n\n    return p\n\n\n@pytest.fixture\ndef mock_status():\n\n    return pc.StatusCompiler(template=\"status_compiler\")\n\n@pytest.fixture\ndef mock_patlas_compiler():\n\n    return pc.StatusCompiler(template=\"patlas_consensus\")\n\n\n@pytest.fixture\ndef mock_init():\n\n    return pc.Init(template=\"init\")\n\n\ndef test_process_init():\n\n    for template, proc in process_map.items():\n\n        p = proc(template=template)\n\n        assert p.template == template\n\n\ndef test_set_correct_template(mock_process):\n\n    mock_process._set_template(\"fastqc\")\n\n    assert os.path.exists(mock_process._template_path)\n\n\ndef test_set_wrong_template(mock_process):\n\n    with pytest.raises(eh.ProcessError):\n        mock_process._set_template(\"wrong_template\")\n\n\ndef test_template_render_empty(mock_process):\n\n    with pytest.raises(eh.ProcessError):\n        mock_process.template_str\n\n\ndef test_template_render(process_wchannels):\n\n    process_wchannels.set_channels(pid=1)\n    t = process_wchannels.template_str\n\n    assert 1\n\n\ndef test_main_channel_setup(mock_process):\n\n    mock_process.set_main_channel_names(\"input_suf\", \"output_suf\", 1)\n\n    assert [mock_process.input_channel.endswith(\"input_suf\"),\n            mock_process.output_channel.endswith(\"output_suf\"),\n            mock_process.lane] == [True, True, 1]\n\n\ndef test_main_raw_channel_self(mock_process):\n    \"\"\"Tests the retrieval of the raw input channel when the input type is\n    inferred from the class\"\"\"\n\n    mock_process.input_type = \"fastq\"\n    res = mock_process.get_user_channel(\"myChannel\")\n\n    assert res == {\"input_channel\": \"myChannel\",\n                   **mock_process.RAW_MAPPING[\"fastq\"]}\n\n\ndef test_main_raw_channel_fastq(mock_process):\n\n    res = mock_process.get_user_channel(\"myChannel\", \"fastq\")\n\n    assert res == {\"input_channel\": \"myChannel\",\n                   **mock_process.RAW_MAPPING[\"fastq\"]}\n\n\ndef test_main_raw_channel_fasta(mock_process):\n\n    res = mock_process.get_user_channel(\"myChannel\", \"fasta\")\n\n    assert res == {\"input_channel\": \"myChannel\",\n                   **mock_process.RAW_MAPPING[\"fasta\"]}\n\n\ndef test_main_raw_channel_invalid(mock_process):\n\n    res = mock_process.get_user_channel(\"myChannel\", \"invalid\")\n\n    assert res is None\n\n\ndef test_channels_setup(process_wchannels):\n\n    process_wchannels.lane = 1\n    process_wchannels.set_channels(pid=1)\n\n    expected = {\"input_channel\": \"in_channel\",\n                \"output_channel\": \"out_channel\",\n                \"template\": process_wchannels.template,\n                \"pid\": \"1_1\",\n                \"forks\": \"\"}\n\n    assert process_wchannels._context == expected\n\n\ndef test_channels_setup_withforks(process_wchannels):\n\n    process_wchannels.forks = [\"A\", \"B\"]\n\n    process_wchannels.lane = 3\n    process_wchannels.set_channels(pid=1)\n\n    expected = {\"input_channel\": \"in_channel\",\n                \"output_channel\": \"out_channel\",\n                \"template\": process_wchannels.template,\n                \"pid\": \"3_1\",\n                \"forks\": \"A\\nB\"}\n\n    assert process_wchannels._context == expected\n\n\ndef test_setup_one_raw_fork(process_wchannels):\n\n    process_wchannels.main_forks = [\"A\"]\n    process_wchannels.lane = 1\n    process_wchannels.set_channels(pid=1)\n\n    expected = {\"input_channel\": \"in_channel\",\n                \"output_channel\": \"out_channel\",\n                \"template\": process_wchannels.template,\n                \"pid\": \"1_1\",\n                \"forks\": \"\\nout_channel.set{ A }\\n\"}\n\n    assert process_wchannels._context == expected\n\n\ndef test_setup_multiple_raw_forks(process_wchannels):\n\n    process_wchannels.main_forks = [\"A\", \"B\"]\n    process_wchannels.lane = 3\n    process_wchannels.set_channels(pid=1)\n\n    expected = {\"input_channel\": \"in_channel\",\n                \"output_channel\": \"out_channel\",\n                \"template\": process_wchannels.template,\n                \"pid\": \"3_1\",\n                \"forks\": \"\\nout_channel.into{ A;B }\\n\"}\n\n    assert process_wchannels._context == expected\n\n\ndef test_channels_setup_status(process_wchannels):\n\n    process_wchannels.status_channels = [\"A\", \"B\"]\n\n    process_wchannels.lane = 3\n    process_wchannels.set_channels(pid=1)\n\n    assert process_wchannels.status_strs == [\"STATUS_A_3_1\", \"STATUS_B_3_1\"]\n\n\ndef test_update_main_fork_noprevious(process_wchannels):\n    \"\"\"Updates the forks attributes when there are no previous main forks\"\"\"\n\n    process_wchannels.set_channels(pid=1)\n    process_wchannels.update_main_forks(\"A\")\n\n    assert [process_wchannels.output_channel,\n            process_wchannels.main_forks,\n            process_wchannels.forks] == \\\n           [\"_out_channel\",\n            [\"out_channel\", \"A\"],\n            [\"\\n_out_channel.into{ out_channel;A }\\n\"]]\n\n\ndef test_secondary_channels_multisink(process_wchannels):\n\n    process_wchannels.lane = 2\n    process_wchannels.set_channels(pid=1)\n    process_wchannels.set_secondary_channel(\"A\", [\"B\", \"C\"])\n\n    assert process_wchannels.forks == [\"\\nA_2_1.into{ B;C }\\n\"]\n\n\ndef test_secondary_channels_singlesink(process_wchannels):\n\n    process_wchannels.lane = 2\n    process_wchannels.set_channels(pid=1)\n    process_wchannels.set_secondary_channel(\"A\", [\"B\"])\n\n    assert process_wchannels.forks == [\"\\nA_2_1.set{ B }\\n\"]\n\n\ndef test_secondary_channels_duplicatesink(process_wchannels):\n\n    process_wchannels.lane = 1\n    process_wchannels.set_channels(pid=1)\n    process_wchannels.set_secondary_channel(\"A\", [\"B\", \"B\"])\n\n    assert process_wchannels.forks == [\"\\nA_1_1.set{ B }\\n\"]\n\n\ndef test_status_init(mock_status):\n\n    assert mock_status.template == \"status_compiler\"\n\n\ndef test_status_channel_setup_empty(mock_status):\n\n    with pytest.raises(eh.ProcessError):\n        mock_status.set_compiler_channels([])\n\n\ndef test_status_channel_single(mock_status):\n\n    mock_status.set_compiler_channels([\"A\"])\n\n    assert mock_status._context == {\"compile_channels\": \"A\"}\n\n\ndef test_status_channel_two(mock_status):\n\n    mock_status.set_compiler_channels([\"A\", \"B\"])\n\n    assert mock_status._context == {\"compile_channels\": \"A.mix(B)\"}\n\n\ndef test_status_channel_multiple(mock_status):\n\n    mock_status.set_compiler_channels([\"A\", \"B\", \"C\"])\n\n    assert mock_status._context == {\"compile_channels\": \"A.mix(B,C)\"}\n\n\ndef test_init_process(mock_init):\n\n    assert mock_init.template == \"init\"\n\n\ndef test_init_raw_inputs_single(mock_init):\n\n    mock_init.set_raw_inputs({\"fasta\": {\"channel\": \"rawChannel\",\n                                    \"raw_forks\": [\"A\"],\n                                    \"channel_str\": \"rawChannel.Channel\"}})\n\n    assert [mock_init.forks, mock_init._context[\"main_inputs\"]] == \\\n        [[\"\\nrawChannel.set{ A }\\n\"], \"rawChannel.Channel\"]\n\n\ndef test_init_raw_inputs_multi_forks(mock_init):\n\n    mock_init.set_raw_inputs({\"fastq\": {\"channel\": \"rawChannel\",\n                                    \"raw_forks\": [\"A\", \"B\"],\n                                    \"channel_str\": \"rawChannel.Channel\"}})\n\n    assert [mock_init.forks, mock_init._context[\"main_inputs\"]] == \\\n        [[\"\\nrawChannel.into{ A;B }\\n\"], \"rawChannel.Channel\"]\n\n\ndef test_init_multi_raw_inputs(mock_init):\n\n    mock_init.set_raw_inputs({\"fastq\": {\"channel\": \"rawChannel\",\n                                    \"raw_forks\": [\"A\", \"B\"],\n                                    \"channel_str\": \"rawChannel.Channel\"},\n                              \"fasta\": {\"channel\": \"otherChannel\",\n                                    \"raw_forks\": [\"C\"],\n                                    \"channel_str\": \"otherChannel.Channel\"}})\n\n    assert [mock_init.forks, mock_init._context[\"main_inputs\"]] == \\\n        [[\"\\nrawChannel.into{ A;B }\\n\",\n          \"\\notherChannel.set{ C }\\n\"],\n         \"rawChannel.Channel\\notherChannel.Channel\"]\n\n\ndef test_init_secondary_inputs(mock_init):\n\n    mock_init.set_secondary_inputs(\n        {\"genomeSize\": \"IN_genome_size = Channel.value(params.genomeSize)\"})\n\n    assert mock_init._context[\"secondary_inputs\"] == \\\n        \"IN_genome_size = Channel.value(params.genomeSize)\"\n\n\ndef test_init_multi_secondary_inputs(mock_init):\n\n    mock_init.set_secondary_inputs(\n        {\"genomeSize\": \"IN_genome_size = Channel.value(params.genomeSize)\",\n         \"other\": \"Other\"})\n\n    assert mock_init._context[\"secondary_inputs\"] == \\\n        \"IN_genome_size = Channel.value(params.genomeSize)\\nOther\"\n\n\ndef test_directive_update():\n\n    p = assembly.Spades(template=\"spades\")\n\n    p.update_attributes({\"version\": \"3.9.0\"})\n\n    assert p.directives[\"spades\"][\"version\"] == \"3.9.0\"\n\n\ndef test_directive_update2():\n\n    p = readsqc.Fastqc(template=\"fastqc\")\n\n    p.update_attributes({\"cpus\": \"3\", \"memory\": \"4GB\"})\n\n    assert [p.directives[\"fastqc2\"][\"cpus\"],\n            p.directives[\"fastqc2\"][\"memory\"]] ==\\\n           [\"3\", \"4GB\"]\n\n\ndef test_directive_update3():\n\n    p = ap.Pilon(template=\"pilon\")\n\n    p.update_attributes({\"cpus\": \"3\", \"memory\": \"4GB\",\n                         \"container\": \"another\", \"version\": \"1.0\"})\n\n    assert [p.directives[\"pilon\"][\"cpus\"],\n            p.directives[\"pilon\"][\"memory\"],\n            p.directives[\"pilon\"][\"container\"],\n            p.directives[\"pilon\"][\"version\"]] == \\\n           [\"3\", \"4GB\", \"another\", \"1.0\"]\n\n\ndef test_directive_update4():\n\n    p = readsqc.Trimmomatic(template=\"trimmomatic\")\n\n    p.update_attributes({\"cpus\": \"3\", \"memory\": \"{4.GB*task.attempt}\",\n                         \"container\": \"another\", \"version\": \"1.0\"})\n\n    assert [p.directives[\"trimmomatic\"][\"cpus\"],\n            p.directives[\"trimmomatic\"][\"memory\"],\n            p.directives[\"trimmomatic\"][\"container\"],\n            p.directives[\"trimmomatic\"][\"version\"]] == \\\n           [\"3\", \"{4.GB*task.attempt}\", \"another\", \"1.0\"]\n\n\ndef test_join_compiler(mock_patlas_compiler):\n\n    mock_patlas_compiler.set_compiler_channels([\"A\", \"B\"], operator=\"join\")\n\n    assert mock_patlas_compiler._context == \\\n        {\"compile_channels\": \"A.join(B).map{ ot -> [ ot[0], ot[1..-1] ] }\"}\n\n\ndef test_join_compiler_one_channel(mock_patlas_compiler):\n\n    mock_patlas_compiler.set_compiler_channels([\"A\"], operator=\"join\")\n\n    assert mock_patlas_compiler._context == \\\n        {\"compile_channels\": \"A\"}\n"
  },
  {
    "path": "flowcraft/tests/test_recipes.py",
    "content": "import pytest\nimport pkgutil\n\nfrom argparse import Namespace\n\nfrom flowcraft.generator import error_handling as eh\nfrom flowcraft.generator import recipes\nfrom flowcraft.generator import recipe\n\n\ndef test_empty_recipe():\n\n    r = recipe.Recipe()\n\n    with pytest.raises(eh.RecipeError):\n        r.brew()\n\n\ndef test_empty_pipeline_str():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n\n    with pytest.raises(eh.RecipeError):\n        r.brew()\n\n\ndef test_basic_recipe():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"teste\"\n\n    r.brew()\n\n    assert r.pipeline_str == \"teste\"\n\n\ndef test_recipe_wdirectives():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    r.directives = {\n        \"componentA\": {\n            \"params\": {\n                \"paramA\": \"val\"\n            },\n            \"directives\": {\n                \"dirA\": \"val\"\n            }\n        }\n    }\n\n    r.brew()\n\n    assert '\"params\":{\"paramA\":\"val\"}' in r.pipeline_str and \\\n        '\"dirA\":\"val\"' in r.pipeline_str\n\n\ndef test_recipe_partial_directives():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    r.directives = {\n        \"componentA\": {\n            \"params\": {\n                \"paramA\": \"val\"\n            },\n        }\n    }\n\n    r.brew()\n\n    assert '\"params\":{\"paramA\":\"val\"}' in r.pipeline_str\n\n\ndef test_recipe_partial_directives2():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    r.directives = {\n        \"componentA\": {\n            \"directives\": {\n                \"dirA\": \"val\"\n            }\n        }\n    }\n\n    r.brew()\n\n    assert '\"dirA\":\"val\"' in r.pipeline_str\n\n\ndef test_component_str():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    directives = {\n        \"dirA\": \"val\"\n    }\n\n    res = r._get_component_str(\"componentA\", directives=directives)\n\n    assert '\"dirA\":\"val\"' in res\n\n\ndef test_component_str2():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    directives = {\n        \"paramA\": \"val\"\n    }\n\n    res = r._get_component_str(\"componentA\", params=directives)\n    print(res)\n\n    assert '\"params\":{\"paramA\":\"val\"}' in res\n\n\ndef test_component_str3():\n\n    r = recipe.Recipe()\n\n    r.name = \"teste\"\n    r.pipeline_str = \"componentA\"\n    params = {\n        \"paramA\": \"val\"\n    }\n    directives = {\n        \"dirA\": \"val\"\n    }\n\n    res = r._get_component_str(\"componentA\", params=params,\n                               directives=directives)\n\n    assert '\"params\":{\"paramA\":\"val\"}' in res and \\\n           '\"dirA\":\"val\"' in res\n\n\ndef test_brew_recipe():\n\n    res = recipe.brew_recipe(\"innuca\")\n\n    assert res != \"\"\n\n\ndef test_bad_recipe_name():\n\n    with pytest.raises(SystemExit):\n        res = recipe.brew_recipe(\"bad_name\")\n\n\ndef test_all_recipes():\n\n    prefix = \"{}.\".format(recipes.__name__)\n    for importer, modname, _ in pkgutil.iter_modules(recipes.__path__, prefix):\n\n        _module = importer.find_module(modname).load_module(modname)\n\n        _recipe_classes = [cls for cls in _module.__dict__.values() if\n                           isinstance(cls, type)]\n\n        for cls in _recipe_classes:\n            cls()\n\n\ndef test_innuendo_recipe():\n\n    args = Namespace(tasks=None)\n\n    recipe.brew_innuendo(args)\n\n\ndef test_innuendo_partial_recipe():\n\n    args = Namespace(tasks=\"integrity_coverage\")\n\n    recipe.brew_innuendo(args)\n\n\ndef test_list_recipes():\n\n    with pytest.raises(SystemExit):\n        recipe.list_recipes()\n\ndef test_list_recipes_full():\n\n    with pytest.raises(SystemExit):\n        recipe.list_recipes(True)"
  },
  {
    "path": "flowcraft/tests/test_sanity.py",
    "content": "import pytest\nfrom contextlib import contextmanager\n\ntry:\n    import generator.pipeline_parser as ps\n    from generator.error_handling import SanityError\nexcept ImportError:\n    import flowcraft.generator.pipeline_parser as ps\n    from flowcraft.generator.error_handling import SanityError\n\n\n@contextmanager\ndef not_raises(exception, msg):\n    try:\n        yield\n    except exception:\n        raise pytest.fail(msg)\n\ndef test_empty_tasks():\n    pipeline_strs = [\n        \"   \",\n        \"\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.empty_tasks(p)\n\n\ndef test_no_brackets_fail():\n\n    pipeline_strs = [\n        \"A B C | D\",\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.brackets_but_no_lanes(p)\n\n\ndef test_number_of_forks_fail():\n\n    pipeline_strs = [\n        \"A B (( C | D)\",\n        \"A B ( C | D\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.brackets_insanity_check(p)\n\n\ndef test_lane_char_fail():\n\n    pipeline_strs = [\n        \"A B (D || E)\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.lane_char_insanity_check(p)\n\n\ndef test_final_char_fail():\n\n    pipeline_strs = [\n        \"|\",\n        \"A B |\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.final_char_insanity_check(p)\n\n\ndef test_fork_no_proc_fail():\n\n    pipeline_strs = [\n        \"A B (|E)\",\n        \"A B (E|)\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.fork_procs_insanity_check(p)\n\n\ndef test_double_fork_fail():\n\n    pipeline_strs = [\n        \"A B (( C | D ) E )\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.start_proc_insanity_check(p)\n\n\ndef test_close_token_ending_fail():\n\n    pipeline_strs = [\n        \"A B ( C | D ) E\"\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.late_proc_insanity_check(p)\n\n\ndef test_inner_forks_fail():\n\n    pipeline_strs = [\n        \"A B ( A D )\",\n    ]\n\n    for p in pipeline_strs:\n        with pytest.raises(SanityError):\n            ps.inner_fork_insanity_checks(p)\n\n\ndef test_string_pass_all():\n\n    # all these functions listed here don't accept strings with spaces\n    pipeline_strs = [\n        \"A B\",\n        \"(A|B)\",\n        \"A B (C|D)\",\n        \"A B (D|E(F|G))\",\n        \"A B (C|B)\",\n        \"F T(S(P(P|M)|M(P|M(P| M)))|Sp)\"\n    ]\n\n    for p in pipeline_strs:\n        with not_raises(SanityError, \"pipeline: {}\".format(p)):\n            ps.brackets_insanity_check(p)\n            ps.lane_char_insanity_check(p)\n            ps.brackets_but_no_lanes(p)\n            ps.fork_procs_insanity_check(p)\n            ps.start_proc_insanity_check(p)\n            ps.late_proc_insanity_check(p)\n\n\ndef test_string_spaces_pass_all():\n\n    # this test accepts strings with spaces\n    pipeline_strs = [\n        \"A B\",\n        \"(A | B)\",\n        \"A B ( C | D)\",\n        \"A B (D | E (F | G))\",\n        \"A B ( C | B)\",\n        # spaces are important for this check\n        \"F T (S(P(P| M) |M(P|M(P| M)))|Sp)\"\n    ]\n\n    for p in pipeline_strs:\n        with not_raises(SanityError, \"pipeline: {}\".format(p)):\n            ps.inner_fork_insanity_checks(p)\n\n\ndef test_string_pass_all_wrapper():\n\n    pipeline_strs = [\n        \"A B\",\n        \"(A | B)\",\n        \"A B ( C | D)\",\n        \"A B (D | E (F | G))\",\n        \"A B ( C | B)\"\n    ]\n\n    for p in pipeline_strs:\n        with not_raises(SanityError, \"pipeline: {}\".format(p)):\n            ps.insanity_checks(p)"
  },
  {
    "path": "requirements.txt",
    "content": "numpydoc"
  },
  {
    "path": "setup.py",
    "content": "import flowcraft\n\nfrom setuptools import setup\n\nVERSION = flowcraft.__version__\n\nwith open(\"README.md\") as fh:\n    README = fh.read()\n\nsetup(\n    name=\"flowcraft\",\n    version=\"{}\".format(VERSION),\n    packages=[\"flowcraft\",\n              \"flowcraft.templates\",\n              \"flowcraft.templates.flowcraft_utils\",\n              \"flowcraft.generator\",\n              \"flowcraft.generator.components\",\n              \"flowcraft.generator.recipes\"],\n    package_dir={\"flowcraft\": \"flowcraft\"},\n    package_data={\"flowcraft\": [\"nextflow.config\",\n                                \"profiles.config\",\n                                \"bin/*\",\n                                \"lib/*\",\n                                \"resources/*\",\n                                \"generator/templates/*\"]},\n    data_files=[(\"\", [\"LICENSE\"])],\n    install_requires=[\n        \"pympler\",\n        \"python-dateutil\",\n        \"argparse\",\n        \"jinja2\",\n        \"requests\"\n    ],\n    description=\"A Nextflow pipeline assembler for genomics. Pick your \"\n                \"modules. Assemble them. Run the pipeline.\",\n    long_description=README,\n    long_description_content_type=\"text/markdown\",\n    url=\"https://github.com/assemblerflow/flowcraft\",\n    author=\"Diogo N Silva\",\n    author_email=\"o.diogosilva@gmail.com\",\n    license=\"GPL3\",\n    entry_points={\n        \"console_scripts\": [\n            \"flowcraft = flowcraft.flowcraft:main\"\n        ]\n    }\n)\n"
  }
]