[
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contribution guidelines for the Clinical Data Sources repository\n\n## Project Philosophy\n\nthis repository of open and easy to access clinical data sources is an open source community-based project. We want to make it easy for interested researchers to access clinical datasets as easily as possible.\n\n## Issues\n\nWe use GitHub Issues for a wide range of tasks including bug reports, feature requests, planning, and general forum. If you have any comments or quesstions feel free to create a github issue.\n\n## Pull Requests\n\nWe operate on a pull request model, all changes/additions should be submitted via [pull request](https://help.github.com/articles/using-pull-requests/ \"GitHub · Using pull requests\"). \n\nTo keep the commit history clean, we may ask you to [rebase your pull request](https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request \"How to Rebase a Pull Request\") or squash certain commits. If you don't know what this means, we'll help you through the process.\n\n## Peer Review\n\nAll pull requests undergo peer review in which project participants review the changes. Reviewers may suggest modifications or even note their disapproval. Reviewers can note their approval by commenting along the lines of \":+1:\", \"looks good to me\", or \"approved\". As a reviewer, it's helpful to note the type of review you performed: have you used the data, only accessed it, looked it over, or are you just supporting the concept?\n\nBefore a repository maintainer merges a pull request, there must be at least one affirmative review. If there is any unaddressed criticism or disapproval, a repository maintainer will determine how to proceed and may wait for additional feedback.\n\n**Happy contributing!**\n"
  },
  {
    "path": "LICENSE",
    "content": "CC0 1.0 Universal\n\nStatement of Purpose\n\nThe laws of most jurisdictions throughout the world automatically confer\nexclusive Copyright and Related Rights (defined below) upon the creator and\nsubsequent owner(s) (each and all, an \"owner\") of an original work of\nauthorship and/or a database (each, a \"Work\").\n\nCertain owners wish to permanently relinquish those rights to a Work for the\npurpose of contributing to a commons of creative, cultural and scientific\nworks (\"Commons\") that the public can reliably and without fear of later\nclaims of infringement build upon, modify, incorporate in other works, reuse\nand redistribute as freely as possible in any form whatsoever and for any\npurposes, including without limitation commercial purposes. These owners may\ncontribute to the Commons to promote the ideal of a free culture and the\nfurther production of creative, cultural and scientific works, or to gain\nreputation or greater distribution for their Work in part through the use and\nefforts of others.\n\nFor these and/or other purposes and motivations, and without any expectation\nof additional consideration or compensation, the person associating CC0 with a\nWork (the \"Affirmer\"), to the extent that he or she is an owner of Copyright\nand Related Rights in the Work, voluntarily elects to apply CC0 to the Work\nand publicly distribute the Work under its terms, with knowledge of his or her\nCopyright and Related Rights in the Work and the meaning and intended legal\neffect of CC0 on those rights.\n\n1. Copyright and Related Rights. A Work made available under CC0 may be\nprotected by copyright and related or neighboring rights (\"Copyright and\nRelated Rights\"). Copyright and Related Rights include, but are not limited\nto, the following:\n\n  i. the right to reproduce, adapt, distribute, perform, display, communicate,\n  and translate a Work;\n\n  ii. moral rights retained by the original author(s) and/or performer(s);\n\n  iii. publicity and privacy rights pertaining to a person's image or likeness\n  depicted in a Work;\n\n  iv. rights protecting against unfair competition in regards to a Work,\n  subject to the limitations in paragraph 4(a), below;\n\n  v. rights protecting the extraction, dissemination, use and reuse of data in\n  a Work;\n\n  vi. database rights (such as those arising under Directive 96/9/EC of the\n  European Parliament and of the Council of 11 March 1996 on the legal\n  protection of databases, and under any national implementation thereof,\n  including any amended or successor version of such directive); and\n\n  vii. other similar, equivalent or corresponding rights throughout the world\n  based on applicable law or treaty, and any national implementations thereof.\n\n2. Waiver. To the greatest extent permitted by, but not in contravention of,\napplicable law, Affirmer hereby overtly, fully, permanently, irrevocably and\nunconditionally waives, abandons, and surrenders all of Affirmer's Copyright\nand Related Rights and associated claims and causes of action, whether now\nknown or unknown (including existing as well as future claims and causes of\naction), in the Work (i) in all territories worldwide, (ii) for the maximum\nduration provided by applicable law or treaty (including future time\nextensions), (iii) in any current or future medium and for any number of\ncopies, and (iv) for any purpose whatsoever, including without limitation\ncommercial, advertising or promotional purposes (the \"Waiver\"). Affirmer makes\nthe Waiver for the benefit of each member of the public at large and to the\ndetriment of Affirmer's heirs and successors, fully intending that such Waiver\nshall not be subject to revocation, rescission, cancellation, termination, or\nany other legal or equitable action to disrupt the quiet enjoyment of the Work\nby the public as contemplated by Affirmer's express Statement of Purpose.\n\n3. Public License Fallback. Should any part of the Waiver for any reason be\njudged legally invalid or ineffective under applicable law, then the Waiver\nshall be preserved to the maximum extent permitted taking into account\nAffirmer's express Statement of Purpose. In addition, to the extent the Waiver\nis so judged Affirmer hereby grants to each affected person a royalty-free,\nnon transferable, non sublicensable, non exclusive, irrevocable and\nunconditional license to exercise Affirmer's Copyright and Related Rights in\nthe Work (i) in all territories worldwide, (ii) for the maximum duration\nprovided by applicable law or treaty (including future time extensions), (iii)\nin any current or future medium and for any number of copies, and (iv) for any\npurpose whatsoever, including without limitation commercial, advertising or\npromotional purposes (the \"License\"). The License shall be deemed effective as\nof the date CC0 was applied by Affirmer to the Work. Should any part of the\nLicense for any reason be judged legally invalid or ineffective under\napplicable law, such partial invalidity or ineffectiveness shall not\ninvalidate the remainder of the License, and in such case Affirmer hereby\naffirms that he or she will not (i) exercise any of his or her remaining\nCopyright and Related Rights in the Work or (ii) assert any associated claims\nand causes of action with respect to the Work, in either case contrary to\nAffirmer's express Statement of Purpose.\n\n4. Limitations and Disclaimers.\n\n  a. No trademark or patent rights held by Affirmer are waived, abandoned,\n  surrendered, licensed or otherwise affected by this document.\n\n  b. Affirmer offers the Work as-is and makes no representations or warranties\n  of any kind concerning the Work, express, implied, statutory or otherwise,\n  including without limitation warranties of title, merchantability, fitness\n  for a particular purpose, non infringement, or the absence of latent or\n  other defects, accuracy, or the present or absence of errors, whether or not\n  discoverable, all to the greatest extent permissible under applicable law.\n\n  c. Affirmer disclaims responsibility for clearing rights of other persons\n  that may apply to the Work or any use thereof, including without limitation\n  any person's Copyright and Related Rights in the Work. Further, Affirmer\n  disclaims responsibility for obtaining any necessary consents, permissions\n  or other rights required for any use of the Work.\n\n  d. Affirmer understands and acknowledges that Creative Commons is not a\n  party to this document and has no duty or obligation with respect to this\n  CC0 or use of the Work.\n\nFor more information, please see\n<http://creativecommons.org/publicdomain/zero/1.0/>\n"
  },
  {
    "path": "README.md",
    "content": "# Clinical Data Sources\nWe are assembling a repository of clinical data sources (Electronic Health Record, Clinical trials, Imaging etc.) that are either public or have low friction application processes. If you have any comments, corrections, or know of any additional sources, please add it as a pull request.\n\nTypes of data: \n\n* Patient Demographics: P\n* Lab tests: L\n* Textual Patient Notes: T\n* Imaging: I\n* Other: O\n\n\n| Name        | Description           | Data Types | Patient Count | Ease of Access  | Publication/Citation Guidelines |\n| ----------- |:-------------------:| ------------ | ------------- | --------------- | ----------- |\n| [MIMIC](https://mimic.physionet.org/)       | MIMIC is an openly available dataset developed by the MIT Lab for Computational Physiology, comprising deidentified health data associated with ~40,000 critical care patients. It includes demographics, vital signs, laboratory tests, medications, and more.| P, L, T | ~45,000 | Simple Application | [Link](http://dx.doi.org/10.1038/sdata.2016.35)|\n| [Physionet 2012](https://physionet.org/challenge/2012/) | Tthe focus of the PhysioNet/CinC Challenge 2012 is to develop methods for patient-specific prediction of in-hospital mortality. Participants will use information collected during the first two days of an ICU stay to predict which patients survive their hospitalizations, and which patients do not. | P, L | 12,000 ICU stays | Public | [Link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965265/)|\n| [i2b2](https://www.i2b2.org/NLP/DataSets/Main.php) | In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner.  We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,500) from the first four i2b2 Challenges as i2b2 NLP Research Data Sets. A similar set of notes from the most recent i2b2 Challenge will be released on the one year anniversary of that Challenge.|T | ~1,500|Simple Application (DUA) | [Link (depends on data usage)](https://www.i2b2.org/NLP/DataSets/Main.php)\n| [PRO-ACT ALS](https://nctu.partners.org/ProACT/)| PRO-ACT provides users with easy access to: 1) Over 10,700 fully de-identified clinical patient records, 2) Placebo and treatment-arm data from 23 Phase II/III clinical trials, 3) Demographic, lab, medical and family history, and other data elements, 4) More than 10 million longitudinally collected data points | P, L | 10,700 | Simple Application | [Link](https://nctu.partners.org/ProACT/Document/DisplayLatest/6)\n| [Cancer Imaging Archive](http://www.cancerimagingarchive.net/) | TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “Collections”, typically patients related by a common disease (e.g. lung cancer), image modality (MRI, CT, etc) or research focus. DICOM is the primary file format used by TCIA for image storage. Supporting data related to the images such as patient outcomes, treatment details, genomics, pathology, and expert analyses are also provided when available. | I, some studies have others | Most studies < 100 | Many Public | [Usage](https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions) [Publication](https://www.ncbi.nlm.nih.gov/pubmed/23884657)\n| [National Biomedical Imaging Archive](https://imaging.nci.nih.gov/ncia/login.jsf) | Welcome to the National Biomedical Imaging Archive (NBIA). NBIA is a searchable repository of in vivo images that provides the biomedical research community, industry, and academia with access to image archives to be used in the development and validation of analytical software tools that support: Lesion detection and classification, Accelerated diagnostic imaging decision, Quantitative imaging assessment of drug response | I | Most studies < 100 | Many Public | NA\n| [Open Access Series of Imaging Studies (OASIS)](http://www.oasis-brains.org/) | The Open Access Series of Imaging Studies (OASIS) is a project aimed at making MRI data sets of the brain freely available to the scientific community. By compiling and freely distributing MRI data sets, we hope to facilitate future discoveries in basic and clinical neuroscience. OASIS is made available by the Washington University Alzheimer’s Disease Research Center, Dr. Randy Buckner at the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN). | I | 416 (study 1), 150 (study 2) | Public | [Study 1](https://www.ncbi.nlm.nih.gov/pubmed/17714011) [Study 2](https://www.ncbi.nlm.nih.gov/pubmed/19929323) \n\n\n<p xmlns:dct=\"http://purl.org/dc/terms/\" xmlns:vcard=\"http://www.w3.org/2001/vcard-rdf/3.0#\">\n  <a rel=\"license\"\n     href=\"http://creativecommons.org/publicdomain/zero/1.0/\">\n    <img src=\"http://i.creativecommons.org/p/zero/1.0/88x31.png\" style=\"border-style: none;\" alt=\"CC0\" />\n  </a>\n  <br />\n  To the extent possible under law,\n  <a rel=\"dct:publisher\"\n     href=\"https://github.com/EpistasisLab/ClinicalDataSources\">\n    <span property=\"dct:title\">Brett Beaulieu-Jones</span></a>\n  has waived all copyright and related or neighboring rights to\n  <span property=\"dct:title\">Open or Easy Access Clinical Data Sources for Biomedical Research</span>.\nThis work is published from:\n<span property=\"vcard:Country\" datatype=\"dct:ISO3166\"\n      content=\"US\" about=\"https://github.com/EpistasisLab/ClinicalDataSources\">\n  United States</span>.\n</p>\n"
  }
]