Copy disabled (too large)
Download .txt
Showing preview only (17,633K chars total). Download the full file to get everything.
Repository: saketkc/pysradb
Branch: develop
Commit: 801eb8fa1f1a
Files: 94
Total size: 16.8 MB
Directory structure:
gitextract_rf093ld_/
├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ └── feature_request.md
│ ├── ISSUE_TEMPLATE.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── codeql-analysis.yml
│ ├── publish.yml
│ ├── pull_request.yml
│ └── push.yml
├── .gitignore
├── AUTHORS.md
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── HISTORY.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── docs/
│ ├── Makefile
│ ├── _static/
│ │ ├── copy-button.js
│ │ └── custom.css
│ ├── authors.md
│ ├── case_studies.md
│ ├── cmdline.md
│ ├── commands.rst
│ ├── conf.py
│ ├── contributing.md
│ ├── history.md
│ ├── index.rst
│ ├── installation.md
│ ├── make.bat
│ ├── modules.rst
│ ├── notebooks.rst
│ ├── pysradb.rst
│ ├── python-api-usage.md
│ └── quickstart.md
├── notebooks/
│ ├── 01.Python-API_demo.ipynb
│ ├── 02.Commandline_download.ipynb
│ ├── 03.ParallelDownload.ipynb
│ ├── 04.SRA_to_fastq_conda.ipynb
│ ├── 05.Downloading_subsets_of_a_project.ipynb
│ ├── 06.Multiple_SRPs.ipynb
│ ├── 07.Query_Search.ipynb
│ ├── 08.PMC_DOI_Identifiers.ipynb
│ ├── 09.Metadata_enrichment.ipynb
│ ├── 11.Parse_Bioscience_Search.ipynb
│ └── README.md
├── pyproject.toml
├── pysradb/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── download.py
│ ├── exceptions.py
│ ├── filter_attrs.py
│ ├── geoweb.py
│ ├── metadata_enrichment.py
│ ├── ontology_reference.json
│ ├── search.py
│ ├── sraweb.py
│ ├── taxid2name.py
│ └── utils.py
├── requirements.txt
├── setup.cfg
└── tests/
├── conftest.py
├── data/
│ └── test_search/
│ ├── ena_search_test1.txt
│ ├── ena_test_verbosity_0.csv
│ ├── ena_test_verbosity_0.json
│ ├── ena_test_verbosity_1.csv
│ ├── ena_test_verbosity_1.json
│ ├── ena_test_verbosity_2.csv
│ ├── ena_test_verbosity_2.json
│ ├── ena_test_verbosity_3.csv
│ ├── ena_test_verbosity_3.json
│ ├── geo_search_test1.txt
│ ├── sra_search_test1.txt
│ ├── sra_test.xml
│ ├── sra_test_2_verbosity_0.csv
│ ├── sra_test_2_verbosity_1.csv
│ ├── sra_test_2_verbosity_2.csv
│ ├── sra_test_2_verbosity_3.csv
│ ├── sra_test_ERS3331676.xml
│ ├── sra_test_verbosity_0.csv
│ ├── sra_test_verbosity_1.csv
│ ├── sra_test_verbosity_2.csv
│ ├── sra_test_verbosity_3.csv
│ └── sra_uids.txt
├── test_geoweb.py
├── test_search.py
├── test_sraweb.py
└── test_utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .coveragerc
================================================
[run]
omit =
pysradb/filter_attrs.py
pysradb/geodb.py
pysradb/sradb.py
pysradb/taxid2name.py
pysradb/utils.py
================================================
FILE: .editorconfig
================================================
# http://editorconfig.org
root = true
[*]
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true
charset = utf-8
end_of_line = lf
[*.bat]
indent_style = tab
end_of_line = crlf
[LICENSE]
insert_final_newline = false
[Makefile]
indent_style = tab
================================================
FILE: .gitattributes
================================================
*.rst linguist-documentation
*.html linguist-documentation
*.ipynb linguist-language=python
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: [saketkc]
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: "[BUG]"
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
`pysradb <command> SRPxxx`
**Desktop (please complete the following information):**
- OS: [e.g. Ubuntu 20.04]
- Python version [e.g. 3.8]
**Additional context**
Add any other context about the problem here.
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: "[ENH]"
labels: enhancement
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
================================================
FILE: .github/ISSUE_TEMPLATE.md
================================================
* pysradb version:
* Python version:
* Operating System:
### Description
Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.
### What I Did
```
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
```
================================================
FILE: .github/dependabot.yml
================================================
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "daily"
================================================
FILE: .github/workflows/codeql-analysis.yml
================================================
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ master ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
schedule:
- cron: '35 5 * * 1'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
# Learn more:
# https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1
================================================
FILE: .github/workflows/publish.yml
================================================
name: publish
on:
release:
types: [created]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
================================================
FILE: .github/workflows/pull_request.yml
================================================
name: pull_request
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, '3.10', '3.11', '3.12', '3.13']
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -U pip
pip install -r requirements.txt
- name: Install Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service in background
ollama serve &
# Wait for Ollama to be ready
sleep 5
# Pull required models for testing
ollama pull phi3
ollama pull meditron
# Verify installation
ollama list
- name: Lint with flake8
run: |
pip install -U pytest coverage pytest-cov codecov black flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
black --check .
- name: Test with pytest
continue-on-error: true
run: |
pip install --editable ".[enrichment]"
pip install pytest
pytest
make coverage
codecov
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.11
uses: actions/setup-python@v1
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install ".[enrichment]"
pip install sphinx myst-parser sphinxcontrib-gtagjs ipython numpydoc sphinx-tabs furo nbsphinx sphinx-panels
- name: Install Pandoc
run: |
sudo apt-get update
sudo apt-get install -y pandoc
- name: Build documentation
run: |
make docs
================================================
FILE: .github/workflows/push.yml
================================================
name: push
on: [push]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, '3.10', '3.11', '3.12', '3.13']
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -U pip
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install -U pytest coverage pytest-cov codecov black flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
black --check .
- name: Test with pytest
continue-on-error: true
run: |
pip install --editable ".[enrichment]"
pip install pytest
pytest
make coverage
codecov
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.11
uses: actions/setup-python@v1
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install ".[enrichment]"
pip install sphinx myst-parser sphinxcontrib-gtagjs ipython numpydoc sphinx-tabs furo nbsphinx sphinx-panels
- name: Install Pandoc
run: |
sudo apt-get update
sudo apt-get install -y pandoc
- name: Build documentation
run: |
make docs
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: github.ref == 'refs/heads/develop'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs/_build/html/
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# dotenv
.env
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
*.sqlite
*.sqlite.gz
geoweb_downloads/
================================================
FILE: AUTHORS.md
================================================
# Credits
## Contributors
- [Boshen Yan](https://github.com/bscrow)
- [Maarten van der Sande](https://github.com/Maarten-vd-Sande)
- [Dibya Gautam](https://github.com/dibyaaaaax)
- [Marius van den Beek](https://github.com/mvdbeek)
- [Devang Thakkar](https://github.com/DevangThakkar)
## Maintainer
- Saket Choudhary \<<saketkc@gmail.com>\>
================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Choudhary"
given-names: "Saket"
orcid: "https://orcid.org/0000-0001-5202-7633"
title: "pysradb"
version: 2.4.1
doi: 10.12688/f1000research.18676.1
date-released: 2025-09-28
url: "https://github.com/saketkc/pysradb"
================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at saketkc@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
================================================
FILE: CONTRIBUTING.md
================================================
# Contributing
Contributions are welcome, and they are greatly appreciated! Every
little bit helps, and credit will always be given.
You can contribute in many ways:
## Types of Contributions
### Report Bugs
Report bugs at <https://github.com/saketkc/pysradb/issues>.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in
troubleshooting.
- Detailed steps to reproduce the bug.
### Fix Bugs
Look through the GitHub issues for bugs. Anything tagged with \"bug\"
and \"help wanted\" is open to whoever wants to implement it.
### Implement Features
Look through the GitHub issues for features. Anything tagged with
\"enhancement\" and \"help wanted\" is open to whoever wants to
implement it.
### Write Documentation
pysradb could always use more documentation, whether as part of the
official pysradb docs, in docstrings, or even on the web in blog posts,
articles, and such.
### Submit Feedback
The best way to send feedback is to file an issue at
<https://github.com/saketkc/pysradb/issues>.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to
implement.
- Remember that this is a volunteer-driven project, and that
contributions are welcome :)
## Get Started!
Ready to contribute? Here\'s how to set up [pysradb]{.title-ref} for
local development.
1. Fork the [pysradb]{.title-ref} repo on GitHub.
2. Clone your fork locally:
``` shell
$ git clone git@github.com:your_name_here/pysradb.git
```
3. Install your local copy into a virtualenv. Assuming you have
virtualenvwrapper installed, this is how you set up your fork for
local development (If python \--version is less than 3.0, run [\$
mkvirtualenv pysradb \--python=py3]{.title-ref} instead):
``` shell
$ mkvirtualenv pysradb
$ cd pysradb/
$ python setup.py develop
```
4. Create a branch for local development:
``` shell
$ git checkout -b name-of-your-bugfix-or-feature
```
Now you can make your changes locally.
5. When you\'re done making changes, check that your changes pass
flake8 and the tests, including testing other Python versions with
tox:
``` shell
$ flake8 pysradb tests
$ python setup.py test or py.test
$ tox
```
To get flake8 and tox, just pip install them into your virtualenv.
6. Commit your changes and push your branch to GitHub:
``` shell
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
```
7. Submit a pull request through the GitHub website.
## Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated.
Put your new functionality into a function with a docstring, and add
the feature to the list in README.rst.
3. The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and
for PyPy. Make sure that the tests pass for all supported Python
versions.
## Tips
To run a subset of tests:
``` shell
$ py.test tests.test_pysradb
```
## Deploying
A reminder for the maintainers on how to deploy. Make sure all your
changes are committed (including an entry in HISTORY.rst). Then run:
``` shell
$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags
```
CI will then deploy to PyPI if tests pass.
================================================
FILE: HISTORY.md
================================================
# History
# 3.0.0 (Unreleased) - BREAKING CHANGES
## Removal of legacy SQLite support
**All local SQLite database support has been removed.** This is a major breaking change and was long overdue.
- **Removed**: `SRAdb`, `GEOdb`, and `BASEdb` classes
- **Removed**: `download_sradb_file()` and `download_geodb_file()` functions
- **Removed**: Files: `sradb.py`, `geodb.py`, `basedb.py`
- **Why**: Legacy local SQLite databases are outdated and rarely used. SRAweb (API-based) provides better, real-time data with no maintenance overhead.
# 2.5.1 (2025-10-29)
- Add prjna support in doi-to-identifiers [#249](https://github.com/saketkc/pysradb/pull/249)
# 2.5.0 (2025-10-19)
- Add pmid/doi-to-gse/srp conversion [#246](https://github.com/saketkc/pysradb/pull/246).
# 2.4.1 (2025-09-27)
- Add gse-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/244).
# 2.4.0 (2025-09-27)
- Add sra-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/241). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith) for the idea.
# 2.3.0 (2025-08-24)
- Download logic improvements: remoted requests-ftp as requirement
- Fix for handling missing metadata keys [#223](https://github.com/saketkc/pysradb/pull/223). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith)
# 2.2.2 (2024-10-03)
- Fix for handling ENA urls for paired end data
# 2.2.1 (2024-08-21)
- Fix for handling ENA urls
- Migrated to pyproject.toml
# 2.2.0 (2023-09-17)
- Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
- Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
- Documentation fixes
## 2.1.0 (2023-05-16)
- Fix for [gse-to-srp] returning unrequested GSEs [#186](https://github.com/saketkc/pysradb/issues/190)
- Fix for [download] using [public_urls]
- Fix for [gsm-to-srx] returning false positives [#165](https://github.com/saketkc/pysradb/issues/165)
- Fix for delimiter not being consistent when metadata is printed on
terminal [#147](https://github.com/saketkc/pysradb/issues/147)
- ENA search is currently broken because of an API change
## 2.0.2 (2023-04-09)
- Fix for [gse-to-srp] to handle cases where a project is
missing but SRXs are returned [#186](https://github.com/saketkc/pysradb/issues/186)
- Fix gse-to-gsm [#187](https://github.com/saketkc/pysradb/issues/187)
## 2.0.1 (2023-03-18)
- Fix for [pysradb download] - using [public_url]
- Fix for SRX -\> SRR and related conversions [#183](https://github.com/saketkc/pysradb/pull/183)
## 2.0.0 (2023-02-23)
- BREAKING change: Overhaul of how urls and associated metadata are
returned (not backward compatible); all column names are lower cased
by default
- Fix extra space in \"organism_taxid\" column
- Added support for Experiment attributes [#89](https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532)
## 1.4.2 (06-17-2022)
- Fix ENA fastq fetching [#163](https://github.com/saketkc/pysradb/issues/163)
## 1.4.1 (06-04-2022)
- Fix for fetching alternative URLs
## 1.4.0 (06-04-2022)
- Added ability to fetch alternative URLs (GCP/AWS) for metadata
[#161](https://github.com/saketkc/pysradb/issues/161)
- Fix for xmldict 0.13.0 no longer defaulting to OrderedDict [#159](https://github.com/saketkc/pysradb/pull/159)
- Fix for missing experiment model and description in metadata [#160](https://github.com/saketkc/pysradb/issues/160)
## 1.3.0 (02-18-2022)
- Add [study_title] to [\--detailed] flag
([#152](https://github.com/saketkc/pysradb/issues/152))
- Fix [KeyError] in [metadata] where some new
IDs do not have any metadata
([#151](https://github.com/saketkc/pysradb/issues/151))
## 1.2.0 (01-10-2022)
- Do not exit if a qeury returns no hits ([#149](https://github.com/saketkc/pysradb/pull/149))
## 1.1.0 (12-12-2021)
- Fixed [gsm-to-gse] failure
([#128](https://github.com/saketkc/pysradb/pull/128))
- Fixed case sensitivity bug for ENA search
([#144](https://github.com/saketkc/pysradb/pull/144))
- Fixed publication date bug for search
([#146](https://github.com/saketkc/pysradb/pull/146))
- Added support for downloading data from GEO [pysradb dowload -g
GSE]
([#129](https://github.com/saketkc/pysradb/pull/129))
## 1.0.1 (01-10-2021)
- Dropped Python 3.6 since pandas 1.2 is not supported
## 1.0.0 (01-09-2021)
- Retired `metadb` and `SRAdb` based search through CLI - everything
defaults to `SRAweb`
- `SRAweb` now supports
[search](https://saket-choudhary.me/pysradb/quickstart.html#search)
- [N/A] is now replaced with [pd.NA]
- Two new fields in \`\--detailed\`: [instrument_model]
and [instrument_model_desc]
[#75](https://github.com/saketkc/pysradb/issues/75)
- Updated documentation
## 0.11.1 (09-18-2020)
- [library_layout] is now outputted in metadata #56
- [-detailed] unifies columns for ENA fastq links instead
of appending \_x/\_y #59
- bugfix for parsing namespace in xml outputs #65
- XML errors from NCBI are now handled more gracefully #69
- Documentation and dependency updates
## 0.11.0 (09-04-2020)
- [pysradb download] now supports multiple threads for
paralle downloads
- [pysradb download] also supports ultra fast downloads of
FASTQs from ENA using aspera-client
## 0.10.3 (03-26-2020)
- Added test cases for SRAweb
- API limit exceeding errors are automagically handled
- Bug fixes for GSE \<=\> SRR
- Bug fix for metadata - supports multiple SRPs
Contributors
- Dibya Gautam
- Marius van den Beek
## 0.10.2 (02-05-2020)
- Bug fix: Handle API-rate limit exceeding =\> Retries
- Enhancement: \'Alternatives\' URLs are now part of
[\--detailed]
## 0.10.1 (02-04-2020)
- Bug fix: Handle Python3.6 for capture_output in subprocess.run
## 0.10.0 (01-31-2020)
- All the subcommands (srx-to-srr, srx-to-srs) will now print
additional columns where the first two columns represent the
relevant conversion
- Fixed a bug where for fetching entries with single efetch record
## 0.9.9 (01-15-2020)
- Major fix: some SRRs would go missing as the experiment dict was
being created only once per SRR (See #15)
- Features: More detailed metadata by default in the SRAweb mode
- See notebook: <https://colab.research.google.com/drive/1C60V->
## 0.9.7 (01-20-2020)
- Feature: instrument, run size and total spots are now printed in the
metadata by default (SRAweb mode only)
- Issue: Fixed an issue with srapath failing on SRP. srapath is now
run on individual SRRs.
## 0.9.6 (07-20-2019)
- Introduced [SRAweb] to perform queries over the web if
the SQLite is missing or does not contain the relevant record.
## 0.9.0 (02-27-2019)
### Others
- This release completely changes the command line interface replacing
click with argparse ([#3](https://github.com/saketkc/pysradb/pull/3))
- Removed Python 2 comptaible stale code
## 0.8.0 (02-26-2019)
### New methods/functionality
- \`srr-to-gsm\`: convert SRR to GSM
- SRAmetadb.sqlite.gz file is deleted by default after extraction
- When SRAmetadb is not found a confirmation is seeked before
downloading
- Confirmation option before SRA downloads
### Bugfix
- download() works with wget
### Others
- [\--out_dir] is now [out-dir]
## 0.7.1 (02-18-2019)
Important: Python2 is no longer supported. Please consider moving to
Python3.
### Bugfix
- Included docs in the index whihch were missed out in the previous
release
## 0.7.0 (02-08-2019)
### New methods/functionality
- \`gsm-to-srr\`: convert GSM to SRR
- \`gsm-to-srx\`: convert GSM to SRX
- \`gsm-to-gse\`: convert GSM to GSE
### Renamed methods
The following commad line options have been renamed and the changes are
not compatible with 0.6.0 release:
- [sra-metadata] -\> [metadata].
- [sra-search] -\> [search].
- [srametadb] -\> [metadb].
## 0.6.0 (12-25-2018)
### Bugfix
- Fixed bugs introduced in 0.5.0 with API changes where multiple
redundant columns were output in [sra-metadata]
### New methods/functionality
- [download] now allows piped inputs
## 0.5.0 (12-24-2018)
### New methods/functionality
- Support for filtering by SRX Id for SRA downloads.
- \`srr_to_srx\`: Convert SRR to SRX/SRP
- \`srp_to_srx\`: Convert SRP to SRX
- Stripped down [sra-metadata] to give minimal information
- Added [\--assay], [\--desc],
[\--detailed] flag for [sra-metadata]
- Improved table printing on terminal
## 0.4.2 (12-16-2018)
### Bugfix
- Fixed unicode error in tests for Python2
## 0.4.0 (12-12-2018)
### New methods/functionality
- Added a new [BASEdb] class to handle common database
connections
- Initial support for GEOmetadb through GEOdb class
- Initial support or a command line interface:
- download Download SRA project (SRPnnnn)
- gse-metadata Fetch metadata for GEO ID (GSEnnnn)
- gse-to-gsm Get GSM(s) for GSE
- gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
- sra-metadata Fetch metadata for SRA project (SRPnnnn)
- Added three separate notebooks for SRAdb, GEOdb, CLI usage
## 0.3.0 (12-05-2018)
### New methods/functionality
- [sample_attribute] and
[experiment_attribute] are now included by default in
the df returned by [sra_metadata()]
- [expand_sample_attribute_columns: expand metadata dataframe based on
attributes in \`sample_attribute] column
- New methods to guess cell/tissue/strain:
[guess_cell_type()]/[guess_tissue_type()]/[guess_strain_type()]
- Improved README and usage instructions
## 0.2.2 (12-03-2018)
### New methods/functionality
- [search_sra()] allows full text search on SRA metadata.
## 0.2.0 (12-03-2018)
### Renamed methods
The following methods have been renamed and the changes are not
compatible with 0.1.0 release:
- [get_query()] -\> [query()].
- [sra_convert()] -\> [sra_metadata()].
- [get_table_counts()] -\> [all_row_counts()].
### New methods/functionality
- [download_sradb_file()] makes fetching [SRAmetadb.sqlite] file easy; wget is no longer required.
- [ftp] protocol is now supported besides [fsp] and hence [aspera-client] is now optional. We however, strongly recommend [aspera-client] for faster downloads.
### Bug fixes
- Silenced [SettingWithCopyWarning] by excplicitly doing
operations on a copy of the dataframe instead of the original.
Besides these, all methods now follow a [numpydoc]
compatible documentation.
## 0.1.0 (12-01-2018)
- First release on PyPI.
================================================
FILE: LICENSE
================================================
BSD 3-Clause License
Copyright (c) 2020-2023, Saket Choudhary
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: MANIFEST.in
================================================
include AUTHORS.md
include CONTRIBUTING.md
include HISTORY.md
include LICENSE
include README.md
include requirements.txt
recursive-include tests *
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
recursive-exclude * *.sqlite
recursive-exclude * *.sqlite.gz
recursive-include docs *.md conf.py Makefile make.bat *.jpg *.png *.gif *.rst
================================================
FILE: Makefile
================================================
.PHONY: clean clean-test clean-pyc clean-build docs help
.DEFAULT_GOAL := help
define BROWSER_PYSCRIPT
import os, webbrowser, sys
try:
from urllib import pathname2url
except:
from urllib.request import pathname2url
webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1])))
endef
export BROWSER_PYSCRIPT
define PRINT_HELP_PYSCRIPT
import re, sys
for line in sys.stdin:
match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line)
if match:
target, help = match.groups()
print("%-20s %s" % (target, help))
endef
export PRINT_HELP_PYSCRIPT
BROWSER := python -c "$$BROWSER_PYSCRIPT"
help:
@python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)
clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts
clean-build: ## remove build artifacts
rm -fr build/
rm -fr dist/
rm -fr .eggs/
find . -name '*.egg-info' -exec rm -fr {} +
find . -name '*.egg' -exec rm -f {} +
clean-pyc: ## remove Python file artifacts
find . -name '*.pyc' -exec rm -f {} +
find . -name '*.pyo' -exec rm -f {} +
find . -name '*~' -exec rm -f {} +
find . -name '__pycache__' -exec rm -fr {} +
clean-test: ## remove test and coverage artifacts
rm -fr .tox/
rm -f .coverage
rm -fr htmlcov/
rm -fr .pytest_cache
lint: ## check style with flake8
flake8 pysradb tests
test: ## run tests quickly with the default Python
pytest -s -v tests
test-all: ## run tests on every Python version with tox
tox
coverage: ## check code coverage quickly with the default Python
coverage run --source pysradb -m pytest
coverage report -m
coverage html
docs: ## generate Sphinx HTML documentation, including API docs
rm -f docs/pysradb.rst
rm -f docs/modules.rst
sphinx-apidoc -o docs/ pysradb
$(MAKE) -C docs clean
$(MAKE) -C docs html
servedocs: docs ## compile the docs watching for changes
#watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D .
watchmedo shell-command -p '*.md|*.rst' -c '$(MAKE) -C docs html' -R -D .
release: dist ## package and upload a release
python -m build
twine upload dist/*
dist: clean ## builds source and wheel package
python -m build
ls -l dist
install: clean ## install the package to the active Python's site-packages
pip install -e .
================================================
FILE: README.md
================================================
# A Python package for retrieving metadata from SRA/ENA/GEO
[](https://pypi.python.org/pypi/pysradb)
[](https://anaconda.org/bioconda/pysradb/badges/version.svg)
[](http://bioconda.github.io/recipes/pysradb/README.html)
[](https://pepy.tech/project/pysradb)
[](https://zenodo.org/badge/latestdoi/159590788)
[](https://github.com/saketkc/pysradb/actions)
## Documentation
<https://saketkc.github.io/pysradb>
## CLI Usage
`pysradb` supports command line usage. See
[CLI](https://saket-choudhary.me/pysradb/cmdline.html) instructions or
[quickstart
guide](https://www.saket-choudhary.me/pysradb/quickstart.html).
$ pysradb
usage: pysradb [-h] [--version] [--citation]
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
...
pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
version: 3.0.0
Citation: 10.12688/f1000research.18676.1
options:
-h, --help show this help message and exit
--version show program's version number and exit
--citation how to cite
subcommands:
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
metadata Fetch metadata for SRA project (SRPnnnn)
download Download SRA project (SRPnnnn)
search Search SRA/ENA for matching text
gse-to-gsm Get GSM for a GSE
gse-to-srp Get SRP for a GSE
gsm-to-gse Get GSE for a GSM
gsm-to-srp Get SRP for a GSM
gsm-to-srr Get SRR for a GSM
gsm-to-srs Get SRS for a GSM
gsm-to-srx Get SRX for a GSM
srp-to-gse Get GSE for a SRP
srp-to-srr Get SRR for a SRP
srp-to-srs Get SRS for a SRP
srp-to-srx Get SRX for a SRP
srr-to-gsm Get GSM for a SRR
srr-to-srp Get SRP for a SRR
srr-to-srs Get SRS for a SRR
srr-to-srx Get SRX for a SRR
srs-to-gsm Get GSM for a SRS
srs-to-srx Get SRX for a SRS
srx-to-srp Get SRP for a SRX
srx-to-srr Get SRR for a SRX
srx-to-srs Get SRS for a SRX
geo-matrix Download and parse GEO Matrix files
srp-to-pmid Get PMIDs for SRP accessions
gse-to-pmid Get PMIDs for GSE accessions
pmid-to-gse Get GSE accessions from PMIDs
pmid-to-srp Get SRP accessions from PMIDs
pmc-to-identifiers Extract database identifiers from PMC articles
pmid-to-identifiers
Extract database identifiers from PubMed articles
doi-to-gse Get GSE accessions from DOIs
doi-to-srp Get SRP accessions from DOIs
doi-to-identifiers Extract database identifiers from articles via DOI
## Quickstart
A Google Colaboratory version of most used commands are available in
this [Colab
Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR)
. Note that this requires only an active internet connection (no
additional downloads are made).
The following notebooks document all the possible features of
\`pysradb\`:
1. [Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
2. [Downloading datasets from SRA - command
line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
3. [Parallely download multiple datasets - Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
4. [Converting SRA-to-fastq - command line (requires
conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
5. [Downloading subsets of a project - Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
6. [Metadata for multiple
SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Multiple_SRPs.ipynb)
7. [Searching
SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Query_Search.ipynb)
8. [Extracting identifiers from PMC/DOI](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.PMC_DOI_Identifiers.ipynb)
9. [Metadata Enrichment with LLMs](https://colab.research.google.com/github/saketkc/pysradb/blob/develop/notebooks/09.Metadata_enrichment.ipynb)
## Installation
To install stable version using \`pip\`:
```bash
pip install pysradb
```
Alternatively, if you use conda:
```bash
conda install -c bioconda pysradb
```
This step will install all the dependencies. If you have an existing
environment with a lot of pre-installed packages, conda might be
[slow](https://github.com/bioconda/bioconda-recipes/issues/13774).
Please consider creating a new enviroment for `pysradb`:
```bash
conda create -c bioconda -n pysradb PYTHON=3.13 pysradb
```
### Dependencies
pandas
requests
tqdm
xmltodict
### Installing pysradb in development mode
git clone https://github.com/saketkc/pysradb.git
cd pysradb && pip install -r requirements.txt
pip install -e .
## Using pysradb
### Obtaining SRA metadata
$ pysradb metadata SRP000941 | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases
SRP000941 SRX056722 Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS184466 Illumina HiSeq 2000 26900401 531654480 SRR179707 26900401 807012030
SRP000941 SRX027889 Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS116481 Illumina Genome Analyzer II 37528590 779578968 SRR067978 37528590 1351029240
SRP000941 SRX027888 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116483 Illumina Genome Analyzer II 13603127 3232309537 SRR067977 13603127 489712572
SRP000941 SRX027887 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116562 Illumina Genome Analyzer II 22430523 506327844 SRR067976 22430523 807498828
SRP000941 SRX027886 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116560 Illumina Genome Analyzer II 15342951 301720436 SRR067975 15342951 552346236
SRP000941 SRX027885 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116482 Illumina Genome Analyzer II 39725232 851429082 SRR067974 39725232 1430108352
SRP000941 SRX027884 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116481 Illumina Genome Analyzer II 32633277 544478483 SRR067973 32633277 1174797972
SRP000941 SRX027883 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS004118 Illumina Genome Analyzer II 22150965 3262293717 SRR067972 9357767 336879612
SRP000941 SRX027883 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS004118 Illumina Genome Analyzer II 22150965 3262293717 SRR067971 12793198 460555128
### Obtaining detailed SRA metadata
$ pysradb metadata SRP075720 --detailed | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases
SRP075720 SRX1800476 GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467643 Illumina HiSeq 2500 2547148 97658407 SRR3587912 2547148 127357400
SRP075720 SRX1800475 GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467642 Illumina HiSeq 2500 2676053 101904264 SRR3587911 2676053 133802650
SRP075720 SRX1800474 GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467641 Illumina HiSeq 2500 1603567 61729014 SRR3587910 1603567 80178350
SRP075720 SRX1800473 GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467640 Illumina HiSeq 2500 2498920 94977329 SRR3587909 2498920 124946000
SRP075720 SRX1800472 GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467639 Illumina HiSeq 2500 2226670 83473957 SRR3587908 2226670 111333500
SRP075720 SRX1800471 GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467638 Illumina HiSeq 2500 2269546 87486278 SRR3587907 2269546 113477300
SRP075720 SRX1800470 GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467636 Illumina HiSeq 2500 2333284 88669838 SRR3587906 2333284 116664200
SRP075720 SRX1800469 GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467637 Illumina HiSeq 2500 2071159 79689296 SRR3587905 2071159 103557950
SRP075720 SRX1800468 GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS1467635 Illumina HiSeq 2500 2321657 89307894 SRR3587904 2321657 116082850
### Enriching metadata via CLI
Enrich metadata with standardized biological attributes using biomedical-specialized LLMs through the command line:
```bash
# Basic enrichment with default backend (Meditron)
$ pysradb metadata GSE286254 --detailed --enrich
# Using OpenBioLLM-8B (larger, trained on 500k+ biomedical entries)
$ pysradb metadata GSE286254 --detailed --enrich --enrich-backend ollama/openbiollm-8b
```
Available biomedical backends:
- `ollama/meditron` (default, 7B - optimized for medical text)
- `ollama/openbiollm-8b` (8B - trained on 500k+ biomedical entries, superior biomedical performance)
This returns the original metadata plus 9 enriched columns:
- `guessed_organ`
- `guessed_tissue`
- `guessed_anatomical_system`
- `guessed_cell_type`
- `guessed_disease`
- `guessed_sex`
- `guessed_development_stage`
- `guessed_assay`
- `guessed_organism`
For more details on enrichment features, prerequisites, and Python API usage, see the [Enriching metadata](#enriching-metadata) section below.
### Converting SRP to GSE
$ pysradb srp-to-gse SRP075720
study_accession study_alias
SRP075720 GSE81903
### Converting GSM to SRP
$ pysradb gsm-to-srp GSM2177186
experiment_alias study_accession
GSM2177186 SRP075720
### Converting GSM to GSE
$ pysradb gsm-to-gse GSM2177186
experiment_alias study_alias
GSM2177186 GSE81903
### Converting GSM to SRX
$ pysradb gsm-to-srx GSM2177186
experiment_alias experiment_accession
GSM2177186 SRX1800089
### Converting GSM to SRR
$ pysradb gsm-to-srr GSM2177186
experiment_alias run_accession
GSM2177186 SRR3587529
### Converting SRP to PMID
$ pysradb srp-to-pmid SRP045778
srp_accession bioproject pmid
SRP045778 PRJNA257197 27373336
### Converting GSE to PMID
$ pysradb gse-to-pmid GSE253406
gse_accession pmid
GSE253406 39528918
### Extracting identifiers from PMC/DOI
Extract database identifiers (GSE, PRJNA, SRP, etc.) from PubMed Central articles or DOIs. This feature automatically converts between GSE and SRP identifiers even when papers only mention one type!
#### Get all identifiers from a PMID
$ pysradb pmid-to-identifiers 39528918
pmid pmc_id gse_ids prjna_ids srp_ids
39528918 PMC10802650 GSE253406 PRJNA1058002 SRP484103
#### Get only GSE or SRP from PMID
$ pysradb pmid-to-gse 39528918
pmid pmc_id gse_ids
39528918 PMC10802650 GSE253406
$ pysradb pmid-to-srp 39528918
pmid pmc_id srp_ids
39528918 PMC10802650 SRP484103
#### Extract from DOI
$ pysradb doi-to-identifiers 10.12688/f1000research.18676.1
doi pmid pmc_id gse_ids srp_ids
10.12688/f1000research.18676.1 30873266 PMC6411813 GSE... SRP...
#### Extract from PMC ID
$ pysradb pmc-to-identifiers PMC10802650
pmc_id gse_ids prjna_ids srp_ids
PMC10802650 GSE253406 PRJNA1058002 SRP484103
### Enriching metadata
Extract standardized biological metadata from SRA/GEO datasets using LLMs.
#### Quickstart
```python
from pysradb import SRAweb
client = SRAweb()
df = client.metadata("GSE286254", detailed=True, enrich=True)
# Returns original + 9 enriched columns (might not always be complete):
# guessed_organ, guessed_tissue, guessed_anatomical_system,
# guessed_cell_type, guessed_disease, guessed_sex,
# guessed_development_stage, guessed_assay, guessed_organism
```
#### Prerequisites
Install Ollama: https://ollama.ai
```bash
# Default backend (recommended)
ollama pull meditron
# Or use OpenBioLLM-8B for better biomedical performance
ollama pull openbiollm-8b
```
#### Advanced Usage
```python
# Use OpenBioLLM-8B backend (trained on 500k+ biomedical entries)
client = SRAweb()
df = client.metadata("GSE286254", detailed=True, enrich=True,
enrich_backend="ollama/openbiollm-8b")
# Manual enrichment with custom settings
from pysradb.metadata_enrichment import create_metadata_extractor, load_ontology_reference
# LLM-based extraction with default backend (meditron)
extractor_llm = create_metadata_extractor(method="llm")
df_enriched = extractor_llm.enrich_dataframe(df, prefix="guessed_")
# LLM-based extraction with specific biomedical backend
extractor_bio = create_metadata_extractor(method="llm", backend="ollama/openbiollm-8b")
df_enriched = extractor_bio.enrich_dataframe(df, prefix="guessed_")
# Embedding-based extraction (faster, offline)
ontology_ref = load_ontology_reference()
extractor_emb = create_metadata_extractor(
method="embedding",
model="FremyCompany/BioLORD-2023",
reference_categories=ontology_ref
)
df_enriched = extractor_emb.enrich_dataframe(df, prefix="guessed_")
```
See [Notebook 09](notebooks/09.Metadata_Enrichment_with_LLMs.ipynb) for detailed examples.
### Downloading supplementary files from GEO
$ pysradb download -g GSE161707
### Downloading an entire SRA/ENA project (multithreaded)
`pysradb` makes it super easy to download datasets from SRA in parallel:
Using 8 threads to download:
$ pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852
Downloads are organized by `SRP/SRX/SRR` mimicking the hierarchy of SRA
projects.
## Publication
> [pysradb: A Python package to query next-generation sequencing
> metadata and data from NCBI Sequence Read
> Archive](https://f1000research.com/articles/8-532/v1)
>
> Presentation slides from BOSC (ISMB-ECCB) 2019:
> <https://f1000research.com/slides/8-1183>
## Citation
Choudhary, Saket. \"pysradb: A Python Package to Query next-Generation
Sequencing Metadata and Data from NCBI Sequence Read Archive.\"
F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532
(<https://f1000research.com/articles/8-532/v1>)
@article{Choudhary2019,
doi = {10.12688/f1000research.18676.1},
url = {https://doi.org/10.12688/f1000research.18676.1},
year = {2019},
month = apr,
publisher = {F1000 (Faculty of 1000 Ltd)},
volume = {8},
pages = {532},
author = {Saket Choudhary},
title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive},
journal = {F1000Research}
}
Zenodo archive: <https://zenodo.org/badge/latestdoi/159590788>
Zenodo DOI: 10.5281/zenodo.2306881
## Questions?
Open an [issue](https://github.com/saketkc/pysradb/issues).
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXPROJ = pysradb
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/_static/copy-button.js
================================================
// Add copy button to code blocks
document.addEventListener('DOMContentLoaded', function() {
// SVG icon for clipboard
const clipboardIcon = `<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M16 4h2a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h2"></path><rect x="8" y="2" width="8" height="4" rx="1" ry="1"></rect></svg>`;
// SVG icon for checkmark
const checkmarkIcon = `<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3" stroke-linecap="round" stroke-linejoin="round"><polyline points="20 6 9 17 4 12"></polyline></svg>`;
// Find all code input blocks (from notebooks and regular code blocks)
// Strategy:
// 1. For notebooks: explicitly target pre tags inside input_area
// 2. For regular docs: target all highlight divs, then filter with skip conditions
let codeBlocks = document.querySelectorAll('div.input_area > div.highlight > pre');
// Also get regular documentation code blocks
const docBlocks = document.querySelectorAll('div.highlight > pre');
// Combine and deduplicate
codeBlocks = Array.from(codeBlocks).concat(
Array.from(docBlocks).filter(block => {
// Skip if already in notebook input_area
if (block.closest('div.input_area')) return false;
// Skip if in prompt or output
if (block.closest('.prompt')) return false;
if (block.closest('.nboutput')) return false;
if (block.closest('.output_area')) return false;
return true;
})
);
codeBlocks.forEach(function(codeBlock) {
// Don't add button if already present
if (codeBlock.querySelector('.copy-button') || codeBlock.parentElement.querySelector('.copy-button')) {
return;
}
// Create copy button
const button = document.createElement('button');
button.className = 'copy-button';
button.innerHTML = clipboardIcon;
button.title = 'Copy code to clipboard';
// Style the button
button.style.cssText = `
position: absolute;
top: 0.5rem;
right: 0.5rem;
padding: 0.4rem;
background-color: rgba(0, 0, 0, 0.3);
color: white;
border: 1px solid rgba(255, 255, 255, 0.3);
border-radius: 0.25rem;
cursor: pointer;
display: flex;
align-items: center;
justify-content: center;
z-index: 1;
transition: all 0.2s ease;
width: 28px;
height: 28px;
padding: 0;
`;
// Add hover effect
button.onmouseover = function() {
this.style.backgroundColor = 'rgba(0, 0, 0, 0.5)';
};
button.onmouseout = function() {
this.style.backgroundColor = 'rgba(0, 0, 0, 0.3)';
};
// Make pre block relative positioned
codeBlock.style.position = 'relative';
// Add click event
button.addEventListener('click', function() {
const code = codeBlock.querySelector('code');
const text = code ? code.textContent : codeBlock.textContent;
// Copy to clipboard
navigator.clipboard.writeText(text).then(function() {
// Change button icon and color temporarily
const originalHTML = button.innerHTML;
button.innerHTML = checkmarkIcon;
button.style.backgroundColor = 'rgba(34, 197, 94, 0.7)';
setTimeout(function() {
button.innerHTML = originalHTML;
button.style.backgroundColor = 'rgba(0, 0, 0, 0.3)';
}, 2000);
}).catch(function(err) {
console.error('Failed to copy:', err);
});
});
// Append button to code block
codeBlock.appendChild(button);
});
});
================================================
FILE: docs/_static/custom.css
================================================
/* Override Pygments code block background color for light mode */
.highlight {
background: #f5f5f5 !important;
}
/* Ensure code block background uses our color */
.highlight pre {
background: #f5f5f5 !important;
}
/* Override inline code highlighting */
.highlighttable {
background: #f5f5f5 !important;
}
.highlighttable td.linenos {
background: #f5f5f5 !important;
}
/* Dark mode overrides */
[data-theme="dark"] .highlight {
background: #1e293b !important;
}
[data-theme="dark"] .highlight pre {
background: #1e293b !important;
}
[data-theme="dark"] .highlighttable {
background: #1e293b !important;
}
[data-theme="dark"] .highlighttable td.linenos {
background: #1e293b !important;
}
================================================
FILE: docs/authors.md
================================================
# Credits
## Contributors
- [Boshen Yan](https://github.com/bscrow)
- [Maarten van der Sande](https://github.com/Maarten-vd-Sande)
- [Dibya Gautam](https://github.com/dibyaaaaax)
- [Marius van den Beek](https://github.com/mvdbeek)
- [Devang Thakkar](https://github.com/DevangThakkar)
## Maintainer
- Saket Choudhary \<<saketkc@gmail.com>\>
================================================
FILE: docs/case_studies.md
================================================
# Case Studies
## Case Study 1
Consider a scenario where somone is interested in searching for
single-cell RNA-seq datasets. In particular, the interest is in studying
retina:
$ pysradb search --query "single-cell rna-seq retina"
study_accession experiment_accession experiment_title sample_taxon_id sample_scientific_name experiment_library_strategy experiment_library_source experiment_library_selection sample_accession sample_alias experiment_instrument_model pool_member_spots run_1_size run_1_accession run_1_total_spots run_1_total_bases
SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other SRS7946094 GSM4995565 Illumina NovaSeq 6000 55435867 2637580797 SRR13329759 55435867 6874047508
SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946093 GSM4995564 Illumina NovaSeq 6000 96123725 4107807391 SRR13329758 96123725 12688331700
SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946092 GSM4995563 Illumina NovaSeq 6000 94345783 4056010488 SRR13329757 94345783 12453643356
SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946091 GSM4995562 Illumina NovaSeq 6000 99487074 4240172698 SRR13329756 99487074 13132293768
SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946090 GSM4995561 Illumina NovaSeq 6000 88048461 3817540828 SRR13329755 88048461 11622396852
SRP257758 SRX9537754 GSM4916438: Pou4f2-tdTomato/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743995 GSM4916438 Illumina HiSeq 2500 364683840 8246658699 SRR13091939 364683840 32456861760
SRP257758 SRX9537753 GSM4916437: Atoh7-zsGreen/lacZ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743994 GSM4916437 Illumina HiSeq 2500 530456067 11895864680 SRR13091938 530456067 47210589963
SRP257758 SRX9537752 GSM4916436: Atoh7-zsGreen/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743993 GSM4916436 Illumina HiSeq 2500 389849416 8671923722 SRR13091937 389849416 34696598024
SRP257758 SRX9537751 GSM4916435: Atoh7-zsGreen/lacZ E14.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743992 GSM4916435 Illumina HiSeq 2500 328878355 7875737709 SRR13091936 328878355 29270173595
SRP257758 SRX9537750 GSM4916434: Atoh7-zsGreen/+ E14.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743991 GSM4916434 Illumina HiSeq 2500 522040155 12760941656 SRR13091935 522040155 46461573795
ERP118072 ERX3614517 NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606 Homo sapiens OTHER TRANSCRIPTOMIC Oligo-dT ERS3920269 SAMEA6120013 NextSeq 500 5818488 43355751 ERR3619129 1457318 109897743
ERP118072 ERX3614516 NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606 Homo sapiens OTHER TRANSCRIPTOMIC Oligo-dT ERS3920268 SAMEA6120012 NextSeq 500 5422441 40645479 ERR3619125 1359663 102468758
SRP288715 SRX9369597 RPE1_SS119_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591452 RPE1_SS119_p10.bam Illumina HiSeq 2000 5062938 88426773 SRR12904705 5062938 202517520
SRP288715 SRX9369596 RPE1_SS119_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591451 RPE1_SS119_p0.bam Illumina HiSeq 2000 978835 19219630 SRR12904706 978835 39153400
SRP288715 SRX9369595 RPE1_SS111_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591450 RPE1_SS111_p10.bam Illumina HiSeq 2000 6205827 108129733 SRR12904707 6205827 248233080
SRP288715 SRX9369594 RPE1_SS111_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591449 RPE1_SS111_p0.bam Illumina HiSeq 2000 928703 18488436 SRR12904708 928703 37148120
SRP288715 SRX9369593 RPE1_SS51_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591448 RPE1_SS51_p10.bam Illumina HiSeq 2000 6088168 106065537 SRR12904709 6088168 243526720
SRP288715 SRX9369592 RPE1_SS51_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591447 RPE1_SS51_p0.bam Illumina HiSeq 2000 1624227 30610200 SRR12904710 1624227 64969080
SRP288715 SRX9369591 RPE1_SS48_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591446 RPE1_SS48_p10.bam Illumina HiSeq 2000 8117881 139408135 SRR12904711 8117881 324715240
SRP288715 SRX9369590 RPE1_SS48_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591445 RPE1_SS48_p0.bam Illumina HiSeq 2000 776140 15821200 SRR12904712 776140 31045600
By default search returns first 20 hits. `SRP299803` seems like a
project of interest. However the information outputted by the `search`
command is pretty limited. We want to look up more detailed information
about this project:
$ pysradb metadata SRP299803 | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other PAIRED SRS7946094 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 55435867 2637580797 SRR13329759 55435867 6874047508
SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946093 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 96123725 4107807391 SRR13329758 96123725 12688331700
SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946092 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 94345783 4056010488 SRR13329757 94345783 12453643356
SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946091 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 99487074 4240172698 SRR13329756 99487074 13132293768
SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946090 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 88048461 3817540828 SRR13329755 88048461 11622396852
It is also possible to get more detailed information using the
`--detailed` flag:
$ pysradb metadata SRP075720 --detailed
run_accession study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases run_alias sra_url experiment_alias source_name strain background genotype tissue/cell type molecule subtype ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
SRR13329759 SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other PAIRED SRS7946094 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 55435867 2637580797 55435867 6874047508 GSM4995565_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329759 GSM4995565 wild type_retina C57BL/6 wild type retina http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz
SRR13329758 SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946093 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 96123725 4107807391 96123725 12688331700 GSM4995564_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra70/SRR/013017/SRR13329758 GSM4995564 Vsx2SE Δ/Δ_retina C57BL/6 Vsx2SE {delta}/{delta} retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz
SRR13329757 SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946092 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 94345783 4056010488 94345783 12453643356 GSM4995563_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra79/SRR/013017/SRR13329757 GSM4995563 Vsx2SE Δ/Δ_retina C57BL/6 Vsx2SE {delta}/{delta} retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz
SRR13329756 SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946091 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 99487074 4240172698 99487074 13132293768 GSM4995562_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329756 GSM4995562 wild type_retina C57BL/6 wild type retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz
SRR13329755 SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946090 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 88048461 3817540828 88048461 11622396852 GSM4995561_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra72/SRR/013017/SRR13329755 GSM4995561 wild type_retina C57BL/6 wild type retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz
Having made sure this dataset is indeed of interest, we want to save
some work and see if the processed dataset has been made available on
GEO by the authors:
$ pysradb srp-to-gse SRP299803
study_accession study_alias
SRP299803 GSE164044
So indeed a GEO project exists for this SRA dataset.
Notice, that the GEO information was also visible in the
`metadata --detailed` operation. Assume we were in posession of the GSM
id of one of the experiments to start off with, say `GSE4995565`.
Starting from this GSM id, we want to get the following information:
- SRP id of the project
- GSE id of the project
- SRX id of the experiment
- SRR id(s) corresponding to the experiment
Get SRP id:
$ pysradb gsm-to-srp GSM4995565
experiment_alias study_accession
GSM4995565 SRP299803
Get GSE id:
$ pysradb gsm-to-gse GSM4995565
experiment_alias study_alias
GSM4995565 GSE164044
Get SRX id:
$ pysradb gsm-to-srx GSM4995565
experiment_alias experiment_accession
GSM4995565 SRX9756769
Getting SRR id(s):
$ pysradb gsm-to-srr GSM4995565
experiment_alias run_accession
GSM4995565 SRR13329759
## Case Study 2
Our first case study included metadata search. Next, we explore
downloading datasets.
We have a SRP id to start off with: `SRP000941`. We want to quickly
checkout its contents:
$ pysradb metadata SRP000941 --detailed| head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
SRP000941 SRX056722 Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells 9606 Homo sapiens SAK270 ChIP-Seq GENOMIC ChIP SINGLE SRS184466 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26900401 531654480 SRR179707 26900401 807012030
SRP000941 SRX027889 Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606 Homo sapiens SAK201 ChIP-Seq GENOMIC ChIP SINGLE SRS116481 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 37528590 779578968 SRR067978 37528590 1351029240
SRP000941 SRX027888 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens LLH1U ChIP-Seq GENOMIC RANDOM SINGLE SRS116483 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 13603127 3232309537 SRR067977 13603127 489712572
SRP000941 SRX027887 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens DM219 ChIP-Seq GENOMIC RANDOM SINGLE SRS116562 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 22430523 506327844 SRR067976 22430523 807498828
This project is a collection of multiple assays.
$ pysradb metadata SRP000941 --detailed | tr -s ' ' | cut -f5 -d ' ' | sort | uniq -c
999 Bisulfite-Seq
768 ChIP-Seq
1 library_strategy
121 OTHER
353 RNA-Seq
28 WGS
We want to however only download `RNA-seq` samples:
$ pysradb metadata SRP000941 --detailed | grep 'study\|RNA-Seq' | pysradb download
This will download all `RNA-seq` samples coming from this project using
`aspera-client`, if available. Alternatively, it can also use `wget`.
Downloading an entire project is easy:
$ pysradb download -p SRP000941
Downloads are organized by `SRP/SRX/SRR` mimicking the hiererachy of SRA
projects.
================================================
FILE: docs/cmdline.md
================================================
# CLI
$ pysradb
usage: pysradb [-h] [--version] [--citation]
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
...
pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
Citation: 10.12688/f1000research.18676.1
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--citation how to cite
subcommands:
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
metadata Fetch metadata for SRA project (SRPnnnn)
download Download SRA project (SRPnnnn)
search Search SRA/ENA for matching text
gse-to-gsm Get GSM for a GSE
gse-to-srp Get SRP for a GSE
gsm-to-gse Get GSE for a GSM
gsm-to-srp Get SRP for a GSM
gsm-to-srr Get SRR for a GSM
gsm-to-srs Get SRS for a GSM
gsm-to-srx Get SRX for a GSM
srp-to-gse Get GSE for a SRP
srp-to-srr Get SRR for a SRP
srp-to-srs Get SRS for a SRP
srp-to-srx Get SRX for a SRP
srr-to-gsm Get GSM for a SRR
srr-to-srp Get SRP for a SRR
srr-to-srs Get SRS for a SRR
srr-to-srx Get SRX for a SRR
srs-to-gsm Get GSM for a SRS
srs-to-srx Get SRX for a SRS
srx-to-srp Get SRP for a SRX
srx-to-srr Get SRR for a SRX
srx-to-srs Get SRS for a SRX
geo-matrix Download and parse GEO Matrix files
srp-to-pmid Get PMIDs for SRP accessions
gse-to-pmid Get PMIDs for GSE accessions
pmid-to-gse Get GSE accessions from PMIDs
pmid-to-srp Get SRP accessions from PMIDs
pmc-to-identifiers Extract database identifiers from PMC articles
pmid-to-identifiers Extract database identifiers from PubMed articles
doi-to-gse Get GSE accessions from DOIs
doi-to-srp Get SRP accessions from DOIs
doi-to-identifiers Extract database identifiers from articles via DOI
## Enriching metadata
Extract standardized biological metadata from SRA/GEO datasets using LLMs.
### Quickstart
```bash
from pysradb import SRAweb
client = SRAweb()
df = client.metadata("GSE286254", detailed=True, enrich=True)
# Returns original + 9 enriched columns (might not always be complete):
# guessed_organ, guessed_tissue, guessed_anatomical_system,
# guessed_cell_type, guessed_disease, guessed_sex,
# guessed_development_stage, guessed_assay, guessed_organism
```
### Prerequisites
Install Ollama: <https://ollama.ai>
```bash
ollama pull phi3
```
### Advanced Usage
```bash
# Use different model
df = client.metadata("GSE286254", detailed=True, enrich=True,
enrich_backend="ollama/llama3.2")
# Manual enrichment with custom settings
from pysradb.metadata_enrichment import create_metadata_extractor, load_ontology_reference
# LLM-based extraction
extractor_llm = create_metadata_extractor(method="llm", backend="ollama/phi3")
df_enriched = extractor_llm.enrich_dataframe(df, prefix="guessed_")
# Embedding-based extraction (faster, offline)
ontology_ref = load_ontology_reference()
extractor_emb = create_metadata_extractor(
method="embedding",
model="FremyCompany/BioLORD-2023",
reference_categories=ontology_ref
)
df_enriched = extractor_emb.enrich_dataframe(df, prefix="guessed_")
```
See [Notebook 09](https://github.com/saketkc/pysradb/blob/develop/notebooks/09.Metadata_Enrichment_with_LLMs.ipynb) for detailed examples.
## Getting metadata for a SRA project (SRP)
The most basic information associated with any SRA project is its list
of experiments and run accessions.
$ pysradb metadata SRP098789
study_accession experiment_accession sample_accession run_accession
SRP098789 SRX2536403 SRS1956353 SRR5227288
SRP098789 SRX2536404 SRS1956354 SRR5227289
SRP098789 SRX2536405 SRS1956355 SRR5227290
SRP098789 SRX2536406 SRS1956356 SRR5227291
SRP098789 SRX2536407 SRS1956357 SRR5227292
SRP098789 SRX2536408 SRS1956358 SRR5227293
SRP098789 SRX2536409 SRS1956359 SRR5227294
Listing SRX and SRRs for a SRP is often not useful. We might want to
take a quick look at the metadata associated with the samples:
$ pysradb metadata SRP098789
study_accession experiment_accession sample_accession run_accession sample_attribute
SRP098789 SRX2536403 SRS1956353 SRR5227288 source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
SRP098789 SRX2536404 SRS1956354 SRR5227289 source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
SRP098789 SRX2536405 SRS1956355 SRR5227290 source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
SRP098789 SRX2536406 SRS1956356 SRR5227291 source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
SRP098789 SRX2536407 SRS1956357 SRR5227292 source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
SRP098789 SRX2536408 SRS1956358 SRR5227293 source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
The example here came from a Ribosome profiling study and consists of a
collection of both Ribo-seq and RNA-seq samples. We can filter out only
the RNA-seq samples:
$ pysradb metadata SRP098789 --detailed | grep 'study|RNA-Seq'
SRP098789 SRX2536422 SRR5227307 RNA-Seq SINGLE -
SRP098789 SRX2536424 SRR5227309 RNA-Seq SINGLE -
SRP098789 SRX2536426 SRR5227311 RNA-Seq SINGLE -
SRP098789 SRX2536428 SRR5227313 RNA-Seq SINGLE -
A more complicated example will consist of multiple assays. For example
\`SRP000941\`:
$ pysradb metadata SRP000941 --detailed | tr -s ' ' | cut -f5 -d ' ' | sort | uniq -c
999 Bisulfite-Seq
768 ChIP-Seq
1 library_strategy
121 OTHER
353 RNA-Seq
28 WGS
## Enriching metadata
You can enrich metadata with standardized biological attributes using biomedical-specialized LLMs through the `--enrich` flag:
### Basic enrichment (using default backend)
$ pysradb metadata GSE286254 --detailed --enrich
The default uses **Meditron** (7B parameters, trained on medical literature and guidelines), which is optimized for biomedical text understanding.
This returns the original metadata plus 9 enriched columns:
- `guessed_organ`
- `guessed_tissue`
- `guessed_anatomical_system`
- `guessed_cell_type`
- `guessed_disease`
- `guessed_sex`
- `guessed_development_stage`
- `guessed_assay`
- `guessed_organism`
### Using alternative biomedical backends
$ pysradb metadata GSE286254 --detailed --enrich --enrich-backend ollama/openbiollm-8b
Available biomedical backends:
- `ollama/meditron` (default, 7B - optimized for medical text)
- `ollama/openbiollm-8b` (8B - trained on 500k+ biomedical entries, superior biomedical performance)
Both models are specialized for biomedical and clinical text understanding, making them ideal for SRA metadata enrichment.
For more details on enrichment features and prerequisites, see the [Enriching metadata](#enriching-metadata) section above.
## Experiment accessions for a project (SRP =\> SRX)
A frequently encountered task involves getting all the experiments (SRX)
for a particular study accession (SRP). Consider project \`SRP048759\`:
$ pysradb srp-to-srx SRP048759
## Sample accessions for a project (SRP =\> SRS)
Each experiment involves one or multiple biological samples (SRS), that
are put through different experiments (SRX).
$ pysradb srp-to-srs --detailed SRP048759
study_accession sample_accession
SRP048759 SRS718878
SRP048759 SRS718879
SRP048759 SRS718880
SRP048759 SRS718881
SRP048759 SRS718882
SRP048759 SRS718883
SRP048759 SRS718884
SRP048759 SRS718885
SRP048759 SRS718886
This is very limited information. It can again be detailed out using the
[\--detailed]{.title-ref} flag:
$ pysradb srp-to-srs --detailed SRP048759
study_accession sample_accession experiment_accession run_accession study_alias sample_alias experiment_alias run_alias
SRP048759 SRS718878 SRX729552 SRR1608490 GSE62190 GSM1521543 GSM1521543 GSM1521543_r1
SRP048759 SRS718878 SRX729552 SRR1608491 GSE62190 GSM1521543 GSM1521543 GSM1521543_r2
SRP048759 SRS718878 SRX729552 SRR1608492 GSE62190 GSM1521543 GSM1521543 GSM1521543_r3
SRP048759 SRS718878 SRX729552 SRR1608493 GSE62190 GSM1521543 GSM1521543 GSM1521543_r4
SRP048759 SRS718879 SRX729553 SRR1608494 GSE62190 GSM1521544 GSM1521544 GSM1521544_r1
SRP048759 SRS718879 SRX729553 SRR1608495 GSE62190 GSM1521544 GSM1521544 GSM1521544_r2
## Run accessions for experiments (SRX =\> SRR)
Another frequently encountered task involves fetching the run accessions
(SRR) for a particular experiment (SRX). Consider experiments
[SRX217956]{.title-ref} and [SRX2536403]{.title-ref}. We want to be able
to resolve the run accessions for these experiments:
$ pysradb srx-to-srr SRX217956 SRX2536403 --detailed
experiment_accession run_accession study_accession sample_attribute
SRX217956 SRR649752 SRP017942 source_name: 3T3 cells || treatment: control || cell line: 3T3 cells || assay type: Riboseq
SRX2536403 SRR5227288 SRP098789 source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
## Experiment accessions for runs (SRR =\> SRX)
For fetching experiment accessions (SRX) for one or multiple run
accessions (SRR):
$ pysradb srr-to-srx SRR5227288 SRR649752 --detailed
run_accession study_accession experiment_accession sample_attribute
SRR649752 SRP017942 SRX217956 source_name: 3T3 cells || treatment: control || cell line: 3T3 cells || assay type: Riboseq
SRR5227288 SRP098789 SRX2536403 source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
## Downaloading entire project
$ pysradb metadata --detailed SRP098789 | pysradb download
## GEO accessions for studies (SRP =\> GSE)
$ pysradb srp-to-gse SRP090415
study_accession study_alias
SRP090415 GSE87328
But not all SRPs will have an associated GEO id (GSE):
$ pysradb srp-to-gse SRP029589
study_accession study_alias
SRP029589 PRJNA218051
## Converting GSM to SRP
$ pysradb gsm-to-srp GSM2177186
experiment_alias study_accession
GSM2177186 SRP075720
## Converting GSM to GSE
$ pysradb gsm-to-gse GSM2177186
experiment_alias study_alias
GSM2177186 GSE81903
## Converting GSM to SRX
$ pysradb gsm-to-srx GSM2177186
experiment_alias experiment_accession
GSM2177186 SRX1800089
## Converting GSM to SRR
$ pysradb gsm-to-srr GSM2177186
experiment_alias run_accession
GSM2177186 SRR3587529
## SRA accessions for GEO studies (GSE =\> SRP)
$ pysradb gse-to-srp GSE87328i
study_alias study_accession
GSE87328 SRP090415
## Converting SRP to PMID
$ pysradb srp-to-pmid SRP045778
srp_accession bioproject pmid
SRP045778 PRJNA257197 27373336
## Converting GSE to PMID
$ pysradb gse-to-pmid GSE253406
gse_accession pmid
GSE253406 39528918
## Extracting identifiers from PMC/DOI
Extract database identifiers (GSE, PRJNA, SRP, etc.) from PubMed Central articles or DOIs.
### Get all identifiers from a PMID
$ pysradb pmid-to-identifiers 39528918
pmid pmc_id gse_ids prjna_ids srp_ids
39528918 PMC10802650 GSE253406 PRJNA1058002 SRP484103
### Get only GSE or SRP from PMID
$ pysradb pmid-to-gse 39528918
pmid pmc_id gse_ids
39528918 PMC10802650 GSE253406
$ pysradb pmid-to-srp 39528918
pmid pmc_id srp_ids
39528918 PMC10802650 SRP484103
### Extract from DOI
$ pysradb doi-to-identifiers 10.12688/f1000research.18676.1
doi pmid pmc_id gse_ids srp_ids
10.12688/f1000research.18676.1 30873266 PMC6411813 GSE... SRP...
### Extract from PMC ID
$ pysradb pmc-to-identifiers PMC10802650
pmc_id gse_ids prjna_ids srp_ids
PMC10802650 GSE253406 PRJNA1058002 SRP484103
## Downloading supplementary files from GEO
$ pysradb download -g GSE161707
## Downloading an entire SRA/ENA project (multithreaded)
`pysradb` makes it super easy to download datasets from SRA in parallel:
Using 8 threads to download:
$ pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852
Downloads are organized by `SRP/SRX/SRR` mimicking the hierarchy of SRA
projects.
================================================
FILE: docs/commands.rst
================================================
API Documentation
=================
See :doc:`pysradb` for the Python API reference documentation.
================================================
FILE: docs/conf.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# pysradb documentation build configuration file, created by
# sphinx-quickstart on Fri Jun 9 13:47:02 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another
# directory, add these directories to sys.path here. If the directory is
# relative to the documentation root, use os.path.abspath to make it
# absolute, like shown here.
#
import os
import sys
# import guzzle_sphinx_theme
import pysradb
autodoc_mock_imports = ["xmltodict", "numpy", "pandas", "requests", "tqdm"]
sys.path.insert(0, os.path.abspath(".."))
# -- General configuration ---------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
"IPython.sphinxext.ipython_directive",
"IPython.sphinxext.ipython_console_highlighting",
"sphinx.ext.mathjax",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.viewcode",
"sphinx.ext.inheritance_diagram",
"numpydoc",
"sphinx_tabs.tabs",
"sphinx_panels",
"sphinxcontrib.gtagjs",
"myst_parser",
"nbsphinx",
]
gtagjs_ids = [
"G-CKQZFCEENZ",
]
panels_add_bootstrap_css = False
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = [".rst", ".md"]
# source_suffix = ".md"
# The master toctree document.
master_doc = "index"
# General information about the project.
project = "pysradb"
copyright = "2023, Saket Choudhary"
author = "Saket Choudhary"
# The version info for the project you're documenting, acts as replacement
# for |version| and |release|, also used in various other places throughout
# the built documents.
#
# The short X.Y version.
version = pysradb.__version__
# The full version, including alpha/beta/rc tags.
release = pysradb.__version__
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = "en"
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output -------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "furo"
# Theme options are theme-specific and customize the look and feel of a
# theme further. For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {
"light_css_variables": {
"color-brand-primary": "#0066cc",
"color-brand-content": "#0066cc",
"color-code-background": "#f5f5f5",
"color-inline-code-background": "#f0f0f0",
},
"dark_css_variables": {
"color-brand-primary": "#3b82f6",
"color-brand-content": "#3b82f6",
"color-code-background": "#1e293b",
"color-inline-code-background": "#334155",
},
}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# -- Options for HTMLHelp output ---------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = "pysradbdoc"
# -- Options for LaTeX output ------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass
# [howto, manual, or own class]).
latex_documents = [
(master_doc, "pysradb.tex", "pysradb Documentation", "Saket Choudhary", "manual")
]
# -- Options for manual page output ------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "pysradb", "pysradb Documentation", [author], 1)]
# -- Options for Texinfo output ----------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
master_doc,
"pysradb",
"pysradb Documentation",
author,
"pysradb",
"One line description of project.",
"Miscellaneous",
)
]
numpydoc_show_class_members = False
##html_theme_path = guzzle_sphinx_theme.html_theme_path()
##html_theme = "guzzle_sphinx_theme"
##
### Register the theme as an extension to generate a sitemap.xml
##extensions.append("guzzle_sphinx_theme")
##
### Guzzle theme options (see theme.conf for more information)
##html_theme_options = {
## # Set the name of the project to appear in the sidebar
## "project_nav_name": "pysradb"
##}
scv_greatest_tag = True
scv_show_banner = True
html_logo = "_static/pysradb_v3.png"
# Load custom JavaScript for copy-to-clipboard functionality
html_js_files = [
"copy-button.js",
]
# Load custom CSS to override Pygments background colors
html_css_files = [
"custom.css",
]
# NBSphinx configuration
nbsphinx_execute = "never"
exclude_patterns.append("**/.ipynb_checkpoints")
exclude_patterns.append("notebooks/.ipynb_checkpoints")
================================================
FILE: docs/contributing.md
================================================
# Contributing
Contributions are welcome, and they are greatly appreciated! Every
little bit helps, and credit will always be given.
You can contribute in many ways:
## Types of Contributions
### Report Bugs
Report bugs at <https://github.com/saketkc/pysradb/issues>.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in
troubleshooting.
- Detailed steps to reproduce the bug.
### Fix Bugs
Look through the GitHub issues for bugs. Anything tagged with \"bug\"
and \"help wanted\" is open to whoever wants to implement it.
### Implement Features
Look through the GitHub issues for features. Anything tagged with
\"enhancement\" and \"help wanted\" is open to whoever wants to
implement it.
### Write Documentation
pysradb could always use more documentation, whether as part of the
official pysradb docs, in docstrings, or even on the web in blog posts,
articles, and such.
### Submit Feedback
The best way to send feedback is to file an issue at
<https://github.com/saketkc/pysradb/issues>.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to
implement.
- Remember that this is a volunteer-driven project, and that
contributions are welcome :)
## Get Started!
Ready to contribute? Here\'s how to set up [pysradb]{.title-ref} for
local development.
1. Fork the [pysradb]{.title-ref} repo on GitHub.
2. Clone your fork locally:
``` shell
$ git clone git@github.com:your_name_here/pysradb.git
```
3. Install your local copy into a virtualenv. Assuming you have
virtualenvwrapper installed, this is how you set up your fork for
local development (If python \--version is less than 3.0, run [\$
mkvirtualenv pysradb \--python=py3]{.title-ref} instead):
``` shell
$ mkvirtualenv pysradb
$ cd pysradb/
$ python setup.py develop
```
4. Create a branch for local development:
``` shell
$ git checkout -b name-of-your-bugfix-or-feature
```
Now you can make your changes locally.
5. When you\'re done making changes, check that your changes pass
flake8 and the tests, including testing other Python versions with
tox:
``` shell
$ flake8 pysradb tests
$ python setup.py test or py.test
$ tox
```
To get flake8 and tox, just pip install them into your virtualenv.
6. Commit your changes and push your branch to GitHub:
``` shell
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
```
7. Submit a pull request through the GitHub website.
## Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated.
Put your new functionality into a function with a docstring, and add
the feature to the list in README.rst.
3. The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and
for PyPy. Make sure that the tests pass for all supported Python
versions.
## Tips
To run a subset of tests:
``` shell
$ py.test tests.test_pysradb
```
## Deploying
A reminder for the maintainers on how to deploy. Make sure all your
changes are committed (including an entry in HISTORY.rst). Then run:
``` shell
$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags
```
CI will then deploy to PyPI if tests pass.
================================================
FILE: docs/history.md
================================================
# History
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.5.1 (2025-10-29)
</summary>
- Add prjna support in doi-to-identifiers [#249](https://github.com/saketkc/pysradb/pull/249)
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.5.0 (2025-10-19)
</summary>
- Add pmid/doi-to-gse/srp conversion [#246](https://github.com/saketkc/pysradb/pull/246).
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.4.1 (2025-09-27)
</summary>
- Add gse-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/244).
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.4.0 (2025-09-27)
</summary>
- Add sra-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/241). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith) for the idea.
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.3.0 (2025-08-24)
</summary>
- Download logic improvements: remoted requests-ftp as requirement
- Fix for handling missing metadata keys [#223](https://github.com/saketkc/pysradb/pull/223). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith)
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.2 (2024-10-03)
</summary>
- Fix for handling ENA urls for paired end data
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.1 (2024-08-21)
</summary>
- Fix for handling ENA urls
- Migrated to pyproject.toml
</details>
<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.0 (2023-09-17)
</summary>
- Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
- Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
- Documentation fixes
## 2.1.0 (2023-05-16)
- Fix for [gse-to-srp] returning unrequested GSEs [#186](https://github.com/saketkc/pysradb/issues/190)
- Fix for [download] using [public_urls]
- Fix for [gsm-to-srx] returning false positives [#165](https://github.com/saketkc/pysradb/issues/165)
- Fix for delimiter not being consistent when metadata is printed on
terminal [#147](https://github.com/saketkc/pysradb/issues/147)
- ENA search is currently broken because of an API change
## 2.0.2 (2023-04-09)
- Fix for [gse-to-srp] to handle cases where a project is
missing but SRXs are returned [#186](https://github.com/saketkc/pysradb/issues/186)
- Fix gse-to-gsm [#187](https://github.com/saketkc/pysradb/issues/187)
## 2.0.1 (2023-03-18)
- Fix for [pysradb download] - using [public_url]
- Fix for SRX -\> SRR and related conversions [#183](https://github.com/saketkc/pysradb/pull/183)
## 2.0.0 (2023-02-23)
- BREAKING change: Overhaul of how urls and associated metadata are
returned (not backward compatible); all column names are lower cased
by default
- Fix extra space in \"organism_taxid\" column
- Added support for Experiment attributes [#89](https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532)
## 1.4.2 (06-17-2022)
- Fix ENA fastq fetching [#163](https://github.com/saketkc/pysradb/issues/163)
## 1.4.1 (06-04-2022)
- Fix for fetching alternative URLs
## 1.4.0 (06-04-2022)
- Added ability to fetch alternative URLs (GCP/AWS) for metadata
[#161](https://github.com/saketkc/pysradb/issues/161)
- Fix for xmldict 0.13.0 no longer defaulting to OrderedDict [#159](https://github.com/saketkc/pysradb/pull/159)
- Fix for missing experiment model and description in metadata [#160](https://github.com/saketkc/pysradb/issues/160)
## 1.3.0 (02-18-2022)
- Add [study_title] to [\--detailed] flag
([#152](https://github.com/saketkc/pysradb/issues/152))
- Fix [KeyError] in [metadata] where some new
IDs do not have any metadata
([#151](https://github.com/saketkc/pysradb/issues/151))
## 1.2.0 (01-10-2022)
- Do not exit if a qeury returns no hits ([#149](https://github.com/saketkc/pysradb/pull/149))
## 1.1.0 (12-12-2021)
- Fixed [gsm-to-gse] failure
([#128](https://github.com/saketkc/pysradb/pull/128))
- Fixed case sensitivity bug for ENA search
([#144](https://github.com/saketkc/pysradb/pull/144))
- Fixed publication date bug for search
([#146](https://github.com/saketkc/pysradb/pull/146))
- Added support for downloading data from GEO [pysradb dowload -g
GSE]
([#129](https://github.com/saketkc/pysradb/pull/129))
## 1.0.1 (01-10-2021)
- Dropped Python 3.6 since pandas 1.2 is not supported
## 1.0.0 (01-09-2021)
- Retired `metadb` and `SRAdb` based search through CLI - everything
defaults to `SRAweb`
- `SRAweb` now supports
[search](https://saket-choudhary.me/pysradb/quickstart.html#search)
- [N/A] is now replaced with [pd.NA]
- Two new fields in \`\--detailed\`: [instrument_model]
and [instrument_model_desc]
[#75](https://github.com/saketkc/pysradb/issues/75)
- Updated documentation
## 0.11.1 (09-18-2020)
- [library_layout] is now outputted in metadata #56
- [-detailed] unifies columns for ENA fastq links instead
of appending \_x/\_y #59
- bugfix for parsing namespace in xml outputs #65
- XML errors from NCBI are now handled more gracefully #69
- Documentation and dependency updates
## 0.11.0 (09-04-2020)
- [pysradb download] now supports multiple threads for
paralle downloads
- [pysradb download] also supports ultra fast downloads of
FASTQs from ENA using aspera-client
## 0.10.3 (03-26-2020)
- Added test cases for SRAweb
- API limit exceeding errors are automagically handled
- Bug fixes for GSE \<=\> SRR
- Bug fix for metadata - supports multiple SRPs
Contributors
- Dibya Gautam
- Marius van den Beek
## 0.10.2 (02-05-2020)
- Bug fix: Handle API-rate limit exceeding =\> Retries
- Enhancement: \'Alternatives\' URLs are now part of
[\--detailed]
## 0.10.1 (02-04-2020)
- Bug fix: Handle Python3.6 for capture_output in subprocess.run
## 0.10.0 (01-31-2020)
- All the subcommands (srx-to-srr, srx-to-srs) will now print
additional columns where the first two columns represent the
relevant conversion
- Fixed a bug where for fetching entries with single efetch record
## 0.9.9 (01-15-2020)
- Major fix: some SRRs would go missing as the experiment dict was
being created only once per SRR (See #15)
- Features: More detailed metadata by default in the SRAweb mode
- See notebook: <https://colab.research.google.com/drive/1C60V->
## 0.9.7 (01-20-2020)
- Feature: instrument, run size and total spots are now printed in the
metadata by default (SRAweb mode only)
- Issue: Fixed an issue with srapath failing on SRP. srapath is now
run on individual SRRs.
## 0.9.6 (07-20-2019)
- Introduced [SRAweb] to perform queries over the web if
the SQLite is missing or does not contain the relevant record.
## 0.9.0 (02-27-2019)
### Others
- This release completely changes the command line interface replacing
click with argparse ([#3](https://github.com/saketkc/pysradb/pull/3))
- Removed Python 2 comptaible stale code
## 0.8.0 (02-26-2019)
### New methods/functionality
- \`srr-to-gsm\`: convert SRR to GSM
- SRAmetadb.sqlite.gz file is deleted by default after extraction
- When SRAmetadb is not found a confirmation is seeked before
downloading
- Confirmation option before SRA downloads
### Bugfix
- download() works with wget
### Others
- [\--out_dir] is now [out-dir]
## 0.7.1 (02-18-2019)
Important: Python2 is no longer supported. Please consider moving to
Python3.
### Bugfix
- Included docs in the index whihch were missed out in the previous
release
## 0.7.0 (02-08-2019)
### New methods/functionality
- \`gsm-to-srr\`: convert GSM to SRR
- \`gsm-to-srx\`: convert GSM to SRX
- \`gsm-to-gse\`: convert GSM to GSE
### Renamed methods
The following commad line options have been renamed and the changes are
not compatible with 0.6.0 release:
- [sra-metadata] -\> [metadata].
- [sra-search] -\> [search].
- [srametadb] -\> [metadb].
## 0.6.0 (12-25-2018)
### Bugfix
- Fixed bugs introduced in 0.5.0 with API changes where multiple
redundant columns were output in [sra-metadata]
### New methods/functionality
- [download] now allows piped inputs
## 0.5.0 (12-24-2018)
### New methods/functionality
- Support for filtering by SRX Id for SRA downloads.
- \`srr_to_srx\`: Convert SRR to SRX/SRP
- \`srp_to_srx\`: Convert SRP to SRX
- Stripped down [sra-metadata] to give minimal information
- Added [\--assay], [\--desc],
[\--detailed] flag for [sra-metadata]
- Improved table printing on terminal
## 0.4.2 (12-16-2018)
### Bugfix
- Fixed unicode error in tests for Python2
## 0.4.0 (12-12-2018)
### New methods/functionality
- Added a new [BASEdb] class to handle common database
connections
- Initial support for GEOmetadb through GEOdb class
- Initial support or a command line interface:
- download Download SRA project (SRPnnnn)
- gse-metadata Fetch metadata for GEO ID (GSEnnnn)
- gse-to-gsm Get GSM(s) for GSE
- gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
- sra-metadata Fetch metadata for SRA project (SRPnnnn)
- Added three separate notebooks for SRAdb, GEOdb, CLI usage
## 0.3.0 (12-05-2018)
### New methods/functionality
- [sample_attribute] and
[experiment_attribute] are now included by default in
the df returned by [sra_metadata()]
- [expand_sample_attribute_columns: expand metadata dataframe based on
attributes in \`sample_attribute] column
- New methods to guess cell/tissue/strain:
[guess_cell_type()]/[guess_tissue_type()]/[guess_strain_type()]
- Improved README and usage instructions
## 0.2.2 (12-03-2018)
### New methods/functionality
- [search_sra()] allows full text search on SRA metadata.
## 0.2.0 (12-03-2018)
### Renamed methods
The following methods have been renamed and the changes are not
compatible with 0.1.0 release:
- [get_query()] -\> [query()].
- [sra_convert()] -\> [sra_metadata()].
- [get_table_counts()] -\> [all_row_counts()].
### New methods/functionality
- [download_sradb_file()] makes fetching [SRAmetadb.sqlite] file easy; wget is no longer required.
- [ftp] protocol is now supported besides [fsp] and hence [aspera-client] is now optional. We however, strongly recommend [aspera-client] for faster downloads.
### Bug fixes
- Silenced [SettingWithCopyWarning] by excplicitly doing
operations on a copy of the dataframe instead of the original.
Besides these, all methods now follow a [numpydoc]
compatible documentation.
## 0.1.0 (12-01-2018)
- First release on PyPI.
</details>
================================================
FILE: docs/index.rst
================================================
============
Introduction
============
``pysradb`` provides a simple method to programmatically access metadata
and download sequencing data from NCBI's Sequence Read Archive (SRA) and European Bioinformatics
Institute's European Nucleotide Archive (ENA).
=============
Quick Example
=============
To fetch metadata associated with project accession ``SRP265425``
.. code-block:: console
$ pysradb metadata SRP265425
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
SRP265425 SRX8434255 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745319 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1311358 83306910 SRR11886735 1311358 109594216
SRP265425 SRX8434254 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745320 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2614109 204278682 SRR11886736 2614109 262305651
SRP265425 SRX8434253 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745318 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2286312 183516004 SRR11886737 2286312 263304134
SRP265425 SRX8434252 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745317 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5202567 507524965 SRR11886738 5202567 781291588
SRP265425 SRX8434251 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745315 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3313960 356104406 SRR11886739 3313960 612430817
SRP265425 SRX8434250 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745316 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5155733 565882351 SRR11886740 5155733 954342917
SRP265425 SRX8434249 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745313 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1324589 175619046 SRR11886741 1324589 216531400
SRP265425 SRX8434248 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745314 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1639851 198973268 SRR11886742 1639851 245466005
SRP265425 SRX8434247 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745312 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3921389 210198580 SRR11886743 3921389 332935558
SRP265425 SRX8434246 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745311 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 14295475 2150005008 SRR11886744 14295475 2967829315
SRP265425 SRX8434245 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745310 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5124692 294846140 SRR11886745 5124692 431819462
SRP265425 SRX8434244 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745309 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2986306 205666872 SRR11886746 2986306 275400959
SRP265425 SRX8434243 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745308 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1182690 59471336 SRR11886747 1182690 86350631
SRP265425 SRX8434242 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745307 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 6031816 749323230 SRR11886748 6031816 928054297
To fetch detailed metadata which includes link to raw sequencing files, specify ``--detailed``:
.. code-block:: console
$ pysradb metadata SRP265425 --detailed
run_accession study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases run_alias sra_url_alt1 sra_url_alt2 sra_url experiment_alias isolate collected_by collection_date geo_loc_name host host_disease isolation_source lat_lon BioSampleModel sra_url_alt3 ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
SRR11886735 SRP265425 SRX8434255 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745319 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1311358 83306910 1311358 109594216 IonXpress_063_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-9/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011608/SRR11886735 GC-20 NA 02-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz
SRR11886736 SRP265425 SRX8434254 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745320 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2614109 204278682 2614109 262305651 IonXpress_062_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRZ/011886/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra50/SRR/011608/SRR11886736 GC-51 NA 14-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz
SRR11886737 SRP265425 SRX8434253 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745318 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2286312 183516004 2286312 263304134 IonXpress_061_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRZ/011886/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra17/SRR/011608/SRR11886737 GC-24 NA 07-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz
SRR11886738 SRP265425 SRX8434252 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745317 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5202567 507524965 5202567 781291588 IonXpress_060_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-15/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011608/SRR11886738 GC-23 NA 08-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz
SRR11886739 SRP265425 SRX8434251 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745315 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3313960 356104406 3313960 612430817 IonXpress_038_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-13/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011608/SRR11886739 GC-11b NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz
SRR11886740 SRP265425 SRX8434250 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745316 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5155733 565882351 5155733 954342917 IonXpress_037_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-5/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886740 GC-14b NA 28-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz
SRR11886741 SRP265425 SRX8434249 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745313 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1324589 175619046 1324589 216531400 IonXpress_036_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-11/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra57/SRR/011608/SRR11886741 GC-12 NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz
SRR11886742 SRP265425 SRX8434248 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745314 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1639851 198973268 1639851 245466005 IonXpress_035_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-11/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011608/SRR11886742 GC-13 NA 23-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz
SRR11886743 SRP265425 SRX8434247 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745312 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3921389 210198580 3921389 332935558 IonXpress_068_R_2020_05_07_11_47_51_user_GCEID-S5-60-SARS_CoV2_SA4.bam gs://sra-pub-src-17/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra64/SRZ/011886/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra54/SRR/011608/SRR11886743 GC-55 NA 24-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz
SRR11886744 SRP265425 SRX8434246 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745311 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 14295475 2150005008 14295475 2967829315 IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq gs://sra-pub-src-11/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra20/SRR/011608/SRR11886744 GC-26 NA 07-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz
SRR11886745 SRP265425 SRX8434245 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745310 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5124692 294846140 5124692 431819462 IonXpress_065_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra19/SRR/011608/SRR11886745 GC-25 NA 10-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz
SRR11886746 SRP265425 SRX8434244 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745309 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2986306 205666872 2986306 275400959 IonXpress_064_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-17/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra59/SRZ/011886/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra47/SRR/011608/SRR11886746 GC-21 NA 03-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz
SRR11886747 SRP265425 SRX8434243 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745308 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1182690 59471336 1182690 86350631 IonXpress_034_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRZ/011886/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886747 GC-11a NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz
SRR11886748 SRP265425 SRX8434242 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745307 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 6031816 749323230 6031816 928054297 IonXpress_033_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-15/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra43/SRZ/011886/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam https://sra-download.ncbi.nlm.nih.gov/traces/sra66/SRR/011608/SRR11886748 GC-14a NA 28-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz
See :doc:`quickstart` for other examples.
.. toctree::
:hidden:
:maxdepth: 1
installation
quickstart
cmdline
python-api-usage
case_studies
notebooks
commands
contributing
authors
history
modules
===========
Publication
===========
`pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive <https://f1000research.com/articles/8-532/v1>`_
Presentation slides from BOSC (ISMB-ECCB) 2019: https://f1000research.com/slides/8-1183
===========================================================================
========
Citation
========
Choudhary, Saket. "pysradb: A Python Package to Query next-Generation Sequencing Metadata and Data from NCBI Sequence Read Archive." F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532 (https://f1000research.com/articles/8-532/v1)
::
@article{Choudhary2019,
doi = {10.12688/f1000research.18676.1},
url = {https://doi.org/10.12688/f1000research.18676.1},
year = {2019},
month = apr,
publisher = {F1000 (Faculty of 1000 Ltd)},
volume = {8},
pages = {532},
author = {Saket Choudhary},
title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive},
journal = {F1000Research}
}
Zenodo archive: https://zenodo.org/badge/latestdoi/159590788
Zenodo DOI: 10.5281/zenodo.2306881
================================================
FILE: docs/installation.md
================================================
# Installation
## Stable release
To install pysradb, run this command in your terminal:
``` console
$ pip install pysradb
```
This is the preferred method to install pysradb, as it will always
install the most recent stable release.
If you don\'t have [pip](https://pip.pypa.io) installed, this [Python
installation
guide](http://docs.python-guide.org/en/latest/starting/installation/)
can guide you through the process.
Alternatively, you may use conda:
``` bash
conda install -c bioconda pysradb
```
This step will install all the dependencies except aspera-client (which
is not required, but highly recommended). If you have an existing
environment with a lot of pre-installed packages, conda might be
[slow](https://github.com/bioconda/bioconda-recipes/issues/13774).
Please consider creating a new enviroment for `pysradb`:
``` bash
conda create -c bioconda -n pysradb PYTHON=3 pysradb
```
## From sources
The source files for pysradb can be downloaded from the [Github
repo](https://github.com/saketkc/pysradb).
You can either clone the public repository:
``` console
$ git clone git://github.com/saketkc/pysradb
```
Or download the
[tarball](https://github.com/saketkc/pysradb/tarball/master):
``` console
$ curl -OL https://github.com/saketkc/pysradb/tarball/master
```
Once you have a copy of the source, you can install it with:
``` console
$ python setup.py install
```
================================================
FILE: docs/make.bat
================================================
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=python -msphinx
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=pysradb
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The Sphinx module was not found. Make sure you have Sphinx installed,
echo.then set the SPHINXBUILD environment variable to point to the full
echo.path of the 'sphinx-build' executable. Alternatively you may add the
echo.Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
================================================
FILE: docs/modules.rst
================================================
pysradb
=======
.. toctree::
:maxdepth: 4
pysradb
================================================
FILE: docs/notebooks.rst
================================================
Tutorials & Notebooks
=====================
The following Jupyter notebooks demonstrate various features of pysradb:
.. toctree::
:maxdepth: 1
:caption:
notebooks/README
notebooks/01.Python-API_demo.ipynb
notebooks/02.Commandline_download.ipynb
notebooks/03.ParallelDownload.ipynb
notebooks/04.SRA_to_fastq_conda.ipynb
notebooks/05.Downloading_subsets_of_a_project.ipynb
notebooks/06.Multiple_SRPs.ipynb
notebooks/07.Query_Search.ipynb
notebooks/08.PMC_DOI_Identifiers.ipynb
notebooks/09.Metadata_enrichment.ipynb
You can also view the complete `notebooks directory on GitHub <https://github.com/saketkc/pysradb/tree/develop/notebooks>`_ for additional tutorials and examples.
================================================
FILE: docs/pysradb.rst
================================================
pysradb package
===============
Submodules
----------
pysradb.basedb module
---------------------
.. automodule:: pysradb.basedb
:members:
:undoc-members:
:show-inheritance:
pysradb.cli module
------------------
.. automodule:: pysradb.cli
:members:
:undoc-members:
:show-inheritance:
pysradb.download module
-----------------------
.. automodule:: pysradb.download
:members:
:undoc-members:
:show-inheritance:
pysradb.exceptions module
-------------------------
.. automodule:: pysradb.exceptions
:members:
:undoc-members:
:show-inheritance:
pysradb.filter\_attrs module
----------------------------
.. automodule:: pysradb.filter_attrs
:members:
:undoc-members:
:show-inheritance:
pysradb.geodb module
--------------------
.. automodule:: pysradb.geodb
:members:
:undoc-members:
:show-inheritance:
pysradb.geoweb module
---------------------
.. automodule:: pysradb.geoweb
:members:
:undoc-members:
:show-inheritance:
pysradb.metadata\_enrichment module
-----------------------------------
.. automodule:: pysradb.metadata_enrichment
:members:
:undoc-members:
:show-inheritance:
pysradb.search module
---------------------
.. automodule:: pysradb.search
:members:
:undoc-members:
:show-inheritance:
pysradb.sradb module
--------------------
.. automodule:: pysradb.sradb
:members:
:undoc-members:
:show-inheritance:
pysradb.sraweb module
---------------------
.. automodule:: pysradb.sraweb
:members:
:undoc-members:
:show-inheritance:
pysradb.taxid2name module
-------------------------
.. automodule:: pysradb.taxid2name
:members:
:undoc-members:
:show-inheritance:
pysradb.utils module
--------------------
.. automodule:: pysradb.utils
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: pysradb
:members:
:undoc-members:
:show-inheritance:
================================================
FILE: docs/python-api-usage.md
================================================
# Python API
## Use Case 1: Fetch the metadata table (SRA-runtable)
The simplest use case of [pysradb]{.title-ref} is when you know the SRA
project ID (SRP) and would simply want to fetch the metadata associated
with it. This is generally reflected in the
[SraRunTable.txt]{.title-ref} that you get from NCBI\'s website. See an
[example](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP098789) of a
SraRunTable.
``` python
from pysradb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP098789')
df.head()
```
=============== ==================== ====================================================================== ============= ======== ================= ============== ================ ============== ============ ========== ======== ============ ===============
study_accession experiment_accession experiment_title run_accession taxon_id library_selection library_layout library_strategy library_source library_name bases spots adapter_spec avg_read_length
=============== ==================== ====================================================================== ============= ======== ================= ============== ================ ============== ============ ========== ======== ============ ===============
SRP098789 SRX2536403 GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1; Homo sapiens; OTHER SRR5227288 9606 other SINGLE - OTHER TRANSCRIPTOMIC 2104142750 42082855 50
SRP098789 SRX2536404 GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2; Homo sapiens; OTHER SRR5227289 9606 other SINGLE - OTHER TRANSCRIPTOMIC 2082873050 41657461 50
SRP098789 SRX2536405 GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3; Homo sapiens; OTHER SRR5227290 9606 other SINGLE - OTHER TRANSCRIPTOMIC 2023148650 40462973 50
SRP098789 SRX2536406 GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1; Homo sapiens; OTHER SRR5227291 9606 other SINGLE - OTHER TRANSCRIPTOMIC 2057165950 41143319 50
SRP098789 SRX2536407 GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2; Homo sapiens; OTHER SRR5227292 9606 other SINGLE - OTHER TRANSCRIPTOMIC 3027621850 60552437 50
=============== ==================== ====================================================================== ============= ======== ================= ============== ================ ============== ============ ========== ======== ============ ===============
The metadata is returned as a [pandas]{.title-ref} dataframe and hence
allows you to perform all regular select/query operations available
through [pandas]{.title-ref}.
## Use Case 2: Downloading an entire project arranged experiment wise
Once you have fetched the metadata and made sure, this is the project
you were looking for, you would want to download everything at once.
NCBI follows this hiererachy: [SRP =\> SRX =\> SRR]{.title-ref}. Each
[SRP]{.title-ref} (project) has multiple [SRX]{.title-ref} (experiments)
and each [SRX]{.title-ref} in turn has multiple [SRR]{.title-ref} (runs)
inside it. We want to mimick this hiereachy in our downloads. The reason
to do that is simple: in most cases you care about [SRX]{.title-ref} the
most, and would want to \"merge\" your SRRs in one way or the other.
Having this hierearchy ensures your downstream code can handle such
cases easily, without worrying about which runs (SRR) need to be merged.
We strongly recommend installing [aspera-client]{.title-ref} which uses
UDP and is [designed to be faster](http://www.skullbox.net/tcpudp.php).
``` python
from pysradb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP017942')
client.download(df)
```
## Use Case 3: Downloading a subset of experiments
Often, you need to process only a smaller set of samples from a project
(SRP). Consider this project which has data spanning four assays.
``` python
df = client.sra_metadata('SRP000941')
print(df.library_strategy.unique())
['ChIP-Seq' 'Bisulfite-Seq' 'RNA-Seq' 'WGS' 'OTHER']
```
But, you might be only interested in analyzing the [RNA-seq]{.title-ref}
samples and would just want to download that subset. This is simple
using [pysradb]{.title-ref} since the metadata can be subset just as you
would subset a dataframe in pandas.
``` python
df_rna = df[df.library_strategy == 'RNA-Seq']
client.download(df=df_rna, out_dir='/pysradb_downloads')()
```
## Use Case 4: Getting cell-type/treatment information from sample_attributes
Cell type/tissue informations is usually hidden in the
[sample_attributes]{.title-ref} column, which can be expanded:
``` python
from pysradb.filter_attrs import expand_sample_attribute_columns
df = client.sra_metadata('SRP017942')
expand_sample_attribute_columns(df).head()
```
<table>
<thead>
<tr class="header">
<th>study_accession</th>
<th>experiment_accession</th>
<th>experiment_title</th>
<th>experiment_attribute</th>
<th>sample_attribute</th>
<th>run_accession</th>
<th>taxon_id</th>
<th>library_selection</th>
<th>library_layout</th>
<th>library_strategy</th>
<th>library_source</th>
<th>library_name</th>
<th>bases</th>
<th>spots</th>
<th>adapter_spec</th>
<th>avg_read_length</th>
<th>assay_type</th>
<th>cell_line</th>
<th>source_name</th>
<th>transfected_with</th>
<th>treatment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p>SRP017942 SRP017942 SRP017942 SRP017942 SRP017942</p></td>
<td><p>SRX217028 SRX217029 SRX217030 SRX217031 SRX217956</p></td>
<td><p>GSM1063575: 293T_GFP; Homo sapiens; RNA-Seq GSM1063576:
293T_GFP_2hrs_severe_Heat_Shock; Homo sapiens; RNA-Seq GSM1063577:
293T_Hspa1a; Homo sapiens; RNA-Seq GSM1063578:
293T_Hspa1a_2hrs_severe_Heat_Shock; Homo sapiens; RNA-Seq GSM794854:
3T3-Control-Riboseq; Mus musculus; RNA-Seq</p></td>
<td><p>GEO Accession: GSM1063575 GEO Accession: GSM1063576 GEO
Accession: GSM1063577 GEO Accession: GSM1063578 GEO Accession:
GSM794854</p></td>
<td><p>source_name: 293T cells || cell line: 293T cells || transfected
with: 3XFLAG-GFP || assay type: Riboseq source_name: 293T cells || cell
line: 293T cells || transfected with: 3XFLAG-GFP || treatment: severe
heat shock (44C 2 hours) || assay type: Riboseq source_name: 293T cells
|| cell line: 293T cells || transfected with: 3XFLAG-Hspa1a || assay
type: Riboseq source_name: 293T cells || cell line: 293T cells ||
transfected with: 3XFLAG-Hspa1a || treatment: severe heat shock (44C 2
hours) || assay type: Riboseq source_name: 3T3 cells || treatment:
control || cell line: 3T3 cells || assay type: Riboseq</p></td>
<td><p>SRR648667 SRR648668 SRR648669 SRR648670 SRR649752</p></td>
<td><blockquote>
<p>9606 9606 9606 9606 10090</p>
</blockquote></td>
<td><p>other other other other cDNA</p></td>
<td><p>SINGLE -SINGLE -SINGLE -SINGLE -SINGLE -</p></td>
<td><p>RNA-Seq RNA-Seq RNA-Seq RNA-Seq RNA-Seq</p></td>
<td><p>TRANSCRIPTOMIC TRANSCRIPTOMIC TRANSCRIPTOMIC TRANSCRIPTOMIC
TRANSCRIPTOMIC</p></td>
<td></td>
<td><p>1806641316 3436984836 3330909216 3622123512 594945396</p></td>
<td><blockquote>
<p>50184481 95471801 92525256</p>
</blockquote>
<dl>
<dt>100614542</dt>
<dd>
<p>16526261</p>
</dd>
</dl></td>
<td></td>
<td><blockquote>
<p>36 36 36 36 36</p>
</blockquote></td>
<td><p>riboseq riboseq riboseq riboseq riboseq</p></td>
<td><p>293t cells 293t cells 293t cells 293t cells 3t3 cells</p></td>
<td><p>293t cells 293t cells 293t cells 293t cells 3t3 cells</p></td>
<td><p>3xflag-gfp 3xflag-gfp 3xflag-hspa1a 3xflag-hspa1a NaN</p></td>
<td><p>NaN severe heat shock (44c 2 hours) NaN severe heat shock (44c 2
hours) control</p></td>
</tr>
</tbody>
</table>
## Use Case 5: Searching for datasets
Another common operation that we do on SRA is seach, plain text search.
If you want to look up for all projects where [ribosome
profiling]{.title-ref} appears somewhere in the description:
``` python
df = client.search_sra(search_str='"ribosome profiling"')
df.head()
```
<table>
<thead>
<tr class="header">
<th>study_accession</th>
<th>experiment_accession</th>
<th>experiment_title</th>
<th>run_accession</th>
<th>taxon_id</th>
<th>library_selection</th>
<th>library_layout</th>
<th>library_strategy</th>
<th>library_source</th>
<th>library_name</th>
<th>bases</th>
<th>spots</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019536</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018584</td>
<td>DRR021383</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_3</td>
<td><blockquote>
<p>978776480</p>
</blockquote></td>
<td>12234706</td>
</tr>
<tr class="even">
<td>DRP003075</td>
<td>DRX019537</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018585</td>
<td>DRR021384</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_4</td>
<td><blockquote>
<p>894201680</p>
</blockquote></td>
<td>11177521</td>
</tr>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019538</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018586</td>
<td>DRR021385</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_5</td>
<td><blockquote>
<p>931536720</p>
</blockquote></td>
<td>11644209</td>
</tr>
<tr class="even">
<td>DRP003075</td>
<td>DRX019540</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018588</td>
<td>DRR021387</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII07_4</td>
<td>2759398700</td>
<td>27593987</td>
</tr>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019541</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018589</td>
<td>DRR021388</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII07_5</td>
<td>2386196500</td>
<td>23861965</td>
</tr>
</tbody>
</table>
Again, the results are available as a [pandas]{.title-ref} dataframe and
hence you can perform all subset operations post your query. Your query
doesn\'t need to be exact.
## Use Case 8: Finding publications (PMIDs) associated with SRA data
Sometimes you have SRA accessions and want to find the publications that describe the data generation.
``` python
from pysradb import SRAweb
client = SRAweb()
# Get PMIDs for a study accession (SRP)
pmids_df = client.srp_to_pmid('SRP002605')
pmids_df.head()
```
sra_accession bioproject pmid
SRP002605 PRJNA129385 20703300
You can also get PMIDs for other SRA accession types:
``` python
# Get PMIDs for run accessions (SRR)
srr_pmids = client.srr_to_pmid('SRR057511')
# Get PMIDs for experiment accessions (SRX)
srx_pmids = client.srx_to_pmid('SRX021967')
# Get PMIDs for sample accessions (SRS)
srs_pmids = client.srs_to_pmid('SRS079386')
# Get PMIDs for multiple accessions at once
multi_pmids = client.sra_to_pmid(['SRP002605', 'SRP016501'])
```
You can also directly query BioProject accessions for their associated publications:
``` python
# Get PMIDs directly from BioProject accessions
bioproject_pmids = client.fetch_bioproject_pmids(['PRJNA257197', 'PRJNA129385'])
print(bioproject_pmids)
# Output: {'PRJNA257197': ['25214632'], 'PRJNA129385': ['20703300']}
```
**Note**: This functionality relies on the cross-references maintained between BioProjects and PubMed. Not all SRA datasets have associated publications, and some publications may not be properly cross-referenced in the NCBI databases. The success rate depends on:
- Whether the authors included SRA/BioProject accessions in their manuscript
- Whether NCBI has established the cross-references
- The publication date relative to data submission
================================================
FILE: docs/quickstart.md
================================================
# Quickstart
Most features in `pysradb` are accessible both from the command-line and
as a python package. `pysradb` usage on the two platforms will be
displayed by selecting the corresponding tab below.
```{note}
If you have any questions along the way, please head over to the
[Python API Usage](python-api-usage.md) or the
[Command Line](cmdline.md) for more information. You may
also wish to refer to the [API Documentation](commands.rst).
```
------------------------------------------------------------------------
## Notebooks
A Google Colaboratory version of most used commands are available in
this [Colab
Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR)
. Colab runs Python 3.6 while `pysradb` requires Python 3.7+ and hence
the notebooks no longer run on Colab, but can be downloaded and run
locally.
The following notebooks document all the possible features of `pysradb`:
1. [Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
2. [Downloading datasets from SRA - command
line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
3. [Parallely download multiple datasets - Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
4. [Converting SRA-to-fastq - command line (requires
conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
5. [Downloading subsets of a project - Python
API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
6. [Download
BAMs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Download_BAMs.ipynb)
7. [Metadata for multiple
SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Multiple_SRPs.ipynb)
8. [Multithreaded fastq downloads using Aspera
Client](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb)
9. [Searching
SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/09.Query_Search.ipynb)
## Metadata
`pysradb` makes it very easy to obtain metadata from SRA/EBI:
`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP265425
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.metadata("SRP265425")
df
```
````
`````
Output:
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases
SRP265425 SRX8434255 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745319 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1311358 83306910 SRR11886735 1311358 109594216
SRP265425 SRX8434254 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745320 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2614109 204278682 SRR11886736 2614109 262305651
SRP265425 SRX8434253 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745318 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2286312 183516004 SRR11886737 2286312 263304134
SRP265425 SRX8434252 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745317 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5202567 507524965 SRR11886738 5202567 781291588
SRP265425 SRX8434251 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745315 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3313960 356104406 SRR11886739 3313960 612430817
SRP265425 SRX8434250 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745316 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5155733 565882351 SRR11886740 5155733 954342917
SRP265425 SRX8434249 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745313 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1324589 175619046 SRR11886741 1324589 216531400
SRP265425 SRX8434248 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745314 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1639851 198973268 SRR11886742 1639851 245466005
SRP265425 SRX8434247 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745312 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3921389 210198580 SRR11886743 3921389 332935558
SRP265425 SRX8434246 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745311 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 14295475 2150005008 SRR11886744 14295475 2967829315
SRP265425 SRX8434245 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745310 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5124692 294846140 SRR11886745 5124692 431819462
SRP265425 SRX8434244 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745309 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2986306 205666872 SRR11886746 2986306 275400959
SRP265425 SRX8434243 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745308 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1182690 59471336 SRR11886747 1182690 86350631
SRP265425 SRX8434242 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745307 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 6031816 749323230 SRR11886748 6031816 928054297
Additionally to obtain locations of `.fastq/.sra` files and other
metadata:
`pysradb` makes it very easy to obtain metadata from SRA/EBI:
`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP265425 --detailed
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.metadata("SRP265425", detailed=True)
df
```
````
`````
Output:
run_accession study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases run_alias sra_url_alt1 sra_url_alt2 sra_url experiment_alias isolate collected_by collection_date geo_loc_name host host_disease isolation_source lat_lon BioSampleModel sra_url_alt3 ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2
SRR11886735 SRP265425 SRX8434255 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745319 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1311358 83306910 1311358 109594216 IonXpress_063_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-9/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011608/SRR11886735 GC-20 NA 02-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz
SRR11886736 SRP265425 SRX8434254 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745320 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2614109 204278682 2614109 262305651 IonXpress_062_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRZ/011886/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra50/SRR/011608/SRR11886736 GC-51 NA 14-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz
SRR11886737 SRP265425 SRX8434253 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745318 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2286312 183516004 2286312 263304134 IonXpress_061_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRZ/011886/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra17/SRR/011608/SRR11886737 GC-24 NA 07-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz
SRR11886738 SRP265425 SRX8434252 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745317 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5202567 507524965 5202567 781291588 IonXpress_060_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-15/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011608/SRR11886738 GC-23 NA 08-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz
SRR11886739 SRP265425 SRX8434251 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745315 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3313960 356104406 3313960 612430817 IonXpress_038_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-13/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011608/SRR11886739 GC-11b NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz
SRR11886740 SRP265425 SRX8434250 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745316 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5155733 565882351 5155733 954342917 IonXpress_037_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-5/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886740 GC-14b NA 28-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz
SRR11886741 SRP265425 SRX8434249 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745313 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1324589 175619046 1324589 216531400 IonXpress_036_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-11/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra57/SRR/011608/SRR11886741 GC-12 NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz
SRR11886742 SRP265425 SRX8434248 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745314 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1639851 198973268 1639851 245466005 IonXpress_035_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-11/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011608/SRR11886742 GC-13 NA 23-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz
SRR11886743 SRP265425 SRX8434247 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745312 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 3921389 210198580 3921389 332935558 IonXpress_068_R_2020_05_07_11_47_51_user_GCEID-S5-60-SARS_CoV2_SA4.bam gs://sra-pub-src-17/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra64/SRZ/011886/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra54/SRR/011608/SRR11886743 GC-55 NA 24-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz
SRR11886744 SRP265425 SRX8434246 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745311 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 14295475 2150005008 14295475 2967829315 IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq gs://sra-pub-src-11/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra20/SRR/011608/SRR11886744 GC-26 NA 07-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz
SRR11886745 SRP265425 SRX8434245 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745310 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 5124692 294846140 5124692 431819462 IonXpress_065_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra19/SRR/011608/SRR11886745 GC-25 NA 10-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz
SRR11886746 SRP265425 SRX8434244 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745309 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 2986306 205666872 2986306 275400959 IonXpress_064_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam gs://sra-pub-src-17/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra59/SRZ/011886/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra47/SRR/011608/SRR11886746 GC-21 NA 03-Apr-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz
SRR11886747 SRP265425 SRX8434243 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745308 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 1182690 59471336 1182690 86350631 IonXpress_034_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-16/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRZ/011886/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886747 GC-11a NA 24-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz
SRR11886748 SRP265425 SRX8434242 Ampliseq of SARS-CoV-2 Ampliseq of SARS-CoV-2 2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03 AMPLICON VIRAL RNA RT-PCR SINGLE SRS6745307 Ion Torrent S5 XL Ion Torrent S5 XL ION_TORRENT 6031816 749323230 6031816 928054297 IonXpress_033_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam gs://sra-pub-src-15/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra43/SRZ/011886/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam https://sra-download.ncbi.nlm.nih.gov/traces/sra66/SRR/011608/SRR11886748 GC-14a NA 28-Mar-2020 Australia: Victoria Homo sapiens COVID-19 swab NA Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz
## Converting between accession numbers
`pysradb` provides a suite of commands for interoperability between
conversion numbers.
### Convert SRP to SRX
`````{tabs}
````{tab} Console
``` bash
$ pysradb srp-to-srx SRP098789
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.srp-to-srx("SRP098789")
df
```
````
`````
Output:
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases study_accesssion
SRP098789 SRX2536428 GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956378 Illumina HiSeq 2500 69422931 1545681856 SRR5227313 69422931 3540569481 SRP098789
SRP098789 SRX2536427 GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956377 Illumina HiSeq 2500 58065134 1302369810 SRR5227312 58065134 2961321834 SRP098789
SRP098789 SRX2536426 GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956376 Illumina HiSeq 2500 63720205 1416818619 SRR5227311 63720205 3249730455 SRP098789
SRP098789 SRX2536425 GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956375 Illumina HiSeq 2500 66363585 1482728577 SRR5227310 66363585 3384542835 SRP098789
SRP098789 SRX2536424 GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956374 Illumina HiSeq 2500 40062613 904488287 SRR5227309 40062613 2043193263 SRP098789
SRP098789 SRX2536423 GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956373 Illumina HiSeq 2500 65591217 1499668100 SRR5227308 65591217 3345152067 SRP098789
SRP098789 SRX2536422 GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956372 Illumina HiSeq 2500 66480991 1564636133 SRR5227307 66480991 3390530541 SRP098789
SRP098789 SRX2536421 GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956371 Illumina HiSeq 2500 57588015 1357395400 SRR5227306 57588015 2936988765 SRP098789
SRP098789 SRX2536420 GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956370 Illumina HiSeq 2000 48405034 1530784033 SRR5227305 48405034 2420251700 SRP098789
SRP098789 SRX2536419 GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956369 Illumina HiSeq 2000 47139057 1489018603 SRR5227304 47139057 2356952850 SRP098789
SRP098789 SRX2536418 GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956368 Illumina HiSeq 2000 50956178 1495757884 SRR5227303 50956178 2547808900 SRP098789
SRP098789 SRX2536417 GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956367 Illumina HiSeq 2000 44258180 1404548468 SRR5227302 44258180 2212909000 SRP098789
SRP098789 SRX2536416 GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956366 Illumina HiSeq 2000 49129512 1536091510 SRR5227301 49129512 2456475600 SRP098789
SRP098789 SRX2536415 GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956365 Illumina HiSeq 2000 30043362 903983724 SRR5227300 30043362 1502168100 SRP098789
SRP098789 SRX2536414 GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956364 Illumina HiSeq 2000 48766213 1530350854 SRR5227299 48766213 2438310650 SRP098789
SRP098789 SRX2536413 GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956363 Illumina HiSeq 2000 49334392 1475414353 SRR5227298 49334392 2466719600 SRP098789
SRP098789 SRX2536412 GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956362 Illumina HiSeq 2000 60381365 1801283052 SRR5227297 60381365 3019068250 SRP098789
SRP098789 SRX2536411 GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956361 Illumina HiSeq 2000 52737784 1644829192 SRR5227296 52737784 2636889200 SRP098789
SRP098789 SRX2536410 GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956360 Illumina HiSeq 2000 46137148 1455541408 SRR5227295 46137148 2306857400 SRP098789
SRP098789 SRX2536409 GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956359 Illumina HiSeq 2000 76002122 1552821132 SRR5227294 76002122 3800106100 SRP098789
SRP098789 SRX2536408 GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956358 Illumina HiSeq 2000 42709138 1338829352 SRR5227293 42709138 2135456900 SRP098789
SRP098789 SRX2536407 GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956357 Illumina HiSeq 2000 60552437 1875910244 SRR5227292 60552437 3027621850 SRP098789
SRP098789 SRX2536406 GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956356 Illumina HiSeq 2000 41143319 843881081 SRR5227291 41143319 2057165950 SRP098789
SRP098789 SRX2536405 GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956355 Illumina HiSeq 2000 40462973 1287284933 SRR5227290 40462973 2023148650 SRP098789
SRP098789 SRX2536404 GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956354 Illumina HiSeq 2000 41657461 1360366732 SRR5227289 41657461 2082873050 SRP098789
SRP098789 SRX2536403 GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956353 Illumina HiSeq 2000 42082855 916745706 SRR5227288 42082855 2104142750 SRP098789
### Convert GSE to SRP
`````{tabs}
````{tab} Console
``` bash
$ pysradb srp-to-srx SRP098789
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.srp-to-srx("SRP098789")
df
```
````
`````
Output:
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases study_accesssion
SRP098789 SRX2536428 GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956378 Illumina HiSeq 2500 69422931 1545681856 SRR5227313 69422931 3540569481 SRP098789
SRP098789 SRX2536427 GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956377 Illumina HiSeq 2500 58065134 1302369810 SRR5227312 58065134 2961321834 SRP098789
SRP098789 SRX2536426 GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956376 Illumina HiSeq 2500 63720205 1416818619 SRR5227311 63720205 3249730455 SRP098789
SRP098789 SRX2536425 GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956375 Illumina HiSeq 2500 66363585 1482728577 SRR5227310 66363585 3384542835 SRP098789
SRP098789 SRX2536424 GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956374 Illumina HiSeq 2500 40062613 904488287 SRR5227309 40062613 2043193263 SRP098789
SRP098789 SRX2536423 GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956373 Illumina HiSeq 2500 65591217 1499668100 SRR5227308 65591217 3345152067 SRP098789
SRP098789 SRX2536422 GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq 9606 Homo sapiens RNA-Seq TRANSCRIPTOMIC cDNA SRS1956372 Illumina HiSeq 2500 66480991 1564636133 SRR5227307 66480991 3390530541 SRP098789
SRP098789 SRX2536421 GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956371 Illumina HiSeq 2500 57588015 1357395400 SRR5227306 57588015 2936988765 SRP098789
SRP098789 SRX2536420 GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956370 Illumina HiSeq 2000 48405034 1530784033 SRR5227305 48405034 2420251700 SRP098789
SRP098789 SRX2536419 GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956369 Illumina HiSeq 2000 47139057 1489018603 SRR5227304 47139057 2356952850 SRP098789
SRP098789 SRX2536418 GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956368 Illumina HiSeq 2000 50956178 1495757884 SRR5227303 50956178 2547808900 SRP098789
SRP098789 SRX2536417 GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956367 Illumina HiSeq 2000 44258180 1404548468 SRR5227302 44258180 2212909000 SRP098789
SRP098789 SRX2536416 GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956366 Illumina HiSeq 2000 49129512 1536091510 SRR5227301 49129512 2456475600 SRP098789
SRP098789 SRX2536415 GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956365 Illumina HiSeq 2000 30043362 903983724 SRR5227300 30043362 1502168100 SRP098789
SRP098789 SRX2536414 GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956364 Illumina HiSeq 2000 48766213 1530350854 SRR5227299 48766213 2438310650 SRP098789
SRP098789 SRX2536413 GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956363 Illumina HiSeq 2000 49334392 1475414353 SRR5227298 49334392 2466719600 SRP098789
SRP098789 SRX2536412 GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956362 Illumina HiSeq 2000 60381365 1801283052 SRR5227297 60381365 3019068250 SRP098789
SRP098789 SRX2536411 GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956361 Illumina HiSeq 2000 52737784 1644829192 SRR5227296 52737784 2636889200 SRP098789
SRP098789 SRX2536410 GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956360 Illumina HiSeq 2000 46137148 1455541408 SRR5227295 46137148 2306857400 SRP098789
SRP098789 SRX2536409 GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956359 Illumina HiSeq 2000 76002122 1552821132 SRR5227294 76002122 3800106100 SRP098789
SRP098789 SRX2536408 GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956358 Illumina HiSeq 2000 42709138 1338829352 SRR5227293 42709138 2135456900 SRP098789
SRP098789 SRX2536407 GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956357 Illumina HiSeq 2000 60552437 1875910244 SRR5227292 60552437 3027621850 SRP098789
SRP098789 SRX2536406 GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956356 Illumina HiSeq 2000 41143319 843881081 SRR5227291 41143319 2057165950 SRP098789
SRP098789 SRX2536405 GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956355 Illumina HiSeq 2000 40462973 1287284933 SRR5227290 40462973 2023148650 SRP098789
SRP098789 SRX2536404 GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956354 Illumina HiSeq 2000 41657461 1360366732 SRR5227289 41657461 2082873050 SRP098789
SRP098789 SRX2536403 GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER 9606 Homo sapiens OTHER TRANSCRIPTOMIC other SRS1956353 Illumina HiSeq 2000 42082855 916745706 SRR5227288 42082855 2104142750 SRP098789
------------------------------------------------------------------------
## Downloading sequencing data
`pysradb` can alse be used to download either `.fastq` or `.sra`
filesboth from ENA and SRA.
### Downloading via accession number
`````{tabs}
````{tab} Console
``` bash
$ pysradb download SRP098789
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
client.download("SRP098789")
```
````
`````
It is also possible to pipe the dataframe from [metadata]{.title-ref} or
[search]{.title-ref} to download, after filtering the dataframe entries:
`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP276671 --detailed | pysradb download
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP016501', detailed=True)
client.download(df=df)
```
````
`````
### Ultrafast fastq downloads
With
[aspera-client](https://downloads.asperasoft.com/en/downloads/8?list)
installed, `pysradb` canan perform ultra fast downloads:
To download all original fastqs with [aspera-client]{.title-ref}
installed utilizing 8 threads:
`````{tabs}
````{tab} Console
``` console
$ pysradb download -t 8 --use_ascp -p SRP002605
```
````
````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
client.download("SRP098789", use_ascp=True, threads=8)
```
````
`````
gitextract_rf093ld_/
├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ └── feature_request.md
│ ├── ISSUE_TEMPLATE.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── codeql-analysis.yml
│ ├── publish.yml
│ ├── pull_request.yml
│ └── push.yml
├── .gitignore
├── AUTHORS.md
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── HISTORY.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── docs/
│ ├── Makefile
│ ├── _static/
│ │ ├── copy-button.js
│ │ └── custom.css
│ ├── authors.md
│ ├── case_studies.md
│ ├── cmdline.md
│ ├── commands.rst
│ ├── conf.py
│ ├── contributing.md
│ ├── history.md
│ ├── index.rst
│ ├── installation.md
│ ├── make.bat
│ ├── modules.rst
│ ├── notebooks.rst
│ ├── pysradb.rst
│ ├── python-api-usage.md
│ └── quickstart.md
├── notebooks/
│ ├── 01.Python-API_demo.ipynb
│ ├── 02.Commandline_download.ipynb
│ ├── 03.ParallelDownload.ipynb
│ ├── 04.SRA_to_fastq_conda.ipynb
│ ├── 05.Downloading_subsets_of_a_project.ipynb
│ ├── 06.Multiple_SRPs.ipynb
│ ├── 07.Query_Search.ipynb
│ ├── 08.PMC_DOI_Identifiers.ipynb
│ ├── 09.Metadata_enrichment.ipynb
│ ├── 11.Parse_Bioscience_Search.ipynb
│ └── README.md
├── pyproject.toml
├── pysradb/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── download.py
│ ├── exceptions.py
│ ├── filter_attrs.py
│ ├── geoweb.py
│ ├── metadata_enrichment.py
│ ├── ontology_reference.json
│ ├── search.py
│ ├── sraweb.py
│ ├── taxid2name.py
│ └── utils.py
├── requirements.txt
├── setup.cfg
└── tests/
├── conftest.py
├── data/
│ └── test_search/
│ ├── ena_search_test1.txt
│ ├── ena_test_verbosity_0.csv
│ ├── ena_test_verbosity_0.json
│ ├── ena_test_verbosity_1.csv
│ ├── ena_test_verbosity_1.json
│ ├── ena_test_verbosity_2.csv
│ ├── ena_test_verbosity_2.json
│ ├── ena_test_verbosity_3.csv
│ ├── ena_test_verbosity_3.json
│ ├── geo_search_test1.txt
│ ├── sra_search_test1.txt
│ ├── sra_test.xml
│ ├── sra_test_2_verbosity_0.csv
│ ├── sra_test_2_verbosity_1.csv
│ ├── sra_test_2_verbosity_2.csv
│ ├── sra_test_2_verbosity_3.csv
│ ├── sra_test_ERS3331676.xml
│ ├── sra_test_verbosity_0.csv
│ ├── sra_test_verbosity_1.csv
│ ├── sra_test_verbosity_2.csv
│ ├── sra_test_verbosity_3.csv
│ └── sra_uids.txt
├── test_geoweb.py
├── test_search.py
├── test_sraweb.py
└── test_utils.py
SYMBOL INDEX (324 symbols across 13 files)
FILE: pysradb/cli.py
class CustomFormatterArgP (line 33) | class CustomFormatterArgP(
class ArgParser (line 39) | class ArgParser(argparse.ArgumentParser):
method error (line 40) | def error(self, message):
function pretty_print_df (line 46) | def pretty_print_df(df, include_header=True):
function _create_table (line 92) | def _create_table(df, terminal_width, include_header, format_value):
function _print_save_df (line 149) | def _print_save_df(df, saveto=None):
function metadata (line 189) | def metadata(
function download (line 248) | def download(
function search (line 299) | def search(saveto, db, verbosity, return_max, fields):
function get_geo_search_info (line 370) | def get_geo_search_info():
function gse_to_gsm (line 385) | def gse_to_gsm(gse_ids, saveto, detailed, desc, expand):
function gse_to_srp (line 400) | def gse_to_srp(gse_ids, saveto, detailed, desc, expand):
function gsm_to_gse (line 415) | def gsm_to_gse(gsm_ids, saveto, detailed, desc, expand):
function gsm_to_srp (line 430) | def gsm_to_srp(gsm_ids, saveto, detailed, desc, expand):
function gsm_to_srr (line 445) | def gsm_to_srr(gsm_ids, saveto, detailed, desc, expand):
function gsm_to_srs (line 460) | def gsm_to_srs(gsm_ids, saveto, detailed, desc, expand):
function gsm_to_srx (line 475) | def gsm_to_srx(gsm_ids, saveto, detailed, desc, expand):
function srp_to_gse (line 490) | def srp_to_gse(srp_id, saveto, detailed, desc, expand):
function srp_to_srr (line 505) | def srp_to_srr(srp_id, saveto, detailed, desc, expand):
function srp_to_srs (line 520) | def srp_to_srs(srp_id, saveto, detailed, desc, expand):
function srp_to_srx (line 535) | def srp_to_srx(srp_id, saveto, detailed, desc, expand):
function srr_to_gsm (line 550) | def srr_to_gsm(srr_ids, saveto, detailed, desc, expand):
function srr_to_srp (line 565) | def srr_to_srp(srr_ids, saveto, detailed, desc, expand):
function srr_to_srs (line 580) | def srr_to_srs(srr_ids, saveto, detailed, desc, expand):
function srr_to_srx (line 595) | def srr_to_srx(srr_ids, saveto, detailed, desc, expand):
function srs_to_gsm (line 610) | def srs_to_gsm(srs_ids, saveto, detailed, desc, expand):
function srs_to_srx (line 625) | def srs_to_srx(srs_ids, saveto, detailed, desc, expand):
function srx_to_srp (line 640) | def srx_to_srp(srx_ids, saveto, detailed, desc, expand):
function srx_to_srr (line 655) | def srx_to_srr(srx_ids, saveto, detailed, desc, expand):
function srx_to_srs (line 670) | def srx_to_srs(srx_ids, saveto, detailed, desc, expand):
function srp_to_pmid (line 681) | def srp_to_pmid(srp_ids, saveto):
function sra_to_pmid (line 687) | def sra_to_pmid(sra_ids, saveto):
function gse_to_pmid (line 694) | def gse_to_pmid(gse_ids, saveto):
function pmid_to_gse (line 700) | def pmid_to_gse(pmid_ids, saveto):
function pmid_to_srp (line 706) | def pmid_to_srp(pmid_ids, saveto):
function pmc_to_identifiers (line 712) | def pmc_to_identifiers(pmc_ids, saveto):
function pmid_to_identifiers (line 718) | def pmid_to_identifiers(pmid_ids, saveto):
function doi_to_gse (line 724) | def doi_to_gse(doi_ids, saveto):
function doi_to_srp (line 730) | def doi_to_srp(doi_ids, saveto):
function doi_to_identifiers (line 736) | def doi_to_identifiers(doi_ids, saveto):
function geo_matrix (line 746) | def geo_matrix(accession, to_tsv, output_dir):
function parse_args (line 765) | def parse_args(args=None):
FILE: pysradb/download.py
function _get_ftp_file_size (line 24) | def _get_ftp_file_size(url):
function _download_ftp_file (line 48) | def _download_ftp_file(
function millify (line 120) | def millify(n):
function get_file_size (line 145) | def get_file_size(row, url_col):
function md5_validate_file (line 192) | def md5_validate_file(file_path, md5_hash):
function download_file (line 218) | def download_file(
FILE: pysradb/exceptions.py
class MissingQueryException (line 4) | class MissingQueryException(Exception):
method __init__ (line 13) | def __init__(self):
class IncorrectFieldException (line 23) | class IncorrectFieldException(Exception):
FILE: pysradb/filter_attrs.py
function _get_sample_attr_keys (line 8) | def _get_sample_attr_keys(sample_attribute):
function expand_sample_attribute_columns (line 60) | def expand_sample_attribute_columns(metadata_df):
function guess_cell_type (line 123) | def guess_cell_type(sample_attribute):
function guess_tissue_type (line 163) | def guess_tissue_type(sample_attribute):
function guess_strain_type (line 189) | def guess_strain_type(sample_attribute):
FILE: pysradb/geoweb.py
class GEOweb (line 21) | class GEOweb(object):
method __init__ (line 22) | def __init__(self):
method get_download_links (line 25) | def get_download_links(self, gse):
method download (line 62) | def download(self, links, root_url, gse, verbose=False, out_dir=None):
function download_geo_matrix (line 111) | def download_geo_matrix(accession, output_dir="."):
function parse_geo_matrix_to_tsv (line 141) | def parse_geo_matrix_to_tsv(input_file, output_file):
FILE: pysradb/metadata_enrichment.py
function _prompt_install_enrichment_dependencies (line 19) | def _prompt_install_enrichment_dependencies() -> bool:
class MetadataExtractor (line 60) | class MetadataExtractor(ABC):
method __init__ (line 63) | def __init__(self):
method extract_metadata (line 67) | def extract_metadata(
method extract_batch (line 82) | def extract_batch(
method _find_column_variant (line 97) | def _find_column_variant(self, df: pd.DataFrame, target_col: str) -> O...
method enrich_dataframe (line 121) | def enrich_dataframe(
class _MetadataExtraction (line 211) | class _MetadataExtraction(BaseModel):
function load_ontology_reference (line 256) | def load_ontology_reference() -> Dict[str, List[str]]:
class LLMMetadataExtractor (line 279) | class LLMMetadataExtractor(MetadataExtractor):
method __init__ (line 282) | def __init__(
method _provider_env_key (line 305) | def _provider_env_key(self) -> Optional[str]:
method _check_ollama_available (line 319) | def _check_ollama_available(self) -> bool:
method _initialize_client (line 346) | def _initialize_client(self):
method _create_extraction_prompt (line 381) | def _create_extraction_prompt(
method _call_llm (line 486) | def _call_llm(self, prompt: str) -> Dict[str, Any]:
method extract_metadata (line 507) | def extract_metadata(
class EmbeddingMetadataExtractor (line 542) | class EmbeddingMetadataExtractor(MetadataExtractor):
method __init__ (line 545) | def __init__(
method _load_model (line 578) | def _load_model(self):
method _get_cache_path (line 611) | def _get_cache_path(self) -> str:
method _compute_reference_embeddings (line 625) | def _compute_reference_embeddings(self) -> Dict[str, Any]:
method _find_best_match (line 660) | def _find_best_match(
method _parse_structured_fields (line 682) | def _parse_structured_fields(self, text: str) -> Dict[str, str]:
method _match_value_or_text (line 704) | def _match_value_or_text(
method extract_metadata (line 751) | def extract_metadata(
function create_metadata_extractor (line 826) | def create_metadata_extractor(
function apply_dataframe_enrichment (line 865) | def apply_dataframe_enrichment(
FILE: pysradb/search.py
class QuerySearch (line 22) | class QuerySearch:
method __init__ (line 94) | def __init__(
method _input_multi_regex_checker (line 171) | def _input_multi_regex_checker(self, regex_matcher, input_query, error...
method _validate_fields (line 218) | def _validate_fields(self):
method _list_stat (line 451) | def _list_stat(self, stat_header):
method show_result_statistics (line 461) | def show_result_statistics(self):
method visualise_results (line 494) | def visualise_results(
method search (line 576) | def search(self):
method get_df (line 579) | def get_df(self):
method get_plot_objects (line 583) | def get_plot_objects(self):
method _plot_graph (line 587) | def _plot_graph(self, plt, axes, show, savedir, too_many_organisms):
class SraSearch (line 681) | class SraSearch(QuerySearch):
method __init__ (line 718) | def __init__(
method search (line 755) | def search(self):
method get_uids (line 803) | def get_uids(self):
method _format_query_string (line 811) | def _format_query_string(self):
method _format_request (line 840) | def _format_request(self):
method _format_response (line 849) | def _format_response(self, content):
method _format_result (line 870) | def _format_result(self):
method _parse_entry (line 931) | def _parse_entry(self, entry_root):
method _update_entry (line 1096) | def _update_entry(self, field_name, field_content):
method _update_stats (line 1122) | def _update_stats(self):
method _merge_selected_columns (line 1204) | def _merge_selected_columns(self, regex):
class EnaSearch (line 1217) | class EnaSearch(QuerySearch):
method search (line 1252) | def search(self):
method _format_query_string (line 1277) | def _format_query_string(self):
method _format_request (line 1342) | def _format_request(self):
method _format_result (line 1381) | def _format_result(self, content):
method _update_stats (line 1421) | def _update_stats(self):
class GeoSearch (line 1499) | class GeoSearch(SraSearch):
method __init__ (line 1537) | def __init__(
method _format_geo_query_string (line 1620) | def _format_geo_query_string(self):
method _format_geo_request (line 1637) | def _format_geo_request(self):
method _format_request (line 1647) | def _format_request(self):
method search (line 1660) | def search(self):
method _combine_uids (line 1749) | def _combine_uids(self, uids_from_sra, uids_from_geo):
method info (line 1769) | def info(cls):
FILE: pysradb/sraweb.py
function xmlescape (line 23) | def xmlescape(data):
function _make_hashable (line 27) | def _make_hashable(obj):
function _order_first (line 45) | def _order_first(df, column_order_list):
function _retry_response (line 59) | def _retry_response(base_url, payload, key, max_retries=10):
function get_retmax (line 74) | def get_retmax(n_records, retmax=500):
class SRAweb (line 80) | class SRAweb(object):
method __init__ (line 81) | def __init__(self, api_key=None):
method format_xml (line 131) | def format_xml(string):
method xml_to_json (line 147) | def xml_to_json(xml):
method bioproject_to_srp (line 169) | def bioproject_to_srp(self, bioproject):
method fetch_ena_fastq (line 236) | def fetch_ena_fastq(self, srp):
method create_esummary_params (line 321) | def create_esummary_params(self, esearchresult, db="sra"):
method get_esummary_response (line 339) | def get_esummary_response(self, db, term, usehistory="y"):
method get_efetch_response (line 411) | def get_efetch_response(self, db, term, usehistory="y"):
method sra_metadata (line 495) | def sra_metadata(
method fetch_gds_results (line 936) | def fetch_gds_results(self, gse, **kwargs):
method fetch_gsm_soft (line 972) | def fetch_gsm_soft(self, gsm_ids):
method geo_metadata (line 1039) | def geo_metadata(
method metadata (line 1465) | def metadata(self, accession, **kwargs):
method gse_to_gsm (line 1506) | def gse_to_gsm(self, gse, **kwargs):
method gse_to_srp (line 1536) | def gse_to_srp(self, gse, **kwargs):
method gsm_to_srp (line 1604) | def gsm_to_srp(self, gsm, **kwargs):
method gsm_to_srr (line 1612) | def gsm_to_srr(self, gsm, **kwargs):
method gsm_to_srs (line 1627) | def gsm_to_srs(self, gsm, **kwargs):
method gsm_to_srx (line 1641) | def gsm_to_srx(self, gsm, **kwargs):
method gsm_to_gse (line 1652) | def gsm_to_gse(self, gsm, **kwargs):
method srp_to_gse (line 1697) | def srp_to_gse(self, srp, **kwargs):
method srp_to_srr (line 1722) | def srp_to_srr(self, srp, **kwargs):
method srp_to_srs (line 1727) | def srp_to_srs(self, srp, **kwargs):
method srp_to_srx (line 1732) | def srp_to_srx(self, srp, **kwargs):
method srr_to_gsm (line 1738) | def srr_to_gsm(self, srr, **kwargs):
method srr_to_srp (line 1758) | def srr_to_srp(self, srr, **kwargs):
method srr_to_srs (line 1768) | def srr_to_srs(self, srr, **kwargs):
method srr_to_srx (line 1776) | def srr_to_srx(self, srr, **kwargs):
method srs_to_gsm (line 1784) | def srs_to_gsm(self, srs, **kwargs):
method srx_to_gsm (line 1795) | def srx_to_gsm(self, srx, **kwargs):
method srs_to_srx (line 1805) | def srs_to_srx(self, srs, **kwargs):
method srx_to_srp (line 1810) | def srx_to_srp(self, srx, **kwargs):
method srx_to_srr (line 1815) | def srx_to_srr(self, srx, **kwargs):
method srx_to_srs (line 1820) | def srx_to_srs(self, srx, **kwargs):
method search (line 1825) | def search(self, *args, **kwargs):
method fetch_bioproject_pmids (line 1828) | def fetch_bioproject_pmids(self, bioprojects):
method srp_to_pmid (line 1912) | def srp_to_pmid(self, srp_accessions):
method _search_fallback_pmids (line 1965) | def _search_fallback_pmids(self, srp_accessions):
method _extract_sra_accession (line 2000) | def _extract_sra_accession(self, row):
method _get_smallest_pmid (line 2010) | def _get_smallest_pmid(self, pmids):
method extract_external_sources (line 2025) | def extract_external_sources(self, metadata_df):
method _search_gse_gsm_pmids (line 2064) | def _search_gse_gsm_pmids(self, metadata_df, sra_accessions):
method _bioproject_to_gse (line 2128) | def _bioproject_to_gse(self, bioproject):
method _srp_to_gse_via_elink (line 2187) | def _srp_to_gse_via_elink(self, srp_id):
method _search_pmc_by_bioproject (line 2270) | def _search_pmc_by_bioproject(self, bioproject_id):
method search_pmc_for_external_sources (line 2337) | def search_pmc_for_external_sources(self, external_sources):
method sra_to_pmid (line 2411) | def sra_to_pmid(self, sra_accessions):
method srr_to_pmid (line 2437) | def srr_to_pmid(self, srr):
method srx_to_pmid (line 2441) | def srx_to_pmid(self, srx):
method srs_to_pmid (line 2445) | def srs_to_pmid(self, srs):
method gse_to_pmid (line 2449) | def gse_to_pmid(self, gse_accessions):
method doi_to_pmid (line 2479) | def doi_to_pmid(self, dois):
method pmid_to_pmc (line 2529) | def pmid_to_pmc(self, pmids):
method fetch_pmc_fulltext (line 2579) | def fetch_pmc_fulltext(self, pmc_id):
method extract_identifiers_from_text (line 2610) | def extract_identifiers_from_text(self, text):
method pmc_to_identifiers (line 2651) | def pmc_to_identifiers(self, pmc_ids, convert_missing=True):
method pmid_to_identifiers (line 2802) | def pmid_to_identifiers(self, pmids):
method pmid_to_gse (line 2873) | def pmid_to_gse(self, pmids):
method pmid_to_srp (line 2889) | def pmid_to_srp(self, pmids):
method doi_to_identifiers (line 2905) | def doi_to_identifiers(self, dois):
method doi_to_gse (line 2976) | def doi_to_gse(self, dois):
method doi_to_srp (line 2992) | def doi_to_srp(self, dois):
FILE: pysradb/utils.py
function path_leaf (line 26) | def path_leaf(path):
function requests_3_retries (line 43) | def requests_3_retries():
function scientific_name_to_taxid (line 64) | def scientific_name_to_taxid(name):
function unique (line 110) | def unique(sequence):
class TqdmUpTo (line 126) | class TqdmUpTo(tqdm):
method update_to (line 136) | def update_to(self, b=1, bsize=1, tsize=None):
function _extract_first_field (line 150) | def _extract_first_field(data):
function _find_aspera_keypath (line 155) | def _find_aspera_keypath(aspera_dir=None):
function mkdir_p (line 177) | def mkdir_p(path):
function order_dataframe (line 195) | def order_dataframe(df, columns):
function _get_url (line 212) | def _get_url(url, download_to, show_progress=True):
function run_command (line 237) | def run_command(command, verbose=False):
function get_gzip_uncompressed_size (line 255) | def get_gzip_uncompressed_size(filepath):
function confirm (line 272) | def confirm(preceeding_text):
function copyfileobj (line 295) | def copyfileobj(fsrc, fdst, bufsize=16384, filesize=None, desc=""):
FILE: tests/test_geoweb.py
function geoweb_connection (line 13) | def geoweb_connection():
function test_valid_download_links (line 19) | def test_valid_download_links(geoweb_connection):
function test_invalid_download_links (line 25) | def test_invalid_download_links(geoweb_connection):
function test_file_download (line 31) | def test_file_download(geoweb_connection):
FILE: tests/test_search.py
function valid_search_inputs_1 (line 25) | def valid_search_inputs_1():
function valid_search_inputs_2 (line 224) | def valid_search_inputs_2():
function valid_search_inputs_geo (line 343) | def valid_search_inputs_geo():
function empty_search_inputs (line 499) | def empty_search_inputs():
function empty_search_inputs_geo (line 523) | def empty_search_inputs_geo():
function invalid_search_inputs (line 550) | def invalid_search_inputs():
function sra_response_xml_1 (line 781) | def sra_response_xml_1():
function sra_formatted_responses_1 (line 786) | def sra_formatted_responses_1():
function sra_response_xml_2 (line 812) | def sra_response_xml_2():
function sra_formatted_responses_2 (line 817) | def sra_formatted_responses_2():
function sra_uids (line 843) | def sra_uids():
function ena_responses_json (line 850) | def ena_responses_json():
function ena_formatted_responses (line 859) | def ena_formatted_responses():
function missing_query_test (line 887) | def missing_query_test(empty_search_inputs):
function test_invalid_search_query (line 893) | def test_invalid_search_query(invalid_search_inputs):
function test_sra_search_1 (line 919) | def test_sra_search_1():
function test_sra_uids (line 930) | def test_sra_uids(sra_uids):
function test_valid_search_query_1_sra (line 938) | def test_valid_search_query_1_sra(valid_search_inputs_1):
function test_valid_search_query_2_sra (line 960) | def test_valid_search_query_2_sra(valid_search_inputs_2):
function test_sra_search_format_request (line 982) | def test_sra_search_format_request():
function test_sra_search_format_result_1 (line 995) | def test_sra_search_format_result_1(sra_response_xml_1, sra_formatted_re...
function test_sra_search_format_result_2 (line 1023) | def test_sra_search_format_result_2(sra_response_xml_2, sra_formatted_re...
function _test_ena_search_1 (line 1049) | def _test_ena_search_1():
function _test_ena_search_2 (line 1061) | def _test_ena_search_2(capsys):
function _test_ena_search_3 (line 1068) | def _test_ena_search_3(capsys):
function _test_valid_search_query_1_ena (line 1076) | def _test_valid_search_query_1_ena(valid_search_inputs_1):
function _test_valid_search_query_2_ena (line 1104) | def _test_valid_search_query_2_ena(valid_search_inputs_2):
function test_ena_search_format_request (line 1134) | def test_ena_search_format_request():
function test_ena_search_format_result (line 1153) | def test_ena_search_format_result(ena_responses_json, ena_formatted_resp...
function missing_query_test_geo (line 1171) | def missing_query_test_geo(empty_search_inputs_geo):
function test_geo_search_1 (line 1177) | def test_geo_search_1():
function test_valid_search_query_geo (line 1191) | def test_valid_search_query_geo(valid_search_inputs_geo):
function test_geo_search_format_request (line 1219) | def test_geo_search_format_request():
function test_geo_info (line 1232) | def test_geo_info():
FILE: tests/test_sraweb.py
function sraweb_connection (line 12) | def sraweb_connection():
function test_sra_metadata (line 18) | def test_sra_metadata(sraweb_connection):
function test_sra_metadata_missing_orgname (line 24) | def test_sra_metadata_missing_orgname(sraweb_connection):
function test_sra_metadata_multiple (line 31) | def test_sra_metadata_multiple(sraweb_connection):
function test_sra_metadata_multiple_detailed (line 41) | def test_sra_metadata_multiple_detailed(sraweb_connection):
function test_tissue_column (line 57) | def test_tissue_column(sraweb_connection):
function test_metadata_exp_accession (line 63) | def test_metadata_exp_accession(sraweb_connection):
function test_fetch_gds_results (line 69) | def test_fetch_gds_results(sraweb_connection):
function test_srp_to_gse (line 75) | def test_srp_to_gse(sraweb_connection):
function test_srp_to_srr (line 81) | def test_srp_to_srr(sraweb_connection):
function test_srp_to_srs (line 93) | def test_srp_to_srs(sraweb_connection):
function test_srp_to_srx (line 106) | def test_srp_to_srx(sraweb_connection):
function test_gse_to_gsm (line 112) | def test_gse_to_gsm(sraweb_connection):
function test_gse_to_gsm2 (line 118) | def test_gse_to_gsm2(sraweb_connection):
function test_gse_to_gsm1 (line 124) | def test_gse_to_gsm1(sraweb_connection):
function test_gse_to_srp (line 130) | def test_gse_to_srp(sraweb_connection):
function test_gse_to_srp2 (line 136) | def test_gse_to_srp2(sraweb_connection):
function test_gse_to_srp_with_nan_sra (line 143) | def test_gse_to_srp_with_nan_sra(sraweb_connection):
function test_gsm_to_srp (line 159) | def test_gsm_to_srp(sraweb_connection):
function test_gsm_to_gse (line 165) | def test_gsm_to_gse(sraweb_connection):
function test_gsm_to_gse_multiple_gses (line 171) | def test_gsm_to_gse_multiple_gses(sraweb_connection):
function test_gsm_to_srr (line 190) | def test_gsm_to_srr(sraweb_connection):
function test_gsm_to_srs (line 196) | def test_gsm_to_srs(sraweb_connection):
function test_gsm_to_srx (line 202) | def test_gsm_to_srx(sraweb_connection):
function test_srr_to_gsm (line 208) | def test_srr_to_gsm(sraweb_connection):
function test_srr_to_srp (line 213) | def test_srr_to_srp(sraweb_connection):
function test_srr_to_srp1 (line 219) | def test_srr_to_srp1(sraweb_connection):
function test_srr_to_srs (line 225) | def test_srr_to_srs(sraweb_connection):
function test_srr_to_srx (line 231) | def test_srr_to_srx(sraweb_connection):
function test_srs_to_gsm (line 237) | def test_srs_to_gsm(sraweb_connection):
function test_srs_to_srx (line 243) | def test_srs_to_srx(sraweb_connection):
function test_srx_to_gsm (line 249) | def test_srx_to_gsm(sraweb_connection):
function test_srx_to_srp (line 255) | def test_srx_to_srp(sraweb_connection):
function test_srx_to_srr (line 261) | def test_srx_to_srr(sraweb_connection):
function test_srx_to_srr1 (line 267) | def test_srx_to_srr1(sraweb_connection):
function test_srx_to_srs (line 273) | def test_srx_to_srs(sraweb_connection):
function _test_xmlns_id (line 280) | def _test_xmlns_id(sraweb_connection):
function test_GCP_url (line 287) | def test_GCP_url(sraweb_connection):
function test_GCP_url2 (line 292) | def test_GCP_url2(sraweb_connection):
function test_gse_to_srp3 (line 297) | def test_gse_to_srp3(sraweb_connection):
function test_gse_to_srp_multiple_srps (line 303) | def test_gse_to_srp_multiple_srps(sraweb_connection):
function test_geo_metadata_for_gse_without_srp (line 326) | def test_geo_metadata_for_gse_without_srp(sraweb_connection):
function test_geo_metadata_with_sample_attributes (line 335) | def test_geo_metadata_with_sample_attributes(sraweb_connection):
function test_geo_metadata_covid19_characteristics (line 342) | def test_geo_metadata_covid19_characteristics(sraweb_connection):
function test_fetch_bioproject_pmids (line 417) | def test_fetch_bioproject_pmids(sraweb_connection):
function test_fetch_bioproject_pmids_multiple (line 425) | def test_fetch_bioproject_pmids_multiple(sraweb_connection):
function test_search_pmc_by_bioproject (line 438) | def test_search_pmc_by_bioproject(sraweb_connection):
function test_fetch_bioproject_pmids_with_pmc_fallback (line 448) | def test_fetch_bioproject_pmids_with_pmc_fallback(sraweb_connection):
function test_srp_to_pmid_with_pmc_fallback (line 460) | def test_srp_to_pmid_with_pmc_fallback(sraweb_connection):
function test_sra_to_pmid (line 476) | def test_sra_to_pmid(sraweb_connection):
function test_srp_to_pmid (line 484) | def test_srp_to_pmid(sraweb_connection):
function test_srr_to_pmid (line 492) | def test_srr_to_pmid(sraweb_connection):
function test_sra_to_pmid_multiple (line 500) | def test_sra_to_pmid_multiple(sraweb_connection):
function test_srp_to_pmid_multiple (line 507) | def test_srp_to_pmid_multiple(sraweb_connection):
function test_gse_to_pmid (line 516) | def test_gse_to_pmid(sraweb_connection):
function test_gse_to_pmid_multiple (line 525) | def test_gse_to_pmid_multiple(sraweb_connection):
function test_pmid_to_pmc (line 534) | def test_pmid_to_pmc(sraweb_connection):
function test_pmid_to_pmc_multiple (line 541) | def test_pmid_to_pmc_multiple(sraweb_connection):
function test_extract_identifiers_from_text (line 548) | def test_extract_identifiers_from_text(sraweb_connection):
function test_pmc_to_identifiers (line 560) | def test_pmc_to_identifiers(sraweb_connection):
function test_pmid_to_identifiers (line 571) | def test_pmid_to_identifiers(sraweb_connection):
function test_pmid_to_gse (line 581) | def test_pmid_to_gse(sraweb_connection):
function test_pmid_to_srp (line 591) | def test_pmid_to_srp(sraweb_connection):
function test_doi_to_pmid (line 600) | def test_doi_to_pmid(sraweb_connection):
function test_doi_to_pmid_multiple (line 607) | def test_doi_to_pmid_multiple(sraweb_connection):
function test_doi_to_identifiers (line 617) | def test_doi_to_identifiers(sraweb_connection):
function test_doi_to_gse (line 627) | def test_doi_to_gse(sraweb_connection):
function test_doi_to_srp (line 637) | def test_doi_to_srp(sraweb_connection):
function test_unified_metadata_with_gse (line 646) | def test_unified_metadata_with_gse(sraweb_connection):
function test_unified_metadata_with_srp (line 654) | def test_unified_metadata_with_srp(sraweb_connection):
function test_unified_metadata_with_multiple_gse (line 662) | def test_unified_metadata_with_multiple_gse(sraweb_connection):
function test_unified_metadata_invalid_accession (line 671) | def test_unified_metadata_invalid_accession(sraweb_connection):
FILE: tests/test_utils.py
function invalid_name (line 9) | def invalid_name():
function valid_name (line 14) | def valid_name():
function invalid_scientific_name_to_taxid (line 18) | def invalid_scientific_name_to_taxid(invalid_name):
function valid_scientific_name_to_taxid (line 24) | def valid_scientific_name_to_taxid(valid_name):
Copy disabled (too large)
Download .json
Condensed preview — 94 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (18,918K chars).
[
{
"path": ".coveragerc",
"chars": 131,
"preview": "[run]\nomit =\n pysradb/filter_attrs.py\n pysradb/geodb.py\n pysradb/sradb.py\n pysradb/taxid2name.py\n pysradb"
},
{
"path": ".editorconfig",
"chars": 292,
"preview": "# http://editorconfig.org\n\nroot = true\n\n[*]\nindent_style = space\nindent_size = 4\ntrim_trailing_whitespace = true\ninsert_"
},
{
"path": ".gitattributes",
"chars": 93,
"preview": "*.rst linguist-documentation\n*.html linguist-documentation\n*.ipynb linguist-language=python\n\n"
},
{
"path": ".github/FUNDING.yml",
"chars": 65,
"preview": "# These are supported funding model platforms\n\ngithub: [saketkc]\n"
},
{
"path": ".github/ISSUE_TEMPLATE/bug_report.md",
"chars": 446,
"preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: \"[BUG]\"\nlabels: bug\nassignees: ''\n\n---\n\n**Describe"
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.md",
"chars": 379,
"preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: \"[ENH]\"\nlabels: enhancement\nassignees: ''\n\n---\n"
},
{
"path": ".github/ISSUE_TEMPLATE.md",
"chars": 318,
"preview": "* pysradb version:\n* Python version:\n* Operating System:\n\n### Description\n\nDescribe what you were trying to get done.\nTe"
},
{
"path": ".github/dependabot.yml",
"chars": 501,
"preview": "# To get started with Dependabot version updates, you'll need to specify which\n# package ecosystems to update and where "
},
{
"path": ".github/workflows/codeql-analysis.yml",
"chars": 2436,
"preview": "# For most projects, this workflow file will not need changing; you simply need\n# to commit it to your repository.\n#\n# Y"
},
{
"path": ".github/workflows/publish.yml",
"chars": 605,
"preview": "name: publish\n\non:\n release:\n types: [created]\n\njobs:\n deploy:\n runs-on: ubuntu-latest\n steps:\n - uses: ac"
},
{
"path": ".github/workflows/pull_request.yml",
"chars": 2205,
"preview": "name: pull_request\n\non: [pull_request]\n\njobs:\n test:\n runs-on: ubuntu-latest\n strategy:\n matrix:\n pyt"
},
{
"path": ".github/workflows/push.yml",
"chars": 2032,
"preview": "name: push\n\non: [push]\n\njobs:\n test:\n runs-on: ubuntu-latest\n strategy:\n matrix:\n python-version: [3."
},
{
"path": ".gitignore",
"chars": 1263,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": "AUTHORS.md",
"chars": 356,
"preview": "# Credits\n\n## Contributors\n\n- [Boshen Yan](https://github.com/bscrow)\n- [Maarten van der Sande](https://github.com/M"
},
{
"path": "CITATION.cff",
"chars": 326,
"preview": "cff-version: 1.2.0\nmessage: \"If you use this software, please cite it as below.\"\nauthors:\n- family-names: \"Choudhary\"\n "
},
{
"path": "CODE_OF_CONDUCT.md",
"chars": 3349,
"preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, w"
},
{
"path": "CONTRIBUTING.md",
"chars": 3607,
"preview": "# Contributing\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will alw"
},
{
"path": "HISTORY.md",
"chars": 10601,
"preview": "# History\n\n# 3.0.0 (Unreleased) - BREAKING CHANGES\n\n## Removal of legacy SQLite support\n\n**All local SQLite database sup"
},
{
"path": "LICENSE",
"chars": 1520,
"preview": "BSD 3-Clause License\n\nCopyright (c) 2020-2023, Saket Choudhary\nAll rights reserved.\n\nRedistribution and use in source an"
},
{
"path": "MANIFEST.in",
"chars": 349,
"preview": "include AUTHORS.md\ninclude CONTRIBUTING.md\ninclude HISTORY.md\ninclude LICENSE\ninclude README.md\ninclude requirements.txt"
},
{
"path": "Makefile",
"chars": 2236,
"preview": ".PHONY: clean clean-test clean-pyc clean-build docs help\n.DEFAULT_GOAL := help\n\ndefine BROWSER_PYSCRIPT\nimport os, webbr"
},
{
"path": "README.md",
"chars": 21709,
"preview": "# A Python package for retrieving metadata from SRA/ENA/GEO\n\n[ {\n // SVG icon for clipboa"
},
{
"path": "docs/_static/custom.css",
"chars": 712,
"preview": "/* Override Pygments code block background color for light mode */\n.highlight {\n background: #f5f5f5 !important;\n}\n\n/* "
},
{
"path": "docs/authors.md",
"chars": 356,
"preview": "# Credits\n\n## Contributors\n\n- [Boshen Yan](https://github.com/bscrow)\n- [Maarten van der Sande](https://github.com/M"
},
{
"path": "docs/case_studies.md",
"chars": 16626,
"preview": "# Case Studies \n\n## Case Study 1\n\nConsider a scenario where somone is interested in searching for\nsingle-cell RNA-seq da"
},
{
"path": "docs/cmdline.md",
"chars": 14684,
"preview": "# CLI\n\n $ pysradb\n usage: pysradb [-h] [--version] [--citation]\n {metadata,download,search,gse-t"
},
{
"path": "docs/commands.rst",
"chars": 100,
"preview": "API Documentation\n=================\n\nSee :doc:`pysradb` for the Python API reference documentation.\n"
},
{
"path": "docs/conf.py",
"chars": 6650,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n#\n# pysradb documentation build configuration file, created by\n# sphinx-qu"
},
{
"path": "docs/contributing.md",
"chars": 3607,
"preview": "# Contributing\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will alw"
},
{
"path": "docs/history.md",
"chars": 11101,
"preview": "# History\n\n<details open>\n<summary style=\"cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;\">\n2.5"
},
{
"path": "docs/index.rst",
"chars": 20476,
"preview": "============\nIntroduction\n============\n\n\n``pysradb`` provides a simple method to programmatically access metadata\nand do"
},
{
"path": "docs/installation.md",
"chars": 1399,
"preview": "# Installation\n\n## Stable release\n\nTo install pysradb, run this command in your terminal:\n\n``` console\n$ pip install pys"
},
{
"path": "docs/make.bat",
"chars": 769,
"preview": "@ECHO OFF\n\npushd %~dp0\n\nREM Command file for Sphinx documentation\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=python -m"
},
{
"path": "docs/modules.rst",
"chars": 58,
"preview": "pysradb\n=======\n\n.. toctree::\n :maxdepth: 4\n\n pysradb\n"
},
{
"path": "docs/notebooks.rst",
"chars": 717,
"preview": "Tutorials & Notebooks\n=====================\n\nThe following Jupyter notebooks demonstrate various features of pysradb:\n\n."
},
{
"path": "docs/pysradb.rst",
"chars": 1943,
"preview": "pysradb package\n===============\n\nSubmodules\n----------\n\npysradb.basedb module\n---------------------\n\n.. automodule:: pys"
},
{
"path": "docs/python-api-usage.md",
"chars": 12265,
"preview": "# Python API \n\n## Use Case 1: Fetch the metadata table (SRA-runtable)\n\nThe simplest use case of [pysradb]{.title-ref} is"
},
{
"path": "docs/quickstart.md",
"chars": 62761,
"preview": "# Quickstart\n\nMost features in `pysradb` are accessible both from the command-line and\nas a python package. `pysradb` us"
},
{
"path": "notebooks/01.Python-API_demo.ipynb",
"chars": 75582,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"[\n"
},
{
"path": "pysradb/cli.py",
"chars": 56662,
"preview": "\"\"\"Command line interface for pysradb\"\"\"\n\nimport argparse\nimport os\nimport re\nimport sys\nimport warnings\nfrom io import "
},
{
"path": "pysradb/download.py",
"chars": 8295,
"preview": "\"\"\"Utility function to download data\"\"\"\n\nimport hashlib\nimport math\nimport os\nimport shutil\nimport sys\nimport warnings\nf"
},
{
"path": "pysradb/exceptions.py",
"chars": 777,
"preview": "\"\"\"This file contains custom Exceptions for pysradb\"\"\"\n\n\nclass MissingQueryException(Exception):\n \"\"\"Exception raised"
},
{
"path": "pysradb/filter_attrs.py",
"chars": 7970,
"preview": "import re\nimport warnings\n\nimport numpy as np\nimport pandas as pd\n\n\ndef _get_sample_attr_keys(sample_attribute):\n if "
},
{
"path": "pysradb/geoweb.py",
"chars": 5503,
"preview": "\"\"\"Utilities to interact with GEO online\"\"\"\n\nimport gzip\nimport os\nimport re\nimport sys\nfrom io import StringIO\n\nimport "
},
{
"path": "pysradb/metadata_enrichment.py",
"chars": 35460,
"preview": "\"\"\"\nMetadata enrichment for SRA/GEO datasets using LLMs and embeddings.\n\"\"\"\n\nimport logging\nimport os\nimport subprocess\n"
},
{
"path": "pysradb/ontology_reference.json",
"chars": 11524,
"preview": "{\n \"organs\": [\n \"brain\", \"heart\", \"liver\", \"lung\", \"kidney\", \"spleen\", \"pancreas\", \"stomach\",\n \"intestine\", \"colo"
},
{
"path": "pysradb/search.py",
"chars": 72580,
"preview": "\"\"\"This file contains the search classes for the search feature.\"\"\"\n\nimport os\nimport re\nimport sys\nimport time\nimport u"
},
{
"path": "pysradb/sraweb.py",
"chars": 119025,
"preview": "\"\"\"Utilities to interact with SRA online\"\"\"\n\nimport concurrent.futures\nimport os\nimport re\nimport sys\nimport time\nimport"
},
{
"path": "pysradb/taxid2name.py",
"chars": 386928,
"preview": "TAXID_TO_NAME = {\n 0: \"not_available\",\n 1: \"root\",\n 2: \"Bacteria\",\n 6: \"Azorhizobium\",\n 7: \"Azorhizobium "
},
{
"path": "pysradb/utils.py",
"chars": 8132,
"preview": "import errno\nimport gzip\nimport io\nimport ntpath\nimport os\nimport shlex\nimport subprocess\nimport urllib.request as urlli"
},
{
"path": "requirements.txt",
"chars": 74,
"preview": "lxml>=4.6.3\npandas>=1.3.2\nrequests>=2.26.0\ntqdm>=4.62.1\nxmltodict>=0.12.0\n"
},
{
"path": "setup.cfg",
"chars": 668,
"preview": "[bumpversion]\ncurrent_version = 2.4.1\ncommit = True\ntag = False\nparse = (?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)(\\"
},
{
"path": "tests/conftest.py",
"chars": 114,
"preview": "# contents of conftest.py\nimport pytest\n\n# Test fixtures will be added here as needed for SRAweb and GEOweb tests\n"
},
{
"path": "tests/data/test_search/ena_search_test1.txt",
"chars": 784,
"preview": "run_accession\nSRR492850\nSRR500270\nSRR609956\nSRR609957\nSRR609958\nSRR609959\nSRR609960\nSRR609961\nSRR609962\nSRR609963\nSRR609"
},
{
"path": "tests/data/test_search/ena_test_verbosity_0.csv",
"chars": 10994,
"preview": "run_accession\nERR1190989\nERR1190990\nERR1190991\nERR1190992\nERR1190993\nERR1190994\nERR1190995\nERR1190996\nERR1190997\nERR1190"
},
{
"path": "tests/data/test_search/ena_test_verbosity_0.json",
"chars": 821724,
"preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
},
{
"path": "tests/data/test_search/ena_test_verbosity_1.csv",
"chars": 132777,
"preview": "run_accession,description\nERR1190989,Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene"
},
{
"path": "tests/data/test_search/ena_test_verbosity_1.json",
"chars": 821724,
"preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
},
{
"path": "tests/data/test_search/ena_test_verbosity_2.csv",
"chars": 415681,
"preview": "study_accession,experiment_accession,experiment_title,description,tax_id,scientific_name,library_strategy,library_source"
},
{
"path": "tests/data/test_search/ena_test_verbosity_2.json",
"chars": 821724,
"preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
},
{
"path": "tests/data/test_search/ena_test_verbosity_3.csv",
"chars": 2187514,
"preview": "study_accession,experiment_accession,experiment_title,description,tax_id,scientific_name,library_strategy,library_source"
},
{
"path": "tests/data/test_search/ena_test_verbosity_3.json",
"chars": 4334941,
"preview": "[{\"study_accession\": \"PRJEB12126\", \"secondary_study_accession\": \"ERP013565\", \"sample_accession\": \"SAMEA3708907\", \"second"
},
{
"path": "tests/data/test_search/geo_search_test1.txt",
"chars": 857,
"preview": "SRX8089313\nSRX8089314\nSRX8089315\nSRX8089316\nSRX8089317\nSRX8089318\nSRX8089319\nSRX8089320\nSRX8089286\nSRX8089275\nSRX8089276"
},
{
"path": "tests/data/test_search/sra_search_test1.txt",
"chars": 19,
"preview": "SRX137370\nSRX137371"
},
{
"path": "tests/data/test_search/sra_test.xml",
"chars": 648589,
"preview": "<?xml version=\"1.0\" ?>\n<EXPERIMENT_PACKAGE_SET>\n<EXPERIMENT_PACKAGE><EXPERIMENT alias=\"GSM4369051\" accession=\"SRX7830165"
},
{
"path": "tests/data/test_search/sra_test_2_verbosity_0.csv",
"chars": 25,
"preview": "run_accession\nERR4229796\n"
},
{
"path": "tests/data/test_search/sra_test_2_verbosity_1.csv",
"chars": 76,
"preview": "run_accession,experiment_title\nERR4229796,HiSeq X Ten paired end sequencing\n"
},
{
"path": "tests/data/test_search/sra_test_2_verbosity_2.csv",
"chars": 516,
"preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
},
{
"path": "tests/data/test_search/sra_test_2_verbosity_3.csv",
"chars": 9398,
"preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
},
{
"path": "tests/data/test_search/sra_test_ERS3331676.xml",
"chars": 10356,
"preview": "<?xml version=\"1.0\" ?>\n<EXPERIMENT_PACKAGE_SET>\n<EXPERIMENT_PACKAGE><EXPERIMENT accession=\"ERX4190585\" alias=\"SC_EXP_296"
},
{
"path": "tests/data/test_search/sra_test_verbosity_0.csv",
"chars": 1033,
"preview": "run_accession\nSRR11217925\nSRR11217924\nSRR11217923\nSRR11217922\nSRR11217921\nSRR11217920\nSRR11217919\nSRR11217918\nSRR1121791"
},
{
"path": "tests/data/test_search/sra_test_verbosity_1.csv",
"chars": 6333,
"preview": "run_accession,experiment_title\nSRR11217925,GSM4369051: rnaH27nsun3; Caenorhabditis elegans; RNA-Seq\nSRR11217924,GSM43690"
},
{
"path": "tests/data/test_search/sra_test_verbosity_2.csv",
"chars": 20039,
"preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
},
{
"path": "tests/data/test_search/sra_test_verbosity_3.csv",
"chars": 329691,
"preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
},
{
"path": "tests/data/test_search/sra_uids.txt",
"chars": 14,
"preview": "155791\n155790\n"
},
{
"path": "tests/test_geoweb.py",
"chars": 1170,
"preview": "\"\"\"Tests for GEOweb\"\"\"\n\nimport os\nimport time\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.geoweb import GEOweb\n\n\n@p"
},
{
"path": "tests/test_search.py",
"chars": 31453,
"preview": "\"\"\"Tests for search.py\"\"\"\n\nimport json\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.search import *\n\n# ============="
},
{
"path": "tests/test_sraweb.py",
"chars": 24247,
"preview": "\"\"\"Tests for SRAweb\"\"\"\n\nimport time\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.sraweb import SRAweb\n\n\n@pytest.fixt"
},
{
"path": "tests/test_utils.py",
"chars": 546,
"preview": "\"\"\"Tests for utils.py\"\"\"\n\nimport pytest\n\nfrom pysradb.utils import *\n\n\n@pytest.fixture(scope=\"module\")\ndef invalid_name("
}
]
About this extraction
This page contains the full source code of the saketkc/pysradb GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 94 files (16.8 MB), approximately 4.4M tokens, and a symbol index with 324 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.