Full Code of saketkc/pysradb for AI

develop 801eb8fa1f1a cached
94 files
16.8 MB
4.4M tokens
324 symbols
1 requests
Copy disabled (too large) Download .txt
Showing preview only (17,633K chars total). Download the full file to get everything.
Repository: saketkc/pysradb
Branch: develop
Commit: 801eb8fa1f1a
Files: 94
Total size: 16.8 MB

Directory structure:
gitextract_rf093ld_/

├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   ├── ISSUE_TEMPLATE.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── codeql-analysis.yml
│       ├── publish.yml
│       ├── pull_request.yml
│       └── push.yml
├── .gitignore
├── AUTHORS.md
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── HISTORY.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── docs/
│   ├── Makefile
│   ├── _static/
│   │   ├── copy-button.js
│   │   └── custom.css
│   ├── authors.md
│   ├── case_studies.md
│   ├── cmdline.md
│   ├── commands.rst
│   ├── conf.py
│   ├── contributing.md
│   ├── history.md
│   ├── index.rst
│   ├── installation.md
│   ├── make.bat
│   ├── modules.rst
│   ├── notebooks.rst
│   ├── pysradb.rst
│   ├── python-api-usage.md
│   └── quickstart.md
├── notebooks/
│   ├── 01.Python-API_demo.ipynb
│   ├── 02.Commandline_download.ipynb
│   ├── 03.ParallelDownload.ipynb
│   ├── 04.SRA_to_fastq_conda.ipynb
│   ├── 05.Downloading_subsets_of_a_project.ipynb
│   ├── 06.Multiple_SRPs.ipynb
│   ├── 07.Query_Search.ipynb
│   ├── 08.PMC_DOI_Identifiers.ipynb
│   ├── 09.Metadata_enrichment.ipynb
│   ├── 11.Parse_Bioscience_Search.ipynb
│   └── README.md
├── pyproject.toml
├── pysradb/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py
│   ├── download.py
│   ├── exceptions.py
│   ├── filter_attrs.py
│   ├── geoweb.py
│   ├── metadata_enrichment.py
│   ├── ontology_reference.json
│   ├── search.py
│   ├── sraweb.py
│   ├── taxid2name.py
│   └── utils.py
├── requirements.txt
├── setup.cfg
└── tests/
    ├── conftest.py
    ├── data/
    │   └── test_search/
    │       ├── ena_search_test1.txt
    │       ├── ena_test_verbosity_0.csv
    │       ├── ena_test_verbosity_0.json
    │       ├── ena_test_verbosity_1.csv
    │       ├── ena_test_verbosity_1.json
    │       ├── ena_test_verbosity_2.csv
    │       ├── ena_test_verbosity_2.json
    │       ├── ena_test_verbosity_3.csv
    │       ├── ena_test_verbosity_3.json
    │       ├── geo_search_test1.txt
    │       ├── sra_search_test1.txt
    │       ├── sra_test.xml
    │       ├── sra_test_2_verbosity_0.csv
    │       ├── sra_test_2_verbosity_1.csv
    │       ├── sra_test_2_verbosity_2.csv
    │       ├── sra_test_2_verbosity_3.csv
    │       ├── sra_test_ERS3331676.xml
    │       ├── sra_test_verbosity_0.csv
    │       ├── sra_test_verbosity_1.csv
    │       ├── sra_test_verbosity_2.csv
    │       ├── sra_test_verbosity_3.csv
    │       └── sra_uids.txt
    ├── test_geoweb.py
    ├── test_search.py
    ├── test_sraweb.py
    └── test_utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .coveragerc
================================================
[run]
omit =
    pysradb/filter_attrs.py
    pysradb/geodb.py
    pysradb/sradb.py
    pysradb/taxid2name.py
    pysradb/utils.py



================================================
FILE: .editorconfig
================================================
# http://editorconfig.org

root = true

[*]
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true
charset = utf-8
end_of_line = lf

[*.bat]
indent_style = tab
end_of_line = crlf

[LICENSE]
insert_final_newline = false

[Makefile]
indent_style = tab


================================================
FILE: .gitattributes
================================================
*.rst linguist-documentation
*.html linguist-documentation
*.ipynb linguist-language=python



================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: [saketkc]


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: "[BUG]"
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
`pysradb <command> SRPxxx`


**Desktop (please complete the following information):**
 - OS: [e.g. Ubuntu 20.04]
 - Python version [e.g. 3.8]

**Additional context**
Add any other context about the problem here.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: "[ENH]"
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.


================================================
FILE: .github/ISSUE_TEMPLATE.md
================================================
* pysradb version:
* Python version:
* Operating System:

### Description

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

### What I Did

```
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
```


================================================
FILE: .github/dependabot.yml
================================================
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
  - package-ecosystem: "pip" # See documentation for possible values
    directory: "/" # Location of package manifests
    schedule:
      interval: "daily"


================================================
FILE: .github/workflows/codeql-analysis.yml
================================================
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
  push:
    branches: [ master ]
  pull_request:
    # The branches below must be a subset of the branches above
    branches: [ master ]
  schedule:
    - cron: '35 5 * * 1'

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: [ 'python' ]
        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
        # Learn more:
        # https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed

    steps:
    - name: Checkout repository
      uses: actions/checkout@v2

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v1
      with:
        languages: ${{ matrix.language }}
        # If you wish to specify custom queries, you can do so here or in a config file.
        # By default, queries listed here will override any specified in a config file.
        # Prefix the list here with "+" to use these queries and those in the config file.
        # queries: ./path/to/local/query, your-org/your-repo/queries@main

    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
    # If this step fails, then you should remove it and run the build manually (see below)
    - name: Autobuild
      uses: github/codeql-action/autobuild@v1

    # ℹ️ Command-line programs to run using the OS shell.
    # 📚 https://git.io/JvXDl

    # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
    #    and modify them (or add more) to build your code if your project
    #    uses a compiled language

    #- run: |
    #   make bootstrap
    #   make release

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v1


================================================
FILE: .github/workflows/publish.yml
================================================
name: publish

on:
  release:
    types: [created]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v1
      with:
        python-version: '3.x'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install setuptools wheel twine
    - name: Build and publish
      env:
        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
      run: |
        python setup.py sdist bdist_wheel
        twine upload dist/*


================================================
FILE: .github/workflows/pull_request.yml
================================================
name: pull_request

on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.9, '3.10', '3.11', '3.12', '3.13']

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v1
      with:
        python-version: ${{ matrix.python-version }}

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -U pip
        pip install -r requirements.txt

    - name: Install Ollama
      run: |
        curl -fsSL https://ollama.com/install.sh | sh
        # Start Ollama service in background
        ollama serve &
        # Wait for Ollama to be ready
        sleep 5
        # Pull required models for testing
        ollama pull phi3
        ollama pull meditron
        # Verify installation
        ollama list

    - name: Lint with flake8
      run: |
        pip install -U pytest coverage pytest-cov codecov black flake8
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
        black --check .

    - name: Test with pytest
      continue-on-error: true
      run: |
        pip install --editable ".[enrichment]"
        pip install pytest
        pytest
        make coverage
        codecov

  docs:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.11
      uses: actions/setup-python@v1
      with:
        python-version: '3.11'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install ".[enrichment]"
        pip install sphinx myst-parser sphinxcontrib-gtagjs ipython numpydoc sphinx-tabs furo nbsphinx sphinx-panels
    - name: Install Pandoc
      run: |
        sudo apt-get update
        sudo apt-get install -y pandoc
    - name: Build documentation
      run: |
        make docs


================================================
FILE: .github/workflows/push.yml
================================================
name: push

on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.9, '3.10', '3.11', '3.12', '3.13']

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v1
      with:
        python-version: ${{ matrix.python-version }}

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -U pip
        pip install -r requirements.txt
    - name: Lint with flake8
      run: |
        pip install -U pytest coverage pytest-cov codecov black flake8
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
        black --check .
    - name: Test with pytest
      continue-on-error: true
      run: |
        pip install --editable ".[enrichment]"
        pip install pytest
        pytest
        make coverage
        codecov

  docs:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.11
      uses: actions/setup-python@v1
      with:
        python-version: '3.11'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install ".[enrichment]"
        pip install sphinx myst-parser sphinxcontrib-gtagjs ipython numpydoc sphinx-tabs furo nbsphinx sphinx-panels
    - name: Install Pandoc
      run: |
        sudo apt-get update
        sudo apt-get install -y pandoc
    - name: Build documentation
      run: |
        make docs
    - name: Deploy
      uses: peaceiris/actions-gh-pages@v3
      if: github.ref == 'refs/heads/develop'
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./docs/_build/html/


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# dotenv
.env

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
*.sqlite
*.sqlite.gz

geoweb_downloads/


================================================
FILE: AUTHORS.md
================================================
# Credits

## Contributors

-   [Boshen Yan](https://github.com/bscrow)
-   [Maarten van der Sande](https://github.com/Maarten-vd-Sande)
-   [Dibya Gautam](https://github.com/dibyaaaaax)
-   [Marius van den Beek](https://github.com/mvdbeek)
-   [Devang Thakkar](https://github.com/DevangThakkar)

## Maintainer

-   Saket Choudhary \<<saketkc@gmail.com>\>


================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Choudhary"
  given-names: "Saket"
  orcid: "https://orcid.org/0000-0001-5202-7633"
title: "pysradb"
version: 2.4.1
doi: 10.12688/f1000research.18676.1
date-released: 2025-09-28
url: "https://github.com/saketkc/pysradb"


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
 advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
 address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
 professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at saketkc@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing

Contributions are welcome, and they are greatly appreciated! Every
little bit helps, and credit will always be given.

You can contribute in many ways:

## Types of Contributions

### Report Bugs

Report bugs at <https://github.com/saketkc/pysradb/issues>.

If you are reporting a bug, please include:

-   Your operating system name and version.
-   Any details about your local setup that might be helpful in
    troubleshooting.
-   Detailed steps to reproduce the bug.

### Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with \"bug\"
and \"help wanted\" is open to whoever wants to implement it.

### Implement Features

Look through the GitHub issues for features. Anything tagged with
\"enhancement\" and \"help wanted\" is open to whoever wants to
implement it.

### Write Documentation

pysradb could always use more documentation, whether as part of the
official pysradb docs, in docstrings, or even on the web in blog posts,
articles, and such.

### Submit Feedback

The best way to send feedback is to file an issue at
<https://github.com/saketkc/pysradb/issues>.

If you are proposing a feature:

-   Explain in detail how it would work.
-   Keep the scope as narrow as possible, to make it easier to
    implement.
-   Remember that this is a volunteer-driven project, and that
    contributions are welcome :)

## Get Started!

Ready to contribute? Here\'s how to set up [pysradb]{.title-ref} for
local development.

1.  Fork the [pysradb]{.title-ref} repo on GitHub.

2.  Clone your fork locally:

    ``` shell
    $ git clone git@github.com:your_name_here/pysradb.git
    ```

3.  Install your local copy into a virtualenv. Assuming you have
    virtualenvwrapper installed, this is how you set up your fork for
    local development (If python \--version is less than 3.0, run [\$
    mkvirtualenv pysradb \--python=py3]{.title-ref} instead):

    ``` shell
    $ mkvirtualenv pysradb
    $ cd pysradb/
    $ python setup.py develop
    ```

4.  Create a branch for local development:

    ``` shell
    $ git checkout -b name-of-your-bugfix-or-feature
    ```

    Now you can make your changes locally.

5.  When you\'re done making changes, check that your changes pass
    flake8 and the tests, including testing other Python versions with
    tox:

    ``` shell
    $ flake8 pysradb tests
    $ python setup.py test or py.test
    $ tox
    ```

    To get flake8 and tox, just pip install them into your virtualenv.

6.  Commit your changes and push your branch to GitHub:

    ``` shell
    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    ```

7.  Submit a pull request through the GitHub website.

## Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

1.  The pull request should include tests.
2.  If the pull request adds functionality, the docs should be updated.
    Put your new functionality into a function with a docstring, and add
    the feature to the list in README.rst.
3.  The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and
    for PyPy. Make sure that the tests pass for all supported Python
    versions.

## Tips

To run a subset of tests:

``` shell
$ py.test tests.test_pysradb
```

## Deploying

A reminder for the maintainers on how to deploy. Make sure all your
changes are committed (including an entry in HISTORY.rst). Then run:

``` shell
$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags
```

CI will then deploy to PyPI if tests pass.


================================================
FILE: HISTORY.md
================================================
# History

# 3.0.0 (Unreleased) - BREAKING CHANGES

## Removal of legacy SQLite support

**All local SQLite database support has been removed.** This is a major breaking change and was long overdue.

- **Removed**: `SRAdb`, `GEOdb`, and `BASEdb` classes
- **Removed**: `download_sradb_file()` and `download_geodb_file()` functions
- **Removed**: Files: `sradb.py`, `geodb.py`, `basedb.py`
- **Why**: Legacy local SQLite databases are outdated and rarely used. SRAweb (API-based) provides better, real-time data with no maintenance overhead.


# 2.5.1 (2025-10-29)

- Add prjna support in doi-to-identifiers [#249](https://github.com/saketkc/pysradb/pull/249) 

# 2.5.0 (2025-10-19)

- Add pmid/doi-to-gse/srp conversion [#246](https://github.com/saketkc/pysradb/pull/246).

# 2.4.1 (2025-09-27)

- Add gse-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/244).

# 2.4.0 (2025-09-27)

- Add sra-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/241). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith) for the idea.

# 2.3.0 (2025-08-24)

- Download logic improvements: remoted requests-ftp as requirement
- Fix for handling missing metadata keys [#223](https://github.com/saketkc/pysradb/pull/223). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith)

# 2.2.2 (2024-10-03)

- Fix for handling ENA urls for paired end data

# 2.2.1 (2024-08-21)

- Fix for handling ENA urls
- Migrated to pyproject.toml


# 2.2.0 (2023-09-17)

- Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
- Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
- Documentation fixes

## 2.1.0 (2023-05-16)

-   Fix for [gse-to-srp] returning unrequested GSEs [#186](https://github.com/saketkc/pysradb/issues/190)
-   Fix for [download] using [public_urls]
-   Fix for [gsm-to-srx] returning false positives [#165](https://github.com/saketkc/pysradb/issues/165)
-   Fix for delimiter not being consistent when metadata is printed on
    terminal [#147](https://github.com/saketkc/pysradb/issues/147)
-   ENA search is currently broken because of an API change

## 2.0.2 (2023-04-09)

-   Fix for [gse-to-srp] to handle cases where a project is
    missing but SRXs are returned [#186](https://github.com/saketkc/pysradb/issues/186)
-   Fix gse-to-gsm [#187](https://github.com/saketkc/pysradb/issues/187)

## 2.0.1 (2023-03-18)

-   Fix for [pysradb download] - using [public_url]
-   Fix for SRX -\> SRR and related conversions [#183](https://github.com/saketkc/pysradb/pull/183)

## 2.0.0 (2023-02-23)

-   BREAKING change: Overhaul of how urls and associated metadata are
    returned (not backward compatible); all column names are lower cased
    by default
-   Fix extra space in \"organism_taxid\" column
-   Added support for Experiment attributes [#89](https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532)

## 1.4.2 (06-17-2022)

-   Fix ENA fastq fetching [#163](https://github.com/saketkc/pysradb/issues/163)

## 1.4.1 (06-04-2022)

-   Fix for fetching alternative URLs

## 1.4.0 (06-04-2022)

-   Added ability to fetch alternative URLs (GCP/AWS) for metadata
    [#161](https://github.com/saketkc/pysradb/issues/161)
-   Fix for xmldict 0.13.0 no longer defaulting to OrderedDict [#159](https://github.com/saketkc/pysradb/pull/159)
-   Fix for missing experiment model and description in metadata [#160](https://github.com/saketkc/pysradb/issues/160)

## 1.3.0 (02-18-2022)

-   Add [study_title] to [\--detailed] flag
    ([#152](https://github.com/saketkc/pysradb/issues/152))
-   Fix [KeyError] in [metadata] where some new
    IDs do not have any metadata
    ([#151](https://github.com/saketkc/pysradb/issues/151))

## 1.2.0 (01-10-2022)

-   Do not exit if a qeury returns no hits ([#149](https://github.com/saketkc/pysradb/pull/149))

## 1.1.0 (12-12-2021)

-   Fixed [gsm-to-gse] failure
    ([#128](https://github.com/saketkc/pysradb/pull/128))
-   Fixed case sensitivity bug for ENA search
    ([#144](https://github.com/saketkc/pysradb/pull/144))
-   Fixed publication date bug for search
    ([#146](https://github.com/saketkc/pysradb/pull/146))
-   Added support for downloading data from GEO [pysradb dowload -g
    GSE]
    ([#129](https://github.com/saketkc/pysradb/pull/129))

## 1.0.1 (01-10-2021)

-   Dropped Python 3.6 since pandas 1.2 is not supported

## 1.0.0 (01-09-2021)

-   Retired `metadb` and `SRAdb` based search through CLI - everything
    defaults to `SRAweb`
-   `SRAweb` now supports
    [search](https://saket-choudhary.me/pysradb/quickstart.html#search)
-   [N/A] is now replaced with [pd.NA]
-   Two new fields in \`\--detailed\`: [instrument_model]
    and [instrument_model_desc]
    [#75](https://github.com/saketkc/pysradb/issues/75)
-   Updated documentation

## 0.11.1 (09-18-2020)

-   [library_layout] is now outputted in metadata #56
-   [-detailed] unifies columns for ENA fastq links instead
    of appending \_x/\_y #59
-   bugfix for parsing namespace in xml outputs #65
-   XML errors from NCBI are now handled more gracefully #69
-   Documentation and dependency updates

## 0.11.0 (09-04-2020)

-   [pysradb download] now supports multiple threads for
    paralle downloads
-   [pysradb download] also supports ultra fast downloads of
    FASTQs from ENA using aspera-client

## 0.10.3 (03-26-2020)

-   Added test cases for SRAweb
-   API limit exceeding errors are automagically handled
-   Bug fixes for GSE \<=\> SRR
-   Bug fix for metadata - supports multiple SRPs

Contributors

-   Dibya Gautam
-   Marius van den Beek

## 0.10.2 (02-05-2020)

-   Bug fix: Handle API-rate limit exceeding =\> Retries
-   Enhancement: \'Alternatives\' URLs are now part of
    [\--detailed]

## 0.10.1 (02-04-2020)

-   Bug fix: Handle Python3.6 for capture_output in subprocess.run

## 0.10.0 (01-31-2020)

-   All the subcommands (srx-to-srr, srx-to-srs) will now print
    additional columns where the first two columns represent the
    relevant conversion
-   Fixed a bug where for fetching entries with single efetch record

## 0.9.9 (01-15-2020)

-   Major fix: some SRRs would go missing as the experiment dict was
    being created only once per SRR (See #15)
-   Features: More detailed metadata by default in the SRAweb mode
-   See notebook: <https://colab.research.google.com/drive/1C60V->

## 0.9.7 (01-20-2020)

-   Feature: instrument, run size and total spots are now printed in the
    metadata by default (SRAweb mode only)
-   Issue: Fixed an issue with srapath failing on SRP. srapath is now
    run on individual SRRs.

## 0.9.6 (07-20-2019)

-   Introduced [SRAweb] to perform queries over the web if
    the SQLite is missing or does not contain the relevant record.

## 0.9.0 (02-27-2019)

### Others

-   This release completely changes the command line interface replacing
    click with argparse ([#3](https://github.com/saketkc/pysradb/pull/3))
-   Removed Python 2 comptaible stale code

## 0.8.0 (02-26-2019)

### New methods/functionality

-   \`srr-to-gsm\`: convert SRR to GSM
-   SRAmetadb.sqlite.gz file is deleted by default after extraction
-   When SRAmetadb is not found a confirmation is seeked before
    downloading
-   Confirmation option before SRA downloads

### Bugfix

-   download() works with wget

### Others

-   [\--out_dir] is now [out-dir]

## 0.7.1 (02-18-2019)

Important: Python2 is no longer supported. Please consider moving to
Python3.

### Bugfix

-   Included docs in the index whihch were missed out in the previous
    release

## 0.7.0 (02-08-2019)

### New methods/functionality

-   \`gsm-to-srr\`: convert GSM to SRR
-   \`gsm-to-srx\`: convert GSM to SRX
-   \`gsm-to-gse\`: convert GSM to GSE

### Renamed methods

The following commad line options have been renamed and the changes are
not compatible with 0.6.0 release:

-   [sra-metadata] -\> [metadata].
-   [sra-search] -\> [search].
-   [srametadb] -\> [metadb].

## 0.6.0 (12-25-2018)

### Bugfix

-   Fixed bugs introduced in 0.5.0 with API changes where multiple
    redundant columns were output in [sra-metadata]

### New methods/functionality

-   [download] now allows piped inputs

## 0.5.0 (12-24-2018)

### New methods/functionality

-   Support for filtering by SRX Id for SRA downloads.
-   \`srr_to_srx\`: Convert SRR to SRX/SRP
-   \`srp_to_srx\`: Convert SRP to SRX
-   Stripped down [sra-metadata] to give minimal information
-   Added [\--assay], [\--desc],
    [\--detailed] flag for [sra-metadata]
-   Improved table printing on terminal

## 0.4.2 (12-16-2018)

### Bugfix

-   Fixed unicode error in tests for Python2

## 0.4.0 (12-12-2018)

### New methods/functionality

-   Added a new [BASEdb] class to handle common database
    connections
-   Initial support for GEOmetadb through GEOdb class
-   Initial support or a command line interface:
    -   download Download SRA project (SRPnnnn)
    -   gse-metadata Fetch metadata for GEO ID (GSEnnnn)
    -   gse-to-gsm Get GSM(s) for GSE
    -   gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
    -   sra-metadata Fetch metadata for SRA project (SRPnnnn)
-   Added three separate notebooks for SRAdb, GEOdb, CLI usage

## 0.3.0 (12-05-2018)

### New methods/functionality

-   [sample_attribute] and
    [experiment_attribute] are now included by default in
    the df returned by [sra_metadata()]
-   [expand_sample_attribute_columns: expand metadata dataframe based on
    attributes in \`sample_attribute] column
-   New methods to guess cell/tissue/strain:
    [guess_cell_type()]/[guess_tissue_type()]/[guess_strain_type()]
-   Improved README and usage instructions

## 0.2.2 (12-03-2018)

### New methods/functionality

-   [search_sra()] allows full text search on SRA metadata.

## 0.2.0 (12-03-2018)

### Renamed methods

The following methods have been renamed and the changes are not
compatible with 0.1.0 release:

-   [get_query()] -\> [query()].
-   [sra_convert()] -\> [sra_metadata()].
-   [get_table_counts()] -\> [all_row_counts()].

### New methods/functionality

-   [download_sradb_file()] makes fetching [SRAmetadb.sqlite] file easy; wget is no longer required.
-   [ftp] protocol is now supported besides [fsp] and hence [aspera-client] is now optional. We however, strongly recommend [aspera-client] for faster downloads.

### Bug fixes

-   Silenced [SettingWithCopyWarning] by excplicitly doing
    operations on a copy of the dataframe instead of the original.

Besides these, all methods now follow a [numpydoc]
compatible documentation.

## 0.1.0 (12-01-2018)

-   First release on PyPI.


================================================
FILE: LICENSE
================================================
BSD 3-Clause License

Copyright (c) 2020-2023, Saket Choudhary
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


================================================
FILE: MANIFEST.in
================================================
include AUTHORS.md
include CONTRIBUTING.md
include HISTORY.md
include LICENSE
include README.md
include requirements.txt

recursive-include tests *
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
recursive-exclude * *.sqlite
recursive-exclude * *.sqlite.gz

recursive-include docs *.md conf.py Makefile make.bat *.jpg *.png *.gif *.rst


================================================
FILE: Makefile
================================================
.PHONY: clean clean-test clean-pyc clean-build docs help
.DEFAULT_GOAL := help

define BROWSER_PYSCRIPT
import os, webbrowser, sys

try:
	from urllib import pathname2url
except:
	from urllib.request import pathname2url

webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1])))
endef
export BROWSER_PYSCRIPT

define PRINT_HELP_PYSCRIPT
import re, sys

for line in sys.stdin:
	match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line)
	if match:
		target, help = match.groups()
		print("%-20s %s" % (target, help))
endef
export PRINT_HELP_PYSCRIPT

BROWSER := python -c "$$BROWSER_PYSCRIPT"

help:
	@python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)

clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts

clean-build: ## remove build artifacts
	rm -fr build/
	rm -fr dist/
	rm -fr .eggs/
	find . -name '*.egg-info' -exec rm -fr {} +
	find . -name '*.egg' -exec rm -f {} +

clean-pyc: ## remove Python file artifacts
	find . -name '*.pyc' -exec rm -f {} +
	find . -name '*.pyo' -exec rm -f {} +
	find . -name '*~' -exec rm -f {} +
	find . -name '__pycache__' -exec rm -fr {} +

clean-test: ## remove test and coverage artifacts
	rm -fr .tox/
	rm -f .coverage
	rm -fr htmlcov/
	rm -fr .pytest_cache

lint: ## check style with flake8
	flake8 pysradb tests

test: ## run tests quickly with the default Python
	pytest -s -v tests

test-all: ## run tests on every Python version with tox
	tox

coverage: ## check code coverage quickly with the default Python
	coverage run --source pysradb -m pytest
	coverage report -m
	coverage html

docs: ## generate Sphinx HTML documentation, including API docs
	rm -f docs/pysradb.rst
	rm -f docs/modules.rst
	sphinx-apidoc -o docs/ pysradb
	$(MAKE) -C docs clean
	$(MAKE) -C docs html

servedocs: docs ## compile the docs watching for changes
	#watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D .
	watchmedo shell-command -p '*.md|*.rst' -c '$(MAKE) -C docs html' -R -D .

release: dist ## package and upload a release
	python -m build
	twine upload dist/*

dist: clean ## builds source and wheel package
	python -m build
	ls -l dist

install: clean ## install the package to the active Python's site-packages
	pip install -e .


================================================
FILE: README.md
================================================
# A Python package for retrieving metadata from SRA/ENA/GEO

[![image](https://img.shields.io/pypi/v/pysradb.svg?style=flat-square)](https://pypi.python.org/pypi/pysradb)
[![image](https://anaconda.org/bioconda/pysradb/badges/version.svg)](https://anaconda.org/bioconda/pysradb/badges/version.svg)
[![image](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/pysradb/README.html)
[![image](https://static.pepy.tech/personalized-badge/pysradb?period=month&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads/month)](https://pepy.tech/project/pysradb)
[![image](https://zenodo.org/badge/159590788.svg)](https://zenodo.org/badge/latestdoi/159590788)
[![image](https://github.com/saketkc/pysradb/workflows/push/badge.svg)](https://github.com/saketkc/pysradb/actions)

## Documentation

<https://saketkc.github.io/pysradb>

## CLI Usage

`pysradb` supports command line usage. See
[CLI](https://saket-choudhary.me/pysradb/cmdline.html) instructions or
[quickstart
guide](https://www.saket-choudhary.me/pysradb/quickstart.html).

    $ pysradb
    usage: pysradb [-h] [--version] [--citation]
                   {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
                   ...

    pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
    version: 3.0.0
    Citation: 10.12688/f1000research.18676.1

    options:
      -h, --help            show this help message and exit
      --version             show program's version number and exit
      --citation            how to cite

    subcommands:
      {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
        metadata            Fetch metadata for SRA project (SRPnnnn)
        download            Download SRA project (SRPnnnn)
        search              Search SRA/ENA for matching text
        gse-to-gsm          Get GSM for a GSE
        gse-to-srp          Get SRP for a GSE
        gsm-to-gse          Get GSE for a GSM
        gsm-to-srp          Get SRP for a GSM
        gsm-to-srr          Get SRR for a GSM
        gsm-to-srs          Get SRS for a GSM
        gsm-to-srx          Get SRX for a GSM
        srp-to-gse          Get GSE for a SRP
        srp-to-srr          Get SRR for a SRP
        srp-to-srs          Get SRS for a SRP
        srp-to-srx          Get SRX for a SRP
        srr-to-gsm          Get GSM for a SRR
        srr-to-srp          Get SRP for a SRR
        srr-to-srs          Get SRS for a SRR
        srr-to-srx          Get SRX for a SRR
        srs-to-gsm          Get GSM for a SRS
        srs-to-srx          Get SRX for a SRS
        srx-to-srp          Get SRP for a SRX
        srx-to-srr          Get SRR for a SRX
        srx-to-srs          Get SRS for a SRX
        geo-matrix          Download and parse GEO Matrix files
        srp-to-pmid         Get PMIDs for SRP accessions
        gse-to-pmid         Get PMIDs for GSE accessions
        pmid-to-gse         Get GSE accessions from PMIDs
        pmid-to-srp         Get SRP accessions from PMIDs
        pmc-to-identifiers  Extract database identifiers from PMC articles
        pmid-to-identifiers
                            Extract database identifiers from PubMed articles
        doi-to-gse          Get GSE accessions from DOIs
        doi-to-srp          Get SRP accessions from DOIs
        doi-to-identifiers  Extract database identifiers from articles via DOI

## Quickstart

A Google Colaboratory version of most used commands are available in
this [Colab
Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR)
. Note that this requires only an active internet connection (no
additional downloads are made).

The following notebooks document all the possible features of
\`pysradb\`:

1.  [Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
2.  [Downloading datasets from SRA - command
    line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
3.  [Parallely download multiple datasets - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
4.  [Converting SRA-to-fastq - command line (requires
    conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
5.  [Downloading subsets of a project - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
6.  [Metadata for multiple
    SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Multiple_SRPs.ipynb)
7.  [Searching
    SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Query_Search.ipynb)
8. [Extracting identifiers from PMC/DOI](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.PMC_DOI_Identifiers.ipynb)
9. [Metadata Enrichment with LLMs](https://colab.research.google.com/github/saketkc/pysradb/blob/develop/notebooks/09.Metadata_enrichment.ipynb)

## Installation

To install stable version using \`pip\`:

```bash
pip install pysradb
```

Alternatively, if you use conda:

```bash
conda install -c bioconda pysradb
```

This step will install all the dependencies. If you have an existing
environment with a lot of pre-installed packages, conda might be
[slow](https://github.com/bioconda/bioconda-recipes/issues/13774).
Please consider creating a new enviroment for `pysradb`:

```bash
conda create -c bioconda -n pysradb PYTHON=3.13 pysradb
```

### Dependencies

    pandas
    requests
    tqdm
    xmltodict

### Installing pysradb in development mode

    git clone https://github.com/saketkc/pysradb.git
    cd pysradb && pip install -r requirements.txt
    pip install -e .

## Using pysradb

### Obtaining SRA metadata

    $ pysradb metadata SRP000941 | head

    study_accession experiment_accession experiment_title                                                                                                                 experiment_desc                                                                                                                  organism_taxid  organism_name library_strategy library_source  library_selection sample_accession sample_title instrument                    total_spots total_size    run_accession run_total_spots run_total_bases
    SRP000941       SRX056722                                                                         Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells                                                               Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS184466                              Illumina HiSeq 2000    26900401     531654480   SRR179707     26900401         807012030
    SRP000941       SRX027889                                                                            Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells                                                                  Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS116481                      Illumina Genome Analyzer II    37528590     779578968   SRR067978     37528590        1351029240
    SRP000941       SRX027888                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116483                      Illumina Genome Analyzer II    13603127    3232309537   SRR067977     13603127         489712572
    SRP000941       SRX027887                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116562                      Illumina Genome Analyzer II    22430523     506327844   SRR067976     22430523         807498828
    SRP000941       SRX027886                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116560                      Illumina Genome Analyzer II    15342951     301720436   SRR067975     15342951         552346236
    SRP000941       SRX027885                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116482                      Illumina Genome Analyzer II    39725232     851429082   SRR067974     39725232        1430108352
    SRP000941       SRX027884                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116481                      Illumina Genome Analyzer II    32633277     544478483   SRR067973     32633277        1174797972
    SRP000941       SRX027883                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS004118                      Illumina Genome Analyzer II    22150965    3262293717   SRR067972      9357767         336879612
    SRP000941       SRX027883                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS004118                      Illumina Genome Analyzer II    22150965    3262293717   SRR067971     12793198         460555128

### Obtaining detailed SRA metadata

    $ pysradb metadata SRP075720 --detailed | head

    study_accession experiment_accession experiment_title                                  experiment_desc                                   organism_taxid  organism_name library_strategy library_source  library_selection sample_accession sample_title instrument           total_spots total_size run_accession run_total_spots run_total_bases
    SRP075720       SRX1800476            GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq   GSM2177569: Kcng4_2la_H9; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467643                    Illumina HiSeq 2500  2547148      97658407  SRR3587912    2547148         127357400
    SRP075720       SRX1800475            GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq   GSM2177568: Kcng4_2la_H8; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467642                    Illumina HiSeq 2500  2676053     101904264  SRR3587911    2676053         133802650
    SRP075720       SRX1800474            GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq   GSM2177567: Kcng4_2la_H7; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467641                    Illumina HiSeq 2500  1603567      61729014  SRR3587910    1603567          80178350
    SRP075720       SRX1800473            GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq   GSM2177566: Kcng4_2la_H6; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467640                    Illumina HiSeq 2500  2498920      94977329  SRR3587909    2498920         124946000
    SRP075720       SRX1800472            GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq   GSM2177565: Kcng4_2la_H5; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467639                    Illumina HiSeq 2500  2226670      83473957  SRR3587908    2226670         111333500
    SRP075720       SRX1800471            GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq   GSM2177564: Kcng4_2la_H4; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467638                    Illumina HiSeq 2500  2269546      87486278  SRR3587907    2269546         113477300
    SRP075720       SRX1800470            GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq   GSM2177563: Kcng4_2la_H3; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467636                    Illumina HiSeq 2500  2333284      88669838  SRR3587906    2333284         116664200
    SRP075720       SRX1800469            GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq   GSM2177562: Kcng4_2la_H2; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467637                    Illumina HiSeq 2500  2071159      79689296  SRR3587905    2071159         103557950
    SRP075720       SRX1800468            GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq   GSM2177561: Kcng4_2la_H1; Mus musculus; RNA-Seq  10090           Mus musculus  RNA-Seq          TRANSCRIPTOMIC  cDNA              SRS1467635                    Illumina HiSeq 2500  2321657      89307894  SRR3587904    2321657         116082850

### Enriching metadata via CLI

Enrich metadata with standardized biological attributes using biomedical-specialized LLMs through the command line:

```bash
# Basic enrichment with default backend (Meditron)
$ pysradb metadata GSE286254 --detailed --enrich

# Using OpenBioLLM-8B (larger, trained on 500k+ biomedical entries)
$ pysradb metadata GSE286254 --detailed --enrich --enrich-backend ollama/openbiollm-8b
```

Available biomedical backends:
- `ollama/meditron` (default, 7B - optimized for medical text)
- `ollama/openbiollm-8b` (8B - trained on 500k+ biomedical entries, superior biomedical performance)

This returns the original metadata plus 9 enriched columns:
- `guessed_organ`
- `guessed_tissue`
- `guessed_anatomical_system`
- `guessed_cell_type`
- `guessed_disease`
- `guessed_sex`
- `guessed_development_stage`
- `guessed_assay`
- `guessed_organism`

For more details on enrichment features, prerequisites, and Python API usage, see the [Enriching metadata](#enriching-metadata) section below.

### Converting SRP to GSE

    $ pysradb srp-to-gse SRP075720

    study_accession study_alias
    SRP075720       GSE81903

### Converting GSM to SRP

    $ pysradb gsm-to-srp GSM2177186

    experiment_alias study_accession
    GSM2177186       SRP075720

### Converting GSM to GSE

    $ pysradb gsm-to-gse GSM2177186

    experiment_alias study_alias
    GSM2177186       GSE81903

### Converting GSM to SRX

    $ pysradb gsm-to-srx GSM2177186

    experiment_alias experiment_accession
    GSM2177186       SRX1800089

### Converting GSM to SRR

    $ pysradb gsm-to-srr GSM2177186

    experiment_alias run_accession
    GSM2177186       SRR3587529

### Converting SRP to PMID

    $ pysradb srp-to-pmid SRP045778

    srp_accession bioproject pmid
    SRP045778     PRJNA257197 27373336

### Converting GSE to PMID

    $ pysradb gse-to-pmid GSE253406

    gse_accession pmid
    GSE253406     39528918

### Extracting identifiers from PMC/DOI

Extract database identifiers (GSE, PRJNA, SRP, etc.) from PubMed Central articles or DOIs. This feature automatically converts between GSE and SRP identifiers even when papers only mention one type!

#### Get all identifiers from a PMID

    $ pysradb pmid-to-identifiers 39528918

    pmid      pmc_id       gse_ids     prjna_ids    srp_ids
    39528918  PMC10802650  GSE253406   PRJNA1058002 SRP484103

#### Get only GSE or SRP from PMID

    $ pysradb pmid-to-gse 39528918

    pmid      pmc_id       gse_ids
    39528918  PMC10802650  GSE253406

    $ pysradb pmid-to-srp 39528918

    pmid      pmc_id       srp_ids
    39528918  PMC10802650  SRP484103


#### Extract from DOI

    $ pysradb doi-to-identifiers 10.12688/f1000research.18676.1

    doi                                 pmid      pmc_id      gse_ids  srp_ids
    10.12688/f1000research.18676.1      30873266  PMC6411813  GSE...   SRP...

#### Extract from PMC ID

    $ pysradb pmc-to-identifiers PMC10802650

    pmc_id       gse_ids     prjna_ids    srp_ids
    PMC10802650  GSE253406   PRJNA1058002 SRP484103


### Enriching metadata

Extract standardized biological metadata from SRA/GEO datasets using LLMs.

#### Quickstart

```python
from pysradb import SRAweb

client = SRAweb()

df = client.metadata("GSE286254", detailed=True, enrich=True)

# Returns original + 9 enriched columns (might not always be complete):
# guessed_organ, guessed_tissue, guessed_anatomical_system,
# guessed_cell_type, guessed_disease, guessed_sex,
# guessed_development_stage, guessed_assay, guessed_organism
```


#### Prerequisites

Install Ollama: https://ollama.ai

```bash
# Default backend (recommended)
ollama pull meditron

# Or use OpenBioLLM-8B for better biomedical performance
ollama pull openbiollm-8b
```

#### Advanced Usage

```python
# Use OpenBioLLM-8B backend (trained on 500k+ biomedical entries)
client = SRAweb()
df = client.metadata("GSE286254", detailed=True, enrich=True,
                enrich_backend="ollama/openbiollm-8b")

# Manual enrichment with custom settings
from pysradb.metadata_enrichment import create_metadata_extractor, load_ontology_reference

# LLM-based extraction with default backend (meditron)
extractor_llm = create_metadata_extractor(method="llm")
df_enriched = extractor_llm.enrich_dataframe(df, prefix="guessed_")

# LLM-based extraction with specific biomedical backend
extractor_bio = create_metadata_extractor(method="llm", backend="ollama/openbiollm-8b")
df_enriched = extractor_bio.enrich_dataframe(df, prefix="guessed_")

# Embedding-based extraction (faster, offline)
ontology_ref = load_ontology_reference()
extractor_emb = create_metadata_extractor(
    method="embedding",
    model="FremyCompany/BioLORD-2023",
    reference_categories=ontology_ref
)
df_enriched = extractor_emb.enrich_dataframe(df, prefix="guessed_")
```

See [Notebook 09](notebooks/09.Metadata_Enrichment_with_LLMs.ipynb) for detailed examples.


### Downloading supplementary files from GEO

    $ pysradb download -g GSE161707

### Downloading an entire SRA/ENA project (multithreaded)

`pysradb` makes it super easy to download datasets from SRA in parallel:
Using 8 threads to download:

    $ pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852

Downloads are organized by `SRP/SRX/SRR` mimicking the hierarchy of SRA
projects.

## Publication

> [pysradb: A Python package to query next-generation sequencing
> metadata and data from NCBI Sequence Read
> Archive](https://f1000research.com/articles/8-532/v1)
>
> Presentation slides from BOSC (ISMB-ECCB) 2019:
> <https://f1000research.com/slides/8-1183>

## Citation

Choudhary, Saket. \"pysradb: A Python Package to Query next-Generation
Sequencing Metadata and Data from NCBI Sequence Read Archive.\"
F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532
(<https://f1000research.com/articles/8-532/v1>)

    @article{Choudhary2019,
    doi = {10.12688/f1000research.18676.1},
    url = {https://doi.org/10.12688/f1000research.18676.1},
    year = {2019},
    month = apr,
    publisher = {F1000 (Faculty of 1000 Ltd)},
    volume = {8},
    pages = {532},
    author = {Saket Choudhary},
    title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive},
    journal = {F1000Research}
    }

Zenodo archive: <https://zenodo.org/badge/latestdoi/159590788>

Zenodo DOI: 10.5281/zenodo.2306881

## Questions?

Open an [issue](https://github.com/saketkc/pysradb/issues).


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = python -msphinx
SPHINXPROJ    = pysradb
SOURCEDIR     = .
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/_static/copy-button.js
================================================
// Add copy button to code blocks
document.addEventListener('DOMContentLoaded', function() {
    // SVG icon for clipboard
    const clipboardIcon = `<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M16 4h2a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h2"></path><rect x="8" y="2" width="8" height="4" rx="1" ry="1"></rect></svg>`;

    // SVG icon for checkmark
    const checkmarkIcon = `<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3" stroke-linecap="round" stroke-linejoin="round"><polyline points="20 6 9 17 4 12"></polyline></svg>`;

    // Find all code input blocks (from notebooks and regular code blocks)
    // Strategy:
    // 1. For notebooks: explicitly target pre tags inside input_area
    // 2. For regular docs: target all highlight divs, then filter with skip conditions
    let codeBlocks = document.querySelectorAll('div.input_area > div.highlight > pre');

    // Also get regular documentation code blocks
    const docBlocks = document.querySelectorAll('div.highlight > pre');

    // Combine and deduplicate
    codeBlocks = Array.from(codeBlocks).concat(
        Array.from(docBlocks).filter(block => {
            // Skip if already in notebook input_area
            if (block.closest('div.input_area')) return false;
            // Skip if in prompt or output
            if (block.closest('.prompt')) return false;
            if (block.closest('.nboutput')) return false;
            if (block.closest('.output_area')) return false;
            return true;
        })
    );

    codeBlocks.forEach(function(codeBlock) {
        // Don't add button if already present
        if (codeBlock.querySelector('.copy-button') || codeBlock.parentElement.querySelector('.copy-button')) {
            return;
        }

        // Create copy button
        const button = document.createElement('button');
        button.className = 'copy-button';
        button.innerHTML = clipboardIcon;
        button.title = 'Copy code to clipboard';

        // Style the button
        button.style.cssText = `
            position: absolute;
            top: 0.5rem;
            right: 0.5rem;
            padding: 0.4rem;
            background-color: rgba(0, 0, 0, 0.3);
            color: white;
            border: 1px solid rgba(255, 255, 255, 0.3);
            border-radius: 0.25rem;
            cursor: pointer;
            display: flex;
            align-items: center;
            justify-content: center;
            z-index: 1;
            transition: all 0.2s ease;
            width: 28px;
            height: 28px;
            padding: 0;
        `;

        // Add hover effect
        button.onmouseover = function() {
            this.style.backgroundColor = 'rgba(0, 0, 0, 0.5)';
        };
        button.onmouseout = function() {
            this.style.backgroundColor = 'rgba(0, 0, 0, 0.3)';
        };

        // Make pre block relative positioned
        codeBlock.style.position = 'relative';

        // Add click event
        button.addEventListener('click', function() {
            const code = codeBlock.querySelector('code');
            const text = code ? code.textContent : codeBlock.textContent;

            // Copy to clipboard
            navigator.clipboard.writeText(text).then(function() {
                // Change button icon and color temporarily
                const originalHTML = button.innerHTML;
                button.innerHTML = checkmarkIcon;
                button.style.backgroundColor = 'rgba(34, 197, 94, 0.7)';

                setTimeout(function() {
                    button.innerHTML = originalHTML;
                    button.style.backgroundColor = 'rgba(0, 0, 0, 0.3)';
                }, 2000);
            }).catch(function(err) {
                console.error('Failed to copy:', err);
            });
        });

        // Append button to code block
        codeBlock.appendChild(button);
    });
});


================================================
FILE: docs/_static/custom.css
================================================
/* Override Pygments code block background color for light mode */
.highlight {
  background: #f5f5f5 !important;
}

/* Ensure code block background uses our color */
.highlight pre {
  background: #f5f5f5 !important;
}

/* Override inline code highlighting */
.highlighttable {
  background: #f5f5f5 !important;
}

.highlighttable td.linenos {
  background: #f5f5f5 !important;
}

/* Dark mode overrides */
[data-theme="dark"] .highlight {
  background: #1e293b !important;
}

[data-theme="dark"] .highlight pre {
  background: #1e293b !important;
}

[data-theme="dark"] .highlighttable {
  background: #1e293b !important;
}

[data-theme="dark"] .highlighttable td.linenos {
  background: #1e293b !important;
}


================================================
FILE: docs/authors.md
================================================
# Credits

## Contributors

-   [Boshen Yan](https://github.com/bscrow)
-   [Maarten van der Sande](https://github.com/Maarten-vd-Sande)
-   [Dibya Gautam](https://github.com/dibyaaaaax)
-   [Marius van den Beek](https://github.com/mvdbeek)
-   [Devang Thakkar](https://github.com/DevangThakkar)

## Maintainer

-   Saket Choudhary \<<saketkc@gmail.com>\>


================================================
FILE: docs/case_studies.md
================================================
# Case Studies 

## Case Study 1

Consider a scenario where somone is interested in searching for
single-cell RNA-seq datasets. In particular, the interest is in studying
retina:

    $ pysradb search --query "single-cell rna-seq retina"

     study_accession experiment_accession    experiment_title    sample_taxon_id sample_scientific_name  experiment_library_strategy experiment_library_source   experiment_library_selection    sample_accession    sample_alias    experiment_instrument_model pool_member_spots   run_1_size  run_1_accession run_1_total_spots   run_1_total_bases
     SRP299803   SRX9756769  GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq    10090   Mus musculus    ATAC-seq    GENOMIC other   SRS7946094  GSM4995565  Illumina NovaSeq 6000   55435867    2637580797  SRR13329759 55435867    6874047508
     SRP299803   SRX9756768  GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq   10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7946093  GSM4995564  Illumina NovaSeq 6000   96123725    4107807391  SRR13329758 96123725    12688331700
     SRP299803   SRX9756767  GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq   10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7946092  GSM4995563  Illumina NovaSeq 6000   94345783    4056010488  SRR13329757 94345783    12453643356
     SRP299803   SRX9756766  GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7946091  GSM4995562  Illumina NovaSeq 6000   99487074    4240172698  SRR13329756 99487074    13132293768
     SRP299803   SRX9756765  GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7946090  GSM4995561  Illumina NovaSeq 6000   88048461    3817540828  SRR13329755 88048461    11622396852
     SRP257758   SRX9537754  GSM4916438: Pou4f2-tdTomato/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq    10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7743995  GSM4916438  Illumina HiSeq 2500 364683840   8246658699  SRR13091939 364683840   32456861760
     SRP257758   SRX9537753  GSM4916437: Atoh7-zsGreen/lacZ E17.5 scRNA-seq; Mus musculus; RNA-Seq   10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7743994  GSM4916437  Illumina HiSeq 2500 530456067   11895864680 SRR13091938 530456067   47210589963
     SRP257758   SRX9537752  GSM4916436: Atoh7-zsGreen/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq  10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7743993  GSM4916436  Illumina HiSeq 2500 389849416   8671923722  SRR13091937 389849416   34696598024
     SRP257758   SRX9537751  GSM4916435: Atoh7-zsGreen/lacZ E14.5 scRNA-seq; Mus musculus; RNA-Seq   10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7743992  GSM4916435  Illumina HiSeq 2500 328878355   7875737709  SRR13091936 328878355   29270173595
     SRP257758   SRX9537750  GSM4916434: Atoh7-zsGreen/+ E14.5 scRNA-seq; Mus musculus; RNA-Seq  10090   Mus musculus    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS7743991  GSM4916434  Illumina HiSeq 2500 522040155   12760941656 SRR13091935 522040155   46461573795
     ERP118072   ERX3614517  NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  Oligo-dT    ERS3920269  SAMEA6120013    NextSeq 500 5818488 43355751    ERR3619129  1457318 109897743
     ERP118072   ERX3614516  NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  Oligo-dT    ERS3920268  SAMEA6120012    NextSeq 500 5422441 40645479    ERR3619125  1359663 102468758
     SRP288715   SRX9369597  RPE1_SS119_p10  9606    Homo sapiens    OTHER   GENOMIC other   SRS7591452  RPE1_SS119_p10.bam  Illumina HiSeq 2000 5062938 88426773    SRR12904705 5062938 202517520
     SRP288715   SRX9369596  RPE1_SS119_p0   9606    Homo sapiens    OTHER   GENOMIC other   SRS7591451  RPE1_SS119_p0.bam   Illumina HiSeq 2000 978835  19219630    SRR12904706 978835  39153400
     SRP288715   SRX9369595  RPE1_SS111_p10  9606    Homo sapiens    OTHER   GENOMIC other   SRS7591450  RPE1_SS111_p10.bam  Illumina HiSeq 2000 6205827 108129733   SRR12904707 6205827 248233080
     SRP288715   SRX9369594  RPE1_SS111_p0   9606    Homo sapiens    OTHER   GENOMIC other   SRS7591449  RPE1_SS111_p0.bam   Illumina HiSeq 2000 928703  18488436    SRR12904708 928703  37148120
     SRP288715   SRX9369593  RPE1_SS51_p10   9606    Homo sapiens    OTHER   GENOMIC other   SRS7591448  RPE1_SS51_p10.bam   Illumina HiSeq 2000 6088168 106065537   SRR12904709 6088168 243526720
     SRP288715   SRX9369592  RPE1_SS51_p0    9606    Homo sapiens    OTHER   GENOMIC other   SRS7591447  RPE1_SS51_p0.bam    Illumina HiSeq 2000 1624227 30610200    SRR12904710 1624227 64969080
     SRP288715   SRX9369591  RPE1_SS48_p10   9606    Homo sapiens    OTHER   GENOMIC other   SRS7591446  RPE1_SS48_p10.bam   Illumina HiSeq 2000 8117881 139408135   SRR12904711 8117881 324715240
     SRP288715   SRX9369590  RPE1_SS48_p0    9606    Homo sapiens    OTHER   GENOMIC other   SRS7591445  RPE1_SS48_p0.bam    Illumina HiSeq 2000 776140  15821200    SRR12904712 776140  31045600

By default search returns first 20 hits. `SRP299803` seems like a
project of interest. However the information outputted by the `search`
command is pretty limited. We want to look up more detailed information
about this project:

    $ pysradb metadata SRP299803 | head
     study_accession experiment_accession    experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_accession   run_total_spots run_total_bases
     SRP299803   SRX9756769  GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq    GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq    10090   Mus musculus        ATAC-seq    GENOMIC other   PAIRED  SRS7946094      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    55435867    2637580797  SRR13329759 55435867    6874047508
     SRP299803   SRX9756768  GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq   GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq   10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946093      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    96123725    4107807391  SRR13329758 96123725    12688331700
     SRP299803   SRX9756767  GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq   GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq   10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946092      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    94345783    4056010488  SRR13329757 94345783    12453643356
     SRP299803   SRX9756766  GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946091      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    99487074    4240172698  SRR13329756 99487074    13132293768
     SRP299803   SRX9756765  GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946090      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    88048461    3817540828  SRR13329755 88048461    11622396852

It is also possible to get more detailed information using the
`--detailed` flag:

    $ pysradb metadata SRP075720 --detailed

     run_accession   study_accession experiment_accession    experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_total_spots run_total_bases run_alias   sra_url experiment_alias    source_name strain background   genotype    tissue/cell type    molecule subtype    ena_fastq_http  ena_fastq_http_1    ena_fastq_http_2    ena_fastq_ftp   ena_fastq_ftp_1 ena_fastq_ftp_2
     SRR13329759 SRP299803   SRX9756769  GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq    GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq    10090   Mus musculus        ATAC-seq    GENOMIC other   PAIRED  SRS7946094      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    55435867    2637580797  55435867    6874047508  GSM4995565_r1   https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329759   GSM4995565  wild type_retina    C57BL/6 wild type   retina          http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz       era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz    era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz
     SRR13329758 SRP299803   SRX9756768  GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq   GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq   10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946093      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    96123725    4107807391  96123725    12688331700 GSM4995564_r1   https://sra-download.ncbi.nlm.nih.gov/traces/sra70/SRR/013017/SRR13329758   GSM4995564  Vsx2SE Δ/Δ_retina   C57BL/6 Vsx2SE {delta}/{delta}  retina  3' RNA      http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz       era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz    era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz
     SRR13329757 SRP299803   SRX9756767  GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq   GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq   10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946092      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    94345783    4056010488  94345783    12453643356 GSM4995563_r1   https://sra-download.ncbi.nlm.nih.gov/traces/sra79/SRR/013017/SRR13329757   GSM4995563  Vsx2SE Δ/Δ_retina   C57BL/6 Vsx2SE {delta}/{delta}  retina  3' RNA      http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz       era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz    era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz
     SRR13329756 SRP299803   SRX9756766  GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946091      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    99487074    4240172698  99487074    13132293768 GSM4995562_r1   https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329756   GSM4995562  wild type_retina    C57BL/6 wild type   retina  3' RNA      http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz       era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz    era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz
     SRR13329755 SRP299803   SRX9756765  GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090   Mus musculus        RNA-Seq TRANSCRIPTOMIC  cDNA    PAIRED  SRS7946090      Illumina NovaSeq 6000   Illumina NovaSeq 6000   ILLUMINA    88048461    3817540828  88048461    11622396852 GSM4995561_r1   https://sra-download.ncbi.nlm.nih.gov/traces/sra72/SRR/013017/SRR13329755   GSM4995561  wild type_retina    C57BL/6 wild type   retina  3' RNA      http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz       era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz    era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz

Having made sure this dataset is indeed of interest, we want to save
some work and see if the processed dataset has been made available on
GEO by the authors:

    $ pysradb srp-to-gse SRP299803

    study_accession  study_alias
    SRP299803        GSE164044

So indeed a GEO project exists for this SRA dataset.

Notice, that the GEO information was also visible in the
`metadata --detailed` operation. Assume we were in posession of the GSM
id of one of the experiments to start off with, say `GSE4995565`.
Starting from this GSM id, we want to get the following information:

-   SRP id of the project
-   GSE id of the project
-   SRX id of the experiment
-   SRR id(s) corresponding to the experiment

Get SRP id:

    $ pysradb gsm-to-srp GSM4995565

    experiment_alias study_accession
    GSM4995565       SRP299803

Get GSE id:

    $ pysradb gsm-to-gse GSM4995565

    experiment_alias study_alias
    GSM4995565       GSE164044

Get SRX id:

    $ pysradb gsm-to-srx GSM4995565

    experiment_alias experiment_accession
    GSM4995565       SRX9756769

Getting SRR id(s):

    $ pysradb gsm-to-srr GSM4995565

    experiment_alias run_accession
    GSM4995565       SRR13329759

## Case Study 2

Our first case study included metadata search. Next, we explore
downloading datasets.

We have a SRP id to start off with: `SRP000941`. We want to quickly
checkout its contents:

    $ pysradb metadata SRP000941 --detailed| head

    study_accession experiment_accession    experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_accession   run_total_spots run_total_bases
    SRP000941   SRX056722   Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells  Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells  9606    Homo sapiens    SAK270  ChIP-Seq    GENOMIC ChIP    SINGLE  SRS184466       Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA    26900401    531654480   SRR179707   26900401    807012030
    SRP000941   SRX027889   Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606    Homo sapiens    SAK201  ChIP-Seq    GENOMIC ChIP    SINGLE  SRS116481       Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA    37528590    779578968   SRR067978   37528590    1351029240
    SRP000941   SRX027888   Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606    Homo sapiens    LLH1U   ChIP-Seq    GENOMIC RANDOM  SINGLE  SRS116483       Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA    13603127    3232309537  SRR067977   13603127    489712572
    SRP000941   SRX027887   Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606    Homo sapiens    DM219   ChIP-Seq    GENOMIC RANDOM  SINGLE  SRS116562       Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA    22430523    506327844   SRR067976   22430523    807498828

This project is a collection of multiple assays.

    $ pysradb metadata SRP000941 --detailed  | tr -s '  ' | cut -f5 -d ' ' | sort | uniq -c

    999 Bisulfite-Seq
    768 ChIP-Seq
      1 library_strategy
    121 OTHER
    353 RNA-Seq
     28 WGS

We want to however only download `RNA-seq` samples:

    $ pysradb metadata SRP000941 --detailed | grep 'study\|RNA-Seq' | pysradb download

This will download all `RNA-seq` samples coming from this project using
`aspera-client`, if available. Alternatively, it can also use `wget`.

Downloading an entire project is easy:

    $ pysradb download -p SRP000941

Downloads are organized by `SRP/SRX/SRR` mimicking the hiererachy of SRA
projects.


================================================
FILE: docs/cmdline.md
================================================
# CLI

    $ pysradb
    usage: pysradb [-h] [--version] [--citation]
                   {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
                   ...

    pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
    Citation: 10.12688/f1000research.18676.1

    optional arguments:
      -h, --help            show this help message and exit
      --version             show program's version number and exit
      --citation            how to cite

    subcommands:
      {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
        metadata            Fetch metadata for SRA project (SRPnnnn)
        download            Download SRA project (SRPnnnn)
        search              Search SRA/ENA for matching text
        gse-to-gsm          Get GSM for a GSE
        gse-to-srp          Get SRP for a GSE
        gsm-to-gse          Get GSE for a GSM
        gsm-to-srp          Get SRP for a GSM
        gsm-to-srr          Get SRR for a GSM
        gsm-to-srs          Get SRS for a GSM
        gsm-to-srx          Get SRX for a GSM
        srp-to-gse          Get GSE for a SRP
        srp-to-srr          Get SRR for a SRP
        srp-to-srs          Get SRS for a SRP
        srp-to-srx          Get SRX for a SRP
        srr-to-gsm          Get GSM for a SRR
        srr-to-srp          Get SRP for a SRR
        srr-to-srs          Get SRS for a SRR
        srr-to-srx          Get SRX for a SRR
        srs-to-gsm          Get GSM for a SRS
        srs-to-srx          Get SRX for a SRS
        srx-to-srp          Get SRP for a SRX
        srx-to-srr          Get SRR for a SRX
        srx-to-srs          Get SRS for a SRX
        geo-matrix          Download and parse GEO Matrix files
        srp-to-pmid         Get PMIDs for SRP accessions
        gse-to-pmid         Get PMIDs for GSE accessions
        pmid-to-gse         Get GSE accessions from PMIDs
        pmid-to-srp         Get SRP accessions from PMIDs
        pmc-to-identifiers  Extract database identifiers from PMC articles
        pmid-to-identifiers Extract database identifiers from PubMed articles
        doi-to-gse          Get GSE accessions from DOIs
        doi-to-srp          Get SRP accessions from DOIs
        doi-to-identifiers  Extract database identifiers from articles via DOI

## Enriching metadata

Extract standardized biological metadata from SRA/GEO datasets using LLMs.

### Quickstart

```bash
from pysradb import SRAweb

client = SRAweb()

df = client.metadata("GSE286254", detailed=True, enrich=True)

# Returns original + 9 enriched columns (might not always be complete):
# guessed_organ, guessed_tissue, guessed_anatomical_system,
# guessed_cell_type, guessed_disease, guessed_sex,
# guessed_development_stage, guessed_assay, guessed_organism
```

### Prerequisites

Install Ollama: <https://ollama.ai>

```bash
ollama pull phi3
```

### Advanced Usage

```bash
# Use different model
df = client.metadata("GSE286254", detailed=True, enrich=True,
                enrich_backend="ollama/llama3.2")

# Manual enrichment with custom settings
from pysradb.metadata_enrichment import create_metadata_extractor, load_ontology_reference

# LLM-based extraction
extractor_llm = create_metadata_extractor(method="llm", backend="ollama/phi3")
df_enriched = extractor_llm.enrich_dataframe(df, prefix="guessed_")

# Embedding-based extraction (faster, offline)
ontology_ref = load_ontology_reference()
extractor_emb = create_metadata_extractor(
    method="embedding",
    model="FremyCompany/BioLORD-2023",
    reference_categories=ontology_ref
)
df_enriched = extractor_emb.enrich_dataframe(df, prefix="guessed_")
```

See [Notebook 09](https://github.com/saketkc/pysradb/blob/develop/notebooks/09.Metadata_Enrichment_with_LLMs.ipynb) for detailed examples.

## Getting metadata for a SRA project (SRP)

The most basic information associated with any SRA project is its list
of experiments and run accessions.

    $ pysradb metadata SRP098789

     study_accession experiment_accession sample_accession run_accession
     SRP098789       SRX2536403           SRS1956353       SRR5227288
     SRP098789       SRX2536404           SRS1956354       SRR5227289
     SRP098789       SRX2536405           SRS1956355       SRR5227290
     SRP098789       SRX2536406           SRS1956356       SRR5227291
     SRP098789       SRX2536407           SRS1956357       SRR5227292
     SRP098789       SRX2536408           SRS1956358       SRR5227293
     SRP098789       SRX2536409           SRS1956359       SRR5227294

Listing SRX and SRRs for a SRP is often not useful. We might want to
take a quick look at the metadata associated with the samples:

    $ pysradb metadata SRP098789

     study_accession experiment_accession sample_accession run_accession sample_attribute
     SRP098789       SRX2536403           SRS1956353       SRR5227288    source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
     SRP098789       SRX2536404           SRS1956354       SRR5227289    source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
     SRP098789       SRX2536405           SRS1956355       SRR5227290    source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
     SRP098789       SRX2536406           SRS1956356       SRR5227291    source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
     SRP098789       SRX2536407           SRS1956357       SRR5227292    source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq
     SRP098789       SRX2536408           SRS1956358       SRR5227293    source_name: Huh7_0.3 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq

The example here came from a Ribosome profiling study and consists of a
collection of both Ribo-seq and RNA-seq samples. We can filter out only
the RNA-seq samples:

    $ pysradb metadata SRP098789 --detailed | grep 'study|RNA-Seq'

    SRP098789       SRX2536422           SRR5227307    RNA-Seq          SINGLE -
    SRP098789       SRX2536424           SRR5227309    RNA-Seq          SINGLE -
    SRP098789       SRX2536426           SRR5227311    RNA-Seq          SINGLE -
    SRP098789       SRX2536428           SRR5227313    RNA-Seq          SINGLE -

A more complicated example will consist of multiple assays. For example
\`SRP000941\`:

    $ pysradb metadata SRP000941 --detailed  | tr -s '  ' | cut -f5 -d ' ' | sort | uniq -c
    999 Bisulfite-Seq
    768 ChIP-Seq
      1 library_strategy
    121 OTHER
    353 RNA-Seq
     28 WGS

## Enriching metadata

You can enrich metadata with standardized biological attributes using biomedical-specialized LLMs through the `--enrich` flag:

### Basic enrichment (using default backend)

    $ pysradb metadata GSE286254 --detailed --enrich

The default uses **Meditron** (7B parameters, trained on medical literature and guidelines), which is optimized for biomedical text understanding.

This returns the original metadata plus 9 enriched columns:
- `guessed_organ`
- `guessed_tissue`
- `guessed_anatomical_system`
- `guessed_cell_type`
- `guessed_disease`
- `guessed_sex`
- `guessed_development_stage`
- `guessed_assay`
- `guessed_organism`

### Using alternative biomedical backends

    $ pysradb metadata GSE286254 --detailed --enrich --enrich-backend ollama/openbiollm-8b

Available biomedical backends:
- `ollama/meditron` (default, 7B - optimized for medical text)
- `ollama/openbiollm-8b` (8B - trained on 500k+ biomedical entries, superior biomedical performance)

Both models are specialized for biomedical and clinical text understanding, making them ideal for SRA metadata enrichment.

For more details on enrichment features and prerequisites, see the [Enriching metadata](#enriching-metadata) section above.

## Experiment accessions for a project (SRP =\> SRX)

A frequently encountered task involves getting all the experiments (SRX)
for a particular study accession (SRP). Consider project \`SRP048759\`:

    $ pysradb srp-to-srx SRP048759

## Sample accessions for a project (SRP =\> SRS)

Each experiment involves one or multiple biological samples (SRS), that
are put through different experiments (SRX).

    $ pysradb srp-to-srs --detailed SRP048759

    study_accession sample_accession
    SRP048759       SRS718878
    SRP048759       SRS718879
    SRP048759       SRS718880
    SRP048759       SRS718881
    SRP048759       SRS718882
    SRP048759       SRS718883
    SRP048759       SRS718884
    SRP048759       SRS718885
    SRP048759       SRS718886

This is very limited information. It can again be detailed out using the
[\--detailed]{.title-ref} flag:

    $ pysradb srp-to-srs --detailed SRP048759

    study_accession sample_accession        experiment_accession    run_accession   study_alias     sample_alias    experiment_alias        run_alias
    SRP048759       SRS718878       SRX729552       SRR1608490      GSE62190        GSM1521543      GSM1521543      GSM1521543_r1
    SRP048759       SRS718878       SRX729552       SRR1608491      GSE62190        GSM1521543      GSM1521543      GSM1521543_r2
    SRP048759       SRS718878       SRX729552       SRR1608492      GSE62190        GSM1521543      GSM1521543      GSM1521543_r3
    SRP048759       SRS718878       SRX729552       SRR1608493      GSE62190        GSM1521543      GSM1521543      GSM1521543_r4
    SRP048759       SRS718879       SRX729553       SRR1608494      GSE62190        GSM1521544      GSM1521544      GSM1521544_r1
    SRP048759       SRS718879       SRX729553       SRR1608495      GSE62190        GSM1521544      GSM1521544      GSM1521544_r2

## Run accessions for experiments (SRX =\> SRR)

Another frequently encountered task involves fetching the run accessions
(SRR) for a particular experiment (SRX). Consider experiments
[SRX217956]{.title-ref} and [SRX2536403]{.title-ref}. We want to be able
to resolve the run accessions for these experiments:

    $ pysradb srx-to-srr SRX217956  SRX2536403 --detailed

    experiment_accession run_accession study_accession sample_attribute
    SRX217956            SRR649752     SRP017942       source_name: 3T3 cells || treatment: control || cell line: 3T3 cells || assay type: Riboseq
    SRX2536403           SRR5227288    SRP098789       source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq

## Experiment accessions for runs (SRR =\> SRX)

For fetching experiment accessions (SRX) for one or multiple run
accessions (SRR):

    $ pysradb srr-to-srx SRR5227288 SRR649752 --detailed
    run_accession study_accession experiment_accession sample_attribute
    SRR649752     SRP017942       SRX217956            source_name: 3T3 cells || treatment: control || cell line: 3T3 cells || assay type: Riboseq
    SRR5227288    SRP098789       SRX2536403           source_name: Huh7_1.5 µM PF-067446846_10 min_ribo-seq || cell line: Huh7 || treatment time: 10 min || library type: ribo-seq

## Downaloading entire project

    $ pysradb metadata --detailed SRP098789 | pysradb download

## GEO accessions for studies (SRP =\> GSE)

    $ pysradb srp-to-gse SRP090415

    study_accession study_alias
    SRP090415       GSE87328

But not all SRPs will have an associated GEO id (GSE):

    $ pysradb srp-to-gse SRP029589

    study_accession study_alias
    SRP029589       PRJNA218051

## Converting GSM to SRP

    $ pysradb gsm-to-srp GSM2177186

    experiment_alias study_accession
    GSM2177186       SRP075720

## Converting GSM to GSE

    $ pysradb gsm-to-gse GSM2177186

    experiment_alias study_alias
    GSM2177186       GSE81903

## Converting GSM to SRX

    $ pysradb gsm-to-srx GSM2177186

    experiment_alias experiment_accession
    GSM2177186       SRX1800089

## Converting GSM to SRR

    $ pysradb gsm-to-srr GSM2177186

    experiment_alias run_accession
    GSM2177186       SRR3587529

## SRA accessions for GEO studies (GSE =\> SRP)

    $ pysradb gse-to-srp GSE87328i

    study_alias study_accession
    GSE87328    SRP090415

## Converting SRP to PMID

    $ pysradb srp-to-pmid SRP045778

    srp_accession bioproject pmid
    SRP045778     PRJNA257197 27373336

## Converting GSE to PMID

    $ pysradb gse-to-pmid GSE253406

    gse_accession pmid
    GSE253406     39528918

## Extracting identifiers from PMC/DOI

Extract database identifiers (GSE, PRJNA, SRP, etc.) from PubMed Central articles or DOIs.

### Get all identifiers from a PMID

    $ pysradb pmid-to-identifiers 39528918

    pmid      pmc_id       gse_ids     prjna_ids    srp_ids
    39528918  PMC10802650  GSE253406   PRJNA1058002 SRP484103

### Get only GSE or SRP from PMID

    $ pysradb pmid-to-gse 39528918

    pmid      pmc_id       gse_ids
    39528918  PMC10802650  GSE253406

    $ pysradb pmid-to-srp 39528918

    pmid      pmc_id       srp_ids
    39528918  PMC10802650  SRP484103

### Extract from DOI

    $ pysradb doi-to-identifiers 10.12688/f1000research.18676.1

    doi                                 pmid      pmc_id      gse_ids  srp_ids
    10.12688/f1000research.18676.1      30873266  PMC6411813  GSE...   SRP...

### Extract from PMC ID

    $ pysradb pmc-to-identifiers PMC10802650

    pmc_id       gse_ids     prjna_ids    srp_ids
    PMC10802650  GSE253406   PRJNA1058002 SRP484103

## Downloading supplementary files from GEO

    $ pysradb download -g GSE161707

## Downloading an entire SRA/ENA project (multithreaded)

`pysradb` makes it super easy to download datasets from SRA in parallel:
Using 8 threads to download:

    $ pysradb download -y -t 8 --out-dir ./pysradb_downloads -p SRP063852

Downloads are organized by `SRP/SRX/SRR` mimicking the hierarchy of SRA
projects.


================================================
FILE: docs/commands.rst
================================================
API Documentation
=================

See :doc:`pysradb` for the Python API reference documentation.


================================================
FILE: docs/conf.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# pysradb documentation build configuration file, created by
# sphinx-quickstart on Fri Jun  9 13:47:02 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

# If extensions (or modules to document with autodoc) are in another
# directory, add these directories to sys.path here. If the directory is
# relative to the documentation root, use os.path.abspath to make it
# absolute, like shown here.
#
import os
import sys

# import guzzle_sphinx_theme
import pysradb

autodoc_mock_imports = ["xmltodict", "numpy", "pandas", "requests", "tqdm"]

sys.path.insert(0, os.path.abspath(".."))


# -- General configuration ---------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
    "IPython.sphinxext.ipython_directive",
    "IPython.sphinxext.ipython_console_highlighting",
    "sphinx.ext.mathjax",
    "sphinx.ext.autodoc",
    "sphinx.ext.autosummary",
    "sphinx.ext.doctest",
    "sphinx.ext.viewcode",
    "sphinx.ext.inheritance_diagram",
    "numpydoc",
    "sphinx_tabs.tabs",
    "sphinx_panels",
    "sphinxcontrib.gtagjs",
    "myst_parser",
    "nbsphinx",
]
gtagjs_ids = [
    "G-CKQZFCEENZ",
]

panels_add_bootstrap_css = False

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = [".rst", ".md"]
# source_suffix = ".md"

# The master toctree document.
master_doc = "index"

# General information about the project.
project = "pysradb"
copyright = "2023, Saket Choudhary"
author = "Saket Choudhary"
# The version info for the project you're documenting, acts as replacement
# for |version| and |release|, also used in various other places throughout
# the built documents.
#
# The short X.Y version.
version = pysradb.__version__
# The full version, including alpha/beta/rc tags.
release = pysradb.__version__

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False


# -- Options for HTML output -------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = "furo"

# Theme options are theme-specific and customize the look and feel of a
# theme further.  For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {
    "light_css_variables": {
        "color-brand-primary": "#0066cc",
        "color-brand-content": "#0066cc",
        "color-code-background": "#f5f5f5",
        "color-inline-code-background": "#f0f0f0",
    },
    "dark_css_variables": {
        "color-brand-primary": "#3b82f6",
        "color-brand-content": "#3b82f6",
        "color-code-background": "#1e293b",
        "color-inline-code-background": "#334155",
    },
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]


# -- Options for HTMLHelp output ---------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = "pysradbdoc"


# -- Options for LaTeX output ------------------------------------------

latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',
    # The font size ('10pt', '11pt' or '12pt').
    #
    # 'pointsize': '10pt',
    # Additional stuff for the LaTeX preamble.
    #
    # 'preamble': '',
    # Latex figure (float) alignment
    #
    # 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass
# [howto, manual, or own class]).
latex_documents = [
    (master_doc, "pysradb.tex", "pysradb Documentation", "Saket Choudhary", "manual")
]


# -- Options for manual page output ------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "pysradb", "pysradb Documentation", [author], 1)]


# -- Options for Texinfo output ----------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
#  dir menu entry, description, category)
texinfo_documents = [
    (
        master_doc,
        "pysradb",
        "pysradb Documentation",
        author,
        "pysradb",
        "One line description of project.",
        "Miscellaneous",
    )
]


numpydoc_show_class_members = False


##html_theme_path = guzzle_sphinx_theme.html_theme_path()
##html_theme = "guzzle_sphinx_theme"
##
### Register the theme as an extension to generate a sitemap.xml
##extensions.append("guzzle_sphinx_theme")
##
### Guzzle theme options (see theme.conf for more information)
##html_theme_options = {
##    # Set the name of the project to appear in the sidebar
##    "project_nav_name": "pysradb"
##}

scv_greatest_tag = True
scv_show_banner = True

html_logo = "_static/pysradb_v3.png"

# Load custom JavaScript for copy-to-clipboard functionality
html_js_files = [
    "copy-button.js",
]

# Load custom CSS to override Pygments background colors
html_css_files = [
    "custom.css",
]

# NBSphinx configuration
nbsphinx_execute = "never"
exclude_patterns.append("**/.ipynb_checkpoints")
exclude_patterns.append("notebooks/.ipynb_checkpoints")


================================================
FILE: docs/contributing.md
================================================
# Contributing

Contributions are welcome, and they are greatly appreciated! Every
little bit helps, and credit will always be given.

You can contribute in many ways:

## Types of Contributions

### Report Bugs

Report bugs at <https://github.com/saketkc/pysradb/issues>.

If you are reporting a bug, please include:

-   Your operating system name and version.
-   Any details about your local setup that might be helpful in
    troubleshooting.
-   Detailed steps to reproduce the bug.

### Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with \"bug\"
and \"help wanted\" is open to whoever wants to implement it.

### Implement Features

Look through the GitHub issues for features. Anything tagged with
\"enhancement\" and \"help wanted\" is open to whoever wants to
implement it.

### Write Documentation

pysradb could always use more documentation, whether as part of the
official pysradb docs, in docstrings, or even on the web in blog posts,
articles, and such.

### Submit Feedback

The best way to send feedback is to file an issue at
<https://github.com/saketkc/pysradb/issues>.

If you are proposing a feature:

-   Explain in detail how it would work.
-   Keep the scope as narrow as possible, to make it easier to
    implement.
-   Remember that this is a volunteer-driven project, and that
    contributions are welcome :)

## Get Started!

Ready to contribute? Here\'s how to set up [pysradb]{.title-ref} for
local development.

1.  Fork the [pysradb]{.title-ref} repo on GitHub.

2.  Clone your fork locally:

    ``` shell
    $ git clone git@github.com:your_name_here/pysradb.git
    ```

3.  Install your local copy into a virtualenv. Assuming you have
    virtualenvwrapper installed, this is how you set up your fork for
    local development (If python \--version is less than 3.0, run [\$
    mkvirtualenv pysradb \--python=py3]{.title-ref} instead):

    ``` shell
    $ mkvirtualenv pysradb
    $ cd pysradb/
    $ python setup.py develop
    ```

4.  Create a branch for local development:

    ``` shell
    $ git checkout -b name-of-your-bugfix-or-feature
    ```

    Now you can make your changes locally.

5.  When you\'re done making changes, check that your changes pass
    flake8 and the tests, including testing other Python versions with
    tox:

    ``` shell
    $ flake8 pysradb tests
    $ python setup.py test or py.test
    $ tox
    ```

    To get flake8 and tox, just pip install them into your virtualenv.

6.  Commit your changes and push your branch to GitHub:

    ``` shell
    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    ```

7.  Submit a pull request through the GitHub website.

## Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

1.  The pull request should include tests.
2.  If the pull request adds functionality, the docs should be updated.
    Put your new functionality into a function with a docstring, and add
    the feature to the list in README.rst.
3.  The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and
    for PyPy. Make sure that the tests pass for all supported Python
    versions.

## Tips

To run a subset of tests:

``` shell
$ py.test tests.test_pysradb
```

## Deploying

A reminder for the maintainers on how to deploy. Make sure all your
changes are committed (including an entry in HISTORY.rst). Then run:

``` shell
$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags
```

CI will then deploy to PyPI if tests pass.


================================================
FILE: docs/history.md
================================================
# History

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.5.1 (2025-10-29)
</summary>


- Add prjna support in doi-to-identifiers [#249](https://github.com/saketkc/pysradb/pull/249) 


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.5.0 (2025-10-19)
</summary>


- Add pmid/doi-to-gse/srp conversion [#246](https://github.com/saketkc/pysradb/pull/246).


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.4.1 (2025-09-27)
</summary>


- Add gse-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/244).


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.4.0 (2025-09-27)
</summary>


- Add sra-to-pmid conversion [#241](https://github.com/saketkc/pysradb/pull/241). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith) for the idea.


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.3.0 (2025-08-24)
</summary>


- Download logic improvements: remoted requests-ftp as requirement
- Fix for handling missing metadata keys [#223](https://github.com/saketkc/pysradb/pull/223). Thanks [@andrewdavidsmith](https://github.com/andrewdavidsmith)


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.2 (2024-10-03)
</summary>


- Fix for handling ENA urls for paired end data


</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.1 (2024-08-21)
</summary>


- Fix for handling ENA urls
- Migrated to pyproject.toml



</details>

<details open>
<summary style="cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;">
2.2.0 (2023-09-17)
</summary>


- Add support for Biosamples and bioproject [#199](https://github.com/saketkc/pysradb/pull/198)
- Use retmode xml for Geo search [#200](https://github.com/saketkc/pysradb/pull/200)
- Documentation fixes

## 2.1.0 (2023-05-16)

-   Fix for [gse-to-srp] returning unrequested GSEs [#186](https://github.com/saketkc/pysradb/issues/190)
-   Fix for [download] using [public_urls]
-   Fix for [gsm-to-srx] returning false positives [#165](https://github.com/saketkc/pysradb/issues/165)
-   Fix for delimiter not being consistent when metadata is printed on
    terminal [#147](https://github.com/saketkc/pysradb/issues/147)
-   ENA search is currently broken because of an API change

## 2.0.2 (2023-04-09)

-   Fix for [gse-to-srp] to handle cases where a project is
    missing but SRXs are returned [#186](https://github.com/saketkc/pysradb/issues/186)
-   Fix gse-to-gsm [#187](https://github.com/saketkc/pysradb/issues/187)

## 2.0.1 (2023-03-18)

-   Fix for [pysradb download] - using [public_url]
-   Fix for SRX -\> SRR and related conversions [#183](https://github.com/saketkc/pysradb/pull/183)

## 2.0.0 (2023-02-23)

-   BREAKING change: Overhaul of how urls and associated metadata are
    returned (not backward compatible); all column names are lower cased
    by default
-   Fix extra space in \"organism_taxid\" column
-   Added support for Experiment attributes [#89](https://github.com/saketkc/pysradb/issues/89#issuecomment-1439319532)

## 1.4.2 (06-17-2022)

-   Fix ENA fastq fetching [#163](https://github.com/saketkc/pysradb/issues/163)

## 1.4.1 (06-04-2022)

-   Fix for fetching alternative URLs

## 1.4.0 (06-04-2022)

-   Added ability to fetch alternative URLs (GCP/AWS) for metadata
    [#161](https://github.com/saketkc/pysradb/issues/161)
-   Fix for xmldict 0.13.0 no longer defaulting to OrderedDict [#159](https://github.com/saketkc/pysradb/pull/159)
-   Fix for missing experiment model and description in metadata [#160](https://github.com/saketkc/pysradb/issues/160)

## 1.3.0 (02-18-2022)

-   Add [study_title] to [\--detailed] flag
    ([#152](https://github.com/saketkc/pysradb/issues/152))
-   Fix [KeyError] in [metadata] where some new
    IDs do not have any metadata
    ([#151](https://github.com/saketkc/pysradb/issues/151))

## 1.2.0 (01-10-2022)

-   Do not exit if a qeury returns no hits ([#149](https://github.com/saketkc/pysradb/pull/149))

## 1.1.0 (12-12-2021)

-   Fixed [gsm-to-gse] failure
    ([#128](https://github.com/saketkc/pysradb/pull/128))
-   Fixed case sensitivity bug for ENA search
    ([#144](https://github.com/saketkc/pysradb/pull/144))
-   Fixed publication date bug for search
    ([#146](https://github.com/saketkc/pysradb/pull/146))
-   Added support for downloading data from GEO [pysradb dowload -g
    GSE]
    ([#129](https://github.com/saketkc/pysradb/pull/129))

## 1.0.1 (01-10-2021)

-   Dropped Python 3.6 since pandas 1.2 is not supported

## 1.0.0 (01-09-2021)

-   Retired `metadb` and `SRAdb` based search through CLI - everything
    defaults to `SRAweb`
-   `SRAweb` now supports
    [search](https://saket-choudhary.me/pysradb/quickstart.html#search)
-   [N/A] is now replaced with [pd.NA]
-   Two new fields in \`\--detailed\`: [instrument_model]
    and [instrument_model_desc]
    [#75](https://github.com/saketkc/pysradb/issues/75)
-   Updated documentation

## 0.11.1 (09-18-2020)

-   [library_layout] is now outputted in metadata #56
-   [-detailed] unifies columns for ENA fastq links instead
    of appending \_x/\_y #59
-   bugfix for parsing namespace in xml outputs #65
-   XML errors from NCBI are now handled more gracefully #69
-   Documentation and dependency updates

## 0.11.0 (09-04-2020)

-   [pysradb download] now supports multiple threads for
    paralle downloads
-   [pysradb download] also supports ultra fast downloads of
    FASTQs from ENA using aspera-client

## 0.10.3 (03-26-2020)

-   Added test cases for SRAweb
-   API limit exceeding errors are automagically handled
-   Bug fixes for GSE \<=\> SRR
-   Bug fix for metadata - supports multiple SRPs

Contributors

-   Dibya Gautam
-   Marius van den Beek

## 0.10.2 (02-05-2020)

-   Bug fix: Handle API-rate limit exceeding =\> Retries
-   Enhancement: \'Alternatives\' URLs are now part of
    [\--detailed]

## 0.10.1 (02-04-2020)

-   Bug fix: Handle Python3.6 for capture_output in subprocess.run

## 0.10.0 (01-31-2020)

-   All the subcommands (srx-to-srr, srx-to-srs) will now print
    additional columns where the first two columns represent the
    relevant conversion
-   Fixed a bug where for fetching entries with single efetch record

## 0.9.9 (01-15-2020)

-   Major fix: some SRRs would go missing as the experiment dict was
    being created only once per SRR (See #15)
-   Features: More detailed metadata by default in the SRAweb mode
-   See notebook: <https://colab.research.google.com/drive/1C60V->

## 0.9.7 (01-20-2020)

-   Feature: instrument, run size and total spots are now printed in the
    metadata by default (SRAweb mode only)
-   Issue: Fixed an issue with srapath failing on SRP. srapath is now
    run on individual SRRs.

## 0.9.6 (07-20-2019)

-   Introduced [SRAweb] to perform queries over the web if
    the SQLite is missing or does not contain the relevant record.

## 0.9.0 (02-27-2019)

### Others

-   This release completely changes the command line interface replacing
    click with argparse ([#3](https://github.com/saketkc/pysradb/pull/3))
-   Removed Python 2 comptaible stale code

## 0.8.0 (02-26-2019)

### New methods/functionality

-   \`srr-to-gsm\`: convert SRR to GSM
-   SRAmetadb.sqlite.gz file is deleted by default after extraction
-   When SRAmetadb is not found a confirmation is seeked before
    downloading
-   Confirmation option before SRA downloads

### Bugfix

-   download() works with wget

### Others

-   [\--out_dir] is now [out-dir]

## 0.7.1 (02-18-2019)

Important: Python2 is no longer supported. Please consider moving to
Python3.

### Bugfix

-   Included docs in the index whihch were missed out in the previous
    release

## 0.7.0 (02-08-2019)

### New methods/functionality

-   \`gsm-to-srr\`: convert GSM to SRR
-   \`gsm-to-srx\`: convert GSM to SRX
-   \`gsm-to-gse\`: convert GSM to GSE

### Renamed methods

The following commad line options have been renamed and the changes are
not compatible with 0.6.0 release:

-   [sra-metadata] -\> [metadata].
-   [sra-search] -\> [search].
-   [srametadb] -\> [metadb].

## 0.6.0 (12-25-2018)

### Bugfix

-   Fixed bugs introduced in 0.5.0 with API changes where multiple
    redundant columns were output in [sra-metadata]

### New methods/functionality

-   [download] now allows piped inputs

## 0.5.0 (12-24-2018)

### New methods/functionality

-   Support for filtering by SRX Id for SRA downloads.
-   \`srr_to_srx\`: Convert SRR to SRX/SRP
-   \`srp_to_srx\`: Convert SRP to SRX
-   Stripped down [sra-metadata] to give minimal information
-   Added [\--assay], [\--desc],
    [\--detailed] flag for [sra-metadata]
-   Improved table printing on terminal

## 0.4.2 (12-16-2018)

### Bugfix

-   Fixed unicode error in tests for Python2

## 0.4.0 (12-12-2018)

### New methods/functionality

-   Added a new [BASEdb] class to handle common database
    connections
-   Initial support for GEOmetadb through GEOdb class
-   Initial support or a command line interface:
    -   download Download SRA project (SRPnnnn)
    -   gse-metadata Fetch metadata for GEO ID (GSEnnnn)
    -   gse-to-gsm Get GSM(s) for GSE
    -   gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
    -   sra-metadata Fetch metadata for SRA project (SRPnnnn)
-   Added three separate notebooks for SRAdb, GEOdb, CLI usage

## 0.3.0 (12-05-2018)

### New methods/functionality

-   [sample_attribute] and
    [experiment_attribute] are now included by default in
    the df returned by [sra_metadata()]
-   [expand_sample_attribute_columns: expand metadata dataframe based on
    attributes in \`sample_attribute] column
-   New methods to guess cell/tissue/strain:
    [guess_cell_type()]/[guess_tissue_type()]/[guess_strain_type()]
-   Improved README and usage instructions

## 0.2.2 (12-03-2018)

### New methods/functionality

-   [search_sra()] allows full text search on SRA metadata.

## 0.2.0 (12-03-2018)

### Renamed methods

The following methods have been renamed and the changes are not
compatible with 0.1.0 release:

-   [get_query()] -\> [query()].
-   [sra_convert()] -\> [sra_metadata()].
-   [get_table_counts()] -\> [all_row_counts()].

### New methods/functionality

-   [download_sradb_file()] makes fetching [SRAmetadb.sqlite] file easy; wget is no longer required.
-   [ftp] protocol is now supported besides [fsp] and hence [aspera-client] is now optional. We however, strongly recommend [aspera-client] for faster downloads.

### Bug fixes

-   Silenced [SettingWithCopyWarning] by excplicitly doing
    operations on a copy of the dataframe instead of the original.

Besides these, all methods now follow a [numpydoc]
compatible documentation.

## 0.1.0 (12-01-2018)

-   First release on PyPI.

</details>



================================================
FILE: docs/index.rst
================================================
============
Introduction
============


``pysradb`` provides a simple method to programmatically access metadata
and download sequencing data from NCBI's Sequence Read Archive (SRA) and European Bioinformatics
Institute's European Nucleotide Archive (ENA).



=============
Quick Example
=============

To fetch metadata associated with project accession ``SRP265425``

.. code-block:: console

    $ pysradb metadata SRP265425

    study_accession	experiment_accession	experiment_title	experiment_desc	organism_taxid 	organism_name	library_name	library_strategy	library_source	library_selection	library_layout	sample_accession	sample_title	instrument	instrument_model	instrument_model_desc	total_spots	total_size	run_accession	run_total_spots	run_total_bases
    SRP265425	SRX8434255	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	63-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745319		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1311358	83306910	SRR11886735	1311358	109594216
    SRP265425	SRX8434254	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	62-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745320		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2614109	204278682	SRR11886736	2614109	262305651
    SRP265425	SRX8434253	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	61-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745318		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2286312	183516004	SRR11886737	2286312	263304134
    SRP265425	SRX8434252	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	60-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745317		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5202567	507524965	SRR11886738	5202567	781291588
    SRP265425	SRX8434251	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	38-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745315		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	3313960	356104406	SRR11886739	3313960	612430817
    SRP265425	SRX8434250	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	37-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745316		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5155733	565882351	SRR11886740	5155733	954342917
    SRP265425	SRX8434249	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	36-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745313		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1324589	175619046	SRR11886741	1324589	216531400
    SRP265425	SRX8434248	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	35-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745314		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1639851	198973268	SRR11886742	1639851	245466005
    SRP265425	SRX8434247	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	68-2020-05-07	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745312		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	3921389	210198580	SRR11886743	3921389	332935558
    SRP265425	SRX8434246	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	66-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745311		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	14295475	2150005008	SRR11886744	14295475	2967829315
    SRP265425	SRX8434245	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	65-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745310		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5124692	294846140	SRR11886745	5124692	431819462
    SRP265425	SRX8434244	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	64-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745309		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2986306	205666872	SRR11886746	2986306	275400959
    SRP265425	SRX8434243	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	34-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745308		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1182690	59471336	SRR11886747	1182690	86350631
    SRP265425	SRX8434242	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	33-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745307		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	6031816	749323230	SRR11886748	6031816	928054297


To fetch detailed metadata which includes link to raw sequencing files, specify ``--detailed``:

.. code-block:: console

    $ pysradb metadata SRP265425 --detailed

    run_accession	study_accession	experiment_accession	experiment_title	experiment_desc	organism_taxid 	organism_name	library_name	library_strategy	library_source	library_selection	library_layout	sample_accession	sample_title	instrument	instrument_model	instrument_model_desc	total_spots	total_size	run_total_spots	run_total_bases	run_alias	sra_url_alt1	sra_url_alt2	sra_url	experiment_alias	isolate	collected_by	collection_date	geo_loc_name	host	host_disease	isolation_source	lat_lon	BioSampleModel	sra_url_alt3	ena_fastq_http	ena_fastq_http_1	ena_fastq_http_2	ena_fastq_ftp	ena_fastq_ftp_1	ena_fastq_ftp_2
    SRR11886735	SRP265425	SRX8434255	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	63-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745319		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1311358	83306910	1311358	109594216	IonXpress_063_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam	gs://sra-pub-src-9/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011608/SRR11886735		GC-20	NA	02-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz
    SRR11886736	SRP265425	SRX8434254	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	62-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745320		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2614109	204278682	2614109	262305651	IonXpress_062_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam	gs://sra-pub-src-16/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRZ/011886/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra50/SRR/011608/SRR11886736		GC-51	NA	14-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz
    SRR11886737	SRP265425	SRX8434253	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	61-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745318		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2286312	183516004	2286312	263304134	IonXpress_061_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam	gs://sra-pub-src-16/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRZ/011886/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra17/SRR/011608/SRR11886737		GC-24	NA	07-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz
    SRR11886738	SRP265425	SRX8434252	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	60-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745317		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5202567	507524965	5202567	781291588	IonXpress_060_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam	gs://sra-pub-src-15/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam	https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011608/SRR11886738		GC-23	NA	08-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz
    SRR11886739	SRP265425	SRX8434251	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	38-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745315		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	3313960	356104406	3313960	612430817	IonXpress_038_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-13/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011608/SRR11886739		GC-11b	NA	24-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz
    SRR11886740	SRP265425	SRX8434250	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	37-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745316		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5155733	565882351	5155733	954342917	IonXpress_037_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-5/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886740		GC-14b	NA	28-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz
    SRR11886741	SRP265425	SRX8434249	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	36-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745313		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1324589	175619046	1324589	216531400	IonXpress_036_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-11/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra57/SRR/011608/SRR11886741		GC-12	NA	24-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz
    SRR11886742	SRP265425	SRX8434248	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	35-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745314		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1639851	198973268	1639851	245466005	IonXpress_035_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-11/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011608/SRR11886742		GC-13	NA	23-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz
    SRR11886743	SRP265425	SRX8434247	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	68-2020-05-07	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745312		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	3921389	210198580	3921389	332935558	IonXpress_068_R_2020_05_07_11_47_51_user_GCEID-S5-60-SARS_CoV2_SA4.bam	gs://sra-pub-src-17/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra64/SRZ/011886/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra54/SRR/011608/SRR11886743		GC-55	NA	24-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz
    SRR11886744	SRP265425	SRX8434246	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	66-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745311		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	14295475	2150005008	14295475	2967829315	IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq	gs://sra-pub-src-11/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra20/SRR/011608/SRR11886744		GC-26	NA	07-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz
    SRR11886745	SRP265425	SRX8434245	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	65-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745310		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	5124692	294846140	5124692	431819462	IonXpress_065_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.bam	gs://sra-pub-src-16/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra19/SRR/011608/SRR11886745		GC-25	NA	10-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz
    SRR11886746	SRP265425	SRX8434244	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	64-2020-04-22	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745309		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	2986306	205666872	2986306	275400959	IonXpress_064_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam	gs://sra-pub-src-17/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra59/SRZ/011886/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra47/SRR/011608/SRR11886746		GC-21	NA	03-Apr-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz
    SRR11886747	SRP265425	SRX8434243	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	34-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745308		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	1182690	59471336	1182690	86350631	IonXpress_034_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-16/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRZ/011886/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta	https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886747		GC-11a	NA	24-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz
    SRR11886748	SRP265425	SRX8434242	Ampliseq of SARS-CoV-2	Ampliseq of SARS-CoV-2	2697049	Severe acute respiratory syndrome coronavirus 2	33-2020-04-03	AMPLICON	VIRAL RNA	RT-PCR	SINGLE	SRS6745307		Ion Torrent S5 XL	Ion Torrent S5 XL	ION_TORRENT	6031816	749323230	6031816	928054297	IonXpress_033_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam	gs://sra-pub-src-15/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1	https://sra-download.ncbi.nlm.nih.gov/traces/sra43/SRZ/011886/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam	https://sra-download.ncbi.nlm.nih.gov/traces/sra66/SRR/011608/SRR11886748		GC-14a	NA	28-Mar-2020	Australia: Victoria	Homo sapiens	COVID-19	swab	NA	Pathogen.cl	https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz

See :doc:`quickstart` for other examples.

.. toctree::
   :hidden:
   :maxdepth: 1

   installation
   quickstart
   cmdline
   python-api-usage
   case_studies
   notebooks
   commands
   contributing
   authors
   history
   modules


===========
Publication
===========

 `pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive <https://f1000research.com/articles/8-532/v1>`_


 Presentation slides from BOSC (ISMB-ECCB) 2019: https://f1000research.com/slides/8-1183

===========================================================================

========
Citation
========

Choudhary, Saket. "pysradb: A Python Package to Query next-Generation Sequencing Metadata and Data from NCBI Sequence Read Archive." F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532 (https://f1000research.com/articles/8-532/v1)

::

    @article{Choudhary2019,
    doi = {10.12688/f1000research.18676.1},
    url = {https://doi.org/10.12688/f1000research.18676.1},
    year = {2019},
    month = apr,
    publisher = {F1000 (Faculty of 1000 Ltd)},
    volume = {8},
    pages = {532},
    author = {Saket Choudhary},
    title = {pysradb: A {P}ython package to query next-generation sequencing metadata and data from {NCBI} {S}equence {R}ead {A}rchive},
    journal = {F1000Research}
    }


Zenodo archive: https://zenodo.org/badge/latestdoi/159590788

Zenodo DOI: 10.5281/zenodo.2306881


================================================
FILE: docs/installation.md
================================================
# Installation

## Stable release

To install pysradb, run this command in your terminal:

``` console
$ pip install pysradb
```

This is the preferred method to install pysradb, as it will always
install the most recent stable release.

If you don\'t have [pip](https://pip.pypa.io) installed, this [Python
installation
guide](http://docs.python-guide.org/en/latest/starting/installation/)
can guide you through the process.

Alternatively, you may use conda:

``` bash
conda install -c bioconda pysradb
```

This step will install all the dependencies except aspera-client (which
is not required, but highly recommended). If you have an existing
environment with a lot of pre-installed packages, conda might be
[slow](https://github.com/bioconda/bioconda-recipes/issues/13774).
Please consider creating a new enviroment for `pysradb`:

``` bash
conda create -c bioconda -n pysradb PYTHON=3 pysradb
```

## From sources

The source files for pysradb can be downloaded from the [Github
repo](https://github.com/saketkc/pysradb).

You can either clone the public repository:

``` console
$ git clone git://github.com/saketkc/pysradb
```

Or download the
[tarball](https://github.com/saketkc/pysradb/tarball/master):

``` console
$ curl  -OL https://github.com/saketkc/pysradb/tarball/master
```

Once you have a copy of the source, you can install it with:

``` console
$ python setup.py install
```


================================================
FILE: docs/make.bat
================================================
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
	set SPHINXBUILD=python -msphinx
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=pysradb

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
	echo.
	echo.The Sphinx module was not found. Make sure you have Sphinx installed,
	echo.then set the SPHINXBUILD environment variable to point to the full
	echo.path of the 'sphinx-build' executable. Alternatively you may add the
	echo.Sphinx directory to PATH.
	echo.
	echo.If you don't have Sphinx installed, grab it from
	echo.http://sphinx-doc.org/
	exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%

:end
popd


================================================
FILE: docs/modules.rst
================================================
pysradb
=======

.. toctree::
   :maxdepth: 4

   pysradb


================================================
FILE: docs/notebooks.rst
================================================
Tutorials & Notebooks
=====================

The following Jupyter notebooks demonstrate various features of pysradb:

.. toctree::
   :maxdepth: 1
   :caption:

   notebooks/README
   notebooks/01.Python-API_demo.ipynb
   notebooks/02.Commandline_download.ipynb
   notebooks/03.ParallelDownload.ipynb
   notebooks/04.SRA_to_fastq_conda.ipynb
   notebooks/05.Downloading_subsets_of_a_project.ipynb
   notebooks/06.Multiple_SRPs.ipynb
   notebooks/07.Query_Search.ipynb
   notebooks/08.PMC_DOI_Identifiers.ipynb
   notebooks/09.Metadata_enrichment.ipynb

You can also view the complete `notebooks directory on GitHub <https://github.com/saketkc/pysradb/tree/develop/notebooks>`_ for additional tutorials and examples.


================================================
FILE: docs/pysradb.rst
================================================
pysradb package
===============

Submodules
----------

pysradb.basedb module
---------------------

.. automodule:: pysradb.basedb
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.cli module
------------------

.. automodule:: pysradb.cli
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.download module
-----------------------

.. automodule:: pysradb.download
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.exceptions module
-------------------------

.. automodule:: pysradb.exceptions
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.filter\_attrs module
----------------------------

.. automodule:: pysradb.filter_attrs
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.geodb module
--------------------

.. automodule:: pysradb.geodb
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.geoweb module
---------------------

.. automodule:: pysradb.geoweb
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.metadata\_enrichment module
-----------------------------------

.. automodule:: pysradb.metadata_enrichment
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.search module
---------------------

.. automodule:: pysradb.search
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.sradb module
--------------------

.. automodule:: pysradb.sradb
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.sraweb module
---------------------

.. automodule:: pysradb.sraweb
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.taxid2name module
-------------------------

.. automodule:: pysradb.taxid2name
   :members:
   :undoc-members:
   :show-inheritance:

pysradb.utils module
--------------------

.. automodule:: pysradb.utils
   :members:
   :undoc-members:
   :show-inheritance:

Module contents
---------------

.. automodule:: pysradb
   :members:
   :undoc-members:
   :show-inheritance:


================================================
FILE: docs/python-api-usage.md
================================================
# Python API 

## Use Case 1: Fetch the metadata table (SRA-runtable)

The simplest use case of [pysradb]{.title-ref} is when you know the SRA
project ID (SRP) and would simply want to fetch the metadata associated
with it. This is generally reflected in the
[SraRunTable.txt]{.title-ref} that you get from NCBI\'s website. See an
[example](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP098789) of a
SraRunTable.

``` python
from pysradb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP098789')
df.head()
```

    ===============  ====================  ======================================================================  =============  ========  =================  ==============  ================  ==============  ============  ==========  ========  ============  ===============
    study_accession  experiment_accession                             experiment_title                             run_accession  taxon_id  library_selection  library_layout  library_strategy  library_source  library_name    bases      spots    adapter_spec  avg_read_length
    ===============  ====================  ======================================================================  =============  ========  =================  ==============  ================  ==============  ============  ==========  ========  ============  ===============
    SRP098789        SRX2536403            GSM2475997: 1.5 µM PF-067446846, 10 min, rep 1; Homo sapiens; OTHER  SRR5227288         9606  other              SINGLE -        OTHER             TRANSCRIPTOMIC                2104142750  42082855                             50
    SRP098789        SRX2536404            GSM2475998: 1.5 µM PF-067446846, 10 min, rep 2; Homo sapiens; OTHER  SRR5227289         9606  other              SINGLE -        OTHER             TRANSCRIPTOMIC                2082873050  41657461                             50
    SRP098789        SRX2536405            GSM2475999: 1.5 µM PF-067446846, 10 min, rep 3; Homo sapiens; OTHER  SRR5227290         9606  other              SINGLE -        OTHER             TRANSCRIPTOMIC                2023148650  40462973                             50
    SRP098789        SRX2536406            GSM2476000: 0.3 µM PF-067446846, 10 min, rep 1; Homo sapiens; OTHER  SRR5227291         9606  other              SINGLE -        OTHER             TRANSCRIPTOMIC                2057165950  41143319                             50
    SRP098789        SRX2536407            GSM2476001: 0.3 µM PF-067446846, 10 min, rep 2; Homo sapiens; OTHER  SRR5227292         9606  other              SINGLE -        OTHER             TRANSCRIPTOMIC                3027621850  60552437                             50
    ===============  ====================  ======================================================================  =============  ========  =================  ==============  ================  ==============  ============  ==========  ========  ============  ===============

The metadata is returned as a [pandas]{.title-ref} dataframe and hence
allows you to perform all regular select/query operations available
through [pandas]{.title-ref}.

## Use Case 2: Downloading an entire project arranged experiment wise

Once you have fetched the metadata and made sure, this is the project
you were looking for, you would want to download everything at once.
NCBI follows this hiererachy: [SRP =\> SRX =\> SRR]{.title-ref}. Each
[SRP]{.title-ref} (project) has multiple [SRX]{.title-ref} (experiments)
and each [SRX]{.title-ref} in turn has multiple [SRR]{.title-ref} (runs)
inside it. We want to mimick this hiereachy in our downloads. The reason
to do that is simple: in most cases you care about [SRX]{.title-ref} the
most, and would want to \"merge\" your SRRs in one way or the other.
Having this hierearchy ensures your downstream code can handle such
cases easily, without worrying about which runs (SRR) need to be merged.

We strongly recommend installing [aspera-client]{.title-ref} which uses
UDP and is [designed to be faster](http://www.skullbox.net/tcpudp.php).

``` python
from pysradb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP017942')
client.download(df)
```

## Use Case 3: Downloading a subset of experiments

Often, you need to process only a smaller set of samples from a project
(SRP). Consider this project which has data spanning four assays.

``` python
df = client.sra_metadata('SRP000941')
print(df.library_strategy.unique())
['ChIP-Seq' 'Bisulfite-Seq' 'RNA-Seq' 'WGS' 'OTHER']
```

But, you might be only interested in analyzing the [RNA-seq]{.title-ref}
samples and would just want to download that subset. This is simple
using [pysradb]{.title-ref} since the metadata can be subset just as you
would subset a dataframe in pandas.

``` python
df_rna = df[df.library_strategy == 'RNA-Seq']
client.download(df=df_rna, out_dir='/pysradb_downloads')()
```

## Use Case 4: Getting cell-type/treatment information from sample_attributes

Cell type/tissue informations is usually hidden in the
[sample_attributes]{.title-ref} column, which can be expanded:

``` python
from pysradb.filter_attrs import expand_sample_attribute_columns
df = client.sra_metadata('SRP017942')
expand_sample_attribute_columns(df).head()
```

<table>
<thead>
<tr class="header">
<th>study_accession</th>
<th>experiment_accession</th>
<th>experiment_title</th>
<th>experiment_attribute</th>
<th>sample_attribute</th>
<th>run_accession</th>
<th>taxon_id</th>
<th>library_selection</th>
<th>library_layout</th>
<th>library_strategy</th>
<th>library_source</th>
<th>library_name</th>
<th>bases</th>
<th>spots</th>
<th>adapter_spec</th>
<th>avg_read_length</th>
<th>assay_type</th>
<th>cell_line</th>
<th>source_name</th>
<th>transfected_with</th>
<th>treatment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p>SRP017942 SRP017942 SRP017942 SRP017942 SRP017942</p></td>
<td><p>SRX217028 SRX217029 SRX217030 SRX217031 SRX217956</p></td>
<td><p>GSM1063575: 293T_GFP; Homo sapiens; RNA-Seq GSM1063576:
293T_GFP_2hrs_severe_Heat_Shock; Homo sapiens; RNA-Seq GSM1063577:
293T_Hspa1a; Homo sapiens; RNA-Seq GSM1063578:
293T_Hspa1a_2hrs_severe_Heat_Shock; Homo sapiens; RNA-Seq GSM794854:
3T3-Control-Riboseq; Mus musculus; RNA-Seq</p></td>
<td><p>GEO Accession: GSM1063575 GEO Accession: GSM1063576 GEO
Accession: GSM1063577 GEO Accession: GSM1063578 GEO Accession:
GSM794854</p></td>
<td><p>source_name: 293T cells || cell line: 293T cells || transfected
with: 3XFLAG-GFP || assay type: Riboseq source_name: 293T cells || cell
line: 293T cells || transfected with: 3XFLAG-GFP || treatment: severe
heat shock (44C 2 hours) || assay type: Riboseq source_name: 293T cells
|| cell line: 293T cells || transfected with: 3XFLAG-Hspa1a || assay
type: Riboseq source_name: 293T cells || cell line: 293T cells ||
transfected with: 3XFLAG-Hspa1a || treatment: severe heat shock (44C 2
hours) || assay type: Riboseq source_name: 3T3 cells || treatment:
control || cell line: 3T3 cells || assay type: Riboseq</p></td>
<td><p>SRR648667 SRR648668 SRR648669 SRR648670 SRR649752</p></td>
<td><blockquote>
<p>9606 9606 9606 9606 10090</p>
</blockquote></td>
<td><p>other other other other cDNA</p></td>
<td><p>SINGLE -SINGLE -SINGLE -SINGLE -SINGLE -</p></td>
<td><p>RNA-Seq RNA-Seq RNA-Seq RNA-Seq RNA-Seq</p></td>
<td><p>TRANSCRIPTOMIC TRANSCRIPTOMIC TRANSCRIPTOMIC TRANSCRIPTOMIC
TRANSCRIPTOMIC</p></td>
<td></td>
<td><p>1806641316 3436984836 3330909216 3622123512 594945396</p></td>
<td><blockquote>
<p>50184481 95471801 92525256</p>
</blockquote>
<dl>
<dt>100614542</dt>
<dd>
<p>16526261</p>
</dd>
</dl></td>
<td></td>
<td><blockquote>
<p>36 36 36 36 36</p>
</blockquote></td>
<td><p>riboseq riboseq riboseq riboseq riboseq</p></td>
<td><p>293t cells 293t cells 293t cells 293t cells 3t3 cells</p></td>
<td><p>293t cells 293t cells 293t cells 293t cells 3t3 cells</p></td>
<td><p>3xflag-gfp 3xflag-gfp 3xflag-hspa1a 3xflag-hspa1a NaN</p></td>
<td><p>NaN severe heat shock (44c 2 hours) NaN severe heat shock (44c 2
hours) control</p></td>
</tr>
</tbody>
</table>

## Use Case 5: Searching for datasets

Another common operation that we do on SRA is seach, plain text search.

If you want to look up for all projects where [ribosome
profiling]{.title-ref} appears somewhere in the description:

``` python
df = client.search_sra(search_str='"ribosome profiling"')
df.head()
```

<table>
<thead>
<tr class="header">
<th>study_accession</th>
<th>experiment_accession</th>
<th>experiment_title</th>
<th>run_accession</th>
<th>taxon_id</th>
<th>library_selection</th>
<th>library_layout</th>
<th>library_strategy</th>
<th>library_source</th>
<th>library_name</th>
<th>bases</th>
<th>spots</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019536</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018584</td>
<td>DRR021383</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_3</td>
<td><blockquote>
<p>978776480</p>
</blockquote></td>
<td>12234706</td>
</tr>
<tr class="even">
<td>DRP003075</td>
<td>DRX019537</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018585</td>
<td>DRR021384</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_4</td>
<td><blockquote>
<p>894201680</p>
</blockquote></td>
<td>11177521</td>
</tr>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019538</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018586</td>
<td>DRR021385</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII05_5</td>
<td><blockquote>
<p>931536720</p>
</blockquote></td>
<td>11644209</td>
</tr>
<tr class="even">
<td>DRP003075</td>
<td>DRX019540</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018588</td>
<td>DRR021387</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII07_4</td>
<td>2759398700</td>
<td>27593987</td>
</tr>
<tr class="odd">
<td>DRP003075</td>
<td>DRX019541</td>
<td>Illumina Genome Analyzer IIx sequencing of SAMD00018589</td>
<td>DRR021388</td>
<td><blockquote>
<p>83333</p>
</blockquote></td>
<td>other</td>
<td>SINGLE -</td>
<td>OTHER</td>
<td>TRANSCRIPTOMIC</td>
<td>GAII07_5</td>
<td>2386196500</td>
<td>23861965</td>
</tr>
</tbody>
</table>

Again, the results are available as a [pandas]{.title-ref} dataframe and
hence you can perform all subset operations post your query. Your query
doesn\'t need to be exact.

## Use Case 8: Finding publications (PMIDs) associated with SRA data

Sometimes you have SRA accessions and want to find the publications that describe the data generation.

``` python
from pysradb import SRAweb
client = SRAweb()

# Get PMIDs for a study accession (SRP)
pmids_df = client.srp_to_pmid('SRP002605')
pmids_df.head()
```

    sra_accession   bioproject      pmid
    SRP002605      PRJNA129385   20703300

You can also get PMIDs for other SRA accession types:

``` python
# Get PMIDs for run accessions (SRR)
srr_pmids = client.srr_to_pmid('SRR057511')

# Get PMIDs for experiment accessions (SRX) 
srx_pmids = client.srx_to_pmid('SRX021967')

# Get PMIDs for sample accessions (SRS)
srs_pmids = client.srs_to_pmid('SRS079386')

# Get PMIDs for multiple accessions at once
multi_pmids = client.sra_to_pmid(['SRP002605', 'SRP016501'])
```

You can also directly query BioProject accessions for their associated publications:

``` python  
# Get PMIDs directly from BioProject accessions
bioproject_pmids = client.fetch_bioproject_pmids(['PRJNA257197', 'PRJNA129385'])
print(bioproject_pmids)
# Output: {'PRJNA257197': ['25214632'], 'PRJNA129385': ['20703300']}
```

**Note**: This functionality relies on the cross-references maintained between BioProjects and PubMed. Not all SRA datasets have associated publications, and some publications may not be properly cross-referenced in the NCBI databases. The success rate depends on:

- Whether the authors included SRA/BioProject accessions in their manuscript
- Whether NCBI has established the cross-references 
- The publication date relative to data submission



================================================
FILE: docs/quickstart.md
================================================
# Quickstart

Most features in `pysradb` are accessible both from the command-line and
as a python package. `pysradb` usage on the two platforms will be
displayed by selecting the corresponding tab below.

```{note}
If you have any questions along the way, please head over to the
[Python API Usage](python-api-usage.md) or the
[Command Line](cmdline.md) for more information. You may
also wish to refer to the [API Documentation](commands.rst).
```

------------------------------------------------------------------------

## Notebooks

A Google Colaboratory version of most used commands are available in
this [Colab
Notebook](https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR)
. Colab runs Python 3.6 while `pysradb` requires Python 3.7+ and hence
the notebooks no longer run on Colab, but can be downloaded and run
locally.

The following notebooks document all the possible features of `pysradb`:

1.  [Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/01.Python-API_demo.ipynb)
2.  [Downloading datasets from SRA - command
    line](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/02.Commandline_download.ipynb)
3.  [Parallely download multiple datasets - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/03.ParallelDownload.ipynb)
4.  [Converting SRA-to-fastq - command line (requires
    conda)](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/04.SRA_to_fastq_conda.ipynb)
5.  [Downloading subsets of a project - Python
    API](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/05.Downloading_subsets_of_a_project.ipynb)
6.  [Download
    BAMs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/06.Download_BAMs.ipynb)
7.  [Metadata for multiple
    SRPs](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/07.Multiple_SRPs.ipynb)
8.  [Multithreaded fastq downloads using Aspera
    Client](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/08.pysradb_ascp_multithreaded.ipynb)
9.  [Searching
    SRA/GEO/ENA](https://colab.research.google.com/github/saketkc/pysradb/blob/master/notebooks/09.Query_Search.ipynb)

## Metadata

`pysradb` makes it very easy to obtain metadata from SRA/EBI:

`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP265425
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
df = client.metadata("SRP265425")
df
```
````
`````

Output:

    study_accession experiment_accession    experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_accession   run_total_spots run_total_bases
    SRP265425   SRX8434255  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745319      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1311358 83306910    SRR11886735 1311358 109594216
    SRP265425   SRX8434254  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745320      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2614109 204278682   SRR11886736 2614109 262305651
    SRP265425   SRX8434253  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745318      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2286312 183516004   SRR11886737 2286312 263304134
    SRP265425   SRX8434252  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745317      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5202567 507524965   SRR11886738 5202567 781291588
    SRP265425   SRX8434251  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745315      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 3313960 356104406   SRR11886739 3313960 612430817
    SRP265425   SRX8434250  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745316      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5155733 565882351   SRR11886740 5155733 954342917
    SRP265425   SRX8434249  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745313      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1324589 175619046   SRR11886741 1324589 216531400
    SRP265425   SRX8434248  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745314      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1639851 198973268   SRR11886742 1639851 245466005
    SRP265425   SRX8434247  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745312      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 3921389 210198580   SRR11886743 3921389 332935558
    SRP265425   SRX8434246  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745311      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 14295475    2150005008  SRR11886744 14295475    2967829315
    SRP265425   SRX8434245  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745310      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5124692 294846140   SRR11886745 5124692 431819462
    SRP265425   SRX8434244  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745309      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2986306 205666872   SRR11886746 2986306 275400959
    SRP265425   SRX8434243  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745308      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1182690 59471336    SRR11886747 1182690 86350631
    SRP265425   SRX8434242  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745307      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 6031816 749323230   SRR11886748 6031816 928054297

Additionally to obtain locations of `.fastq/.sra` files and other
metadata:

`pysradb` makes it very easy to obtain metadata from SRA/EBI:

`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP265425 --detailed
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
df = client.metadata("SRP265425", detailed=True)
df
```
````
`````

Output:

    run_accession   study_accession experiment_accession    experiment_title    experiment_desc organism_taxid  organism_name   library_name    library_strategy    library_source  library_selection   library_layout  sample_accession    sample_title    instrument  instrument_model    instrument_model_desc   total_spots total_size  run_total_spots run_total_bases run_alias   sra_url_alt1    sra_url_alt2    sra_url experiment_alias    isolate collected_by    collection_date geo_loc_name    host    host_disease    isolation_source    lat_lon BioSampleModel  sra_url_alt3    ena_fastq_http  ena_fastq_http_1    ena_fastq_http_2    ena_fastq_ftp   ena_fastq_ftp_1 ena_fastq_ftp_2
    SRR11886735 SRP265425   SRX8434255  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 63-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745319      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1311358 83306910    1311358 109594216   IonXpress_063_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam  gs://sra-pub-src-9/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886735/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   https://sra-download.ncbi.nlm.nih.gov/traces/sra0/SRR/011608/SRR11886735        GC-20   NA  02-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/035/SRR11886735/SRR11886735.fastq.gz
    SRR11886736 SRP265425   SRX8434254  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 62-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745320      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2614109 204278682   2614109 262305651   IonXpress_062_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam  gs://sra-pub-src-16/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRZ/011886/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra50/SRR/011608/SRR11886736       GC-51   NA  14-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886736/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/036/SRR11886736/SRR11886736.fastq.gz
    SRR11886737 SRP265425   SRX8434253  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 61-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745318      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2286312 183516004   2286312 263304134   IonXpress_061_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam  gs://sra-pub-src-16/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRZ/011886/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra17/SRR/011608/SRR11886737       GC-24   NA  07-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886737/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/037/SRR11886737/SRR11886737.fastq.gz
    SRR11886738 SRP265425   SRX8434252  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 60-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745317      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5202567 507524965   5202567 781291588   IonXpress_060_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam  gs://sra-pub-src-15/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1    https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam    https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011608/SRR11886738       GC-23   NA  08-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886738/IonXpress_060_R_2020_04_22_15_56_22_user_GCEID_S5_58_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/038/SRR11886738/SRR11886738.fastq.gz
    SRR11886739 SRP265425   SRX8434251  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 38-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745315      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 3313960 356104406   3313960 612430817   IonXpress_038_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-13/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886739/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   https://sra-download.ncbi.nlm.nih.gov/traces/sra24/SRR/011608/SRR11886739       GC-11b  NA  24-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/039/SRR11886739/SRR11886739.fastq.gz
    SRR11886740 SRP265425   SRX8434250  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 37-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745316      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5155733 565882351   5155733 954342917   IonXpress_037_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-5/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886740/IonXpress_037_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886740       GC-14b  NA  28-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/040/SRR11886740/SRR11886740.fastq.gz
    SRR11886741 SRP265425   SRX8434249  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 36-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745313      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1324589 175619046   1324589 216531400   IonXpress_036_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-11/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886741/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   https://sra-download.ncbi.nlm.nih.gov/traces/sra57/SRR/011608/SRR11886741       GC-12   NA  24-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/041/SRR11886741/SRR11886741.fastq.gz
    SRR11886742 SRP265425   SRX8434248  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 35-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745314      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1639851 198973268   1639851 245466005   IonXpress_035_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-11/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886742/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRR/011608/SRR11886742       GC-13   NA  23-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/042/SRR11886742/SRR11886742.fastq.gz
    SRR11886743 SRP265425   SRX8434247  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 68-2020-05-07   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745312      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 3921389 210198580   3921389 332935558   IonXpress_068_R_2020_05_07_11_47_51_user_GCEID-S5-60-SARS_CoV2_SA4.bam  gs://sra-pub-src-17/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra64/SRZ/011886/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra54/SRR/011608/SRR11886743       GC-55   NA  24-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886743/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/043/SRR11886743/SRR11886743.fastq.gz
    SRR11886744 SRP265425   SRX8434246  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 66-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745311      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 14295475    2150005008  14295475    2967829315  IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq    gs://sra-pub-src-11/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1  https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886744/IonXpress_066_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.fastq.1   https://sra-download.ncbi.nlm.nih.gov/traces/sra20/SRR/011608/SRR11886744       GC-26   NA  07-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl     http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/044/SRR11886744/SRR11886744.fastq.gz
    SRR11886745 SRP265425   SRX8434245  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 65-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745310      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 5124692 294846140   5124692 431819462   IonXpress_065_R_2020_04_22_11_10_56_user_GCEID-S5-57-SARS_CoV2_SA4.bam  gs://sra-pub-src-16/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra69/SRZ/011886/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra19/SRR/011608/SRR11886745       GC-25   NA  10-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886745/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/045/SRR11886745/SRR11886745.fastq.gz
    SRR11886746 SRP265425   SRX8434244  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 64-2020-04-22   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745309      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 2986306 205666872   2986306 275400959   IonXpress_064_R_2020_04_22_15_56_22_user_GCEID-S5-58-SARS_CoV2_SA4.bam  gs://sra-pub-src-17/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra59/SRZ/011886/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra47/SRR/011608/SRR11886746       GC-21   NA  03-Apr-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886746/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/046/SRR11886746/SRR11886746.fastq.gz
    SRR11886747 SRP265425   SRX8434243  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 34-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745308      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 1182690 59471336    1182690 86350631    IonXpress_034_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-16/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1  https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRZ/011886/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta  https://sra-download.ncbi.nlm.nih.gov/traces/sra13/SRR/011608/SRR11886747       GC-11a  NA  24-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886747/Wuhan_Hu_1_NC_045512_21500_and_subgenomics_SA4.fasta.1   http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/047/SRR11886747/SRR11886747.fastq.gz
    SRR11886748 SRP265425   SRX8434242  Ampliseq of SARS-CoV-2  Ampliseq of SARS-CoV-2  2697049 Severe acute respiratory syndrome coronavirus 2 33-2020-04-03   AMPLICON    VIRAL RNA   RT-PCR  SINGLE  SRS6745307      Ion Torrent S5 XL   Ion Torrent S5 XL   ION_TORRENT 6031816 749323230   6031816 928054297   IonXpress_033_R_2020_04_03_10_09_05_user_GCEID-S5-55-SARS_CoV2_SA4.bam  gs://sra-pub-src-15/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1    https://sra-download.ncbi.nlm.nih.gov/traces/sra43/SRZ/011886/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam    https://sra-download.ncbi.nlm.nih.gov/traces/sra66/SRR/011608/SRR11886748       GC-14a  NA  28-Mar-2020 Australia: Victoria Homo sapiens    COVID-19    swab    NA  Pathogen.cl https://sra-pub-sars-cov2.s3.amazonaws.com/sra-src/SRR11886748/IonXpress_033_R_2020_04_03_10_09_05_user_GCEID_S5_55_SARS_CoV2_SA4.bam.1 http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz         era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR118/048/SRR11886748/SRR11886748.fastq.gz

## Converting between accession numbers

`pysradb` provides a suite of commands for interoperability between
conversion numbers.

### Convert SRP to SRX

`````{tabs}
````{tab} Console
``` bash
$ pysradb srp-to-srx SRP098789
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
df = client.srp-to-srx("SRP098789")
df
```
````
`````

Output:

    study_accession experiment_accession    experiment_title        experiment_desc organism_taxid  organism_name   library_strategy        library_source  library_selection       sample_accession        sample_title    instrument      total_spots     total_size      run_accession   run_total_spots run_total_bases study_accesssion
    SRP098789       SRX2536428      GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq       GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq       9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956378      Illumina HiSeq 2500     69422931        1545681856      SRR5227313      69422931        3540569481      SRP098789
    SRP098789       SRX2536427      GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER   GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER   9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956377      Illumina HiSeq 2500     58065134        1302369810      SRR5227312      58065134        2961321834      SRP098789
    SRP098789       SRX2536426      GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq       GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq       9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956376      Illumina HiSeq 2500     63720205        1416818619      SRR5227311      63720205        3249730455      SRP098789
    SRP098789       SRX2536425      GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER   GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER   9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956375      Illumina HiSeq 2500     66363585        1482728577      SRR5227310      66363585        3384542835      SRP098789
    SRP098789       SRX2536424      GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq      GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq      9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956374      Illumina HiSeq 2500     40062613        904488287       SRR5227309      40062613        2043193263      SRP098789
    SRP098789       SRX2536423      GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER    GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER    9606    Homo sapiens    OTHER   TRANSCRIPTOMIC other    SRS1956373      Illumina HiSeq 2500     65591217        1499668100      SRR5227308      65591217        3345152067      SRP098789
    SRP098789       SRX2536422      GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq      GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq      9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956372      Illumina HiSeq 2500     66480991        1564636133      SRR5227307      66480991        3390530541      SRP098789
    SRP098789       SRX2536421      GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER    GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER    9606    Homo sapiens    OTHER   TRANSCRIPTOMIC other    SRS1956371      Illumina HiSeq 2500     57588015        1357395400      SRR5227306      57588015        2936988765      SRP098789
    SRP098789       SRX2536420      GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER  GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956370      Illumina HiSeq 2000    48405034 1530784033      SRR5227305      48405034        2420251700      SRP098789
    SRP098789       SRX2536419      GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER  GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956369      Illumina HiSeq 2000    47139057 1489018603      SRR5227304      47139057        2356952850      SRP098789
    SRP098789       SRX2536418      GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER  GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956368      Illumina HiSeq 2000    50956178 1495757884      SRR5227303      50956178        2547808900      SRP098789
    SRP098789       SRX2536417      GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956367      Illumina HiSeq 2000     44258180        1404548468      SRR5227302      44258180        2212909000      SRP098789
    SRP098789       SRX2536416      GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956366      Illumina HiSeq 2000     49129512        1536091510      SRR5227301      49129512        2456475600      SRP098789
    SRP098789       SRX2536415      GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956365      Illumina HiSeq 2000     30043362        903983724       SRR5227300      30043362        1502168100      SRP098789
    SRP098789       SRX2536414      GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956364      Illumina HiSeq 2000     48766213        1530350854      SRR5227299      48766213        2438310650      SRP098789
    SRP098789       SRX2536413      GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956363      Illumina HiSeq 2000     49334392        1475414353      SRR5227298      49334392        2466719600      SRP098789
    SRP098789       SRX2536412      GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956362      Illumina HiSeq 2000     60381365        1801283052      SRR5227297      60381365        3019068250      SRP098789
    SRP098789       SRX2536411      GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER  GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956361      Illumina HiSeq 2000    52737784 1644829192      SRR5227296      52737784        2636889200      SRP098789
    SRP098789       SRX2536410      GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER  GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956360      Illumina HiSeq 2000    46137148 1455541408      SRR5227295      46137148        2306857400      SRP098789
    SRP098789       SRX2536409      GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER  GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956359      Illumina HiSeq 2000    76002122 1552821132      SRR5227294      76002122        3800106100      SRP098789
    SRP098789       SRX2536408      GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956358      Illumina HiSeq 2000     42709138        1338829352      SRR5227293      42709138        2135456900      SRP098789
    SRP098789       SRX2536407      GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956357      Illumina HiSeq 2000     60552437        1875910244      SRR5227292      60552437        3027621850      SRP098789
    SRP098789       SRX2536406      GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956356      Illumina HiSeq 2000     41143319        843881081       SRR5227291      41143319        2057165950      SRP098789
    SRP098789       SRX2536405      GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956355      Illumina HiSeq 2000     40462973        1287284933      SRR5227290      40462973        2023148650      SRP098789
    SRP098789       SRX2536404      GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956354      Illumina HiSeq 2000     41657461        1360366732      SRR5227289      41657461        2082873050      SRP098789
    SRP098789       SRX2536403      GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956353      Illumina HiSeq 2000     42082855        916745706       SRR5227288      42082855        2104142750      SRP098789

### Convert GSE to SRP

`````{tabs}
````{tab} Console
``` bash
$ pysradb srp-to-srx SRP098789
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
df = client.srp-to-srx("SRP098789")
df
```
````
`````

Output:

    study_accession experiment_accession    experiment_title        experiment_desc organism_taxid  organism_name   library_strategy        library_source  library_selection       sample_accession        sample_title    instrument      total_spots     total_size      run_accession   run_total_spots run_total_bases study_accesssion
    SRP098789       SRX2536428      GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq       GSM2476022: vehicle, 60 min, rep 5-mRNAseq; Homo sapiens; RNA-Seq       9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956378      Illumina HiSeq 2500     69422931        1545681856      SRR5227313      69422931        3540569481      SRP098789
    SRP098789       SRX2536427      GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER   GSM2476021: PF-06446846, 60 min, rep 5 -mRNA-seq; Homo sapiens; OTHER   9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956377      Illumina HiSeq 2500     58065134        1302369810      SRR5227312      58065134        2961321834      SRP098789
    SRP098789       SRX2536426      GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq       GSM2476020: vehicle, 60 min, rep 4-mRNAseq; Homo sapiens; RNA-Seq       9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956376      Illumina HiSeq 2500     63720205        1416818619      SRR5227311      63720205        3249730455      SRP098789
    SRP098789       SRX2536425      GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER   GSM2476019: PF-06446846, 60 min, rep 4 -mRNA-seq; Homo sapiens; OTHER   9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956375      Illumina HiSeq 2500     66363585        1482728577      SRR5227310      66363585        3384542835      SRP098789
    SRP098789       SRX2536424      GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq      GSM2476018: vehicle, 60 min, rep 5-Ribo-seq; Homo sapiens; RNA-Seq      9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956374      Illumina HiSeq 2500     40062613        904488287       SRR5227309      40062613        2043193263      SRP098789
    SRP098789       SRX2536423      GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER    GSM2476017: 1.5 ?M PF-067446846, 60 min, rep 5 -riboseq; Homo sapiens; OTHER    9606    Homo sapiens    OTHER   TRANSCRIPTOMIC other    SRS1956373      Illumina HiSeq 2500     65591217        1499668100      SRR5227308      65591217        3345152067      SRP098789
    SRP098789       SRX2536422      GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq      GSM2476016: Vehicle, 60 min, rep 4-ribo-seq; Homo sapiens; RNA-Seq      9606    Homo sapiens    RNA-Seq TRANSCRIPTOMIC  cDNA    SRS1956372      Illumina HiSeq 2500     66480991        1564636133      SRR5227307      66480991        3390530541      SRP098789
    SRP098789       SRX2536421      GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER    GSM2476015: 1.5 ?M PF-067446846, 60 min, rep 4 -riboseq; Homo sapiens; OTHER    9606    Homo sapiens    OTHER   TRANSCRIPTOMIC other    SRS1956371      Illumina HiSeq 2500     57588015        1357395400      SRR5227306      57588015        2936988765      SRP098789
    SRP098789       SRX2536420      GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER  GSM2476014: vehicle, 60 min rep 3; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956370      Illumina HiSeq 2000    48405034 1530784033      SRR5227305      48405034        2420251700      SRP098789
    SRP098789       SRX2536419      GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER  GSM2476013: vehicle, 60 min rep 2; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956369      Illumina HiSeq 2000    47139057 1489018603      SRR5227304      47139057        2356952850      SRP098789
    SRP098789       SRX2536418      GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER  GSM2476012: vehicle, 60 min rep 1; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956368      Illumina HiSeq 2000    50956178 1495757884      SRR5227303      50956178        2547808900      SRP098789
    SRP098789       SRX2536417      GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     GSM2476011: 0.3 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956367      Illumina HiSeq 2000     44258180        1404548468      SRR5227302      44258180        2212909000      SRP098789
    SRP098789       SRX2536416      GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     GSM2476010: 0.3 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956366      Illumina HiSeq 2000     49129512        1536091510      SRR5227301      49129512        2456475600      SRP098789
    SRP098789       SRX2536415      GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     GSM2476009: 0.3 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956365      Illumina HiSeq 2000     30043362        903983724       SRR5227300      30043362        1502168100      SRP098789
    SRP098789       SRX2536414      GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     GSM2476008: 1.5 ?M PF-067446846, 60 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956364      Illumina HiSeq 2000     48766213        1530350854      SRR5227299      48766213        2438310650      SRP098789
    SRP098789       SRX2536413      GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     GSM2476007: 1.5 ?M PF-067446846, 60 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956363      Illumina HiSeq 2000     49334392        1475414353      SRR5227298      49334392        2466719600      SRP098789
    SRP098789       SRX2536412      GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     GSM2476006: 1.5 ?M PF-067446846, 60 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956362      Illumina HiSeq 2000     60381365        1801283052      SRR5227297      60381365        3019068250      SRP098789
    SRP098789       SRX2536411      GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER  GSM2476005: vehicle, 10 min rep 3; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956361      Illumina HiSeq 2000    52737784 1644829192      SRR5227296      52737784        2636889200      SRP098789
    SRP098789       SRX2536410      GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER  GSM2476004: vehicle, 10 min rep 2; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956360      Illumina HiSeq 2000    46137148 1455541408      SRR5227295      46137148        2306857400      SRP098789
    SRP098789       SRX2536409      GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER  GSM2476003: vehicle, 10 min rep 1; Homo sapiens; OTHER  9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956359      Illumina HiSeq 2000    76002122 1552821132      SRR5227294      76002122        3800106100      SRP098789
    SRP098789       SRX2536408      GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     GSM2476002: 0.3 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956358      Illumina HiSeq 2000     42709138        1338829352      SRR5227293      42709138        2135456900      SRP098789
    SRP098789       SRX2536407      GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     GSM2476001: 0.3 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956357      Illumina HiSeq 2000     60552437        1875910244      SRR5227292      60552437        3027621850      SRP098789
    SRP098789       SRX2536406      GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     GSM2476000: 0.3 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956356      Illumina HiSeq 2000     41143319        843881081       SRR5227291      41143319        2057165950      SRP098789
    SRP098789       SRX2536405      GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     GSM2475999: 1.5 ?M PF-067446846, 10 min, rep 3; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956355      Illumina HiSeq 2000     40462973        1287284933      SRR5227290      40462973        2023148650      SRP098789
    SRP098789       SRX2536404      GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     GSM2475998: 1.5 ?M PF-067446846, 10 min, rep 2; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956354      Illumina HiSeq 2000     41657461        1360366732      SRR5227289      41657461        2082873050      SRP098789
    SRP098789       SRX2536403      GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     GSM2475997: 1.5 ?M PF-067446846, 10 min, rep 1; Homo sapiens; OTHER     9606    Homo sapiens    OTHER   TRANSCRIPTOMIC  other   SRS1956353      Illumina HiSeq 2000     42082855        916745706       SRR5227288      42082855        2104142750      SRP098789

------------------------------------------------------------------------

## Downloading sequencing data

`pysradb` can alse be used to download either `.fastq` or `.sra`
filesboth from ENA and SRA.

### Downloading via accession number

`````{tabs}
````{tab} Console
``` bash
$ pysradb download SRP098789
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
client.download("SRP098789")
```
````
`````

It is also possible to pipe the dataframe from [metadata]{.title-ref} or
[search]{.title-ref} to download, after filtering the dataframe entries:

`````{tabs}
````{tab} Console
``` bash
$ pysradb metadata SRP276671 --detailed | pysradb download
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb
client = SRAweb()
df = client.sra_metadata('SRP016501', detailed=True)
client.download(df=df)
```
````
`````

### Ultrafast fastq downloads

With
[aspera-client](https://downloads.asperasoft.com/en/downloads/8?list)
installed, `pysradb` canan perform ultra fast downloads:

To download all original fastqs with [aspera-client]{.title-ref}
installed utilizing 8 threads:

`````{tabs}
````{tab} Console
``` console
$ pysradb download -t 8 --use_ascp -p SRP002605
```
````

````{tab} Python
``` python
from pysradb.sraweb import SRAweb

client = SRAweb()
client.download("SRP098789", use_ascp=True, threads=8)
```
````
`````
Download .txt
gitextract_rf093ld_/

├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   ├── ISSUE_TEMPLATE.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── codeql-analysis.yml
│       ├── publish.yml
│       ├── pull_request.yml
│       └── push.yml
├── .gitignore
├── AUTHORS.md
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── HISTORY.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── docs/
│   ├── Makefile
│   ├── _static/
│   │   ├── copy-button.js
│   │   └── custom.css
│   ├── authors.md
│   ├── case_studies.md
│   ├── cmdline.md
│   ├── commands.rst
│   ├── conf.py
│   ├── contributing.md
│   ├── history.md
│   ├── index.rst
│   ├── installation.md
│   ├── make.bat
│   ├── modules.rst
│   ├── notebooks.rst
│   ├── pysradb.rst
│   ├── python-api-usage.md
│   └── quickstart.md
├── notebooks/
│   ├── 01.Python-API_demo.ipynb
│   ├── 02.Commandline_download.ipynb
│   ├── 03.ParallelDownload.ipynb
│   ├── 04.SRA_to_fastq_conda.ipynb
│   ├── 05.Downloading_subsets_of_a_project.ipynb
│   ├── 06.Multiple_SRPs.ipynb
│   ├── 07.Query_Search.ipynb
│   ├── 08.PMC_DOI_Identifiers.ipynb
│   ├── 09.Metadata_enrichment.ipynb
│   ├── 11.Parse_Bioscience_Search.ipynb
│   └── README.md
├── pyproject.toml
├── pysradb/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py
│   ├── download.py
│   ├── exceptions.py
│   ├── filter_attrs.py
│   ├── geoweb.py
│   ├── metadata_enrichment.py
│   ├── ontology_reference.json
│   ├── search.py
│   ├── sraweb.py
│   ├── taxid2name.py
│   └── utils.py
├── requirements.txt
├── setup.cfg
└── tests/
    ├── conftest.py
    ├── data/
    │   └── test_search/
    │       ├── ena_search_test1.txt
    │       ├── ena_test_verbosity_0.csv
    │       ├── ena_test_verbosity_0.json
    │       ├── ena_test_verbosity_1.csv
    │       ├── ena_test_verbosity_1.json
    │       ├── ena_test_verbosity_2.csv
    │       ├── ena_test_verbosity_2.json
    │       ├── ena_test_verbosity_3.csv
    │       ├── ena_test_verbosity_3.json
    │       ├── geo_search_test1.txt
    │       ├── sra_search_test1.txt
    │       ├── sra_test.xml
    │       ├── sra_test_2_verbosity_0.csv
    │       ├── sra_test_2_verbosity_1.csv
    │       ├── sra_test_2_verbosity_2.csv
    │       ├── sra_test_2_verbosity_3.csv
    │       ├── sra_test_ERS3331676.xml
    │       ├── sra_test_verbosity_0.csv
    │       ├── sra_test_verbosity_1.csv
    │       ├── sra_test_verbosity_2.csv
    │       ├── sra_test_verbosity_3.csv
    │       └── sra_uids.txt
    ├── test_geoweb.py
    ├── test_search.py
    ├── test_sraweb.py
    └── test_utils.py
Download .txt
SYMBOL INDEX (324 symbols across 13 files)

FILE: pysradb/cli.py
  class CustomFormatterArgP (line 33) | class CustomFormatterArgP(
  class ArgParser (line 39) | class ArgParser(argparse.ArgumentParser):
    method error (line 40) | def error(self, message):
  function pretty_print_df (line 46) | def pretty_print_df(df, include_header=True):
  function _create_table (line 92) | def _create_table(df, terminal_width, include_header, format_value):
  function _print_save_df (line 149) | def _print_save_df(df, saveto=None):
  function metadata (line 189) | def metadata(
  function download (line 248) | def download(
  function search (line 299) | def search(saveto, db, verbosity, return_max, fields):
  function get_geo_search_info (line 370) | def get_geo_search_info():
  function gse_to_gsm (line 385) | def gse_to_gsm(gse_ids, saveto, detailed, desc, expand):
  function gse_to_srp (line 400) | def gse_to_srp(gse_ids, saveto, detailed, desc, expand):
  function gsm_to_gse (line 415) | def gsm_to_gse(gsm_ids, saveto, detailed, desc, expand):
  function gsm_to_srp (line 430) | def gsm_to_srp(gsm_ids, saveto, detailed, desc, expand):
  function gsm_to_srr (line 445) | def gsm_to_srr(gsm_ids, saveto, detailed, desc, expand):
  function gsm_to_srs (line 460) | def gsm_to_srs(gsm_ids, saveto, detailed, desc, expand):
  function gsm_to_srx (line 475) | def gsm_to_srx(gsm_ids, saveto, detailed, desc, expand):
  function srp_to_gse (line 490) | def srp_to_gse(srp_id, saveto, detailed, desc, expand):
  function srp_to_srr (line 505) | def srp_to_srr(srp_id, saveto, detailed, desc, expand):
  function srp_to_srs (line 520) | def srp_to_srs(srp_id, saveto, detailed, desc, expand):
  function srp_to_srx (line 535) | def srp_to_srx(srp_id, saveto, detailed, desc, expand):
  function srr_to_gsm (line 550) | def srr_to_gsm(srr_ids, saveto, detailed, desc, expand):
  function srr_to_srp (line 565) | def srr_to_srp(srr_ids, saveto, detailed, desc, expand):
  function srr_to_srs (line 580) | def srr_to_srs(srr_ids, saveto, detailed, desc, expand):
  function srr_to_srx (line 595) | def srr_to_srx(srr_ids, saveto, detailed, desc, expand):
  function srs_to_gsm (line 610) | def srs_to_gsm(srs_ids, saveto, detailed, desc, expand):
  function srs_to_srx (line 625) | def srs_to_srx(srs_ids, saveto, detailed, desc, expand):
  function srx_to_srp (line 640) | def srx_to_srp(srx_ids, saveto, detailed, desc, expand):
  function srx_to_srr (line 655) | def srx_to_srr(srx_ids, saveto, detailed, desc, expand):
  function srx_to_srs (line 670) | def srx_to_srs(srx_ids, saveto, detailed, desc, expand):
  function srp_to_pmid (line 681) | def srp_to_pmid(srp_ids, saveto):
  function sra_to_pmid (line 687) | def sra_to_pmid(sra_ids, saveto):
  function gse_to_pmid (line 694) | def gse_to_pmid(gse_ids, saveto):
  function pmid_to_gse (line 700) | def pmid_to_gse(pmid_ids, saveto):
  function pmid_to_srp (line 706) | def pmid_to_srp(pmid_ids, saveto):
  function pmc_to_identifiers (line 712) | def pmc_to_identifiers(pmc_ids, saveto):
  function pmid_to_identifiers (line 718) | def pmid_to_identifiers(pmid_ids, saveto):
  function doi_to_gse (line 724) | def doi_to_gse(doi_ids, saveto):
  function doi_to_srp (line 730) | def doi_to_srp(doi_ids, saveto):
  function doi_to_identifiers (line 736) | def doi_to_identifiers(doi_ids, saveto):
  function geo_matrix (line 746) | def geo_matrix(accession, to_tsv, output_dir):
  function parse_args (line 765) | def parse_args(args=None):

FILE: pysradb/download.py
  function _get_ftp_file_size (line 24) | def _get_ftp_file_size(url):
  function _download_ftp_file (line 48) | def _download_ftp_file(
  function millify (line 120) | def millify(n):
  function get_file_size (line 145) | def get_file_size(row, url_col):
  function md5_validate_file (line 192) | def md5_validate_file(file_path, md5_hash):
  function download_file (line 218) | def download_file(

FILE: pysradb/exceptions.py
  class MissingQueryException (line 4) | class MissingQueryException(Exception):
    method __init__ (line 13) | def __init__(self):
  class IncorrectFieldException (line 23) | class IncorrectFieldException(Exception):

FILE: pysradb/filter_attrs.py
  function _get_sample_attr_keys (line 8) | def _get_sample_attr_keys(sample_attribute):
  function expand_sample_attribute_columns (line 60) | def expand_sample_attribute_columns(metadata_df):
  function guess_cell_type (line 123) | def guess_cell_type(sample_attribute):
  function guess_tissue_type (line 163) | def guess_tissue_type(sample_attribute):
  function guess_strain_type (line 189) | def guess_strain_type(sample_attribute):

FILE: pysradb/geoweb.py
  class GEOweb (line 21) | class GEOweb(object):
    method __init__ (line 22) | def __init__(self):
    method get_download_links (line 25) | def get_download_links(self, gse):
    method download (line 62) | def download(self, links, root_url, gse, verbose=False, out_dir=None):
  function download_geo_matrix (line 111) | def download_geo_matrix(accession, output_dir="."):
  function parse_geo_matrix_to_tsv (line 141) | def parse_geo_matrix_to_tsv(input_file, output_file):

FILE: pysradb/metadata_enrichment.py
  function _prompt_install_enrichment_dependencies (line 19) | def _prompt_install_enrichment_dependencies() -> bool:
  class MetadataExtractor (line 60) | class MetadataExtractor(ABC):
    method __init__ (line 63) | def __init__(self):
    method extract_metadata (line 67) | def extract_metadata(
    method extract_batch (line 82) | def extract_batch(
    method _find_column_variant (line 97) | def _find_column_variant(self, df: pd.DataFrame, target_col: str) -> O...
    method enrich_dataframe (line 121) | def enrich_dataframe(
  class _MetadataExtraction (line 211) | class _MetadataExtraction(BaseModel):
  function load_ontology_reference (line 256) | def load_ontology_reference() -> Dict[str, List[str]]:
  class LLMMetadataExtractor (line 279) | class LLMMetadataExtractor(MetadataExtractor):
    method __init__ (line 282) | def __init__(
    method _provider_env_key (line 305) | def _provider_env_key(self) -> Optional[str]:
    method _check_ollama_available (line 319) | def _check_ollama_available(self) -> bool:
    method _initialize_client (line 346) | def _initialize_client(self):
    method _create_extraction_prompt (line 381) | def _create_extraction_prompt(
    method _call_llm (line 486) | def _call_llm(self, prompt: str) -> Dict[str, Any]:
    method extract_metadata (line 507) | def extract_metadata(
  class EmbeddingMetadataExtractor (line 542) | class EmbeddingMetadataExtractor(MetadataExtractor):
    method __init__ (line 545) | def __init__(
    method _load_model (line 578) | def _load_model(self):
    method _get_cache_path (line 611) | def _get_cache_path(self) -> str:
    method _compute_reference_embeddings (line 625) | def _compute_reference_embeddings(self) -> Dict[str, Any]:
    method _find_best_match (line 660) | def _find_best_match(
    method _parse_structured_fields (line 682) | def _parse_structured_fields(self, text: str) -> Dict[str, str]:
    method _match_value_or_text (line 704) | def _match_value_or_text(
    method extract_metadata (line 751) | def extract_metadata(
  function create_metadata_extractor (line 826) | def create_metadata_extractor(
  function apply_dataframe_enrichment (line 865) | def apply_dataframe_enrichment(

FILE: pysradb/search.py
  class QuerySearch (line 22) | class QuerySearch:
    method __init__ (line 94) | def __init__(
    method _input_multi_regex_checker (line 171) | def _input_multi_regex_checker(self, regex_matcher, input_query, error...
    method _validate_fields (line 218) | def _validate_fields(self):
    method _list_stat (line 451) | def _list_stat(self, stat_header):
    method show_result_statistics (line 461) | def show_result_statistics(self):
    method visualise_results (line 494) | def visualise_results(
    method search (line 576) | def search(self):
    method get_df (line 579) | def get_df(self):
    method get_plot_objects (line 583) | def get_plot_objects(self):
    method _plot_graph (line 587) | def _plot_graph(self, plt, axes, show, savedir, too_many_organisms):
  class SraSearch (line 681) | class SraSearch(QuerySearch):
    method __init__ (line 718) | def __init__(
    method search (line 755) | def search(self):
    method get_uids (line 803) | def get_uids(self):
    method _format_query_string (line 811) | def _format_query_string(self):
    method _format_request (line 840) | def _format_request(self):
    method _format_response (line 849) | def _format_response(self, content):
    method _format_result (line 870) | def _format_result(self):
    method _parse_entry (line 931) | def _parse_entry(self, entry_root):
    method _update_entry (line 1096) | def _update_entry(self, field_name, field_content):
    method _update_stats (line 1122) | def _update_stats(self):
    method _merge_selected_columns (line 1204) | def _merge_selected_columns(self, regex):
  class EnaSearch (line 1217) | class EnaSearch(QuerySearch):
    method search (line 1252) | def search(self):
    method _format_query_string (line 1277) | def _format_query_string(self):
    method _format_request (line 1342) | def _format_request(self):
    method _format_result (line 1381) | def _format_result(self, content):
    method _update_stats (line 1421) | def _update_stats(self):
  class GeoSearch (line 1499) | class GeoSearch(SraSearch):
    method __init__ (line 1537) | def __init__(
    method _format_geo_query_string (line 1620) | def _format_geo_query_string(self):
    method _format_geo_request (line 1637) | def _format_geo_request(self):
    method _format_request (line 1647) | def _format_request(self):
    method search (line 1660) | def search(self):
    method _combine_uids (line 1749) | def _combine_uids(self, uids_from_sra, uids_from_geo):
    method info (line 1769) | def info(cls):

FILE: pysradb/sraweb.py
  function xmlescape (line 23) | def xmlescape(data):
  function _make_hashable (line 27) | def _make_hashable(obj):
  function _order_first (line 45) | def _order_first(df, column_order_list):
  function _retry_response (line 59) | def _retry_response(base_url, payload, key, max_retries=10):
  function get_retmax (line 74) | def get_retmax(n_records, retmax=500):
  class SRAweb (line 80) | class SRAweb(object):
    method __init__ (line 81) | def __init__(self, api_key=None):
    method format_xml (line 131) | def format_xml(string):
    method xml_to_json (line 147) | def xml_to_json(xml):
    method bioproject_to_srp (line 169) | def bioproject_to_srp(self, bioproject):
    method fetch_ena_fastq (line 236) | def fetch_ena_fastq(self, srp):
    method create_esummary_params (line 321) | def create_esummary_params(self, esearchresult, db="sra"):
    method get_esummary_response (line 339) | def get_esummary_response(self, db, term, usehistory="y"):
    method get_efetch_response (line 411) | def get_efetch_response(self, db, term, usehistory="y"):
    method sra_metadata (line 495) | def sra_metadata(
    method fetch_gds_results (line 936) | def fetch_gds_results(self, gse, **kwargs):
    method fetch_gsm_soft (line 972) | def fetch_gsm_soft(self, gsm_ids):
    method geo_metadata (line 1039) | def geo_metadata(
    method metadata (line 1465) | def metadata(self, accession, **kwargs):
    method gse_to_gsm (line 1506) | def gse_to_gsm(self, gse, **kwargs):
    method gse_to_srp (line 1536) | def gse_to_srp(self, gse, **kwargs):
    method gsm_to_srp (line 1604) | def gsm_to_srp(self, gsm, **kwargs):
    method gsm_to_srr (line 1612) | def gsm_to_srr(self, gsm, **kwargs):
    method gsm_to_srs (line 1627) | def gsm_to_srs(self, gsm, **kwargs):
    method gsm_to_srx (line 1641) | def gsm_to_srx(self, gsm, **kwargs):
    method gsm_to_gse (line 1652) | def gsm_to_gse(self, gsm, **kwargs):
    method srp_to_gse (line 1697) | def srp_to_gse(self, srp, **kwargs):
    method srp_to_srr (line 1722) | def srp_to_srr(self, srp, **kwargs):
    method srp_to_srs (line 1727) | def srp_to_srs(self, srp, **kwargs):
    method srp_to_srx (line 1732) | def srp_to_srx(self, srp, **kwargs):
    method srr_to_gsm (line 1738) | def srr_to_gsm(self, srr, **kwargs):
    method srr_to_srp (line 1758) | def srr_to_srp(self, srr, **kwargs):
    method srr_to_srs (line 1768) | def srr_to_srs(self, srr, **kwargs):
    method srr_to_srx (line 1776) | def srr_to_srx(self, srr, **kwargs):
    method srs_to_gsm (line 1784) | def srs_to_gsm(self, srs, **kwargs):
    method srx_to_gsm (line 1795) | def srx_to_gsm(self, srx, **kwargs):
    method srs_to_srx (line 1805) | def srs_to_srx(self, srs, **kwargs):
    method srx_to_srp (line 1810) | def srx_to_srp(self, srx, **kwargs):
    method srx_to_srr (line 1815) | def srx_to_srr(self, srx, **kwargs):
    method srx_to_srs (line 1820) | def srx_to_srs(self, srx, **kwargs):
    method search (line 1825) | def search(self, *args, **kwargs):
    method fetch_bioproject_pmids (line 1828) | def fetch_bioproject_pmids(self, bioprojects):
    method srp_to_pmid (line 1912) | def srp_to_pmid(self, srp_accessions):
    method _search_fallback_pmids (line 1965) | def _search_fallback_pmids(self, srp_accessions):
    method _extract_sra_accession (line 2000) | def _extract_sra_accession(self, row):
    method _get_smallest_pmid (line 2010) | def _get_smallest_pmid(self, pmids):
    method extract_external_sources (line 2025) | def extract_external_sources(self, metadata_df):
    method _search_gse_gsm_pmids (line 2064) | def _search_gse_gsm_pmids(self, metadata_df, sra_accessions):
    method _bioproject_to_gse (line 2128) | def _bioproject_to_gse(self, bioproject):
    method _srp_to_gse_via_elink (line 2187) | def _srp_to_gse_via_elink(self, srp_id):
    method _search_pmc_by_bioproject (line 2270) | def _search_pmc_by_bioproject(self, bioproject_id):
    method search_pmc_for_external_sources (line 2337) | def search_pmc_for_external_sources(self, external_sources):
    method sra_to_pmid (line 2411) | def sra_to_pmid(self, sra_accessions):
    method srr_to_pmid (line 2437) | def srr_to_pmid(self, srr):
    method srx_to_pmid (line 2441) | def srx_to_pmid(self, srx):
    method srs_to_pmid (line 2445) | def srs_to_pmid(self, srs):
    method gse_to_pmid (line 2449) | def gse_to_pmid(self, gse_accessions):
    method doi_to_pmid (line 2479) | def doi_to_pmid(self, dois):
    method pmid_to_pmc (line 2529) | def pmid_to_pmc(self, pmids):
    method fetch_pmc_fulltext (line 2579) | def fetch_pmc_fulltext(self, pmc_id):
    method extract_identifiers_from_text (line 2610) | def extract_identifiers_from_text(self, text):
    method pmc_to_identifiers (line 2651) | def pmc_to_identifiers(self, pmc_ids, convert_missing=True):
    method pmid_to_identifiers (line 2802) | def pmid_to_identifiers(self, pmids):
    method pmid_to_gse (line 2873) | def pmid_to_gse(self, pmids):
    method pmid_to_srp (line 2889) | def pmid_to_srp(self, pmids):
    method doi_to_identifiers (line 2905) | def doi_to_identifiers(self, dois):
    method doi_to_gse (line 2976) | def doi_to_gse(self, dois):
    method doi_to_srp (line 2992) | def doi_to_srp(self, dois):

FILE: pysradb/utils.py
  function path_leaf (line 26) | def path_leaf(path):
  function requests_3_retries (line 43) | def requests_3_retries():
  function scientific_name_to_taxid (line 64) | def scientific_name_to_taxid(name):
  function unique (line 110) | def unique(sequence):
  class TqdmUpTo (line 126) | class TqdmUpTo(tqdm):
    method update_to (line 136) | def update_to(self, b=1, bsize=1, tsize=None):
  function _extract_first_field (line 150) | def _extract_first_field(data):
  function _find_aspera_keypath (line 155) | def _find_aspera_keypath(aspera_dir=None):
  function mkdir_p (line 177) | def mkdir_p(path):
  function order_dataframe (line 195) | def order_dataframe(df, columns):
  function _get_url (line 212) | def _get_url(url, download_to, show_progress=True):
  function run_command (line 237) | def run_command(command, verbose=False):
  function get_gzip_uncompressed_size (line 255) | def get_gzip_uncompressed_size(filepath):
  function confirm (line 272) | def confirm(preceeding_text):
  function copyfileobj (line 295) | def copyfileobj(fsrc, fdst, bufsize=16384, filesize=None, desc=""):

FILE: tests/test_geoweb.py
  function geoweb_connection (line 13) | def geoweb_connection():
  function test_valid_download_links (line 19) | def test_valid_download_links(geoweb_connection):
  function test_invalid_download_links (line 25) | def test_invalid_download_links(geoweb_connection):
  function test_file_download (line 31) | def test_file_download(geoweb_connection):

FILE: tests/test_search.py
  function valid_search_inputs_1 (line 25) | def valid_search_inputs_1():
  function valid_search_inputs_2 (line 224) | def valid_search_inputs_2():
  function valid_search_inputs_geo (line 343) | def valid_search_inputs_geo():
  function empty_search_inputs (line 499) | def empty_search_inputs():
  function empty_search_inputs_geo (line 523) | def empty_search_inputs_geo():
  function invalid_search_inputs (line 550) | def invalid_search_inputs():
  function sra_response_xml_1 (line 781) | def sra_response_xml_1():
  function sra_formatted_responses_1 (line 786) | def sra_formatted_responses_1():
  function sra_response_xml_2 (line 812) | def sra_response_xml_2():
  function sra_formatted_responses_2 (line 817) | def sra_formatted_responses_2():
  function sra_uids (line 843) | def sra_uids():
  function ena_responses_json (line 850) | def ena_responses_json():
  function ena_formatted_responses (line 859) | def ena_formatted_responses():
  function missing_query_test (line 887) | def missing_query_test(empty_search_inputs):
  function test_invalid_search_query (line 893) | def test_invalid_search_query(invalid_search_inputs):
  function test_sra_search_1 (line 919) | def test_sra_search_1():
  function test_sra_uids (line 930) | def test_sra_uids(sra_uids):
  function test_valid_search_query_1_sra (line 938) | def test_valid_search_query_1_sra(valid_search_inputs_1):
  function test_valid_search_query_2_sra (line 960) | def test_valid_search_query_2_sra(valid_search_inputs_2):
  function test_sra_search_format_request (line 982) | def test_sra_search_format_request():
  function test_sra_search_format_result_1 (line 995) | def test_sra_search_format_result_1(sra_response_xml_1, sra_formatted_re...
  function test_sra_search_format_result_2 (line 1023) | def test_sra_search_format_result_2(sra_response_xml_2, sra_formatted_re...
  function _test_ena_search_1 (line 1049) | def _test_ena_search_1():
  function _test_ena_search_2 (line 1061) | def _test_ena_search_2(capsys):
  function _test_ena_search_3 (line 1068) | def _test_ena_search_3(capsys):
  function _test_valid_search_query_1_ena (line 1076) | def _test_valid_search_query_1_ena(valid_search_inputs_1):
  function _test_valid_search_query_2_ena (line 1104) | def _test_valid_search_query_2_ena(valid_search_inputs_2):
  function test_ena_search_format_request (line 1134) | def test_ena_search_format_request():
  function test_ena_search_format_result (line 1153) | def test_ena_search_format_result(ena_responses_json, ena_formatted_resp...
  function missing_query_test_geo (line 1171) | def missing_query_test_geo(empty_search_inputs_geo):
  function test_geo_search_1 (line 1177) | def test_geo_search_1():
  function test_valid_search_query_geo (line 1191) | def test_valid_search_query_geo(valid_search_inputs_geo):
  function test_geo_search_format_request (line 1219) | def test_geo_search_format_request():
  function test_geo_info (line 1232) | def test_geo_info():

FILE: tests/test_sraweb.py
  function sraweb_connection (line 12) | def sraweb_connection():
  function test_sra_metadata (line 18) | def test_sra_metadata(sraweb_connection):
  function test_sra_metadata_missing_orgname (line 24) | def test_sra_metadata_missing_orgname(sraweb_connection):
  function test_sra_metadata_multiple (line 31) | def test_sra_metadata_multiple(sraweb_connection):
  function test_sra_metadata_multiple_detailed (line 41) | def test_sra_metadata_multiple_detailed(sraweb_connection):
  function test_tissue_column (line 57) | def test_tissue_column(sraweb_connection):
  function test_metadata_exp_accession (line 63) | def test_metadata_exp_accession(sraweb_connection):
  function test_fetch_gds_results (line 69) | def test_fetch_gds_results(sraweb_connection):
  function test_srp_to_gse (line 75) | def test_srp_to_gse(sraweb_connection):
  function test_srp_to_srr (line 81) | def test_srp_to_srr(sraweb_connection):
  function test_srp_to_srs (line 93) | def test_srp_to_srs(sraweb_connection):
  function test_srp_to_srx (line 106) | def test_srp_to_srx(sraweb_connection):
  function test_gse_to_gsm (line 112) | def test_gse_to_gsm(sraweb_connection):
  function test_gse_to_gsm2 (line 118) | def test_gse_to_gsm2(sraweb_connection):
  function test_gse_to_gsm1 (line 124) | def test_gse_to_gsm1(sraweb_connection):
  function test_gse_to_srp (line 130) | def test_gse_to_srp(sraweb_connection):
  function test_gse_to_srp2 (line 136) | def test_gse_to_srp2(sraweb_connection):
  function test_gse_to_srp_with_nan_sra (line 143) | def test_gse_to_srp_with_nan_sra(sraweb_connection):
  function test_gsm_to_srp (line 159) | def test_gsm_to_srp(sraweb_connection):
  function test_gsm_to_gse (line 165) | def test_gsm_to_gse(sraweb_connection):
  function test_gsm_to_gse_multiple_gses (line 171) | def test_gsm_to_gse_multiple_gses(sraweb_connection):
  function test_gsm_to_srr (line 190) | def test_gsm_to_srr(sraweb_connection):
  function test_gsm_to_srs (line 196) | def test_gsm_to_srs(sraweb_connection):
  function test_gsm_to_srx (line 202) | def test_gsm_to_srx(sraweb_connection):
  function test_srr_to_gsm (line 208) | def test_srr_to_gsm(sraweb_connection):
  function test_srr_to_srp (line 213) | def test_srr_to_srp(sraweb_connection):
  function test_srr_to_srp1 (line 219) | def test_srr_to_srp1(sraweb_connection):
  function test_srr_to_srs (line 225) | def test_srr_to_srs(sraweb_connection):
  function test_srr_to_srx (line 231) | def test_srr_to_srx(sraweb_connection):
  function test_srs_to_gsm (line 237) | def test_srs_to_gsm(sraweb_connection):
  function test_srs_to_srx (line 243) | def test_srs_to_srx(sraweb_connection):
  function test_srx_to_gsm (line 249) | def test_srx_to_gsm(sraweb_connection):
  function test_srx_to_srp (line 255) | def test_srx_to_srp(sraweb_connection):
  function test_srx_to_srr (line 261) | def test_srx_to_srr(sraweb_connection):
  function test_srx_to_srr1 (line 267) | def test_srx_to_srr1(sraweb_connection):
  function test_srx_to_srs (line 273) | def test_srx_to_srs(sraweb_connection):
  function _test_xmlns_id (line 280) | def _test_xmlns_id(sraweb_connection):
  function test_GCP_url (line 287) | def test_GCP_url(sraweb_connection):
  function test_GCP_url2 (line 292) | def test_GCP_url2(sraweb_connection):
  function test_gse_to_srp3 (line 297) | def test_gse_to_srp3(sraweb_connection):
  function test_gse_to_srp_multiple_srps (line 303) | def test_gse_to_srp_multiple_srps(sraweb_connection):
  function test_geo_metadata_for_gse_without_srp (line 326) | def test_geo_metadata_for_gse_without_srp(sraweb_connection):
  function test_geo_metadata_with_sample_attributes (line 335) | def test_geo_metadata_with_sample_attributes(sraweb_connection):
  function test_geo_metadata_covid19_characteristics (line 342) | def test_geo_metadata_covid19_characteristics(sraweb_connection):
  function test_fetch_bioproject_pmids (line 417) | def test_fetch_bioproject_pmids(sraweb_connection):
  function test_fetch_bioproject_pmids_multiple (line 425) | def test_fetch_bioproject_pmids_multiple(sraweb_connection):
  function test_search_pmc_by_bioproject (line 438) | def test_search_pmc_by_bioproject(sraweb_connection):
  function test_fetch_bioproject_pmids_with_pmc_fallback (line 448) | def test_fetch_bioproject_pmids_with_pmc_fallback(sraweb_connection):
  function test_srp_to_pmid_with_pmc_fallback (line 460) | def test_srp_to_pmid_with_pmc_fallback(sraweb_connection):
  function test_sra_to_pmid (line 476) | def test_sra_to_pmid(sraweb_connection):
  function test_srp_to_pmid (line 484) | def test_srp_to_pmid(sraweb_connection):
  function test_srr_to_pmid (line 492) | def test_srr_to_pmid(sraweb_connection):
  function test_sra_to_pmid_multiple (line 500) | def test_sra_to_pmid_multiple(sraweb_connection):
  function test_srp_to_pmid_multiple (line 507) | def test_srp_to_pmid_multiple(sraweb_connection):
  function test_gse_to_pmid (line 516) | def test_gse_to_pmid(sraweb_connection):
  function test_gse_to_pmid_multiple (line 525) | def test_gse_to_pmid_multiple(sraweb_connection):
  function test_pmid_to_pmc (line 534) | def test_pmid_to_pmc(sraweb_connection):
  function test_pmid_to_pmc_multiple (line 541) | def test_pmid_to_pmc_multiple(sraweb_connection):
  function test_extract_identifiers_from_text (line 548) | def test_extract_identifiers_from_text(sraweb_connection):
  function test_pmc_to_identifiers (line 560) | def test_pmc_to_identifiers(sraweb_connection):
  function test_pmid_to_identifiers (line 571) | def test_pmid_to_identifiers(sraweb_connection):
  function test_pmid_to_gse (line 581) | def test_pmid_to_gse(sraweb_connection):
  function test_pmid_to_srp (line 591) | def test_pmid_to_srp(sraweb_connection):
  function test_doi_to_pmid (line 600) | def test_doi_to_pmid(sraweb_connection):
  function test_doi_to_pmid_multiple (line 607) | def test_doi_to_pmid_multiple(sraweb_connection):
  function test_doi_to_identifiers (line 617) | def test_doi_to_identifiers(sraweb_connection):
  function test_doi_to_gse (line 627) | def test_doi_to_gse(sraweb_connection):
  function test_doi_to_srp (line 637) | def test_doi_to_srp(sraweb_connection):
  function test_unified_metadata_with_gse (line 646) | def test_unified_metadata_with_gse(sraweb_connection):
  function test_unified_metadata_with_srp (line 654) | def test_unified_metadata_with_srp(sraweb_connection):
  function test_unified_metadata_with_multiple_gse (line 662) | def test_unified_metadata_with_multiple_gse(sraweb_connection):
  function test_unified_metadata_invalid_accession (line 671) | def test_unified_metadata_invalid_accession(sraweb_connection):

FILE: tests/test_utils.py
  function invalid_name (line 9) | def invalid_name():
  function valid_name (line 14) | def valid_name():
  function invalid_scientific_name_to_taxid (line 18) | def invalid_scientific_name_to_taxid(invalid_name):
  function valid_scientific_name_to_taxid (line 24) | def valid_scientific_name_to_taxid(valid_name):
Copy disabled (too large) Download .json
Condensed preview — 94 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (18,918K chars).
[
  {
    "path": ".coveragerc",
    "chars": 131,
    "preview": "[run]\nomit =\n    pysradb/filter_attrs.py\n    pysradb/geodb.py\n    pysradb/sradb.py\n    pysradb/taxid2name.py\n    pysradb"
  },
  {
    "path": ".editorconfig",
    "chars": 292,
    "preview": "# http://editorconfig.org\n\nroot = true\n\n[*]\nindent_style = space\nindent_size = 4\ntrim_trailing_whitespace = true\ninsert_"
  },
  {
    "path": ".gitattributes",
    "chars": 93,
    "preview": "*.rst linguist-documentation\n*.html linguist-documentation\n*.ipynb linguist-language=python\n\n"
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 65,
    "preview": "# These are supported funding model platforms\n\ngithub: [saketkc]\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 446,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: \"[BUG]\"\nlabels: bug\nassignees: ''\n\n---\n\n**Describe"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 379,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: \"[ENH]\"\nlabels: enhancement\nassignees: ''\n\n---\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE.md",
    "chars": 318,
    "preview": "* pysradb version:\n* Python version:\n* Operating System:\n\n### Description\n\nDescribe what you were trying to get done.\nTe"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 501,
    "preview": "# To get started with Dependabot version updates, you'll need to specify which\n# package ecosystems to update and where "
  },
  {
    "path": ".github/workflows/codeql-analysis.yml",
    "chars": 2436,
    "preview": "# For most projects, this workflow file will not need changing; you simply need\n# to commit it to your repository.\n#\n# Y"
  },
  {
    "path": ".github/workflows/publish.yml",
    "chars": 605,
    "preview": "name: publish\n\non:\n  release:\n    types: [created]\n\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n    - uses: ac"
  },
  {
    "path": ".github/workflows/pull_request.yml",
    "chars": 2205,
    "preview": "name: pull_request\n\non: [pull_request]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        pyt"
  },
  {
    "path": ".github/workflows/push.yml",
    "chars": 2032,
    "preview": "name: push\n\non: [push]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        python-version: [3."
  },
  {
    "path": ".gitignore",
    "chars": 1263,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "AUTHORS.md",
    "chars": 356,
    "preview": "# Credits\n\n## Contributors\n\n-   [Boshen Yan](https://github.com/bscrow)\n-   [Maarten van der Sande](https://github.com/M"
  },
  {
    "path": "CITATION.cff",
    "chars": 326,
    "preview": "cff-version: 1.2.0\nmessage: \"If you use this software, please cite it as below.\"\nauthors:\n- family-names: \"Choudhary\"\n  "
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 3349,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, w"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3607,
    "preview": "# Contributing\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will alw"
  },
  {
    "path": "HISTORY.md",
    "chars": 10601,
    "preview": "# History\n\n# 3.0.0 (Unreleased) - BREAKING CHANGES\n\n## Removal of legacy SQLite support\n\n**All local SQLite database sup"
  },
  {
    "path": "LICENSE",
    "chars": 1520,
    "preview": "BSD 3-Clause License\n\nCopyright (c) 2020-2023, Saket Choudhary\nAll rights reserved.\n\nRedistribution and use in source an"
  },
  {
    "path": "MANIFEST.in",
    "chars": 349,
    "preview": "include AUTHORS.md\ninclude CONTRIBUTING.md\ninclude HISTORY.md\ninclude LICENSE\ninclude README.md\ninclude requirements.txt"
  },
  {
    "path": "Makefile",
    "chars": 2236,
    "preview": ".PHONY: clean clean-test clean-pyc clean-build docs help\n.DEFAULT_GOAL := help\n\ndefine BROWSER_PYSCRIPT\nimport os, webbr"
  },
  {
    "path": "README.md",
    "chars": 21709,
    "preview": "# A Python package for retrieving metadata from SRA/ENA/GEO\n\n[![image](https://img.shields.io/pypi/v/pysradb.svg?style=f"
  },
  {
    "path": "docs/Makefile",
    "chars": 608,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/_static/copy-button.js",
    "chars": 4135,
    "preview": "// Add copy button to code blocks\ndocument.addEventListener('DOMContentLoaded', function() {\n    // SVG icon for clipboa"
  },
  {
    "path": "docs/_static/custom.css",
    "chars": 712,
    "preview": "/* Override Pygments code block background color for light mode */\n.highlight {\n  background: #f5f5f5 !important;\n}\n\n/* "
  },
  {
    "path": "docs/authors.md",
    "chars": 356,
    "preview": "# Credits\n\n## Contributors\n\n-   [Boshen Yan](https://github.com/bscrow)\n-   [Maarten van der Sande](https://github.com/M"
  },
  {
    "path": "docs/case_studies.md",
    "chars": 16626,
    "preview": "# Case Studies \n\n## Case Study 1\n\nConsider a scenario where somone is interested in searching for\nsingle-cell RNA-seq da"
  },
  {
    "path": "docs/cmdline.md",
    "chars": 14684,
    "preview": "# CLI\n\n    $ pysradb\n    usage: pysradb [-h] [--version] [--citation]\n                   {metadata,download,search,gse-t"
  },
  {
    "path": "docs/commands.rst",
    "chars": 100,
    "preview": "API Documentation\n=================\n\nSee :doc:`pysradb` for the Python API reference documentation.\n"
  },
  {
    "path": "docs/conf.py",
    "chars": 6650,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n#\n# pysradb documentation build configuration file, created by\n# sphinx-qu"
  },
  {
    "path": "docs/contributing.md",
    "chars": 3607,
    "preview": "# Contributing\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will alw"
  },
  {
    "path": "docs/history.md",
    "chars": 11101,
    "preview": "# History\n\n<details open>\n<summary style=\"cursor: pointer; font-weight: bold; font-size: 1.1em; margin-top: 0.5em;\">\n2.5"
  },
  {
    "path": "docs/index.rst",
    "chars": 20476,
    "preview": "============\nIntroduction\n============\n\n\n``pysradb`` provides a simple method to programmatically access metadata\nand do"
  },
  {
    "path": "docs/installation.md",
    "chars": 1399,
    "preview": "# Installation\n\n## Stable release\n\nTo install pysradb, run this command in your terminal:\n\n``` console\n$ pip install pys"
  },
  {
    "path": "docs/make.bat",
    "chars": 769,
    "preview": "@ECHO OFF\n\npushd %~dp0\n\nREM Command file for Sphinx documentation\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=python -m"
  },
  {
    "path": "docs/modules.rst",
    "chars": 58,
    "preview": "pysradb\n=======\n\n.. toctree::\n   :maxdepth: 4\n\n   pysradb\n"
  },
  {
    "path": "docs/notebooks.rst",
    "chars": 717,
    "preview": "Tutorials & Notebooks\n=====================\n\nThe following Jupyter notebooks demonstrate various features of pysradb:\n\n."
  },
  {
    "path": "docs/pysradb.rst",
    "chars": 1943,
    "preview": "pysradb package\n===============\n\nSubmodules\n----------\n\npysradb.basedb module\n---------------------\n\n.. automodule:: pys"
  },
  {
    "path": "docs/python-api-usage.md",
    "chars": 12265,
    "preview": "# Python API \n\n## Use Case 1: Fetch the metadata table (SRA-runtable)\n\nThe simplest use case of [pysradb]{.title-ref} is"
  },
  {
    "path": "docs/quickstart.md",
    "chars": 62761,
    "preview": "# Quickstart\n\nMost features in `pysradb` are accessible both from the command-line and\nas a python package. `pysradb` us"
  },
  {
    "path": "notebooks/01.Python-API_demo.ipynb",
    "chars": 75582,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/02.Commandline_download.ipynb",
    "chars": 16827,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/03.ParallelDownload.ipynb",
    "chars": 103166,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/04.SRA_to_fastq_conda.ipynb",
    "chars": 44564,
    "preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n  \"colab\": {\n   \"name\": \"04.SRA-to-fastq-conda.ipynb\",\n   \"proven"
  },
  {
    "path": "notebooks/05.Downloading_subsets_of_a_project.ipynb",
    "chars": 41299,
    "preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n  \"colab\": {\n   \"name\": \"05.Downloading subsets of a project.ipyn"
  },
  {
    "path": "notebooks/06.Multiple_SRPs.ipynb",
    "chars": 6699,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/07.Query_Search.ipynb",
    "chars": 81612,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/08.PMC_DOI_Identifiers.ipynb",
    "chars": 34203,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/09.Metadata_enrichment.ipynb",
    "chars": 96370,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"5sioFsNrTU7L\"\n   },\n   \"source\": [\n    \"[![Ope"
  },
  {
    "path": "notebooks/11.Parse_Bioscience_Search.ipynb",
    "chars": 5548623,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"[![Open In Colab](https://colab.res"
  },
  {
    "path": "notebooks/README.md",
    "chars": 1103,
    "preview": "# Notebooks demonstrating functionalities of pysradb\n\n1. [Python API](https://colab.research.google.com/github/saketkc/p"
  },
  {
    "path": "pyproject.toml",
    "chars": 1572,
    "preview": "[build-system]\nrequires = [\"hatchling\"]\nbuild-backend = \"hatchling.build\"\n\n[project]\nname = \"pysradb\"\ndynamic = [\"versio"
  },
  {
    "path": "pysradb/__init__.py",
    "chars": 315,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"Top-level package for pysradb.\"\"\"\n\n__author__ = \"\"\"Saket Choudhary\"\"\"\n__email__ = \"saketkc@gm"
  },
  {
    "path": "pysradb/__main__.py",
    "chars": 97,
    "preview": "import sys\n\nfrom .cli import parse_args\n\nif __name__ == \"__main__\":\n    parse_args(sys.argv[1:])\n"
  },
  {
    "path": "pysradb/cli.py",
    "chars": 56662,
    "preview": "\"\"\"Command line interface for pysradb\"\"\"\n\nimport argparse\nimport os\nimport re\nimport sys\nimport warnings\nfrom io import "
  },
  {
    "path": "pysradb/download.py",
    "chars": 8295,
    "preview": "\"\"\"Utility function to download data\"\"\"\n\nimport hashlib\nimport math\nimport os\nimport shutil\nimport sys\nimport warnings\nf"
  },
  {
    "path": "pysradb/exceptions.py",
    "chars": 777,
    "preview": "\"\"\"This file contains custom Exceptions for pysradb\"\"\"\n\n\nclass MissingQueryException(Exception):\n    \"\"\"Exception raised"
  },
  {
    "path": "pysradb/filter_attrs.py",
    "chars": 7970,
    "preview": "import re\nimport warnings\n\nimport numpy as np\nimport pandas as pd\n\n\ndef _get_sample_attr_keys(sample_attribute):\n    if "
  },
  {
    "path": "pysradb/geoweb.py",
    "chars": 5503,
    "preview": "\"\"\"Utilities to interact with GEO online\"\"\"\n\nimport gzip\nimport os\nimport re\nimport sys\nfrom io import StringIO\n\nimport "
  },
  {
    "path": "pysradb/metadata_enrichment.py",
    "chars": 35460,
    "preview": "\"\"\"\nMetadata enrichment for SRA/GEO datasets using LLMs and embeddings.\n\"\"\"\n\nimport logging\nimport os\nimport subprocess\n"
  },
  {
    "path": "pysradb/ontology_reference.json",
    "chars": 11524,
    "preview": "{\n  \"organs\": [\n    \"brain\", \"heart\", \"liver\", \"lung\", \"kidney\", \"spleen\", \"pancreas\", \"stomach\",\n    \"intestine\", \"colo"
  },
  {
    "path": "pysradb/search.py",
    "chars": 72580,
    "preview": "\"\"\"This file contains the search classes for the search feature.\"\"\"\n\nimport os\nimport re\nimport sys\nimport time\nimport u"
  },
  {
    "path": "pysradb/sraweb.py",
    "chars": 119025,
    "preview": "\"\"\"Utilities to interact with SRA online\"\"\"\n\nimport concurrent.futures\nimport os\nimport re\nimport sys\nimport time\nimport"
  },
  {
    "path": "pysradb/taxid2name.py",
    "chars": 386928,
    "preview": "TAXID_TO_NAME = {\n    0: \"not_available\",\n    1: \"root\",\n    2: \"Bacteria\",\n    6: \"Azorhizobium\",\n    7: \"Azorhizobium "
  },
  {
    "path": "pysradb/utils.py",
    "chars": 8132,
    "preview": "import errno\nimport gzip\nimport io\nimport ntpath\nimport os\nimport shlex\nimport subprocess\nimport urllib.request as urlli"
  },
  {
    "path": "requirements.txt",
    "chars": 74,
    "preview": "lxml>=4.6.3\npandas>=1.3.2\nrequests>=2.26.0\ntqdm>=4.62.1\nxmltodict>=0.12.0\n"
  },
  {
    "path": "setup.cfg",
    "chars": 668,
    "preview": "[bumpversion]\ncurrent_version = 2.4.1\ncommit = True\ntag = False\nparse = (?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)(\\"
  },
  {
    "path": "tests/conftest.py",
    "chars": 114,
    "preview": "# contents of conftest.py\nimport pytest\n\n# Test fixtures will be added here as needed for SRAweb and GEOweb tests\n"
  },
  {
    "path": "tests/data/test_search/ena_search_test1.txt",
    "chars": 784,
    "preview": "run_accession\nSRR492850\nSRR500270\nSRR609956\nSRR609957\nSRR609958\nSRR609959\nSRR609960\nSRR609961\nSRR609962\nSRR609963\nSRR609"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_0.csv",
    "chars": 10994,
    "preview": "run_accession\nERR1190989\nERR1190990\nERR1190991\nERR1190992\nERR1190993\nERR1190994\nERR1190995\nERR1190996\nERR1190997\nERR1190"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_0.json",
    "chars": 821724,
    "preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_1.csv",
    "chars": 132777,
    "preview": "run_accession,description\nERR1190989,Illumina HiSeq 2000 sequencing; Analysis of coronavirus and infected host-cell gene"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_1.json",
    "chars": 821724,
    "preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_2.csv",
    "chars": 415681,
    "preview": "study_accession,experiment_accession,experiment_title,description,tax_id,scientific_name,library_strategy,library_source"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_2.json",
    "chars": 821724,
    "preview": "[{\"study_accession\": \"PRJEB12126\", \"experiment_accession\": \"ERX1264364\", \"experiment_title\": \"Illumina HiSeq 2000 sequen"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_3.csv",
    "chars": 2187514,
    "preview": "study_accession,experiment_accession,experiment_title,description,tax_id,scientific_name,library_strategy,library_source"
  },
  {
    "path": "tests/data/test_search/ena_test_verbosity_3.json",
    "chars": 4334941,
    "preview": "[{\"study_accession\": \"PRJEB12126\", \"secondary_study_accession\": \"ERP013565\", \"sample_accession\": \"SAMEA3708907\", \"second"
  },
  {
    "path": "tests/data/test_search/geo_search_test1.txt",
    "chars": 857,
    "preview": "SRX8089313\nSRX8089314\nSRX8089315\nSRX8089316\nSRX8089317\nSRX8089318\nSRX8089319\nSRX8089320\nSRX8089286\nSRX8089275\nSRX8089276"
  },
  {
    "path": "tests/data/test_search/sra_search_test1.txt",
    "chars": 19,
    "preview": "SRX137370\nSRX137371"
  },
  {
    "path": "tests/data/test_search/sra_test.xml",
    "chars": 648589,
    "preview": "<?xml version=\"1.0\" ?>\n<EXPERIMENT_PACKAGE_SET>\n<EXPERIMENT_PACKAGE><EXPERIMENT alias=\"GSM4369051\" accession=\"SRX7830165"
  },
  {
    "path": "tests/data/test_search/sra_test_2_verbosity_0.csv",
    "chars": 25,
    "preview": "run_accession\nERR4229796\n"
  },
  {
    "path": "tests/data/test_search/sra_test_2_verbosity_1.csv",
    "chars": 76,
    "preview": "run_accession,experiment_title\nERR4229796,HiSeq X Ten paired end sequencing\n"
  },
  {
    "path": "tests/data/test_search/sra_test_2_verbosity_2.csv",
    "chars": 516,
    "preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
  },
  {
    "path": "tests/data/test_search/sra_test_2_verbosity_3.csv",
    "chars": 9398,
    "preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
  },
  {
    "path": "tests/data/test_search/sra_test_ERS3331676.xml",
    "chars": 10356,
    "preview": "<?xml version=\"1.0\" ?>\n<EXPERIMENT_PACKAGE_SET>\n<EXPERIMENT_PACKAGE><EXPERIMENT accession=\"ERX4190585\" alias=\"SC_EXP_296"
  },
  {
    "path": "tests/data/test_search/sra_test_verbosity_0.csv",
    "chars": 1033,
    "preview": "run_accession\nSRR11217925\nSRR11217924\nSRR11217923\nSRR11217922\nSRR11217921\nSRR11217920\nSRR11217919\nSRR11217918\nSRR1121791"
  },
  {
    "path": "tests/data/test_search/sra_test_verbosity_1.csv",
    "chars": 6333,
    "preview": "run_accession,experiment_title\nSRR11217925,GSM4369051: rnaH27nsun3; Caenorhabditis elegans; RNA-Seq\nSRR11217924,GSM43690"
  },
  {
    "path": "tests/data/test_search/sra_test_verbosity_2.csv",
    "chars": 20039,
    "preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
  },
  {
    "path": "tests/data/test_search/sra_test_verbosity_3.csv",
    "chars": 329691,
    "preview": "study_accession,experiment_accession,experiment_title,sample_taxon_id,sample_scientific_name,experiment_library_strategy"
  },
  {
    "path": "tests/data/test_search/sra_uids.txt",
    "chars": 14,
    "preview": "155791\n155790\n"
  },
  {
    "path": "tests/test_geoweb.py",
    "chars": 1170,
    "preview": "\"\"\"Tests for GEOweb\"\"\"\n\nimport os\nimport time\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.geoweb import GEOweb\n\n\n@p"
  },
  {
    "path": "tests/test_search.py",
    "chars": 31453,
    "preview": "\"\"\"Tests for search.py\"\"\"\n\nimport json\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.search import *\n\n# ============="
  },
  {
    "path": "tests/test_sraweb.py",
    "chars": 24247,
    "preview": "\"\"\"Tests for SRAweb\"\"\"\n\nimport time\n\nimport pandas as pd\nimport pytest\n\nfrom pysradb.sraweb import SRAweb\n\n\n@pytest.fixt"
  },
  {
    "path": "tests/test_utils.py",
    "chars": 546,
    "preview": "\"\"\"Tests for utils.py\"\"\"\n\nimport pytest\n\nfrom pysradb.utils import *\n\n\n@pytest.fixture(scope=\"module\")\ndef invalid_name("
  }
]

About this extraction

This page contains the full source code of the saketkc/pysradb GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 94 files (16.8 MB), approximately 4.4M tokens, and a symbol index with 324 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!