Full Code of laminlabs/lamindb for AI

main 44563e03eeae cached

288 files

2.7 MB

728.5k tokens

1902 symbols

1 requests

Download .txt

Showing preview only (2,912K chars total). Download the full file or copy to clipboard to get everything.

Repository: laminlabs/lamindb
Branch: main
Commit: 44563e03eeae
Files: 288
Total size: 2.7 MB

Directory structure:
gitextract_xlz91t15/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   ├── config.yml
│   │   ├── enhancement.yml
│   │   └── usage_question.yml
│   └── workflows/
│       ├── build.yml
│       └── doc-changes.yml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── docs/
│   ├── api.md
│   ├── arrays.md
│   ├── bionty.md
│   ├── changelog.md
│   ├── curate.md
│   ├── faq/
│   │   ├── acid.md
│   │   ├── curate-any.md
│   │   ├── idempotency.md
│   │   ├── import-modules.md
│   │   ├── keep-artifacts-local.md
│   │   ├── pydantic-pandera.md
│   │   ├── reference-field.md
│   │   ├── search.md
│   │   ├── symbol-mapping.md
│   │   ├── test_notebooks.py
│   │   ├── track-run-inputs.md
│   │   ├── trash-archive.md
│   │   └── validate-fields.md
│   ├── faq.md
│   ├── guide.md
│   ├── index.md
│   ├── lightning.md
│   ├── manage-changes.md
│   ├── manage-ontologies.md
│   ├── organize.md
│   ├── pertdb.md
│   ├── query-search.md
│   ├── registries.md
│   ├── scripts/
│   │   ├── curate_anndata_flexible.py
│   │   ├── curate_anndata_uns.py
│   │   ├── curate_dataframe_attrs.py
│   │   ├── curate_dataframe_external_features.py
│   │   ├── curate_dataframe_flexible.py
│   │   ├── curate_dataframe_minimal_errors.py
│   │   ├── curate_dataframe_union_features.py
│   │   ├── curate_mudata.py
│   │   ├── curate_soma_experiment.py
│   │   ├── curate_spatialdata.py
│   │   ├── define_schema_anndata_uns.py
│   │   ├── define_schema_df_metadata.py
│   │   ├── define_schema_spatialdata.py
│   │   ├── my_workflow.py
│   │   ├── my_workflow_with_click.py
│   │   ├── my_workflow_with_step.py
│   │   ├── run_script_with_step.py
│   │   ├── run_track_and_finish.py
│   │   ├── run_track_with_features_and_params.py
│   │   ├── run_track_with_params.py
│   │   └── synced_with_git.py
│   ├── storage/
│   │   ├── add-replace-cache.ipynb
│   │   ├── anndata-accessor.ipynb
│   │   ├── prepare-sync-local-to-cloud.ipynb
│   │   ├── sync-local-to-cloud.ipynb
│   │   ├── test-files/
│   │   │   └── iris.data
│   │   ├── test_notebooks.py
│   │   ├── upload.ipynb
│   │   └── vitessce.ipynb
│   ├── storage.md
│   ├── sync.md
│   ├── test_notebooks.py
│   └── track.md
├── lamindb/
│   ├── __init__.py
│   ├── _finish.py
│   ├── _secret_redaction.py
│   ├── _view.py
│   ├── base/
│   │   ├── __init__.py
│   │   ├── dtypes.py
│   │   ├── fields.py
│   │   ├── ids.py
│   │   ├── types.py
│   │   ├── uids.py
│   │   ├── users.py
│   │   └── utils.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── _compat.py
│   │   ├── _context.py
│   │   ├── _functions.py
│   │   ├── _mapped_collection.py
│   │   ├── _settings.py
│   │   ├── _sync_git.py
│   │   ├── _track_environment.py
│   │   ├── exceptions.py
│   │   ├── loaders.py
│   │   ├── storage/
│   │   │   ├── __init__.py
│   │   │   ├── _anndata_accessor.py
│   │   │   ├── _backed_access.py
│   │   │   ├── _polars_lazy_df.py
│   │   │   ├── _pyarrow_dataset.py
│   │   │   ├── _spatialdata_accessor.py
│   │   │   ├── _tiledbsoma.py
│   │   │   ├── _valid_suffixes.py
│   │   │   ├── _zarr.py
│   │   │   ├── objects.py
│   │   │   ├── paths.py
│   │   │   └── types.py
│   │   └── subsettings/
│   │       ├── __init__.py
│   │       ├── _annotation_settings.py
│   │       └── _creation_settings.py
│   ├── curators/
│   │   ├── __init__.py
│   │   └── core.py
│   ├── errors.py
│   ├── examples/
│   │   ├── __init__.py
│   │   ├── cellxgene/
│   │   │   ├── __init__.py
│   │   │   └── _cellxgene.py
│   │   ├── croissant/
│   │   │   ├── __init__.py
│   │   │   └── mini_immuno.anndata.zarr_metadata.json
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── _core.py
│   │   │   ├── _fake.py
│   │   │   ├── _small.py
│   │   │   ├── define_mini_immuno_features_labels.py
│   │   │   ├── define_mini_immuno_schema_flexible.py
│   │   │   ├── mini_immuno.py
│   │   │   └── save_mini_immuno_datasets.py
│   │   ├── fixtures/
│   │   │   ├── __init__.py
│   │   │   └── sheets.py
│   │   ├── mlflow/
│   │   │   └── __init__.py
│   │   ├── schemas/
│   │   │   ├── __init__.py
│   │   │   ├── _anndata.py
│   │   │   ├── _simple.py
│   │   │   ├── define_schema_anndata_ensembl_gene_ids_and_valid_features_in_obs.py
│   │   │   └── define_valid_features.py
│   │   └── wandb/
│   │       └── __init__.py
│   ├── integrations/
│   │   ├── __init__.py
│   │   ├── _croissant.py
│   │   ├── _vitessce.py
│   │   └── lightning.py
│   ├── migrations/
│   │   ├── 0177_squashed.py
│   │   ├── 0178_v2_2.py
│   │   ├── 0179_v2_2_part_2.py
│   │   ├── 0180_v2_2_part_3.py
│   │   ├── 0181_v2_2_part_4.py
│   │   ├── 0182_v2_2_part_5.py
│   │   ├── 0183_v2_2_part_6.py
│   │   ├── 0184_alter_transformrecord_feature.py
│   │   ├── 0185_alter_runrecord_feature.py
│   │   ├── 0186_v2_4.py
│   │   ├── 0187_squashed.py
│   │   ├── 0187_v2_4_part_2.py
│   │   ├── README.md
│   │   └── __init__.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── _describe.py
│   │   ├── _django.py
│   │   ├── _feature_manager.py
│   │   ├── _from_values.py
│   │   ├── _is_versioned.py
│   │   ├── _label_manager.py
│   │   ├── _relations.py
│   │   ├── _run_cleanup.py
│   │   ├── artifact.py
│   │   ├── artifact_set.py
│   │   ├── block.py
│   │   ├── can_curate.py
│   │   ├── collection.py
│   │   ├── feature.py
│   │   ├── has_parents.py
│   │   ├── project.py
│   │   ├── query_manager.py
│   │   ├── query_set.py
│   │   ├── record.py
│   │   ├── run.py
│   │   ├── save.py
│   │   ├── schema.py
│   │   ├── sqlrecord.py
│   │   ├── storage.py
│   │   ├── transform.py
│   │   └── ulabel.py
│   ├── py.typed
│   └── setup/
│       ├── __init__.py
│       ├── _merge.py
│       ├── _switch.py
│       ├── core/
│       │   └── __init__.py
│       ├── errors/
│       │   └── __init__.py
│       └── types/
│           └── __init__.py
├── lamindb_full.py
├── noxfile.py
├── pyproject.full.toml
├── pyproject.toml
├── scripts/
│   └── migrate_test_instances.py
└── tests/
    ├── core/
    │   ├── _dataset_fixtures.py
    │   ├── conftest.py
    │   ├── notebooks/
    │   │   ├── basic-r-notebook.Rmd.cleaned.html
    │   │   ├── basic-r-notebook.Rmd.html
    │   │   ├── duplicate/
    │   │   │   └── with-title-initialized-consecutive-finish.ipynb
    │   │   ├── load_schema.ipynb
    │   │   ├── no-title.ipynb
    │   │   ├── with-title-initialized-consecutive-finish-not-last-cell.ipynb
    │   │   └── with-title-initialized-consecutive-finish.ipynb
    │   ├── scripts/
    │   │   ├── duplicate1/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate2/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate3/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate4/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate5/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── script-to-test-filename-change.py
    │   │   └── script-to-test-versioning.py
    │   ├── test_artifact_anndata_with_curation.py
    │   ├── test_artifact_basics.py
    │   ├── test_artifact_dataframe_with_curation.py
    │   ├── test_artifact_describe_to_dataframe.py
    │   ├── test_artifact_features_annotations.py
    │   ├── test_artifact_parquet.py
    │   ├── test_blocks.py
    │   ├── test_branches.py
    │   ├── test_can_curate.py
    │   ├── test_collection.py
    │   ├── test_curator_basics.py
    │   ├── test_data_migrations.py
    │   ├── test_db.py
    │   ├── test_delete.py
    │   ├── test_feature.py
    │   ├── test_feature_dtype.py
    │   ├── test_from_values.py
    │   ├── test_has_parents.py
    │   ├── test_has_type.py
    │   ├── test_integrity.py
    │   ├── test_is_versioned.py
    │   ├── test_label_manager.py
    │   ├── test_load.py
    │   ├── test_manager.py
    │   ├── test_merge.py
    │   ├── test_nbconvert.py
    │   ├── test_notebooks.py
    │   ├── test_querydb.py
    │   ├── test_queryset.py
    │   ├── test_record_basics.py
    │   ├── test_record_sheet_examples.py
    │   ├── test_rename_features_labels.py
    │   ├── test_run.py
    │   ├── test_save.py
    │   ├── test_schema.py
    │   ├── test_search.py
    │   ├── test_settings.py
    │   ├── test_sqlrecord.py
    │   ├── test_storage.py
    │   ├── test_switch.py
    │   ├── test_track_flow.py
    │   ├── test_track_script_or_notebook.py
    │   ├── test_track_step.py
    │   ├── test_transform.py
    │   ├── test_transform_from_git.py
    │   └── test_view.py
    ├── curators/
    │   ├── conftest.py
    │   ├── test_cellxgene_curation.py
    │   ├── test_curate_from_croissant.py
    │   ├── test_curators_examples.py
    │   ├── test_curators_remote.py
    │   └── test_dataframe_curation.py
    ├── integrations/
    │   ├── conftest.py
    │   └── test_lightning.py
    ├── no_instance/
    │   ├── conftest.py
    │   ├── test_connect_dynamic_import.py
    │   ├── test_import_side_effects.py
    │   └── test_no_default_instance.py
    ├── permissions/
    │   ├── conftest.py
    │   ├── jwt_utils.py
    │   ├── scripts/
    │   │   ├── check_lamin_dev.py
    │   │   ├── setup_access.py
    │   │   └── setup_instance.py
    │   └── test_rls_dbwritelog.py
    ├── profiling/
    │   ├── import_lamindb.py
    │   ├── import_lamindb_and_connect.py
    │   ├── import_lamindb_core_storage.py
    │   └── import_records_from_dataframe.py
    ├── storage/
    │   ├── conftest.py
    │   ├── test_artifact_storage.py
    │   ├── test_artifact_zarr.py
    │   ├── test_cache.py
    │   ├── test_connect_reconnect.py
    │   ├── test_storage_lifecycle.py
    │   ├── test_streaming.py
    │   └── test_transfer.py
    └── tiledbsoma/
        ├── conftest.py
        ├── test_artifact_basics.py
        ├── test_curators.py
        └── test_storage.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yml
================================================
name: Report a bug
description: Report a bug.
labels:
  - ":bug: bug"
body:
  - type: textarea
    id: report
    attributes:
      label: Add a description
      placeholder: |
        Describe and consider providing version information. Please ensure you're on the latest version of lamindb.
        This is a public repository!
        Do not reveal any internal information.
    validations:
      required: true


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
contact_links:
    - name: LaminHub issues
      url: https://github.com/laminlabs/laminhub-public
      about: If you have issues with the GUI/web app at lamin.ai, please report them here.
    - name: Enterprise support
      url: https://lamin.ai/contact
      about: If you have other questions, contact us directly.


================================================
FILE: .github/ISSUE_TEMPLATE/enhancement.yml
================================================
name: Propose an enhancement
description: Propose an enhancement.
body:
  - type: textarea
    id: description
    attributes:
      label: Add a description
      placeholder: |
        This is a public repository!
        Do not reveal any internal information.
    validations:
      required: true


================================================
FILE: .github/ISSUE_TEMPLATE/usage_question.yml
================================================
name: Ask a usage question
description: Ask a usage question.
labels:
  - "usage question"
body:
  - type: textarea
    id: description
    attributes:
      label: Add a description
      placeholder: |
        This is a public repository!
        Do not reveal any internal information.
    validations:
      required: true


================================================
FILE: .github/workflows/build.yml
================================================
name: build

on:
  push:
    branches: [release]
  pull_request:

jobs:
  pre-filter:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0

      - uses: dorny/paths-filter@v3
        id: changes
        if: github.event_name != 'push'
        with:
          filters: |
            curator:
              - 'lamindb/curators/**'
              - 'lamindb/examples/cellxgene/**'
              - 'tests/curators/**'
            integrations:
              - 'lamindb/integrations/**'
              - 'tests/integrations/**'

      - id: set-matrix
        shell: bash
        run: |
          BASE_GROUPS=$(jq -n -c '["unit-core-sqlite", "unit-core-postgres", "unit-storage", "tutorial", "guide", "tiledbsoma", "biology", "faq", "storage", "cli", "permissions", "no-instance"]')
          ADDITIONAL_GROUPS=[]

          if [[ "${{ github.event_name }}" == "push" || "${{ github.event_name }}" == "repository_dispatch" ]]; then
            # Run everything on push and dispatch
            ADDITIONAL_GROUPS=$(jq -n -c '["curator", "integrations"]')
          else
            # Otherwise check which paths changed
            if [[ "${{ steps.changes.outputs.curator }}" == "true" ]]; then
              ADDITIONAL_GROUPS=$(jq -n -c --argjson groups "$ADDITIONAL_GROUPS" '$groups + ["curator"]')
            fi
            if [[ "${{ steps.changes.outputs.integrations }}" == "true" ]]; then
              ADDITIONAL_GROUPS=$(jq -n -c --argjson groups "$ADDITIONAL_GROUPS" '$groups + ["integrations"]')
            fi
          fi

          # Combine base groups with any additional groups
          MATRIX=$(jq -n -c --argjson base "$BASE_GROUPS" --argjson additional "$ADDITIONAL_GROUPS" '{group: ($base + $additional)}')

          # Output as single line for GitHub Actions
          echo "matrix=$(echo "$MATRIX" | jq -c .)" >> $GITHUB_OUTPUT

          # Pretty print for debugging
          echo "Generated matrix:"
          echo "$MATRIX" | jq .

  test:
    needs: pre-filter
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.pre-filter.outputs.matrix)}}
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v6
        with:
          submodules: recursive
          fetch-depth: 0

      - uses: actions/checkout@v6
        if: ${{ matrix.group == 'permissions' }}
        with:
          repository: laminlabs/laminhub
          token: ${{ secrets.GH_TOKEN_DEPLOY_LAMINAPP }}
          path: laminhub
          ref: main

      - uses: actions/setup-python@v6
        with:
          python-version: |
            ${{ matrix.group == 'tiledbsoma' && '3.13' ||
                matrix.group == 'permissions' && '3.14' ||
                github.ref == 'refs/heads/release' && '3.11' ||
                '3.14'
                }}

      - name: cache pre-commit
        uses: actions/cache@v4
        with:
          path: ~/.cache/pre-commit
          key: pre-commit-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml') }}

      - name: cache postgres
        if: ${{ matrix.group == 'faq' || matrix.group == 'unit-core-postgres' || matrix.group == 'unit-storage' || matrix.group == 'permissions'}}
        id: cache-postgres
        uses: actions/cache@v4
        with:
          path: ~/postgres.tar
          key: cache-postgres-0
          restore-keys: |
            cache-postgres-
      - name: cache postgres miss
        if: ${{ (matrix.group == 'faq' || matrix.group == 'unit-core-postgres' || matrix.group == 'unit-storage' || matrix.group == 'permissions') && steps.cache-postgres.outputs.cache-hit != 'true' }}
        run: docker pull postgres:latest && docker image save postgres:latest --output ~/postgres.tar
      - name: cache postgres use
        if: ${{ (matrix.group == 'faq' || matrix.group == 'unit-core-postgres' || matrix.group == 'unit-storage' || matrix.group == 'permissions') && steps.cache-postgres.outputs.cache-hit == 'true' }}
        run: docker image load --input ~/postgres.tar

      - run: pip install "laminci@git+https://github.com/laminlabs/laminci"

      - run: nox -s configure_coverage -- '${{needs.pre-filter.outputs.matrix}}'

      - name: install postgres
        if: ${{ matrix.group == 'faq' }}
        run: sudo apt-get install libpq-dev

      - name: install graphviz
        if: ${{ matrix.group == 'tutorial' || matrix.group == 'guide' || matrix.group == 'biology' || matrix.group == 'faq'}}
        run: sudo apt-get -y install graphviz

      # - run: nox -s lint
      #   if: ${{ matrix.group == 'tutorial' }}

      - run: nox -s "install_ci(group='${{ matrix.group }}')"

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      - run: nox -s prepare
        if: ${{ !startsWith(matrix.group, 'unit-') && !startsWith(matrix.group, 'permissions') }}
      - run: nox -s "test(group='${{ matrix.group }}')"

      - name: upload coverage
        uses: actions/upload-artifact@v4
        with:
          name: coverage--${{ matrix.group }}
          path: .coverage
          include-hidden-files: true

      - name: upload docs
        if: ${{ matrix.group == 'tutorial' || matrix.group == 'guide' || matrix.group == 'tiledbsoma' || matrix.group == 'biology' || matrix.group == 'faq' || matrix.group == 'storage' }}
        uses: actions/upload-artifact@v4
        with:
          name: docs-${{ matrix.group }}
          path: ./docs/${{ matrix.group }}

  profile:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      LAMIN_API_KEY: ${{ secrets.LAMIN_API_KEY_TESTUSER1 }}
    steps:
      - uses: actions/checkout@v6
        with:
          submodules: recursive
          fetch-depth: 0
      - uses: actions/setup-python@v6
        with:
          python-version: |
            ${{ github.ref == 'refs/heads/release' && '3.11' ||
                '3.14'
                }}
      - run: pip install git+https://github.com/laminlabs/laminci
      - run: nox -s "install_ci(group='unit-core-sqlite')"
      - run: uv pip install --system git+https://github.com/laminlabs/laminprofiler
      - run: lamin login
      - run: laminprofiler check tests/profiling/import_lamindb_and_connect.py --threshold 3.5
      - run: lamin connect laminlabs/lamindata
      - run: laminprofiler check tests/profiling/import_lamindb.py --threshold 1.5
      - run: laminprofiler check tests/profiling/import_lamindb_core_storage.py --threshold 1.5

  docs:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          submodules: recursive
          fetch-depth: 0

      - name: checkout lndocs
        uses: actions/checkout@v6
        with:
          repository: laminlabs/lndocs
          ssh-key: ${{ secrets.READ_LNDOCS }}
          path: lndocs
          ref: main

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - uses: actions/setup-python@v6
        with:
          python-version: "3.12"
      - run: pip install "laminci@git+https://x-access-token:${{ secrets.LAMIN_BUILD_DOCS }}@github.com/laminlabs/laminci"
      - run: nox -s "install_ci(group='docs')"
      - uses: actions/download-artifact@v4
      - run: nox -s clidocs
      - run: nox -s prepare
      - run: nox -s docs
      - run: rm -r ./_build/html/.doctrees # do not want to deploy with cloudflare
      - uses: cloudflare/wrangler-action@v3
        id: cloudflare
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          accountId: 472bdad691b4483dea759eadb37110bd
          command: pages deploy "_build/html" --project-name=lamindb
          gitHubToken: ${{ secrets.GITHUB_TOKEN }}
      - uses: edumserrano/find-create-or-update-comment@v2
        if: github.event_name == 'pull_request'
        with:
          issue-number: ${{ github.event.pull_request.number }}
          body-includes: "Deployment URL"
          comment-author: "github-actions[bot]"
          body: |
            Deployment URL: ${{ steps.cloudflare.outputs.deployment-url }}
          edit-mode: replace

      - uses: peter-evans/repository-dispatch@v2
        if: ${{ github.event_name == 'push' }}
        with:
          token: ${{ secrets.LAMIN_BUILD_DOCS }}
          repository: "laminlabs/lamin-docs"
          event-type: build

  coverage:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-python@v6
        with:
          python-version: "3.14"
      - run: |
          python -m pip install -U uv
          uv pip install --system coverage[toml]
          uv pip install --system --no-deps .

      - uses: actions/download-artifact@v4
      - name: run coverage
        run: |
          coverage combine coverage--*/.coverage*
          coverage report --fail-under=0
          coverage xml
      - uses: codecov/codecov-action@v2
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

  dispatch:
    if: ${{ github.event_name == 'push' }}
    runs-on: ubuntu-latest
    steps:
      - uses: peter-evans/repository-dispatch@v2
        with:
          token: ${{ secrets.LAMIN_BUILD_DOCS }}
          repository: "laminlabs/lamindb-dispatch"
          event-type: build


================================================
FILE: .github/workflows/doc-changes.yml
================================================
name: doc-changes

on:
  pull_request_target:
    branches:
      - main
      - release
    types:
      - closed

jobs:
  doc-changes:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install "laminci[doc-changes]@git+https://x-access-token:${{ secrets.LAMIN_BUILD_DOCS }}@github.com/laminlabs/laminci"
      - run: laminci doc-changes
        env:
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          docs_token: ${{ secrets.LAMIN_BUILD_DOCS }}
          changelog_file: lamin-docs/docs/changelog/soon/lamindb.md


================================================
FILE: .gitignore
================================================
__MACOSX/

# LaminDB
README_stripped.md
docs/scripts/test_artifact_parquet.py
README.ipynb
docs/sample.fasta
docs/faq/sample.fasta
docs/faq/test-acid/
docs/scripts/define_mini_immuno_features_labels.py
docs/scripts/define_mini_immuno_schema_flexible.py
docs/scripts/define_schema_anndata_ensembl_gene_ids_and_valid_features_in_obs.py
docs/scripts/define_valid_features.py
docs/scripts/save_mini_immuno_datasets.py
profile_output*
docs/cli.md
.coveragerc
*.db
*.lndb
*.jpg
*.zarr/
docsbuild/
docs/lamin.md
docs/guide/data-validation.ipynb
docs/guide/bionty.ipynb
docs/guide/lnschema-core.ipynb
docs/paradisi05_laminopathic_nuclei.jpg
bionty_docs/
lamindb_docs/
_build
mydata/
lamin-intro/
lamin-tutorial/
mytest/
rds/
mydb/
docs/test-registries/
docs/test-annotate-flexible/
docs/lamindb.*
lamin_sphinx
docs/conf.py
lamindb/setup/.env
_secrets.py
_configuration.py
lamin.db
docs/generated/*
_docs_tmp*
docs/guide/Laminopathic_nuclei.jpg
docs/guide/paradisi05_laminopathic_nuclei.jpg
nocodb
docs/guide/SRR4238351_subsamp.fastq.gz
docs/faq/paradisi05_laminopathic_nuclei.jpg
docs/faq/tostore/
docs/faq/mydata_postgres/
docs/guide/myobjects/
docs/faq/test-run-inputs/
docs/intro/paradisi05_laminopathic_nuclei.jpg
docs/guide/figures/
docs/test-annotate/
docs/test-track/
suo22/
docs/biology/test-flow/
docs/biology/test-scrna/
docs/biology/test-registries/
docs/biology/test-multimodal/
default_storage
default_storage_unit_core
default_storage_unit_storage
test.ipynb
test2.ipynb
run-tests
test-django-validation/
curate.tiledbsoma
small_dataset.tiledbsoma
nonregistered_storage
registered_storage
tests/core/notebooks/no-uid-renamed.ipynb

# General
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# ruff
.ruff_cache

# Pyre type checker
.pyre/

# data files
data/
_build
*.csv
*.fcs
*.zip
*.feather
*.h5ad
*.h5mu
*.parquet
*.bam
*.fastq.gz
*.pt

# Pycharm
.idea

# VSCode
.vscode

# CELLxGENE
!lamindb/examples/cellxgene/cellxgene_schema_versions.csv

# ml
lightning_logs
mlruns
download_mnist
checkpoints
test_lightning


================================================
FILE: .gitmodules
================================================
[submodule "sub/lamindb-setup"]
	path = sub/lamindb-setup
	url = https://github.com/laminlabs/lamindb-setup
[submodule "sub/lamin-cli"]
	path = sub/lamin-cli
	url = https://github.com/laminlabs/lamin-cli
[submodule "sub/bionty"]
	path = sub/bionty
	url = https://github.com/laminlabs/bionty
[submodule "sub/pertdb"]
	path = sub/pertdb
	url = https://github.com/laminlabs/pertdb
[submodule "sub/cellxgene-lamin"]
	path = sub/cellxgene-lamin
	url = https://github.com/laminlabs/cellxgene-lamin.git


================================================
FILE: .pre-commit-config.yaml
================================================
fail_fast: false
default_language_version:
  python: python3
default_stages:
  - pre-commit
  - pre-push
minimum_pre_commit_version: 2.16.0
repos:
  - repo: https://github.com/rbubley/mirrors-prettier
    rev: v3.5.1
    hooks:
      - id: prettier
        exclude: |
          (?x)(
            docs/changelog.md|.github/ISSUE_TEMPLATE/config.yml|tests/core/notebooks/basic-r-notebook.Rmd.cleaned.html|README.md
          )
  - repo: https://github.com/kynan/nbstripout
    rev: 0.8.1
    hooks:
      - id: nbstripout
        exclude: |
          (?x)(
              docs/examples/|
              docs/notes/
          )
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.9.10
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix, --unsafe-fixes]
      - id: ruff-format
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: detect-private-key
      - id: check-ast
      - id: end-of-file-fixer
        exclude: |
          (?x)(
              .github/workflows/latest-changes.jinja2
            )
      - id: mixed-line-ending
        args: [--fix=lf]
      - id: trailing-whitespace
        exclude: |
          (?x)(
              tests/core/notebooks/basic-r-notebook.Rmd.cleaned.html
            )
      - id: check-case-conflict
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.14.1
    hooks:
      - id: mypy
        args:
          [
            --no-strict-optional,
            --ignore-missing-imports,
            --disable-error-code=annotation-unchecked,
            --disable-error-code=type-arg,
            --namespace-packages,
            --explicit-package-bases,
          ]
        additional_dependencies:
          ["types-requests", "types-attrs", "types-PyYAML"]
        exclude: |
          (?x)(
              test_notebooks.py|
              script-to-test-versioning.py|
              tests/storage/conftest.py|
              tests/curators/conftest.py|
              tests/permissions/conftest.py|
              tests/writelog/conftest.py|
              tests/writelog_sqlite/conftest.py|
              tests/curators/test_curators_examples.py|
              tests/core/conftest.py|
              docs/scripts/
          )


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing

Contributions are generally welcome. Please make an issue to discuss proposals.

## Installation

### PyPI

For installation from PyPI, see [docs.lamin.ai/setup](https://docs.lamin.ai/setup).

### Github

For installation from GitHub, call:

```bash
git clone --recursive https://github.com/laminlabs/lamindb
pip install laminci
python -m venv .venv
source .venv/bin/activate
nox -s install
```

This will install a few dependencies from the git submodules linked [here](https://github.com/laminlabs/lamindb/tree/main/sub), as well as packages
like `pytest` and `pre-commit` that you'll need when developing.

lamindb depends on several other packages that may require modifications for pull requests to successfully pass the continuous integration build.
We suggest the following workflow if commits to any of the submodules are essential for the current modifications in lamindb:

1. Change directory into the submodule that you want to modify: `cd sub/SUBMODULE`.
2. Switch to a new feature branch: `git switch -c feature/NEWFEATURE`.
3. Make a pull request with your changes to the `SUBMODULE` and ensure that the CI passes.
4. In the repository root of lamindb, create a new commit and push:

```bash
cd ..
git add -u
git commit -m "Upgraded SUBMODULE"
git push
```

Any pull request of yours should now also have the changes of the submodule included allowing you to test that changes in the submodule and lamindb are compatible.

## Running and writing tests

This package uses the [pytest][] for automated testing.
Please add a test for every function added to the package.

Running tests requires the [Docker daemon][] up, then run at the root of the repository:

```bash
pytest --ignore=tests/storage --ignore=tests/permission
```

in the root of the repository.
We exclude specific directories in local `pytest` runs because they directly access external resources such as AWS, which require specific access keys.
Continuous integration will automatically run **all** tests on pull requests.

## Code-style

This project uses [pre-commit][] to enforce consistent code-styles. On every commit, pre-commit checks will either
automatically fix issues with the code, or raise an error message.

To enable pre-commit locally, simply run

```bash
pre-commit install
```

in the root of the repository. Pre-commit will automatically download all dependencies when it is run for the first time.

We further use [gitmoji][] to add emoticons to commits.
These allow us to more easily categorize them allowing for faster visual filtering.

It can be installed by running:

```bash
npm i -g gitmoji-cli
```

and enabled for the repository via:

```bash
gitmoji -i
```

If you don't have `sudo` in your working environment, follow [these instructions](https://github.com/sindresorhus/guides/blob/main/npm-global-without-sudo.md).

## Documentation

We build our documentation with an internal tool called `lndocs`.
We have not made it public yet and therefore external contributors need to rely on the Github Actions `docs` job to build the documentation.
If the `docs` job succeeds, a preview URL will be posted automatically as a comment to your pull request.

## Releases

Currently only lamin employees have release rights.

Release publishing is managed via `laminci release --pypi`. For `lamindb`, the
release flow now publishes two distributions in sequence:

- `lamindb-core` (contains the `lamindb/` namespace package)
- `lamindb` (meta-package that depends on `lamindb-core`)

Before first production publish of a version, run a TestPyPI dry run by
building both wheels from `pyproject.toml` and `pyproject.full.toml`, then
uploading with `twine` to TestPyPI for verification.

[Docker daemon]: https://docs.docker.com/engine/install/
[gitmoji]: https://gitmoji.dev/
[pre-commit]: https://pre-commit.com/
[pytest]: https://docs.pytest.org/


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
[![docs](https://img.shields.io/badge/docs-yellow)](https://docs.lamin.ai) [![llms.txt](https://img.shields.io/badge/llms.txt-orange)](https://docs.lamin.ai/llms.txt) [![codecov](https://codecov.io/gh/laminlabs/lamindb/branch/main/graph/badge.svg?token=VKMRJ7OWR3)](https://codecov.io/gh/laminlabs/lamindb) [![pypi](https://img.shields.io/pypi/v/lamindb?color=blue&label=PyPI)](https://pypi.org/project/lamindb) [![cran](https://www.r-pkg.org/badges/version/laminr?color=green)](https://cran.r-project.org/package=laminr) [![stars](https://img.shields.io/github/stars/laminlabs/lamindb?style=flat&logo=GitHub&label=&color=gray)](https://github.com/laminlabs/lamindb) [![downloads](https://static.pepy.tech/personalized-badge/lamindb?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=GRAY&left_text=%E2%AC%87%EF%B8%8F)](https://pepy.tech/project/lamindb)

# LaminDB - Open-source data framework for biology

LaminDB allows you to query, trace, and validate datasets and models at scale.
You get context & memory through a lineage-native lakehouse that supports bio-formats, registries & ontologies while feeling as simple as a file system.

Agent? [llms.txt](https://docs.lamin.ai/llms.txt)

<details>
<summary>Why?</summary>

(1) Reproducing, tracing & understanding how datasets, models & results are created is critical to quality R&D.
Without context, humans & agents make mistakes and cannot close feedback loops across data generation & analysis.
Without memory, compute & intelligence are wasted on fragmented, non-compounding tasks — LLM context windows are small.

(2) Training & fine-tuning models with thousands of datasets — across LIMS, ELNs, orthogonal assays — is now a primary path to scaling R&D.
But without queryable & validated data or with data locked in organizational & infrastructure silos, it leads to garbage in, garbage out or is quite simply impossible.

Imagine building software without git or pull requests: an agent's actions would be impossible to verify.
While code has git and tables have dbt/warehouses, biological data has lacked a framework for managing its unique complexity.

LaminDB fills the gap.
It is a lineage-native lakehouse that understands bio-registries and formats (`AnnData`, `.zarr`, …) based on the established open data stack:
Postgres/SQLite for metadata and cross-platform storage for datasets.
By offering queries, tracing & validation in a single API, LaminDB provides the context & memory to turn messy, agentic biological R&D into a scalable process.

</details>

<img width="800px" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/BunYmHkyFLITlM5M000D.svg">

How?

- **lineage** → track inputs & outputs of notebooks, scripts, functions & pipelines with a single line of code
- **lakehouse** → manage, monitor & validate schemas for standard and bio formats; query across many datasets
- **FAIR datasets** → validate & annotate `DataFrame`, `AnnData`, `SpatialData`, `parquet`, `zarr`, …
- **LIMS & ELN** → programmatic experimental design with bio-registries, ontologies & markdown notes
- **unified access** → storage locations (local, S3, GCP, …), SQL databases (Postgres, SQLite) & ontologies
- **reproducible** → auto-track source code & compute environments with data & code versioning
- **change management** → branching & merging similar to git, plan management for agents
- **zero lock-in** → runs anywhere on open standards (Postgres, SQLite, `parquet`, `zarr`, etc.)
- **scalable** → you hit storage & database directly through your `pydata` or R stack, no REST API involved
- **simple** → just `pip install` from PyPI or `install.packages('laminr')` from CRAN
- **distributed** → zero-copy & lineage-aware data sharing across infrastructure (databases & storage locations)
- **integrations** → [git](https://docs.lamin.ai/track#sync-code-with-git), [nextflow](https://docs.lamin.ai/nextflow), [vitessce](https://docs.lamin.ai/vitessce), [redun](https://docs.lamin.ai/redun), and [more](https://docs.lamin.ai/integrations)
- **extensible** → create custom plug-ins based on the Django ORM, the basis for LaminDB's registries

GUI, permissions, audit logs? [LaminHub](https://lamin.ai) is a collaboration hub built on LaminDB similar to how GitHub is built on git.

<details>
<summary>Who?</summary>

Scientists and engineers at leading research institutions and biotech companies, including:

- **Industry** → Pfizer, Altos Labs, Ensocell Therapeutics, ...
- **Academia & Research** → scverse, DZNE (National Research Center for Neuro-Degenerative Diseases), Helmholtz Munich (National Research Center for Environmental Health), ...
- **Research Hospitals** → Global Immunological Swarm Learning Network: Harvard, MIT, Stanford, ETH Zürich, Charité, U Bonn, Mount Sinai, ...

From personal research projects to pharma-scale deployments managing petabytes of data across:

entities | OOMs
--- | ---
observations & datasets | 10¹² & 10⁶
runs & transforms| 10⁹ & 10⁵
proteins & genes | 10⁹ & 10⁶
biosamples & species | 10⁵ & 10²
... | ...

</details>

## Quickstart

To install the Python package with recommended dependencies, use:

```shell
pip install lamindb
```

<details>
<summary>Install with minimal dependencies.</summary>

The `lamindb` package adds data-science related dependencies, those that come with the `[full]` extra, see [here](https://github.com/laminlabs/lamindb/blob/2cc91adcf6077c5af69c1a098699085bb0844083/pyproject.toml#L30-L49).

If you want a maximally lightweight install of the `lamindb` namespace, use:

```shell
pip install lamindb-core
```

This suffices to support the basic functionality but you will get an `ImportError` if you're e.g. trying to validate a `DataFrame` because that requires `pandera`.

</details>

### Query databases & load artifacts

You can browse public databases at [lamin.ai/explore](https://lamin.ai/explore). To query [laminlabs/cellxgene](https://lamin.ai/laminlabs/cellxgene), run:

```python
import lamindb as ln

db = ln.DB("laminlabs/cellxgene")  # a database object for queries
df = db.Artifact.to_dataframe()    # a dataframe listing datasets & models
```

To get a [specific dataset](https://lamin.ai/laminlabs/cellxgene/artifact/BnMwC3KZz0BuKftR), run:

```python
artifact = db.Artifact.get("BnMwC3KZz0BuKftR")  # a metadata object for a dataset
artifact.describe()                             # describe the context of the dataset
```

<details>
<summary>See the output.</summary>
<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/mxlUQiRLMU4Zos6k0001.png" width="550">
</details>

Access the content of the dataset via:

```python
local_path = artifact.cache()  # return a local path from a cache
adata = artifact.load()        # load object into memory
accessor = artifact.open()     # return a streaming accessor
```

You can query by biological entities like `Disease` through plug-in `bionty`:

```python
alzheimers = db.bionty.Disease.get(name="Alzheimer disease")
df = db.Artifact.filter(diseases=alzheimers).to_dataframe()
```

### Configure your database

You can create a LaminDB instance at [lamin.ai](https://lamin.ai) and invite collaborators.
To connect to an existing instance, run:

```shell
# log into LaminHub
lamin login
# then either
lamin connect account/name  # connect globally in your environment
# or
lamin connect --here account/name  # connect in your current development directory
```

If you prefer to init a new instance instead (no login required), run:

```shell
lamin init --storage ./quickstart-data --modules bionty
```

For more configuration, read: [docs.lamin.ai/setup](https://docs.lamin.ai/setup).

On the terminal and in a Python session, LaminDB will now auto-connect.

### Save files & folders as artifacts

To save a file or folder via the API:

```python
import lamindb as ln
# → connected lamindb: account/instance

open("sample.fasta", "w").write(">seq1\nACGT\n")        # create dataset
ln.Artifact("sample.fasta", key="sample.fasta").save()  # save dataset
```

To save a file or folder via the CLI, run:

```shell
lamin save sample.fasta --key sample.fasta
```

To load an artifact via the CLI into a local cache, run:

```shell
lamin load --key sample.fasta
```

Read more about the CLI: [docs.lamin.ai/cli](https://docs.lamin.ai/cli).

### Lineage: scripts & notebooks

To create a dataset while tracking source code, inputs, outputs, logs, and environment:

```python
import lamindb as ln
# → connected lamindb: account/instance

ln.track()                                              # track code execution
open("sample.fasta", "w").write(">seq1\nACGT\n")        # create dataset
ln.Artifact("sample.fasta", key="sample.fasta").save()  # save dataset
ln.finish()                                             # mark run as finished
```

Running this snippet as a script (`python create-fasta.py`) produces the following data lineage:

```python
artifact = ln.Artifact.get(key="sample.fasta")  # get artifact by key
artifact.describe()      # context of the artifact
artifact.view_lineage()  # fine-grained lineage
```

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/BOTCBgHDAvwglN3U0004.png" width="550"> <img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/EkQATsQL5wqC95Wj0006.png" width="140">

Watch a mini video: [youtu.be/jwnHu1PbA9Q](https://youtu.be/jwnHu1PbA9Q)

<details>
<summary>Access run & transform.</summary>

```python
run = artifact.run              # get the run object
transform = artifact.transform  # get the transform object
run.describe()                  # context of the run
```

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/rJrHr3XaITVS4wVJ0000.png" width="550" />

```python
transform.describe()  # context of the transform
```

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/JYwmHBbgf2MRCfgL0000.png" width="550" />

</details>

<details>
<summary>Track a project or an agent plan.</summary>

Pass a project/artifact to `ln.track()`, for example:

```python
ln.track(project="My project", plan="./plans/curate-dataset-x.md")
```

Note that you have to create a project or save the agent plan in case they don't yet exist:

```shell
# create a project with the CLI
lamin create project "My project"

# save an agent plan with the CLI
lamin save /path/to/.cursor/plans/curate-dataset-x.plan.md
lamin save /path/to/.claude/plans/curate-dataset-x.md
```

Or in Python:

```python
ln.Project(name="My project").save()  # create a project in Python
```

</details>


### Lineage: functions & workflows

You can achieve the same traceability for functions & workflows:

<!-- #skip_laminr -->

```python
import lamindb as ln

@ln.flow()
def create_fasta(fasta_file: str = "sample.fasta"):
    open(fasta_file, "w").write(">seq1\nACGT\n")    # create dataset
    ln.Artifact(fasta_file, key=fasta_file).save()  # save dataset

if __name__ == "__main__":
    create_fasta()
```

<!-- #end_skip_laminr -->

Beyond what you get for scripts & notebooks, this automatically tracks function & CLI params and integrates well with established Python workflow managers: [docs.lamin.ai/track](https://docs.lamin.ai/track). To integrate advanced bioinformatics pipeline managers like Nextflow, see [docs.lamin.ai/pipelines](https://docs.lamin.ai/pipelines).

<details>
<summary>A richer example.</summary>

Here is an automatically generated re-construction of the project of [Schmidt _et al._ (Science, 2022)](https://pubmed.ncbi.nlm.nih.gov/35113687/):

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/KQmzmmLOeBN0C8Yk0004.png" width="850">

A phenotypic CRISPRa screening result is integrated with scRNA-seq data. Here is the result of the screen input:

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/JvLaK9Icj11eswQn0000.png" width="850">

You can explore it [here](https://lamin.ai/laminlabs/lamindata/artifact/W1AiST5wLrbNEyVq) on LaminHub or [here](https://github.com/laminlabs/schmidt22) on GitHub.

</details>

### Labeling & queries by fields

You can label an artifact by running:

```python
my_label = ln.ULabel(name="My label").save()   # a universal label
project = ln.Project(name="My project").save() # a project label
artifact.ulabels.add(my_label)
artifact.projects.add(project)
```

Query for it:

```python
ln.Artifact.filter(ulabels=my_label, projects=project).to_dataframe()
```

You can also query by the metadata that lamindb automatically collects:

```python
ln.Artifact.filter(run=run).to_dataframe()              # by creating run
ln.Artifact.filter(transform=transform).to_dataframe()  # by creating transform
ln.Artifact.filter(size__gt=1e6).to_dataframe()         # size greater than 1MB
```

If you want to include more information into the resulting dataframe, pass `include`.

```python
ln.Artifact.to_dataframe(include=["created_by__name", "storage__root"])  # include fields from related registries
```

Note: The query syntax for `DB` objects and for your default database is the same.

### The core data model

Here is an overview that illustrates how `Artifact` links to all other registries:

<img width="800px" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/HMfWLa1rFkxcxQEN0000.svg">

### Queries by features

You can annotate datasets and samples with features. Let's define some:

```python
from datetime import date

ln.Feature(name="gc_content", dtype=float).save()
ln.Feature(name="experiment_note", dtype=str).save()
ln.Feature(name="experiment_date", dtype=date, coerce=True).save()  # accept date strings
```

During annotation, feature names and data types are validated against these definitions.

```python
artifact.features.set_values({
    "gc_content": 0.55,
    "experiment_note": "Looks great",
    "experiment_date": "2025-10-24",
})
```

Query for it:

```python
ln.Artifact.filter(experiment_date="2025-10-24").to_dataframe()  # query all artifacts annotated with `experiment_date`
```

If you want to include the feature values into the dataframe, pass `include`.

```python
ln.Artifact.to_dataframe(include="features")  # include the feature annotations
```

### Lake ♾️ LIMS ♾️ Sheets

You can create records for the entities underlying your experiments: samples, perturbations, instruments, etc., for example:

```python
ln.Record(name="Sample 1", features={"gc_content": 0.5}).save()
```

You can create relationships of entities:

```python
# create a flexible record type to track experiments
experiment_type = ln.Record(name="Experiment", is_type=True).save()

# create a record of type `Experiment` for your first experiment
ln.Record(name="Experiment 1", type=experiment_type).save()

# create a feature to link experiments in records, dataframes, etc.
ln.Feature(name="experiment", dtype=experiment_type).save()

# create a sample record that links the sample to `Experiment 1` via the `experiment` feature
ln.Record(name="Sample 2", features={"gc_content": 0.5, "experiment": "Experiment 1"}).save()
```

You can convert any record type to dataframe/sheet:

```python
experiment_type.to_dataframe()
```

<details>
<summary>You can edit records like Excel sheets on LaminHub.</summary>
<img width="800px" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/XSzhWUb0EoHOejiw0001.png">
</details>

### Data versioning

If you change source code or datasets, LaminDB manages versioning for you.
Assume you run a new version of our `create-fasta.py` script to create a new version of `sample.fasta`.

```python
import lamindb as ln

ln.track()
open("sample.fasta", "w").write(">seq1\nTGCA\n")  # a new sequence
ln.Artifact("sample.fasta", key="sample.fasta", features={"experiment": "Experiment 1"}).save()  # annotate with the new experiment
ln.finish()
```

If you now query by `key`, you'll get the latest version of this artifact:

```python
artifact = ln.Artifact.get(key="sample.fasta")  # get artifact by key
artifact.versions.to_dataframe()                # see all versions of that artifact
```

### Change management

To create a contribution branch and switch to it, run:

```shell
lamin switch -c my_branch
```

To merge a contribution branch into `main`, run:

```shell
lamin switch main  # switch to the main branch
lamin merge my_branch  # merge contribution branch into main
```

Read more: [docs.lamin.ai/lamindb.branch](https://docs.lamin.ai/lamindb.branch).

### Data sharing

To share data in a lineage-aware way, sync objects from a source database to your default database:

```python
db = ln.DB("laminlabs/lamindata")
artifact = db.Artifact.get(key="example_datasets/mini_immuno/dataset1.h5ad")
artifact.save()
```

This is zero-copy for the artifact's data in storage. Read more: [docs.lamin.ai/sync](https://docs.lamin.ai/sync).

### Lakehouse ♾️ feature store

Here is how you ingest a `DataFrame`:

```python
import pandas as pd

df = pd.DataFrame({
    "sequence_str": ["ACGT", "TGCA"],
    "gc_content": [0.55, 0.54],
    "experiment_note": ["Looks great", "Ok"],
    "experiment_date": [date(2025, 10, 24), date(2025, 10, 25)],
})
ln.Artifact.from_dataframe(df, key="my_datasets/sequences.parquet").save()  # no validation
```

To validate & annotate the content of the dataframe, use the built-in schema `valid_features`:

```python
ln.Feature(name="sequence_str", dtype=str).save()  # define a remaining feature
artifact = ln.Artifact.from_dataframe(
    df,
    key="my_datasets/sequences.parquet",
    schema="valid_features"  # validate columns against features
).save()
artifact.describe()
```

Watch a mini video: [youtu.be/Ji6E7hTnReQ](https://youtu.be/Ji6E7hTnReQ)

You can filter for datasets by schema and then launch distributed queries and batch loading.

### Lakehouse beyond tables

To validate an `AnnData` with built-in schema `ensembl_gene_ids_and_valid_features_in_obs`, call:

```python
import anndata as ad
import numpy as np
import pandas as pd

adata = ad.AnnData(
    X=np.ones((21, 10)),
    obs=pd.DataFrame({'cell_type_by_model': ['T cell', 'B cell', 'NK cell'] * 7}),
    var=pd.DataFrame(index=[f'ENSG{i:011d}' for i in range(10)])
)
artifact = ln.Artifact.from_anndata(
    adata,
    key="my_datasets/scrna.h5ad",
    schema="ensembl_gene_ids_and_valid_features_in_obs"
).save()
artifact.describe()
```

To validate a `SpatialData` or any other array-like dataset, you need to construct a `Schema`. You can do this by composing simple `pandera`-style schemas: [docs.lamin.ai/curate](https://docs.lamin.ai/curate).

### Ontologies

Plugin `bionty` gives you >20 public ontologies as `SQLRecord` registries. This was used to validate the `ENSG` ids in the `adata` just before.

```python
import bionty as bt

bt.CellType.import_source()  # import the default ontology
bt.CellType.to_dataframe()   # your extensible cell type ontology in a simple registry
```

You can then create objects, e.g. for labeling, analogous to `ULabel`, `Project`, or `Record`:

```python
t_cell = bt.CellType.get(name="T cell")
artifact.cell_types.add(t_cell)
```

Read more: [docs.lamin.ai/manage-ontologies](https://docs.lamin.ai/manage-ontologies).

Watch a mini video: [youtu.be/3vpWjHj3Kw8](https://youtu.be/3vpWjHj3Kw8)

### Save unstructured notes

When in your development directory, you can save markdown files as records:

```shell
lamin save <topic>/<my-note.md>
```


================================================
FILE: docs/api.md
================================================
# API Reference

<meta http-equiv="Refresh" content="0; url=./lamindb.html" />

```{toctree}
:maxdepth: 1
:caption: CLI & lamindb
:hidden:

cli
lamindb
```

```{toctree}
:maxdepth: 1
:caption: Modules
:hidden:

bionty
pertdb
```


================================================
FILE: docs/arrays.md
================================================
---
execute_via: python
---

# Stream datasets from storage

This guide walks through streaming datasets from disk or cloud storage.

```python
# replace with your username and S3 bucket
!lamin login testuser1
!lamin init --storage s3://lamindb-ci/test-arrays
```

Import lamindb and track this notebook.

```python
import lamindb as ln
import numpy as np

ln.track()
db = ln.DB("laminlabs/lamindata")  # we'll pull dataset from there
```

## DataFrame

### Streaming from a single artifact

A dataframe stored as sharded `parquet`.

```python
artifact = db.Artifact.get(key="sharded_parquet")
```

```python
artifact.path.view_tree()
```

```python
dataset = artifact.open()
```

This returns a [pyarrow dataset](https://arrow.apache.org/docs/python/dataset.html).

```python
dataset
```

```python
dataset.head(5).to_pandas()
```

### Streaming from a set of artifacts

You can open several parquet files as a single dataset by calling `.open()` on the result of a query:

```python
dataset = db.Artifact.filter(
    key__startswith="example_datasets/small", suffix=".parquet", is_latest=True
).open()  # open an ArtifactSet for streaming
dataset
```

The same is possible for the artifacts in a collection:

```python
collection = db.Collection.get(key="sharded_parquet_collection")
dataset = collection.open()
dataset
```

Once you have a storage-backed dataset, you can query it like this:

```python
dataset.to_table().to_pandas()
```

By default `Artifact.open()` and `Collection.open()` use `pyarrow` to lazily open dataframes. `polars` can be also used by passing `engine="polars"`. Note also that `.open(engine="polars")` returns a context manager with [LazyFrame](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html).

```python
with collection.open(engine="polars", use_fsspec=True) as lazy_df:
    display(lazy_df.collect().to_pandas())
```

## AnnData

We'll need some test data:

```python
ln.Artifact("s3://lamindb-ci/test-arrays/pbmc68k.h5ad").save()
ln.Artifact("s3://lamindb-ci/test-arrays/testfile.hdf5").save()
```

An `h5ad` artifact stored on s3:

```python
artifact = ln.Artifact.get(key="pbmc68k.h5ad")
```

```python
artifact.path
```

```python
adata = artifact.open()
```

This object is an `AnnDataAccessor` object, an `AnnData` object backed in the cloud:

```python
adata
```

Without subsetting, the `AnnDataAccessor` object references underlying lazy `h5` or `zarr` arrays:

```python
adata.X
```

You can subset it like a normal `AnnData` object:

```python
obs_idx = adata.obs.cell_type.isin(["Dendritic cells", "CD14+ Monocytes"]) & (
    adata.obs.percent_mito <= 0.05
)
adata_subset = adata[obs_idx]
adata_subset
```

Subsets load arrays into memory upon direct access:

```python
adata_subset.X
```

To load the entire subset into memory as an actual `AnnData` object, use `to_memory()`:

```python
adata_subset.to_memory()
```

It is also possible to add columns to `.obs` and `.var` of cloud AnnData objects without downloading them. First, create a new `AnnData` `zarr` artifact:

```python
adata_subset.to_memory().write_zarr("adata_subset.zarr")
artifact = ln.Artifact(
    "adata_subset.zarr", description="test add column to adata"
).save()
```

This is how you add a column:

```python
with artifact.open(mode="r+") as adata_accessor:
    adata_accessor.add_column(where="obs", col_name="ones", col=np.ones(adata_accessor.shape[0]))
    display(adata_accessor)
```

The version of the artifact is updated after the modification.

```python
artifact
```

```python
artifact.delete(permanent=True)
```

## SpatialData

It is also possible to access `AnnData` objects inside `SpatialData` `tables`:

```python
artifact = ln.Artifact.connect("laminlabs/lamindata").get(
    key="visium_aligned_guide_min.zarr"
)

access = artifact.open()
```

```python
access
```

```python
access.tables
```

This gives you the same `AnnDataAccessor` object as for a normal `AnnData`.

```python
table = access.tables["table"]

table
```

You can subset it and read into memory as an actual `AnnData`:

```python
table_subset = table[table.obs["clone"] == "diploid"]

table_subset
```

<!-- #region -->

```python
adata = table_subset.to_memory()
```

<!-- #endregion -->

## Generic HDF5

Let us query a generic HDF5 artifact:

```python
artifact = ln.Artifact.get(key="testfile.hdf5")
```

And get a backed accessor:

```python
backed = artifact.open()
```

The returned object contains the `.connection` and `h5py.File` or `zarr.Group` in `.storage`

```python
backed
```

```python
backed.storage
```

```python
# clean up test instance
ln.setup.delete("test-arrays", force=True)
```


================================================
FILE: docs/bionty.md
================================================
# `bionty`

```{eval-rst}
.. automodule:: bionty
```


================================================
FILE: docs/changelog.md
================================================
# Changelog

Actual content in lamin-docs.


================================================
FILE: docs/curate.md
================================================
---
execute_via: python
---

# Validate & standardize datasets

Data curation with LaminDB ensures your datasets are **validated** and **queryable** through **annotation**.

```{raw} html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Ji6E7hTnReQ?si=K0OnU2MTGv4fIhFo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
```

Curating a dataset with LaminDB means three things:

- **Validate** that the dataset matches a desired schema.
- **Standardize** the dataset (e.g., by fixing typos, mapping synonyms) or update registries if validation fails.
- **Annotate** the dataset by linking it against metadata entities so that it becomes queryable.

In this guide we'll curate common data structures.
Here is a [guide](/faq/curate-any) for the underlying low-level API.

Note: If you know either `pydantic` or `pandera`, here is an [FAQ](/faq/pydantic-pandera) that compares LaminDB with both of these tools.

```python
# pip install lamindb
!lamin init --storage ./test-curate --modules bionty
```

```python
import lamindb as ln

ln.track()
```

<!-- #region -->

## Schema design patterns

A {class}`~lamindb.Schema` in LaminDB is a specification that defines the expected structure, data types, and validation rules for a dataset.
It is similar to `pydantic.Model` for dictionaries, and `pandera.Schema`, and `pyarrow.lib.Schema` for tables, but supporting more complicated data structures.

Schemas ensure data consistency by defining:

- What {class}`~lamindb.Feature`s (dimensions) exist in your dataset
- What data types those features should have
- What values are valid for categorical features
- Which {class}`~lamindb.Feature`s are required vs optional

An exemplary schema:

```python
schema = ln.Schema(
    name="experiment_schema",           # human-readable name
    features=[                          # required features
        ln.Feature(name="cell_type", dtype=bt.CellType),
        ln.Feature(name="treatment", dtype=str),
    ],
    otype="DataFrame"                   # object type (DataFrame, AnnData, etc.)
)
```

For composite data structures using slots:

```{dropdown} What are slots?

For composite data structures, you need to specify which component contains which schema, for example, to validate both cell metadata in `.obs` and gene metadata in `.var` within the same schema.
Each slot is a key like `"obs"` for AnnData observations,`"rna:var"` for MuData modalities, or `"attrs:nested:key"` for SpatialData annotations.
```

```python
# AnnData with multiple "slots"
adata_schema = ln.Schema(
    otype="AnnData",
    slots={
        "obs": cell_metadata_schema,     # cell annotations
        "var.T": gene_id_schema          # gene-derived features
    }
)
```

Before diving into curation, let's understand the different schema approaches and when to use each one.
Think of schemas as rules that define what valid data should look like.

<!-- #endregion -->

### Flexible schema

Use when: You want to validate those columns whose names match feature names in your `Feature` registry.

```{eval-rst}
.. literalinclude:: scripts/define_valid_features.py
   :language: python
```

### Minimal required schema

Use when: You need certain columns but want flexibility for additional metadata.

```{eval-rst}
.. literalinclude:: scripts/define_mini_immuno_schema_flexible.py
   :language: python
```

<!-- #region -->

### Strict Schema

Use when: You need complete control over data structure and values.

```python
# Only allows specified columns
schema = ln.Schema(
    features=[...],
    minimal_set=True,  # whether all passed features are required
    maximal_set=False  # whether additional features are allowed
)
```

<!-- #endregion -->

## DataFrame

### Step 1: Load and examine your data

We'll be working with the mini immuno dataset:

```python
df = ln.examples.datasets.mini_immuno.get_dataset1(
    with_cell_type_synonym=True, with_cell_type_typo=True
)
df
```

### Step 2: Set up your metadata registries

Before creating a schema, ensure your registries have the right features and labels:

```{eval-rst}
.. literalinclude:: scripts/define_mini_immuno_features_labels.py
   :language: python
```

### Step 3: Create your schema

```python
schema = ln.examples.datasets.mini_immuno.define_mini_immuno_schema_flexible()
schema.describe()
```

<!-- #region -->

### Step 4: Initialize Curator and first validation

If you expect the validation to pass, you can directly register an artifact by providing the schema:

```python

artifact = ln.Artifact.from_dataframe(df, key="examples/my_curated_dataset.parquet", schema=schema).save()
```

<!-- #endregion -->

The {meth}`~lamindb.curators.core.Curator.validate` method validates that your dataset adheres to the criteria defined by the `schema`.
It identifies which values are already validated (exist in the registries) and which are potentially problematic (do not yet exist in our registries).

```python
try:
    curator = ln.curators.DataFrameCurator(df, schema)
    curator.validate()
except ln.errors.ValidationError as error:
    print(error)
```

### Step 5: Fix validation issues

```python
# check the non-validated terms
curator.cat.non_validated
```

For `cell_type_by_expert`, we saw 2 terms are not validated.

First, let's standardize synonym "B-cell" as suggested

```python
curator.cat.standardize("cell_type_by_expert")
```

```python
# now we have only one non-validated cell type left
curator.cat.non_validated
```

For "CD8-pos alpha-beta T cell", let's understand which cell type in the public ontology might be the actual match.

```python
# to check the correct spelling of categories, pass `public=True` to get a lookup object from public ontologies
# use `lookup = curator.cat.lookup()` to get a lookup object of existing records in your instance
lookup = curator.cat.lookup(public=True)
lookup
```

```python
# here is an example for the "cell_type" column
cell_types = lookup["cell_type_by_expert"]
cell_types.cd8_positive_alpha_beta_t_cell
```

```python
# fix the cell type name
df["cell_type_by_expert"] = df["cell_type_by_expert"].cat.rename_categories(
    {"CD8-pos alpha-beta T cell": cell_types.cd8_positive_alpha_beta_t_cell.name}
)
```

For perturbation, we want to add the new values: "DMSO", "IFNG"

```python
# this adds perturbations that were _not_ validated
curator.cat.add_new_from("perturbation")
```

```python
ln.Feature.get(name="perturbation")
```

```python
# validate again
curator.validate()
```

### Step 6: Save your curated dataset

```python
artifact = curator.save_artifact(key="examples/my_curated_dataset.parquet")
```

```python
artifact.describe()
```

## Common fixes

This section covers the most frequent curation issues and their solutions.
Use this as a reference when validation fails.

### Feature validation issues

<!-- #region -->

**Issue**: "Column not in dataframe"

```
"column 'treatment' not in dataframe. Columns in dataframe: ['drug', 'timepoint', ...]"
```

**Solutions**:

```python
# Solution 1: Rename columns to match schema
df = df.rename(columns={
    'treatment': 'drug',
    'time': 'timepoint',
    ...
})

# Solution 2: Create missing columns
df['treatment'] = 'unknown'  # Add with default value (or define Feature.default_value)

# Solution 3: Modify schema to match your data
schema = ln.Schema(
    features=[
        ln.Feature.get(name="drug"),  # Use actual column name
        ln.Feature.get(name="timepoint"),
    ],
    ...
)
```

<!-- #endregion -->

### Value validation issues

<!-- #region -->

**Issue**: "Terms not validated in feature 'perturbation'"

```
2 terms not validated in feature 'cell_type': 'B-cell', 'CD8-pos alpha-beta T cell'
    1 synonym found: "B-cell" → "B cell"
    → curate synonyms via: .standardize("cell_type")
    for remaining terms:
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('cell_type')
```

**Solutions**:

```python
# Solution 1: Use automatic standardization if given hint (handles synonyms))
curator.cat.standardize('cell_type')

# Solution 2: Manual mapping for complex cases
value_mapping = {
    'T-cells': 'T cell',
    'B-cells': 'B cell',
}
df['cell_type'] = df['cell_type'].map(value_mapping).fillna(df['cell_type'])

# Solution 3: Use public ontology lookup for correct names
lookup = curator.cat.lookup(public=True)
cell_types = lookup["cell_type"]
df['cell_type'] = df['cell_type'].cat.rename_categories({
    'CD8-pos T cell': cell_types.cd8_positive_alpha_beta_t_cell.name
})

# Solution 4: Add new legitimate terms
curator.cat.add_new_from("cell_type")
```

<!-- #endregion -->

### Data type issues

<!-- #region -->

**Issue**: "Expected categorical data, got object"

```
TypeError: Expected categorical data for cell_type, got object
```

**Solutions**:

```python
# Solution 1: Convert to categorical
df['cell_type'] = df['cell_type'].astype('category')

# Solution 2: Use coercion in feature definition
ln.Feature(name="cell_type", dtype=bt.CellType, coerce=True).save()
```

<!-- #endregion -->

### Organism-specific ontology issues

<!-- #region -->

**Issue**: "Terms not validated" for organism-specific ontologies like developmental stages

```
2 terms not validated in feature 'developmental_stage_ontology_id': 'MmusDv:0000142', 'MmusDv:0000022'
```

**Solution**: Specify organism-specific source in feature definition using `cat_filters`:

```python
# When defining the schema, specify the organism-specific source
mouse_source = bt.Source.filter(
    entity="bionty.DevelopmentalStage",
    organism="mouse"
).one()

schema = ln.Schema(
    features=[
        ln.Feature(
            name="developmental_stage_ontology_id",
            dtype=bt.DevelopmentalStage.ontology_id,
            cat_filters={"source": mouse_source}  # Specify organism-specific source
        )
    ],
    ...
)
```

This pattern applies to any ontology where the same registry serves multiple organisms (e.g., `DevelopmentalStage`, `Phenotype`, ...).

<!-- #endregion -->

## External data validation

Since not all metadata is always stored within the dataset itself, it is also possible to validate external metadata.

```{eval-rst}
.. literalinclude:: scripts/curate_dataframe_external_features.py
   :language: python
   :caption: curate_dataframe_external_features.py
```

```python
!python scripts/curate_dataframe_external_features.py
```

## Union dtypes

Some metadata columns might validate against several registries.

```{eval-rst}
.. literalinclude:: scripts/curate_dataframe_union_features.py
   :language: python
   :caption: curate_dataframe_union_features.py
```

```python
!python scripts/curate_dataframe_union_features.py
```

## AnnData

`AnnData` like all other data structures that follow is a composite structure that stores different arrays in different `slots`.

### Allow a flexible schema

We can also allow a flexible schema for an `AnnData` and only require that it's indexed with Ensembl gene IDs.

```{eval-rst}
.. literalinclude:: scripts/curate_anndata_flexible.py
   :language: python
   :caption: curate_anndata_flexible.py
```

Let's run the script.

```python
!python scripts/curate_anndata_flexible.py
```

Under-the-hood, this uses the following build-in schema ({func}`~lamindb.examples.schemas.anndata_ensembl_gene_ids_and_valid_features_in_obs`):

```{eval-rst}
.. literalinclude:: scripts/define_schema_anndata_ensembl_gene_ids_and_valid_features_in_obs.py
   :language: python
```

This schema tranposes the `var` DataFrame during curation, so that one validates and annotates the columns of `var.T`, i.e., `[ENSG00000153563, ENSG00000010610, ENSG00000170458]`.
If one doesn't transpose, one would annotate the columns of `var`, i.e., `[gene_symbol, gene_type]`.

```{eval-rst}
.. image:: https://lamin-site-assets.s3.amazonaws.com/.lamindb/gLyfToATM7WUzkWW0001.png
    :width: 800px
```

### Fix validation issues

```python
adata = ln.examples.datasets.mini_immuno.get_dataset1(
    with_gene_typo=True, with_cell_type_typo=True, otype="AnnData"
)
adata
```

```python
schema = ln.examples.schemas.anndata_ensembl_gene_ids_and_valid_features_in_obs()
schema.describe()
```

Check the slots of a schema:

```python
schema.slots
```

```python
curator = ln.curators.AnnDataCurator(adata, schema)
try:
    curator.validate()
except ln.errors.ValidationError as error:
    print(error)
```

As above, we leverage a lookup object with valid cell types to find the correct name.

```python
valid_cell_types = curator.slots["obs"].cat.lookup()["cell_type_by_expert"]
adata.obs["cell_type_by_expert"] = adata.obs[
    "cell_type_by_expert"
].cat.rename_categories(
    {"CD8-pos alpha-beta T cell": valid_cell_types.cd8_positive_alpha_beta_t_cell.name}
)
```

The validated `AnnData` can be subsequently saved as an {class}`~lamindb.Artifact`:

```python
adata.obs.columns
```

```python
curator.slots["var.T"].cat.add_new_from("columns")
```

```python
curator.validate()
```

```python
artifact = curator.save_artifact(key="examples/my_curated_anndata.h5ad")
```

Access the schema for each slot:

```python
artifact.features.slots
```

The saved artifact has been annotated with validated features and labels:

```python
artifact.describe()
```

## Unstructured dictionaries

Most datastructures support unstructured metadata stored as dictionaries:

- Pandas DataFrames: `.attrs`
- AnnData: `.uns`
- MuData: `.uns` and `modality:uns`
- SpatialData: `.attrs`

Here, we exemplary show how to curate such metadata for AnnData:

```{eval-rst}
.. literalinclude:: scripts/define_schema_anndata_uns.py
   :language: python
   :caption: define_schema_anndata_uns.py
```

```python
!python scripts/define_schema_anndata_uns.py
```

```{eval-rst}
.. literalinclude:: scripts/curate_anndata_uns.py
   :language: python
   :caption: curate_anndata_uns.py
```

```python
!python scripts/curate_anndata_uns.py
```

## MuData

```{eval-rst}
.. literalinclude:: scripts/curate_mudata.py
   :language: python
   :caption: curate_mudata.py
```

```python
!python scripts/curate_mudata.py
```

## SpatialData

```{eval-rst}
.. literalinclude:: scripts/define_schema_spatialdata.py
   :language: python
   :caption: define_schema_spatialdata.py
```

```python
!python scripts/define_schema_spatialdata.py
```

```{eval-rst}
.. literalinclude:: scripts/curate_spatialdata.py
   :language: python
   :caption: curate_spatialdata.py
```

```python
!python scripts/curate_spatialdata.py
```

## TiledbsomaExperiment

```{eval-rst}
.. literalinclude:: scripts/curate_soma_experiment.py
   :language: python
   :caption: curate_soma_experiment.py
```

```python
!python scripts/curate_soma_experiment.py
```

## Other data structures

If you have other data structures, read: {doc}`/faq/curate-any`.

```python
!rm -rf ./test-curate
!rm -rf ./small_dataset.tiledbsoma
!lamin delete --force test-curate
```


================================================
FILE: docs/faq/acid.md
================================================
---
execute_via: python
---

# Will data & metadata stay in sync?

Here, we walk through different errors that can occur while saving artifacts & metadata records, and show that the LaminDB instance does not get corrupted by dangling metadata or artifacts.

Transactions within Python across data & metadata are [ACID](https://en.wikipedia.org/wiki/ACID).

If an upload process is externally killed and Python cannot run clean-up operations anymore, the artifact is internally still flagged with `artifact._storage_ongoing = True`. This is visible on the UI. You can then re-run `lamin save` or `artifact.save()` to attempt uploading the artifact a second time.

```python
!lamin init --storage ./test-acid
```

```python
import pytest
import lamindb as ln
from upath import UPath

ln.settings.verbosity = "debug"
```

```python
open("sample.fasta", "w").write(">seq1\nACGT\n")
```

## Save error due to failed upload within Python

Let's try to save an artifact to a storage location without permission.

```python
artifact = ln.Artifact("sample.fasta", key="sample.fasta")
```

Because the public API only allows you to set a default storage for which you have permission, we need to hack it:

```python
ln.settings.storage._root = UPath("s3://nf-core-awsmegatests")
```

This raises an exception but nothing gets saved:

```python
with pytest.raises(PermissionError) as error:
    artifact.save()
print(error.exconly())
assert len(ln.Artifact.filter()) == 0
```

## Save error during bulk creation

```python
artifacts = [artifact, "this is not a record"]
```

This raises an exception but nothing gets saved:

```python
with pytest.raises(Exception) as error:
    ln.save(artifacts)
print(error.exconly())
assert len(ln.Artifact.filter()) == 0  # nothing got saved
```

If a list of data objects is passed to `ln.save()` and the upload of one of these data objects fails, the successful uploads are maintained and a `RuntimeError` is raised, listing the successfully uploaded data objects up until that point.

## Save error due to externally aborted upload

Back to a proper storage location:

```python
ln.settings.storage._root = UPath("./test-acid").absolute()
```

The save operation works:

```python
artifact.save()
```

Let's pretend the upload was killed.

```python
artifact._storage_ongoing = True
artifact.save()
artifact.path.unlink()
assert artifact._aux == {"so": 1}  # storage/upload is ongoing
```

We can re-run it:

```python
artifact = ln.Artifact("sample.fasta", key="sample.fasta").save()
```

```python
assert not artifact._storage_ongoing
assert artifact._aux is None
```

```python
!rm -r ./test-acid
!lamin delete --force test-acid
```


================================================
FILE: docs/faq/curate-any.md
================================================
---
execute_via: python
---

# How do I validate & annotate arbitrary data structures?

This guide walks through the low-level API that lets you validate iterables.

You can then use the records create inferred during validation to annotate a dataset.

:::{dropdown} How do I validate based on a public ontology?

LaminDB makes it easy to validate categorical variables based on registries that inherit from {class}`~lamindb.models.CanCurate`.

{class}`~lamindb.models.CanCurate` methods validate against the registries in your LaminDB instance.
In {doc}`/manage-ontologies`, you'll see how to extend standard validation to validation against _public references_ using a `PubliOntology` object, e.g., via `public_genes = bt.Gene.public()`.
By default, {meth}`~lamindb.models.CanCurate.from_values` considers a match in a public reference a validated value for any {mod}`bionty` entity.

:::

```python
# pip install 'lamindb[zarr]'
!lamin init --storage ./test-curate-any --modules bionty
```

Define a test dataset.

```python
import lamindb as ln
import bionty as bt
import zarr
import numpy as np

data = zarr.open_group(store="data.zarr", mode="a")

data.create_dataset(name="temperature", shape=(3,), dtype="float32")
data.create_dataset(name="knockout_gene", shape=(3,), dtype=str)
data.create_dataset(name="disease", shape=(3,), dtype=str)

data["knockout_gene"][:] = np.array(
    ["ENSG00000139618", "ENSG00000141510", "ENSG00000133703"]
)
data["disease"][:] = np.random.default_rng().choice(
    ["MONDO:0004975", "MONDO:0004980"], 3
)
```

## Validate and standardize vectors

Read the `disease` array from the zarr group into memory.

```python
disease = data["disease"][:]
```

{meth}`~lamindb.models.CanCurate.validate` validates vectore-like values against reference values in a registry.
It returns a boolean vector indicating where a value has an exact match in the reference values.

```python
bt.Disease.validate(disease, field=bt.Disease.ontology_id)
```

When validation fails, you can call {meth}`~lamindb.models.CanCurate.inspect` to figure out what to do.

{meth}`~lamindb.models.CanCurate.inspect` applies the same definition of validation as {meth}`~lamindb.models.CanCurate.validate`, but returns a rich return value {class}`~lamindb.models.InspectResult`. Most importantly, it logs recommended curation steps that would render the data validated.

Note: you can use {meth}`~lamindb.models.CanCurate.standardize` to standardize synonyms.

```python
bt.Disease.inspect(disease, field=bt.Disease.ontology_id)
```

Bulk creating records using {meth}`~lamindb.models.CanCurate.from_values` only returns validated records.

```python
diseases = bt.Disease.from_values(disease, field=bt.Disease.ontology_id).save()
```

Repeat the process for more labels:

```python
experiments = ln.Record.from_values(
    ["Experiment A", "Experiment B"],
    field=ln.Record.name,
    create=True,  # create non-validated labels
).save()
genes = bt.Gene.from_values(
    data["knockout_gene"][:], field=bt.Gene.ensembl_gene_id
).save()
```

## Annotate the dataset

Register the dataset as an artifact:

```python
artifact = ln.Artifact("data.zarr", key="my_dataset.zarr").save()
```

Annotate with features:

```python
ln.Feature(name="experiment", dtype=ln.Record).save()
ln.Feature(name="disease", dtype=bt.Disease.ontology_id).save()
ln.Feature(name="knockout_gene", dtype=bt.Gene.ensembl_gene_id).save()
artifact.features.set_values(
    {"experiment": experiments, "knockout_gene": genes, "disease": diseases}
)
artifact.describe()
```

```python
# clean up test instance
!rm -r data.zarr
!rm -r ./test-curate-any
!lamin delete --force test-curate-any
```


================================================
FILE: docs/faq/idempotency.md
================================================
---
execute_via: python
---

# Will data get duplicated upon re-running code?

LaminDB's operations are idempotent in the sense defined here, which allows you to re-run code without duplicating data.

:::{admonition} SQLRecords with `name` field

When you instantiate {class}`~lamindb.models.SQLRecord` with a name, in case a name has an _exact match_ in a registry, the constructor returns it instead of creating a new record. In case records with _similar names_ exist, you'll see them in a table: you can then decide whether you want to save the new record or pick an existing record.

If you set {attr}`~lamindb.core.subsettings.CreationSettings.search_names` to `False`, you bypass these checks.

:::

:::{admonition} Artifacts & collections

If you instantiate {class}`~lamindb.Artifact` from data that already exists as an artifact, the `Artifact()` constructor returns the existing artifact based on a hash lookup.

:::

```python
# pip install lamindb
!lamin init --storage ./test-idempotency
```

```python
import lamindb as ln

ln.track("ANW20Fr4eZgM0000")
```

## SQLRecords with name field

```python
assert ln.settings.creation.search_names
```

Let us add a first record to the {class}`~lamindb.Record` registry:

```python
label = ln.Record(name="My label 1").save()
```

If we create a new record, we'll automatically get search results that give clues on whether we are prone to duplicating an entry:

```python
label = ln.Record(name="My label 1a")
```

Let's save the `1a` label, we actually intend to create it.

```python
label.save()
```

In case we match an existing name directly, we'll get the existing object:

```python
label = ln.Record(name="My label 1")
```

If we save it again, it will not create a new entry in the registry:

```python
label.save()
```

Now, if we create a third record, we'll get two alternatives:

```python
label = ln.Record(name="My label 1b")
```

If we prefer to not perform a search, e.g. for performance reasons, we can switch it off.

```python
ln.settings.creation.search_names = False
label = ln.Record(name="My label 1c")
```

Switch it back on:

```python
ln.settings.creation.search_names = True
```

## Artifacts & collections

```python
filepath = ln.examples.datasets.file_fcs()
```

Create an `Artifact`:

```python
artifact = ln.Artifact(filepath, key="my_fcs_file.fcs").save()
```

```python
assert artifact.hash == "rCPvmZB19xs4zHZ7p_-Wrg"
assert artifact.run == ln.context.run
assert not artifact.recreating_runs.exists()
```

Create an `Artifact` from the same path:

```python
artifact2 = ln.Artifact(filepath, key="my_fcs_file.fcs")
```

It gives us the existing object:

```python
assert artifact.id == artifact2.id
assert artifact.run == artifact2.run
assert not artifact.recreating_runs.exists()
```

If you save it again, nothing will happen (the operation is idempotent):

```python
artifact2.save()
```

In the hidden cell below, you'll see how this interplays with data lineage.

```python
ln.track(new_run=True)
artifact3 = ln.Artifact(filepath, key="my_fcs_file.fcs")
assert artifact3.id == artifact2.id
assert artifact3.run == artifact2.run != ln.context.run  # run is not updated
assert artifact2.recreating_runs.first() == ln.context.run
```

```python
!rm -rf ./test-idempotency
!lamin delete --force test-idempotency
```


================================================
FILE: docs/faq/import-modules.md
================================================
---
execute_via: python
---

# What happens if I import a schema module without lamindb?

```python
# !pip install 'lamindb[bionty]'
!lamin init --storage testmodule --modules bionty
```

Upon `import`, nothing yet happens:

```python
import bionty as bt
```

If you try to access an attribute (other than `model`), you'll load the instance in the same way as calling `import lamindb`.

Under the hood, `lamindb` is imported!

```python
assert bt.Organism(name="human") is not None
```

```python
!lamin delete --force testmodule
```


================================================
FILE: docs/faq/keep-artifacts-local.md
================================================
---
execute_via: python
---

# Keep artifacts local in a cloud instance

If you want to default to keeping artifacts local in a cloud instance, enable {attr}`~lamindb.setup.core.InstanceSettings.keep_artifacts_local`.

Let us first create a cloud instance that woul store artifacts exclusively on S3.

```python
!lamin login testuser1
!lamin init --storage s3://lamindb-ci/keep-artifacts-local
```

Let's import lamindb and track the current notebook run.

```python
# pip install lamindb
import lamindb as ln

ln.track("l9lFf83aPwRc")
```

## Toggling setting "keep artifacts local"

You can checkmark the "Keep artifacts local" box on the instance settings tab.

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/6Kt20kV5sQIFyV0Q0000.png" width="400px">

Or toggle it through the following instance setting.

```python
ln.setup.settings.instance.keep_artifacts_local = True
```

## Create a local storage location

Call the following for a -- potentially pre-existing -- root path and a unique host identifier.

```python
ln.Storage(root="./our_local_storage", host="abc-institute-drive1").save()
```

Now, you have two storage locations: one in the S3 bucket, and the other locally.

```python
ln.Storage.to_dataframe()
```

You can now set it as a local default storage location.
Next time you connect to the instance, this won't be necessary and the location will be automatically detected as the local default.

```python
ln.settings.local_storage = "./our_local_storage"
```

## Use a local storage location

If you save an artifact in keep-artifacts-local mode, by default, it's stored in local storage.

```python
original_filepath = ln.examples.datasets.file_fcs()
artifact = ln.Artifact(original_filepath, key="example_datasets/file1.fcs").save()
local_path = artifact.path  # local storage path
local_path
```

You'll see the `.fcs` file named by the `uid` in your `.lamindb/` directory under `./our_local_storage/`:

```python
assert artifact.path.exists()
assert artifact.path.as_posix().startswith(ln.settings.local_storage.root.as_posix())
ln.settings.local_storage.root.view_tree()
```

## Pre-existing artifacts

Assume you already have a file in your local storage location:

```python
file_in_local_storage = ln.examples.datasets.file_bam()
file_in_local_storage.rename("./our_local_storage/output.bam")
ln.UPath("our_local_storage/").view_tree()
```

When registering an artifact for it, it remains where it is.

```python
my_existing_file = ln.Artifact("./our_local_storage/output.bam").save()
ln.UPath("our_local_storage/").view_tree()
```

The storage path of the artifact matches the pre-existing file:

```python
my_existing_file.path
```

## Switching between local storage locations

You might have several local storage locations. Here is how you can switch between them.

```python
ln.Storage(root="./our_local_storage2", host="abc-institute-drive1").save()
ln.settings.local_storage = "./our_local_storage2"  # switch to the new storage location
```

Ingest a file into the new local storage location.

```python
filepath = ln.examples.datasets.file_fastq()
artifact3 = ln.Artifact(filepath, key="example_datasets/file.fastq.gz").save()
```

Inspect where all the files are.

```python
ln.Artifact.to_dataframe(include=["storage__root", "storage__region"])
```

## Upload a local artifact to the cloud

If you'd like to upload an artifact to the cloud storage location to more easily share it or view it through web applications, you pass `upload=True` to the `save()` method.

```python
artifact.save(upload=True)
```

You now see the artifact in the S3 bucket:

```python
ln.settings.storage.root.view_tree()
```

And it's no longer present in local storage:

```python
assert artifact.path.exists()
assert not local_path.exists()
assert artifact.path.as_posix().startswith(ln.settings.storage.root.as_posix())
ln.settings.local_storage.root.view_tree()
```

## Upload directly to the cloud

You can also directly upload via `upload=True`:

```python
filepath = ln.examples.datasets.file_mini_csv()
artifact2 = ln.Artifact(filepath, key="example_datasets/mini.csv").save(upload=True)
artifact2.path
```

Now we have two files on S3:

```python
ln.Artifact.to_dataframe(include="storage__root")
```

## Update storage description

You can add a description to the storage location by using the `description` field.

```python
storage_record = ln.Storage.get(root__endswith="our_local_storage")
storage_record.description = "Our shared directory for project X"
storage_record.save()
ln.Storage.to_dataframe()
```

## Delete the test instance

Delete the artifacts:

```python
artifact.delete(permanent=True)
artifact2.delete(permanent=True)
artifact3.delete(permanent=True)
my_existing_file.delete(permanent=True, storage=False)
```

Delete the instance:

```python
ln.setup.delete("keep-artifacts-local", force=True)
```


================================================
FILE: docs/faq/pydantic-pandera.md
================================================
---
execute_via: python
---

# Pydantic & Pandera vs. LaminDB

This doc explains conceptual differences between data validation with `pydantic`, `pandera`, and `LaminDB`.

```python
!lamin init --storage test-pydantic-pandera --modules bionty
```

Let us work with a test dataframe.

```python
import pandas as pd
import pydantic
import lamindb as ln
import bionty as bt
import pandera.pandas as pandera
import pprint

from typing import Literal, Any

df = ln.examples.datasets.mini_immuno.get_dataset1()
df
```

## Define a schema

### pydantic

```python
Perturbation = Literal["DMSO", "IFNG"]
CellType = Literal["T cell", "B cell"]
OntologyID = Literal["EFO:0008913"]


class ImmunoSchema(pydantic.BaseModel):
    perturbation: Perturbation
    cell_type_by_model: CellType
    cell_type_by_expert: CellType
    assay_oid: OntologyID
    concentration: str
    treatment_time_h: int
    donor: str | None

    class Config:
        title = "My immuno schema"
```

### pandera

```python
pandera_schema = pandera.DataFrameSchema(
    {
        "perturbation": pandera.Column(
            str, checks=pandera.Check.isin(["DMSO", "IFNG"])
        ),
        "cell_type_by_model": pandera.Column(
            str, checks=pandera.Check.isin(["T cell", "B cell"])
        ),
        "cell_type_by_expert": pandera.Column(
            str, checks=pandera.Check.isin(["T cell", "B cell"])
        ),
        "assay_oid": pandera.Column(str, checks=pandera.Check.isin(["EFO:0008913"])),
        "concentration": pandera.Column(str),
        "treatment_time_h": pandera.Column(int),
        "donor": pandera.Column(str, nullable=True),
    },
    name="My immuno schema",
)
```

### LaminDB

Features & labels are defined on the level of the database instance.
You can either define a schema with required (and optional) columns.

```python
ln.Record(name="DMSO").save()
ln.Record(name="IFNG").save()

# leverage ontologies through types ln.Record, bt.CellType, bt.ExperimentalFactor
lamindb_schema = ln.Schema(
    name="My immuno schema",
    features=[
        ln.Feature(name="perturbation", dtype=ln.Record).save(),
        ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save(),
        ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save(),
        ln.Feature(name="assay_oid", dtype=bt.ExperimentalFactor.ontology_id).save(),
        ln.Feature(name="concentration", dtype=str).save(),
        ln.Feature(name="treatment_time_h", dtype=int).save(),
        ln.Feature(name="donor", dtype=str, nullable=True).save(),
    ],
).save()
```

Or merely define a constraint on the feature identifier.

```python
lamindb_schema_only_itype = ln.Schema(
    name="Allow any valid features & labels", itype=ln.Feature
)
```

## Validate a dataframe

### pydantic

```python
class DataFrameValidationError(Exception):
    pass


def validate_dataframe(df: pd.DataFrame, model: type[pydantic.BaseModel]):
    errors = []

    for i, row in enumerate(df.to_dict(orient="records")):
        try:
            model(**row)
        except pydantic.ValidationError as e:
            errors.append(f"row {i} failed validation: {e}")

    if errors:
        error_message = "\n".join(errors)
        raise DataFrameValidationError(
            f"DataFrame validation failed with the following errors:\n{error_message}"
        )
```

```python
try:
    validate_dataframe(df, ImmunoSchema)
except DataFrameValidationError as e:
    print(e)
```

To fix the validation error, we need to update the `Literal` and re-run the model definition.

```python
Perturbation = Literal["DMSO", "IFNG"]
CellType = Literal[
    "T cell", "B cell", "CD8-positive, alpha-beta T cell"  # <-- updated
]
OntologyID = Literal["EFO:0008913"]


class ImmunoSchema(pydantic.BaseModel):
    perturbation: Perturbation
    cell_type_by_model: CellType
    cell_type_by_expert: CellType
    assay_oid: OntologyID
    concentration: str
    treatment_time_h: int
    donor: str | None

    class Config:
        title = "My immuno schema"
```

```python
validate_dataframe(df, ImmunoSchema)
```

### pandera

```python
try:
    pandera_schema.validate(df)
except pandera.errors.SchemaError as e:
    print(e)
```

### LaminDB

Because the term `"CD8-positive, alpha-beta T cell"` is part of the public `CellType` ontology, validation passes the first time.

If validation had not passed, we could have resolved the issue simply by adding a new term to the `CellType` registry rather than editing the code.
This also puts downstream data scientists into a position to update ontologies.

```python
curator = ln.curators.DataFrameCurator(df, lamindb_schema)
curator.validate()
```

What was the cell type validation based on? Let's inspect the `CellType` registry.

```python
bt.CellType.to_dataframe()
```

The `CellType` regsitry is hierachical as it contains the Cell Ontology.

```python
bt.CellType.get(name="CD8-positive, alpha-beta T cell").view_parents()
```

## Overview of validation properties

Importantly, LaminDB offers not only a `DataFrameCurator`, but also a `AnnDataCurator`, `MuDataCurator`, `SpatialDataCurator`, and `TiledbsomaCurator`.

The below overview only concerns validating dataframes.

### Experience of data engineer

| property                                                                                                                       | `pydantic`                                            | `pandera`                                             | `lamindb`                                                                             |
| ------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------- |
| define schema as code                                                                                                          | yes, in form of a `pydantic.BaseModel`                | yes, in form of a `pandera.DataFrameSchema`           | yes, in form of a `lamindb.Schema`                                                    |
| define schema as a set of constraints without the need of listing fields/columns/features; e.g. useful if validating 60k genes | no                                                    | no                                                    | yes                                                                                   |
| update labels independent of code                                                                                              | not possible because labels are enums/literals        | not possible because labels are hard-coded in `Check` | possible by adding new terms to a registry                                            |
| built-in validation from public ontologies                                                                                     | no                                                    | no                                                    | yes                                                                                   |
| sync labels with ELN/LIMS registries without code change                                                                       | no                                                    | no                                                    | yes                                                                                   |
| can re-use fields/columns/features across schemas                                                                              | limited via subclass                                  | only in same Python session                           | yes because persisted in database                                                     |
| schema modifications can invalidate previously validated datasets                                                              | yes                                                   | yes                                                   | no because LaminDB allows to query datasets that were validated with a schema version |
| can use columnar organization of dataframe                                                                                     | no, need to iterate over potentially millions of rows | yes                                                   | yes                                                                                   |

### Experience of data consumer

| property                                    | `pydantic`                                                                    | `pandera`             | `lamindb`                              |
| ------------------------------------------- | ----------------------------------------------------------------------------- | --------------------- | -------------------------------------- |
| dataset is queryable / findable             | no                                                                            | no                    | yes, by querying for labels & features |
| dataset is annotated                        | no                                                                            | no                    | yes                                    |
| user knows what validation constraints were | no, because might not have access to code and doesn't know which code was run | no (same as pydantic) | yes, via `artifact.schema`             |

## Annotation & queryability

### Engineer: annotate the dataset

Either use the `Curator` object:

```python
artifact = curator.save_artifact(key="our_datasets/dataset1.parquet")
```

If you don't expect a need for Curator functionality for updating ontologies and standardization, you can also use the `Artifact` constructor.

```python
artifact = ln.Artifact.from_dataframe(
    df, key="our_datasets/dataset1.parquet", schema=lamindb_schema
).save()
```

### Consumer: see annotations

```python
artifact.describe()
```

### Consumer: query the dataset

```python
ln.Artifact.filter(perturbation="IFNG").to_dataframe()
```

### Consumer: understand validation

By accessing `artifact.schema`, the consumer can understand _how_ the dataset was validated.

```python
artifact.schema
```

```python
artifact.schema.features.to_dataframe()
```

## Nested data with dynamic keys

We will now examine another more complex example where data is nested with potentially arbitrary (dynamic) keys.
The example is inspired by the [CELLxGENE schema](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/6.0.0/schema.md#uns-dataset-metadata) where annotations are stored as dictionaries in the AnnData `.uns` slot.

```python
uns_dict = ln.examples.datasets.dict_cellxgene_uns()
pprint.pprint(uns_dict)
```

### pydantic

Pydantic is primed to deal with nested data.

```python
class Images(pydantic.BaseModel):
    fullres: str
    hires: str


class Scalefactors(pydantic.BaseModel):
    spot_diameter_fullres: float
    tissue_hires_scalef: float


class Library(pydantic.BaseModel):
    images: Images
    scalefactors: Scalefactors


class Spatial(pydantic.BaseModel):
    is_single: bool
    model_config = {"extra": "allow"}

    def __init__(self, **data):
        libraries = {}
        other_fields = {}

        # store all libraries under a single key for validation
        for key, value in data.items():
            if key.startswith("library_"):
                libraries[key] = Library(**value)
            else:
                other_fields[key] = value

        other_fields["libraries"] = libraries
        super().__init__(**other_fields)


class SpatialDataSchema(pydantic.BaseModel):
    organism_ontology_term_id: str
    spatial: Spatial


validated_data = SpatialDataSchema(**uns_dict)
```

However, pydantic either requires all dictionary keys to be known beforehand to construct the Model classes or workarounds to collect all keys for a single model.

### pandera

Pandera cannot validate dictionaries because it is designed for structured dataframe data.
Therefore, we need to flatten the dictionary to transform it into a DataFrame:

```python
def _flatten_dict(d: dict[Any, Any], parent_key: str = "", sep: str = "_"):
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(_flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)
```

```python
def create_dynamic_schema(flattened_data: dict[str, Any]):
    schema_dict = {
        "organism_ontology_term_id": pandera.Column(str),
        "spatial_is_single": pandera.Column(bool),
    }

    for key in flattened_data.keys():
        if key.startswith("spatial_library_") and key.endswith("_images_fullres"):
            lib_prefix = key.replace("_images_fullres", "")
            schema_dict.update(
                {
                    f"{lib_prefix}_images_fullres": pandera.Column(str),
                    f"{lib_prefix}_images_hires": pandera.Column(str),
                    f"{lib_prefix}_scalefactors_spot_diameter_fullres": pandera.Column(
                        float
                    ),
                    f"{lib_prefix}_scalefactors_tissue_hires_scalef": pandera.Column(
                        float
                    ),
                }
            )

    return pandera.DataFrameSchema(schema_dict)


flattened = _flatten_dict(uns_dict)
df = pd.DataFrame([flattened])
spatial_schema = create_dynamic_schema(flattened)
validated_df = spatial_schema.validate(df)
```

Analogously to pydantic, pandera does not have out of the box support for dynamically named keys.
Therefore, it is necessary to dynamically construct a pydantic schema.

### LaminDB

Similarly, LaminDB currently requires constructing flattened dataframes to dynamically create features for the schema, which can then be used for validation with the DataFrameCurator.
Future improvements are expected, including support for a dictionary-specific curator.

```python
def create_dynamic_schema(flattened_data: dict[str, Any]) -> ln.Schema:
    features = []

    for key, value in flattened_data.items():
        if key == "organism_ontology_term_id":
            features.append(ln.Feature(name=key, dtype=bt.Organism.ontology_id).save())
        elif isinstance(value, bool):
            features.append(ln.Feature(name=key, dtype=bool).save())
        elif isinstance(value, (int, float)):
            features.append(ln.Feature(name=key, dtype=float).save())
        else:
            features.append(ln.Feature(name=key, dtype=str).save())

    return ln.Schema(name="Spatial data schema", features=features, coerce=True).save()


flattened = _flatten_dict(uns_dict)
flattened_df = pd.DataFrame([flattened])
spatial_schema = create_dynamic_schema(flattened)
curator = ln.curators.DataFrameCurator(flattened_df, spatial_schema)
curator.validate()
```

```{note}
Curators for scverse data structures allow for the specification of schema slots that access and validate dataframes in nested dictionary attributes like `.attrs` or `.uns`.
These schema slots use colon-separated paths like `'attrs:sample'` or `'uns:spatial:images'` to target specific dataframes for validation.
```


================================================
FILE: docs/faq/reference-field.md
================================================
---
execute_via: python
---

# Where to store external links and IDs?

When registering data in LaminDB, you might want to store a reference link or ID to indicate the source of the collection.

We have `reference` and `reference_type` fields for this purpose, they are available for {class}`~lamindb.Collection`, {class}`~lamindb.Transform`, {class}`~lamindb.Run` and {class}`~lamindb.Record`.

```python
# !pip install lamindb
!lamin init --storage testreference
```

```python
import lamindb as ln
```

Let's say we have a few donor samples that came form Vendor X, in order to chase back the orders, I'd like to keep track the donor ids provided by the vendor:

```python
ln.Record(
    name="donor 001", reference="VX984545", reference_type="Donor ID from Vendor X"
)
```

```python
!lamin delete --force testreference
```


================================================
FILE: docs/faq/search.md
================================================
---
execute_via: python
---

# How does search work?

```python
from laminci.db import setup_local_test_postgres

pgurl = setup_local_test_postgres()
!lamin init --name benchmark_search --db {pgurl} --modules bionty --storage ./benchmark_search
```

Here we show how to perform text search on `SQLRecord` and evaluate some search queries for the {class}`bionty.CellType` ontology.

```python
import lamindb as ln
import bionty as bt

SEARCH_QUERIES_EXACT = (
    "t cell",
    "stem cell",
    "b cell",
    "regulatory B cell",
    "Be2 cell",
    "adipocyte",
)
SEARCH_QUERIES_CONTAINS = ("t cel", "t-cel", "neural", "kidney", "kidne")
TOP_N = 20

bt.CellType.import_source()
```

```python
ln.Record(name="cat[*_*]").save()
```

## Search the registry

```python
for query in SEARCH_QUERIES_EXACT:
    print("Query:", query)
    qs = bt.CellType.search(query)
    display(qs.to_dataframe())

    assert query.lower() == qs[0].name.lower()
```

```python
for query in SEARCH_QUERIES_CONTAINS:
    print("Query:", query)
    qs = bt.CellType.search(query)
    display(qs.to_dataframe())

    top_record = qs[0]
    query = query.lower()
    assert query in top_record.name.lower() or query in top_record.synonyms.lower()
```

Check escaping of special characters.

```python
assert len(ln.Record.search("cat[")) == 1
```

```python
assert len(ln.Record.search("*_*")) == 1
```

## Search the public ontology

```python
ct_public = bt.CellType.public()

df = ct_public.search("b cell", limit=20)
assert df.iloc[0]["name"] == "B cell"
df
```

```python
!docker stop pgtest && docker rm pgtest
!lamin delete --force benchmark_search
```


================================================
FILE: docs/faq/symbol-mapping.md
================================================
---
execute_via: python
---

# Why should I not index datasets with gene symbols?

Gene symbols are widely used for readability, particularly for visualization. However, indexing datasets with gene symbols presents challenges:

- A single gene may have multiple symbols or aliases.
- Gene symbols change over time (e.g., _BRCA2_ was once _FACD_) without version tracking.
- The same symbol can represent different genes across species.
- Symbols may be misinterpreted by software (e.g., _SEPT9_ as "September 9" in Excel).
- Formatting inconsistencies exist (e.g., case sensitivity, special characters).

Using unique identifiers like ENSEMBL gene IDs addresses these issues by providing:

- A direct, stable mapping to genomic coordinates.
- Consistency across databases.
- Species-specific prefixes to prevent cross-species confusion.
- Unique, permanent identifiers with standardized formatting.

Storing ENSEMBL gene IDs alongside gene symbols offers readability for visualization while maintaining robust data integrity. During curation, validating against ENSEMBL gene IDs ensures accurate mapping.

If only symbols are available for a dataset, you can map them to ENSEMBL IDs using {meth}`~bionty.Gene.standardize`.

```python
# !pip install 'lamindb[bionty]'
!lamin init --storage test-symbols --modules bionty
```

```python
import lamindb as ln
import bionty as bt
import numpy as np
import pandas as pd
import anndata as ad

# create example AnnData object with gene symbols
rng = np.random.default_rng(42)
X = rng.integers(0, 100, size=(5, 10))
var = pd.DataFrame(
    index=pd.Index(
        [
            "BRCA1",
            "TP53",
            "EGFR",
            "KRAS",
            "PTEN",
            "MYC",
            "VEGFA",
            "IL6",
            "TNF",
            "GAPDH",
        ],
        name="symbol",
    )
)
adata = ad.AnnData(X=X, var=var)
adata.var
```

```python
# map Gene symbols to ENSEMBL IDs
gene_mapper = bt.Gene.standardize(
    adata.var.index,
    field=bt.Gene.symbol,
    return_field=bt.Gene.ensembl_gene_id,
    return_mapper=True,
    organism="human",
)
adata.var["ensembl_id"] = adata.var.index.map(
    lambda gene_id: gene_mapper.get(gene_id, gene_id)
)
adata.var
```

```python
standardized_genes = bt.Gene.from_values(
    [
        "ENSG00000141510",
        "ENSG00000133703",
        "ENSG00000111640",
        "ENSG00000171862",
        "ENSG00000204490",
        "ENSG00000112715",
        "ENSG00000146648",
        "ENSG00000136997",
        "ENSG00000012048",
        "ENSG00000136244",
    ],
    field=bt.Gene.ensembl_gene_id,
    organism="human",
)
ln.save(standardized_genes)
```

This allows for validating the the `ensembl_id` against the `Gene` registry using the `bt.Gene.ensembl_gene_id` field.

```python
bt.Gene.validate(adata.var["ensembl_id"], field=bt.Gene.ensembl_gene_id)
```

```{note}
Gene symbols do not map one-to-one with ENSEMBL IDs. A single gene symbol may correspond to multiple ENSEMBL IDs due to:

1. **Gene Paralogs**: Similar symbols can be shared among paralogous genes within the same species, resulting in one symbol linking to multiple ENSEMBL IDs.
2. **Pseudogenes**: Some symbols represent both functional genes and their non-functional pseudogenes, each with distinct ENSEMBL IDs.
3. **Transcript Variants**: One symbol may map to multiple ENSEMBL transcript IDs, each representing different isoforms or splice variants.

{meth}`~bionty.Gene.standardize` retrieves the first match in cases of multiple hits, which is generally sufficient but not perfectly accurate.
```

```python
!lamin delete --force test-symbols
```


================================================
FILE: docs/faq/test_notebooks.py
================================================
from pathlib import Path

import nbproject_test as test

import lamindb as ln


def test_notebooks():
    nbdir = Path(__file__).parent
    ln.setup.login("testuser1")
    ln.setup.init(storage=nbdir / "mydata")
    test.execute_notebooks(nbdir, write=True)


================================================
FILE: docs/faq/track-run-inputs.md
================================================
---
execute_via: python
---

# Can I disable tracking run inputs?

Yes, if you switch {attr}`~lamindb.core.Settings.track_run_inputs` to `False`.

```python
# pip install lamindb
!lamin init --storage test-run-inputs
```

```python
import lamindb as ln
```

Some test artifacts:

```python
ln.track(transform=ln.Transform(key="Dummpy pipeline"))
ln.Artifact(ln.examples.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact(ln.examples.datasets.file_mini_csv(), description="My csv").save()
```

Call `ln.track()`:

```python
ln.track("Rx2s9aPTMQLY0000")
```

## Don't track artifact as run input

```python
ln.settings.track_run_inputs = False
```

```python
artifact = ln.Artifact.get(description="My image")
```

```python
artifact.cache()
```

No run inputs are linked to the current notebook run:

```python
ln.Run.get(id=ln.context.run.id).input_artifacts.all()
```

```python
artifact.view_lineage()
```

```python
assert len(ln.Run.get(id=ln.context.run.id).input_artifacts.all()) == 0
```

## Manually track artifact as run input

Let us manually track an artifact by passing `is_run_input` to either `.cache()`, `.load()` or `.open()`:

```python
artifact.cache(is_run_input=True)
```

You can see the fcs artifact is now being added to the run inputs:

```python
for input in ln.Run.get(id=ln.context.run.id).input_artifacts.all():
    print(input)
```

```python
artifact.view_lineage()
```

```python
assert len(ln.Run.get(id=ln.context.run.id).input_artifacts.all()) == 1
```

## Automatically track artifacts as run input

If you switch the following setting, and call to `.load()`, `.cache()` and `.open()` will track the artifact as run input.

```python
ln.settings.track_run_inputs = True
```

```python
artifact = ln.Artifact.get(description="My csv")
```

```python
artifact.load()
```

```python
for input in ln.Run.get(id=ln.context.run.id).input_artifacts.all():
    print(input)
```

```python
artifact.view_lineage()
```

```python
assert len(ln.Run.get(id=ln.context.run.id).input_artifacts.all()) == 2
```

```python
!lamin delete --force test-run-inputs
```


================================================
FILE: docs/faq/trash-archive.md
================================================
# How do I trash or archive objects?

Any object in LaminDB has the following 3 levels of visibility through 3 default branches:

- `main`: visible
- `archive`: excluded from query & search
- `trash`: excluded from query & search, scheduled for deletion

Let's look at an example for an `Artifact` object while noting that the same applies to any other `SQLRecord`.

```python
import lamindb as ln
import pandas as pd

df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
artifact = ln.Artifact.from_dataframe(df, key="dataset.parquet").save()
```

An artifact is by default created on the `main` branch.

```python
assert artifact.branch.name == "main"
ln.Artifact.filter(key="dataset.parquet").to_dataframe()
# the artifact shows up
```

If you delete an artifact, it gets moved into the `trash` branch.

```python
artifact.delete()
assert artifact.branch.name == "trash"
```

Artifacts in trash won't show up in queries with default arguments:

```python
ln.Artifact.filter(key="dataset.parquet").to_dataframe()
# the artifact does not show up
```

You can query for them by adding the `trash` branch to the filter.

```python
ln.Artifact.filter(key="dataset.parquet", branch__name="trash").to_dataframe()
# the artifact shows up
```

You can restore an artifact from trash:

```python
artifact.restore()
ln.Artifact.filter(key="dataset.parquet").to_dataframe()
# the artifact shows up
```


================================================
FILE: docs/faq/validate-fields.md
================================================
---
execute_via: python
---

# Django field validation

[Django field validation](https://docs.djangoproject.com/en/5.1/ref/validators/) are enabled for models that inherit the `ValidateFields` class.

```python
# pip install lamindb
!lamin init --storage ./test-django-validation
```

```python
import lamindb as ln
from lamindb.core.exceptions import FieldValidationError
```

```python
try:
    ln.Reference(name="my ref", doi="abc.ef", url="myurl.com")
except FieldValidationError as e:
    print(e)
```

```python
!lamin delete --force test-django-validation
```


================================================
FILE: docs/faq.md
================================================
# FAQ

```{toctree}
:maxdepth: 1

faq/pydantic-pandera
faq/idempotency
faq/acid
faq/track-run-inputs
faq/curate-any
faq/import-modules
faq/reference-field
faq/trash-archive
faq/keep-artifacts-local
faq/validate-fields
faq/symbol-mapping
faq/search
```


================================================
FILE: docs/guide.md
================================================
# Guide

```{toctree}
:hidden:
:caption: "Overview"

README
```

```{toctree}
:hidden:
:caption: "How to"

query-search
track
organize
manage-changes
manage-ontologies
sync
```

```{toctree}
:hidden:
:caption: Use cases

lightning
```

```{toctree}
:hidden:
:caption: Other topics

faq
storage
```


================================================
FILE: docs/index.md
================================================
```{include} ../README.md
:start-line: 0
:end-line: 5
```

<meta http-equiv="Refresh" content="0; url=./guide.html" />

```{toctree}
:maxdepth: 1
:hidden:

guide
api
changelog
```


================================================
FILE: docs/lightning.md
================================================
# Lightning

This guide offers more context on the {class}`lamindb.integrations.lightning.Checkpoint` callback. For end-to-end examples, see the following guides:

- {doc}`docs:clearml`
- {doc}`docs:wandb`
- {doc}`docs:mlflow`

## Quickstart

Pass `ll.Checkpoint` and a logger into `Trainer`. The logger is what gives
checkpoints meaningful, namespaced artifact keys — without it, keys fall back
to a bare `checkpoints/` prefix (or just the run UID when `ln.track()` is
active).

Any logger implementing Lightning's `Logger` interface works (`TensorBoardLogger`,
`WandbLogger`, `MLFlowLogger`, `CSVLogger`, etc.). We use `TensorBoardLogger`
in the examples below.

```python
import lamindb as ln
import lightning.pytorch as pl
from lightning.pytorch.loggers import TensorBoardLogger
from lamindb.integrations import lightning as ll

ln.track()

logger = TensorBoardLogger(save_dir="logs")
checkpoint = ll.Checkpoint(monitor="val_loss", mode="min", save_top_k=3)

trainer = pl.Trainer(
    max_epochs=10,
    callbacks=[checkpoint],
    logger=logger,
)
trainer.fit(model, datamodule=datamodule)
```

After training, each saved checkpoint file is a LaminDB artifact:

```python
checkpoint.last_checkpoint_artifact
checkpoint.last_checkpoint_artifact.key
# e.g. "logs/lightning_logs/2r5pIRnK7z0q/checkpoints/epoch=0-step=100.ckpt"

checkpoint.checkpoint_key_prefix
# e.g. "logs/lightning_logs/2r5pIRnK7z0q/checkpoints"
```

### How is a run organized?

A Lightning `Trainer` coordinates three concerns during training:

1. **Logger** — writes metrics (loss curves, learning rate, etc.) to a dashboard directory. The logger determines the local directory layout: `{save_dir}/{name}/{version}/`.
2. **ModelCheckpoint** — saves model snapshots (`.ckpt` files) into a `checkpoints/` subdirectory underneath the logger's directory.
3. **SaveConfigCallback** — when using `LightningCLI`, writes the fully resolved `config.yaml` into the logger's directory so you can reproduce exactly which hyperparameters were used.

All three share the same directory tree. The logger creates it, the checkpoint callback writes into it, and the config callback stores beside it:

```
logs/                          # logger save_dir
  lightning_logs/              # logger name
    version_0/                 # logger version (local filesystem)
      events.out.tfevents.*    # ← logger output (TensorBoard)
      config.yaml              # ← SaveConfigCallback
      checkpoints/
        epoch=0-step=100.ckpt  # ← ModelCheckpoint
        epoch=1-step=200.ckpt
        hparams.yaml           # ← auto-generated by Lightning
```

LaminDB's integration replaces `ModelCheckpoint` with `ll.Checkpoint` and
Lightning's `SaveConfigCallback` with `ll.SaveConfigCallback`. Checkpoint
files, the config, and `hparams.yaml` become `lamindb.Artifact` records with
lineage tracking and optional feature annotations.

Note that artifact keys in LaminDB do **not** mirror the local directory layout
exactly — the callback uses the LaminDB run UID instead of Lightning's
auto-incrementing `version_N` directory by default. See
[How artifact keys are derived](#how-artifact-keys-are-derived) for details.

### Which kind of artifacts?

`Checkpoint` saves three kinds of artifacts:

| Kind         | Example key                           | When                                     |
| ------------ | ------------------------------------- | ---------------------------------------- |
| `checkpoint` | `…/checkpoints/epoch=0-step=100.ckpt` | Every time Lightning writes a checkpoint |
| `config`     | `…/config.yaml`                       | When using `ll.SaveConfigCallback`       |
| `hparams`    | `…/checkpoints/hparams.yaml`          | When Lightning generates it              |

Checkpoints and `hparams.yaml` live under the `checkpoints/` subdirectory,
while the config sits directly under the base prefix.

The callback tracks the latest artifact of each kind:

```python
checkpoint.last_checkpoint_artifact
checkpoint.last_config_artifact
checkpoint.last_hparams_artifact
checkpoint.last_artifact_event
```

### How is data lineage tracked?

When a run is being tracked with `ln.track()`:

- `checkpoint` artifacts are recorded as **run outputs** — they are produced by the training run.
- `config` artifacts are recorded as **run inputs** — the resolved config is part of the run specification.
- `hparams.yaml` is saved as an artifact but not linked as a run input.

## How are artifact keys derived?

LaminDB artifact keys are **not** necessarily a mirror of the local filesystem layout.
Lightning uses auto-incrementing version directories (`version_0`, `version_1`,
…) on disk, but these are meaningless as artifact identifiers — they depend on
what already exists locally and cannot reliably distinguish runs across
machines.

Instead, when `ln.track()` is active, the callback uses the **LaminDB run UID**
as the version segment by default (`run_uid_is_version=True`). This guarantees
that every tracked run produces unique artifact keys regardless of local state.

The base prefix is determined by priority:

| Scenario                 | Base prefix                            |
| ------------------------ | -------------------------------------- |
| `dirpath` set (± logger) | `{dirpath}/{run_uid}`                  |
| No `dirpath` + logger    | `{save_dir_basename}/{name}/{run_uid}` |
| No `dirpath` + no logger | `{run_uid}`                            |

`run_uid` above refers to the active LaminDB run UID (from `ln.context.run.uid`).
When no run is tracked or `run_uid_is_version=False`, the callback falls back
to the logger's own version (e.g. `version_0`) or omits the segment entirely.

**Checkpoint & hparams keys:**

| Scenario                      | LaminDB key pattern                                           |
| ----------------------------- | ------------------------------------------------------------- |
| Logger present (recommended)  | `{save_dir_basename}/{name}/{run_uid}/checkpoints/{filename}` |
| No logger, explicit `dirpath` | `{dirpath}/{run_uid}/checkpoints/{filename}`                  |
| No logger, no `dirpath`       | `{run_uid}/checkpoints/{filename}`                            |

**Config keys:**

| Scenario                      | Key pattern                                        |
| ----------------------------- | -------------------------------------------------- |
| Logger present                | `{save_dir_basename}/{name}/{run_uid}/config.yaml` |
| No logger, explicit `dirpath` | `{dirpath}/{run_uid}/config.yaml`                  |
| No logger, no `dirpath`       | `{run_uid}/config.yaml`                            |

For example, with `TensorBoardLogger(save_dir="logs")` and a tracked run:

```
logs/lightning_logs/2r5pIRnK7z0q/       # base prefix ({save_dir_basename}/{name}/{run_uid})
  config.yaml                            # ← config artifact
  checkpoints/
    epoch=0-step=100.ckpt                # ← checkpoint artifact
    hparams.yaml                         # ← hparams artifact
```

### Opting out of run UID keys

Pass `run_uid_is_version=False` to fall back to the logger-managed version
directory, matching Lightning's local layout more closely:

```python
checkpoint = ll.Checkpoint(
    monitor="val_loss",
    run_uid_is_version=False,
)
```

With this setting, the key uses the logger's version (`version_0`, etc.)
instead of the run UID. This is mainly useful when you don't call `ln.track()`
or when you want artifact keys that exactly mirror the local directory tree.

### Why run UIDs instead of `version_N`?

Lightning's auto-incrementing `version_N` depends on what directories already
exist at `save_dir`. Two runs on different machines — or the same machine after
clearing `logs/` — can both produce `version_0`. With `run_uid_is_version=True`
(the default), each tracked run gets a unique prefix derived from the Lamin
run, so artifact keys never collide.

## Use with the Lightning CLI

The Lightning CLI resolves a YAML config into concrete model and data module
arguments. To also store that resolved config as a LaminDB artifact, pass
`ll.SaveConfigCallback` in your training script and declare the trainer,
logger, callbacks, model, and data in a config file.

**`config.yaml`**

```yaml
trainer:
  max_epochs: 10

  logger:
    class_path: lightning.pytorch.loggers.TensorBoardLogger
    init_args:
      save_dir: logs

  callbacks:
    - class_path: lamindb.integrations.lightning.Checkpoint
      init_args:
        monitor: val/loss
        mode: min
        save_top_k: 3

model:
  learning_rate: 1.0e-3

data:
  batch_size: 64
```

**`train.py`**

```python
import lamindb as ln
from lightning.pytorch.cli import LightningCLI
from lamindb.integrations.lightning import SaveConfigCallback

ln.track()

def cli_main() -> None:
    LightningCLI(
        model_class=MyModel,
        datamodule_class=MyDataModule,
        save_config_callback=SaveConfigCallback,
    )

if __name__ == "__main__":
    cli_main()
```

```bash
python train.py fit --config config.yaml
```

`ll.SaveConfigCallback` extends Lightning's built-in version: it writes the
local file as usual and then delegates to whichever
`ArtifactPublishingModelCheckpoint` is registered on the trainer to persist the
config as an artifact.

## Annotating with features

Attach custom run-level and artifact-level feature values through `features=`:

```python
logger = TensorBoardLogger(save_dir="logs")
checkpoint = ll.Checkpoint(
    monitor="val_loss",
    features={
        "run": {"training_framework": "lightning"},
        "artifact": {"dataset_version": "2026-03"},
    },
)

trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)
```

Feature names must already exist in Lamin.

The callback can also auto-track standard Lightning fields. Create the
corresponding LaminDB features once:

```python
ll.save_lightning_features()
```

This enables auto-features:

- Artifact-level: `is_best_model`, `is_last_model`, `score`, `model_rank`, `save_weights_only`, `monitor`, `mode`
- Run-level: `logger_name`, `logger_version`, `max_epochs`, `max_steps`,
  `precision`, `accumulate_grad_batches`, `gradient_clip_val`, `monitor`, `mode`

## Extending the callback

### Subclass `Checkpoint`

Subclass when you want to keep LaminDB persistence and additionally notify an
external system after each artifact is saved:

```python
from lamindb.integrations import lightning as ll
from my_model_registry import ModelRegistry


class ModelRegistryCheckpoint(ll.Checkpoint):
    """Register each checkpoint in an external model registry."""

    def __init__(self, *args, registry_project: str, **kwargs):
        super().__init__(*args, **kwargs)
        self.registry_project = registry_project
        self.model_registry = ModelRegistry()

    def on_artifact_saved(self, event: ll.ArtifactSavedEvent) -> None:
        if event.kind == "checkpoint":
            # register the model in your external system
            self.model_registry.register(
                project=self.registry_project,
                model_uri=event.storage_uri,
                metadata={"lamin_key": event.key},
            )


logger = TensorBoardLogger(save_dir="logs")
checkpoint = ModelRegistryCheckpoint(
    registry_project="my-project",
    monitor="val_loss",
    save_top_k=3,
)
trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)
trainer.fit(model, datamodule=datamodule)
```

Each event gives you:

- `event.kind`: `"checkpoint"`, `"config"`, or `"hparams"`
- `event.artifact`: the persisted LaminDB artifact
- `event.key`: the LaminDB artifact key
- `event.local_path`: the local file path Lightning wrote
- `event.storage_uri`: the stable storage URI for downstream systems

### Attach an observer

Observers are useful when you want composition instead of inheritance:

```python
from lamindb.integrations import lightning as ll


class ArtifactLogger:
    def on_artifact_saved(self, event: ll.ArtifactSavedEvent) -> None:
        print(event.kind, event.storage_uri)

    def on_artifact_removed(self, event: ll.ArtifactRemovedEvent) -> None:
        print("removed", event.key)


logger = TensorBoardLogger(save_dir="logs")
checkpoint = ll.Checkpoint(
    monitor="val_loss",
    artifact_observers=[ArtifactLogger()],
)

trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)
trainer.fit(model, datamodule=datamodule)
```

Observers receive the same events that subclasses see.

## Integrating other systems

To register checkpoints in another system (e.g. ClearML, Weights & Biases,
MLflow, Neptune, or Comet), use the artifact lifecycle events rather than
re-deriving paths from Lightning internals.

The key hand-off value is `event.storage_uri`, which resolves to the persisted
artifact location. `event.artifact` gives you the full LaminDB record when you
need metadata beyond the URI.


================================================
FILE: docs/manage-changes.md
================================================
# Manage changes

Managing changes in LaminDB is largely analogous to managing code changes via branching in git and Pull Requests in GitHub.

For usage examples, read the `Examples` section of the {class}`~lamindb.Branch` class.


================================================
FILE: docs/manage-ontologies.md
================================================
---
execute_via: python
---

# Manage biological ontologies

This guide shows how to manage ontologies for basic biological entities.

```{raw} html
<iframe width="560" height="315" src="https://www.youtube.com/embed/3vpWjHj3Kw8?si=D0jxqL2zB4idh2QA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
```

If instead you're interested in

- accessing public ontologies, see {doc}`docs:public-ontologies`
- flexible bio registries for the wetlab (a LIMS), see {class}`~lamindb.Record` and {doc}`docs:records`

```python
# pip install lamindb
!lamin init --storage ./test-ontologies --modules bionty
```

## Import records from public ontologies

Let's first populate our {class}`~bionty.CellType` registry with the default public ontology (Cell Ontology).

```python
import lamindb as ln
import bionty as bt

# inspect the available public ontology versions
bt.Source.to_dataframe()
```

```python
# inspect which ontology version we're about to import
bt.Source.get(entity="bionty.CellType", currently_used=True)
```

```python
# populate the database with a public ontology
bt.CellType.import_source()
```

This is now your in-house cell type ontology in which you can add & modify records as you like. It's a registry just like `Artifact` or `Record`.

```python
# all public cell types are now available in LaminDB
bt.CellType.to_dataframe()
```

```python
# let's also populate the Gene registry with human and mouse genes
bt.Gene.import_source(organism="human")
bt.Gene.import_source(organism="mouse")
```

## Access records in in-house registries

Search key words:

```python
bt.CellType.search("gamma-delta T").to_dataframe().head(2)
```

Or look up with auto-complete:

```python
cell_types = bt.CellType.lookup()
hsc_record = cell_types.hematopoietic_stem_cell
hsc_record
```

Filter by fields and relationships:

```python
gdt_cell = bt.CellType.get(ontology_id="CL:0000798", created_by__handle="testuser1")
gdt_cell
```

View the ontological hierarchy:

```python
gdt_cell.view_parents()  # pass with_children=True to also view children
```

Or access the parents and children directly:

```python
gdt_cell.parents.to_dataframe()
```

```python
gdt_cell.children.to_dataframe()
```

It is also possible to recursively query parents or children, getting direct parents (children), their parents, and so forth.

```python
gdt_cell.query_parents().to_dataframe()
```

```python
gdt_cell.query_children().to_dataframe()
```

## Construct custom hierarchies of records

You can add a child of a parent record:

```python
# register a new cell type
my_celltype = bt.CellType(name="my new T-cell subtype").save()
# specify "gamma-delta T cell" as a parent
my_celltype.parents.add(gdt_cell)

# visualize hierarchy
my_celltype.view_parents(distance=3)
```

## Create new records

When accessing datasets, one often encounters bulk references to entities that might be corrupted or standardized using different standardization schemes.

Let's consider an example based on an `AnnData` object, in the `cell_type` annotations of this `AnnData` object, we find 4 references to cell types:

```python
adata = ln.examples.datasets.anndata_with_obs()
adata.obs.cell_type.value_counts()
```

We'd like to load the corresponding records in our in-house registry to annotate a dataset.

To this end, you'll typically use {class}`~lamindb.models.CanCurate.from_values`, which will both validate & retrieve records that match the values.

```python
cell_types = bt.CellType.from_values(adata.obs.cell_type)
cell_types
```

Logging informed us that 3 cell types were validated. Since we loaded these records at the same time, we could readily use them to annotate a dataset.

:::{dropdown} What happened under-the-hood?

`.from_values()` performs the following look ups:

1. If registry records match the values, load these records
2. If values match synonyms of registry records, load these records
3. If no record in the registry matches, attempt to load records from a public ontology
4. Same as 3. but based on synonyms

No records will be returned if all 4 look ups are unsuccessful.

Sometimes, it's useful to treat validated records differently from non-validated records. Here is a way:

```
original_values = ["gut", "gut2"]
inspector = bt.Tissue.inspect(original_values)
records_from_validated_values = bt.Tissue.from_values(inspector.validated)
```

:::

Alternatively, we can retrieve records based on ontology ids:

```python
adata.obs.cell_type_id.unique().tolist()
```

```python
bt.CellType.from_values(adata.obs.cell_type_id, field=bt.CellType.ontology_id)
```

## Validate & standardize

Simple validation of an iterable of values works like so:

```python
bt.CellType.validate(["fat cell", "blood forming stem cell"])
```

Because these values don't comply with the registry, they're not validated!

You can easily convert these values to validated standardized names based on synonyms like so:

```python
bt.CellType.standardize(["fat cell", "blood forming stem cell"])
```

Alternatively, you can use `.from_values()`, which will only ever return validated records and automatically standardize under-the-hood:

```python
bt.CellType.from_values(["fat cell", "blood forming stem cell"])
```

If you are now sure what to do, use `.inspect()` to get instructions:

```python
bt.CellType.inspect(["fat cell", "blood forming stem cell"]);
```

We can also add new synonyms to a record:

```python
hsc_record.add_synonym("HSC")
```

And when we encounter this synonym as a value, it will now be standardized using synonyms-lookup, and mapped on the correct registry record:

```python
bt.CellType.standardize(["HSC"])
```

A special synonym is `.abbr` (short for abbreviation), which has its own field and can be assigned via:

```python
hsc_record.set_abbr("HSC")
```

You can create a lookup object from the `.abbr` field:

```python
cell_types = bt.CellType.lookup("abbr")
cell_types.hsc
```

The same workflow works for all of `bionty`'s registries.

## Manage ontologies across organisms

Several registries are organism-aware (has a `.organism` field), for instance, {class}`~bionty.Gene`.

In this case, API calls that interact with multi-organism registries require an `organism` argument when there's ambiguity.

For instance, when validating gene symbols:

```python
bt.Gene.validate(["TCF7", "ABC1"], organism="human")
```

In contrary, working with Ensembl Gene IDs doesn't require passing `organism`, as there's no ambiguity:

```python
bt.Gene.validate(
    ["ENSG00000000419", "ENSMUSG00002076988"], field=bt.Gene.ensembl_gene_id
)
```

When working with the same organism throughout your analysis/workflow, you can omit the `organism` argument by configuring it globally:

```python
bt.settings.organism = "mouse"
bt.Gene.from_source(symbol="Ap5b1")
```

## Track ontology versions

Under-the-hood, source ontology versions are automatically tracked for each registry:

```python
bt.Source.filter(currently_used=True).to_dataframe()
```

Each record is linked to a versioned public source (if it was created from public):

```python
hepatocyte = bt.CellType.get(name="hepatocyte")
hepatocyte.source
```

## Create records from a specific ontology version

By default, new records are imported or created from the `"currently_used"` public sources which are configured during the instance initialization, e.g.:

```python
bt.Source.filter(entity="bionty.Phenotype", currently_used=True).to_dataframe()
```

Sometimes, the default source doesn't contain the ontology term you are looking for.

You can then specify to create a record from a non-default source. For instance, we can use the `ncbitaxon` ontology:

```python
source = bt.Source.get(entity="bionty.Organism", name="ncbitaxon")
source
```

```python
# validate against the NCBI Taxonomy
bt.Organism.validate(
    ["iris setosa", "iris versicolor", "iris virginica"], source=source
)
```

```python
# since we didn't seed the Organism registry with the NCBITaxon public ontology
# we need to save the records to the database
records = bt.Organism.from_values(
    ["iris setosa", "iris versicolor", "iris virginica"], source=source
).save()

# now we can query a iris organism and view its parents and children
bt.Organism.get(name="iris").view_parents(with_children=True)
```

<!-- #region -->

## Access any Ensembl genes

Genes from all Ensembl versions and organisms can be accessed, even though they are not yet present in the `bt.Source` registry.

For instance, if you want to use `rabbit` genes from Ensembl version `release-103`:

```python

# pip install pymysql
import bionty as bt

# automatically download genes for a new organism
gene_ontology = bt.base.Gene(source="ensembl", organism="rabbit", version='release-103')

# register the new source in lamindb
gene_ontology.register_source_in_lamindb()

# now you can start using this source

# import all genes from this source to your Gene registry
source = bt.Source.get(entity="bionty.Gene", name="ensembl", organism="rabbit", version="release-103")
bt.Gene.import_source(source=source)
```

<!-- #endregion -->


================================================
FILE: docs/organize.md
================================================
# Organize datasets

```{toctree}
:maxdepth: 1
:hidden:

curate
```

This guide walks through organizing datasets using files & folders, database relationships, and versioned collections.

## Via files & folders

You can use LaminDB like a file system. Similar to AWS S3, you organize artifacts into virtual folders using `/`-separated keys. To ingest a single file into a `project1/` folder, you'd call:

```python
artifact1 = ln.Artifact("./dataset.csv", key="project1/dataset1.csv").save()
```

For convenience, if you want to create an artifact for every file in a directory, use {meth}`~lamindb.Artifact.from_dir`:

```python
artifacts = ln.Artifact.from_dir("./project1/").save()
```

You can then query for all artifacts in the `"./project1/"` folder via:

```python
artifacts = ln.Artifact.filter(key__startswith="project1/")
```

Unlike a regular file system, every artifact is versioned and comes with rich metadata.

:::{dropdown} What if I do not care about the metadata and version of every file in a folder?

In some cases a folder _is_ the dataset and you don't need fine-grained information for every file.
In this scenario, save the entire directory as a single artifact:

```python
ln.Artifact("./folder_abc", key="folder_abc").save()
```

:::

## Via relationships in the database

### Annotating with projects

What if an artifact is relevant to multiple projects?
A dataset that's in the `project1/` folder cannot **also** reside in a `project2/` folder.
You can solve this problem with the `artifact.projects` relationship that links the {class}`~lamindb.Project` to {class}`~lamindb.Artifact`:

<img width="400" alt="image" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/uVm5ptyqukPEKCix0000.png"/>

Here is how to annotate one artifact with two projects:

```python
project1 = ln.Project(name="Project 1").save()  # create project 1
project2 = ln.Project(name="Project 2").save()  # create project 2
artifact1.projects.add(project1, project2)      # annotate artifact1
```

This allows you to retrieve `artifact1` by querying any project it belongs to:

```python
artifacts_in_project1 = ln.Artifact.filter(projects=project1)
artifacts_in_project2 = ln.Artifact.filter(projects=project2)
```

Here, `artifact1` is part of both query results.

:::{dropdown} Three additional advantages to using related registries rather than folder structures.

1. Projects can be richly annotated (e.g., with start/end dates, parent projects, or member roles).
2. You no longer need to rely on fragile file paths. If a folder is renamed, path-based retrieval breaks, but a project query by `uid` will always work.[^protectproject]
3. You can run a constrained query or search against all projects in your database rather than trying to narrow a search to folder names.

:::

### Annotating with labels

You can annotate with other entity types, not just projects. LaminDB offers two main classes for this: {class}`~lamindb.Record` for metadata records and {class}`~lamindb.ULabel` for simple labels, which are both link to artifacts:

<img width="400" alt="image" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/qvhxt6UuoUO2Bd820000.png"/>

Here is how to annotate with a ulabel and with a sample record:

```python
ulabel1 = ln.ULabel(name="raw_data").save()  # create a ulabel
artifact1.ulabels.add(ulabel1)               # annotate artifact1

sample_type = ln.Record(                     # create a record type "Samples"
    name="Samples",
    is_type=True
).save()
record1 = ln.Record(                         # create a sample record
    name="My sample",
    features={"gc_content": 0.5}
).save()
artifact1.records.add(record1)               # annnotate artifact1
```

You can use records and ulabels alongside entity types in modules such as {mod}`bionty`:

```python
import bionty as bt

cell_type1 = bt.CellType.from_source(
    name="T cell"                            # create a cell type from a public ontology
).save()
artifact1.cell_types.add(cell_type1)         # annotate artifact1
```

### Annotating with features

To annotate with non-categorical data types or to disambiguate categorical annotations, use {class}`~lamindb.Feature` objects.

<img width="400" alt="image" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/eT6SEny5HpQQNgFl0000.png"/>

Here is how to define features and annotate an artifact with feature values:

```python
exp_type = ln.Record.get(name="Experiments")          # query the entity type `Experiments`
ln.Feature(name="gc_content", dtype=float).save()     # define a feature with dtype float
ln.Feature(name="experiment", dtype=exp_type).save()  # define a feature with dtype `Experiments`
artifact.features.set_values({
    "gc_content": 0.55,                               # validated to be a float
    "experiment": "Experiment 1",                     # validated to exist under the `Experiments` record type
})
```

When you work with structured data formats like `DataFrame` or `AnnData`, it often makes sense to validate the content of their features. After validation, the parsed feature values are automatically used for annotation. The easiest way is to use validation and auto-annotation is the built-in schema `"valid_features"`:

```python
# validate columns in the dataframe and map them on features
# auto-annotate with parsed metadata
ln.Artifact.from_dataframe(df, schema="valid_features").save()
```

Below is an example from the {doc}`docs:tutorial` illustrating how you get e.g. cell type, treatment, and assay annotations based on a dataframe's content. You can read more on this in {doc}`/curate`.

<img width="600px" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/6sofuDVvTANB0f480003.png">

### Annotating with data-lineage

When you call {func}`~lamindb.track` or decorate a function with {func}`~lamindb.flow`, you automatically annotate artifacts with {class}`~lamindb.Run` and {class}`~lamindb.Transform` objects.

<img width="400" alt="image" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/Z1iliqp5mInQQ2iY0000.png"/>

Here is how:

```{eval-rst}
.. literalinclude:: scripts/run_track_and_finish.py
   :language: python
```

Note that you can pass `project` to {func}`~lamindb.track` to auto-annotate all objects that are created in a run with a project label. Read more in {doc}`/track`.

### Overview of auto-generated annotations

The {class}`~lamindb.Artifact` registry has simple fields (such as `description`, `created_at`, `size`) and related fields (such as `projects`, `created_by`, `storage`). Many of these fields are automatically populated and you can use them to retrieve sets of artifacts.

<img width="800px" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/HMfWLa1rFkxcxQEN0000.svg">

All other registries link to {class}`~lamindb.Artifact` to provide context for finding, querying, validating, and managing artifacts.[^starsnowflake]

:::{dropdown} Can you give me some example queries?

Here are examples leveraging auto-populated fields.

```python
artifacts = ln.Artifact.filter(
    created_at__gt="2023-06-24",    # created after June 24th, 2023
    size__lt=1e9,                   # smaller than 1GB
    suffix=".parquet",              # with a .parquet suffix
    n_observations__gt=1000,        # with more than 1000 observations
    n_files__gt=1000,               # folder-like artifacts with more than 1000 files
    otype="DataFrame",              # that are DataFrames
    created_on__name="my-branch",   # created on a specific branch or environment
    created_by__handle="falexwolf", # created by user with handle falexwolf
    run=run,                        # created by a specific run
    transform__name="my-script.py", # created by a specific script/notebook
)
```

:::

## Versioned collections of artifacts

If you want to group artifacts by metadata and version the entire set, use {class}`~lamindb.Collection`.

<img width="160" alt="image" src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/QR0KuktVEnVL08K90000.png"/>

Unlike during annotation, you have to pass an entire group of artifacts to a `Collection` constructor:

```python
collection = ln.Collection([artifact1, artifact2], key="my_data_release").save()
```

And unlike the folder-based or annotation-based sets of artifacts — which can change as artifacts are added or removed — a collection guarantees an exact, immutable set of artifacts.

Artifacts are versioned based on the hash of their content. Collections are versioned based on the top-level hash of their artifact hashes. If you use the {meth}`~lamindb.Collection.append` method, a new version of the collection is created, and the old version is left unchanged:

```python
collection_v2 = collection.append(artifact3)
```

While collections are indirectly annotated through the annotations of the artifacts they contain, you can also add collection-level annotations. Like artifacts, collections link to projects, runs, ulabels, records, and most other registries.

[^starsnowflake]: You can consider the SQL table underlying {class}`~lamindb.Artifact` your _fact table_ and all other tables for other entities your _dimension tables_ in a star or Snowflake schema ([see Wikipedia](https://en.wikipedia.org/wiki/Fact_table)).

[^protectproject]: The project annotation of the artifact is protected against the deletion of the project. If a user with necessary rights attempts to delete the project, they will get an error.


================================================
FILE: docs/pertdb.md
================================================
# `pertdb`

```{eval-rst}
.. automodule:: pertdb
```


================================================
FILE: docs/query-search.md
================================================
# Query, search & stream

```{toctree}
:maxdepth: 1

registries
arrays
```


================================================
FILE: docs/registries.md
================================================
---
execute_via: python
---

# Query & search registries

This guide walks through different ways of querying & searching LaminDB registries.
To understand the underlying cross-linking of objects in the SQL database, read {doc}`organize`.

If you already have a set of artifacts, e.g. in the form of parquet files, and you'd like to now query/stream their (validated) content, read {doc}`arrays`.

```python
# initialize a test database to run examples
!lamin init --storage ./test-registries --modules bionty
```

Let's start by creating a few exemplary datasets:

```python
import lamindb as ln

ln.Artifact(ln.examples.datasets.file_fastq(), key="raw/my_fastq.fastq.gz").save()
ln.Artifact(ln.examples.datasets.file_jpg_paradisi05(), key="my_image.jpg").save()
ln.Artifact.from_dataframe(ln.examples.datasets.df_iris(), key="iris.parquet").save()
ln.examples.datasets.mini_immuno.save_mini_immuno_datasets()
```

## Get an overview

The easiest way to get an overview over all artifacts is by typing {meth}`~lamindb.Artifact.to_dataframe`, which returns the 100 latest artifacts in the {class}`~lamindb.Artifact` registry.

```python
ln.Artifact.to_dataframe()
```

You can include features.

```python
ln.Artifact.to_dataframe(include="features")
```

You can include fields from other registries.

```python
ln.Artifact.to_dataframe(
    include=[
        "created_by__name",
        "records__name",
        "cell_types__name",
        "schemas__itype",
    ]
)
```

You can also get an overview of the entire database.

```python
ln.view()
```

## Auto-complete objects

For registries with less than 100k objects, auto-completing a `Lookup` object is the most convenient way of finding a record.

```python
records = ln.Record.lookup()
```

With auto-complete, we find a record:

```python
experiment_1 = records.experiment_1
experiment_1
```

This works for any {class}`~lamindb.models.BaseSQLRecord` class, e.g., also for plugin `bionty`.

```python
import bionty as bt

cell_types = bt.CellType.lookup()
```

## Get one object

{meth}`~lamindb.models.BaseSQLRecord.get` errors if none or more than one matching objects are found.

```python
ln.Record.get(experiment_1.uid)  # by uid
ln.Record.get(name="Experiment 1")  # by field
```

## Query objects by fields

Use {meth}`~lamindb.models.BaseSQLRecord.filter` to query all artifacts by the `suffix` field:

```python
qs = ln.Artifact.filter(suffix=".h5ad")
qs
```

This returns a {class}`~lamindb.models.QuerySet`, which lazily references the set of {class}`~lamindb.models.BaseSQLRecord` objects that matches the filter statement. You can iteratively filter a queryset:

```python
qs = qs.filter(records__name="Experiment 1")
```

To access the results encoded in a queryset, call:

- {meth}`~lamindb.models.BasicQuerySet.to_dataframe`: A pandas `DataFrame` with each record in a row.
- {meth}`~lamindb.models.BasicQuerySet.one`: Exactly one record. Will raise an error if there is none. Is equivalent to the `.get()` method shown above.
- {meth}`~lamindb.models.BasicQuerySet.one_or_none`: Either one record or `None` if there is no query result.

Alternatively,

- use the `QuerySet` as an iterator
- get individual objects via `qs[0]`, `qs[1]`

For example:

```python
qs.to_dataframe()
```

Note that the `SQLRecord` classes in LaminDB are Django Models and any [Django query](https://docs.djangoproject.com/en/stable/topics/db/queries/) works.

## Query objects by features

The `Artifact`, `Record`, and `Run` registries can be queried by features.

```python
ln.Artifact.filter(perturbation="DMSO").to_dataframe(include="features")
```

You can also query by passing a `Feature` object, which is useful to disambiguate feature names.

```python
perturbation = ln.Feature.get(name="perturbation")  # can optionally pass a feature type to disambiguate
ln.Artifact.filter(perturbation == "DMSO")  # note this is now an expression using the == syntax
```

Just like for fields holding dictionary values, you can query for dictionary keys in features whose `dtype` is `dict`:

```python
ln.Artifact.filter(study_metadata__detail1="123").to_dataframe(include="features")
```

```python
ln.Artifact.filter(study_metadata__detail2=2).to_dataframe(include="features")
```

You can query for whether a dataset is annotated or not annotated by a feature.

```python
ln.Artifact.filter(perturbation__isnull=True).to_dataframe(include="features")
```

```python
ln.Artifact.filter(perturbation__isnull=False).to_dataframe(include="features")
```

## Query runs by parameters

Here is an example for querying by parameters: {ref}`track-run-parameters`.

## Search for objects

You can search every registry via {meth}`~lamindb.models.SQLRecord.search`. For example, the `Artifact` registry.

```python
ln.Artifact.search("iris").to_dataframe()
```

Here is more background on search and examples for searching the entire cell type ontology: {doc}`/faq/search`

## Query related registries

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations and leverage different comparators.

```python
ln.Artifact.filter(created_by__handle__startswith="testuse").to_dataframe()
```

The filter selects all artifacts based on the users who ran the generating notebook. Under the hood, in the SQL database, it's joining the artifact table with the user table.

Another typical example is querying all datasets that measure a particular feature. For instance, which datasets measure `"CD8A"`. Here is how to do it:

```python
cd8a = bt.Gene.get(symbol="CD8A")
# query for all feature sets that contain CD8A
schemas_with_cd8a = ln.Schema.filter(genes=cd8a)
# get all artifacts
ln.Artifact.filter(schemas__in=schemas_with_cd8a).to_dataframe()
```

Instead of splitting this across three queries, the double-underscore syntax allows you to define a path for one query.

```python
ln.Artifact.filter(schemas__genes__symbol="CD8A").to_dataframe()
```

## Filter operators

You can qualify the type of comparison in a query by using a comparator.

Below follows a list of the most import, but Django supports about [two dozen field comparators](https://docs.djangoproject.com/en/stable/ref/models/querysets/#field-lookups) `field__comparator=value`.

### and

```python
ln.Artifact.filter(suffix=".h5ad", records=experiment_1).to_dataframe()
```

### less than/ greater than

Or subset to artifacts greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

```python
ln.Artifact.filter(records=experiment_1, size__gt=1e4).to_dataframe()
```

### in

```python
ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).to_dataframe()
```

### order by

```python
ln.Artifact.filter().order_by("created_at").to_dataframe()
```

```python
# reverse ordering
ln.Artifact.filter().order_by("-created_at").to_dataframe()
```

```python
ln.Artifact.filter().order_by("key").to_dataframe()
```

```python
# reverse ordering
ln.Artifact.filter().order_by("-key").to_dataframe()
```

### contains

```python
ln.Transform.filter(description__contains="search").to_dataframe().head(5)
```

And case-insensitive:

```python
ln.Transform.filter(description__icontains="Search").to_dataframe().head(5)
```

### startswith

```python
ln.Transform.filter(description__startswith="Query").to_dataframe()
```

### or

```python
ln.Artifact.filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).to_dataframe()
```

### negate/ unequal

```python
ln.Artifact.filter(~ln.Q(suffix=".jpg")).to_dataframe()
```


================================================
FILE: docs/scripts/curate_anndata_flexible.py
================================================
import lamindb as ln

ln.examples.datasets.mini_immuno.define_features_labels()
adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
artifact = ln.Artifact.from_anndata(
    adata,
    key="examples/mini_immuno.h5ad",
    schema="ensembl_gene_ids_and_valid_features_in_obs",
).save()
artifact.describe()


================================================
FILE: docs/scripts/curate_anndata_uns.py
================================================
import lamindb as ln

ln.examples.datasets.mini_immuno.define_features_labels()
adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
schema = ln.Schema.get(name="Study metadata schema")
artifact = ln.Artifact.from_anndata(
    adata, schema=schema, key="examples/mini_immuno_uns.h5ad"
)
artifact.describe()


================================================
FILE: docs/scripts/curate_dataframe_attrs.py
================================================
import lamindb as ln

from .define_schema_df_metadata import study_metadata_schema

df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
schema = ln.Schema(
    features=[ln.Feature(name="perturbation", dtype="str").save()],
    slots={"attrs": study_metadata_schema},
    otype="DataFrame",
).save()
curator = ln.curators.DataFrameCurator(df, schema=schema)
curator.validate()
artifact = curator.save_artifact(key="examples/df_with_attrs.parquet")
artifact.describe()


================================================
FILE: docs/scripts/curate_dataframe_external_features.py
================================================
import lamindb as ln
from datetime import date

df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")

temperature = ln.Feature(name="temperature", dtype=float).save()
date_of_study = ln.Feature(name="date_of_study", dtype=date).save()
external_schema = ln.Schema(features=[temperature, date_of_study]).save()

concentration = ln.Feature(name="concentration", dtype=str).save()
donor = ln.Feature(name="donor", dtype=str, nullable=True).save()
schema = ln.Schema(
    features=[concentration, donor],
    slots={"__external__": external_schema},
    otype="DataFrame",
).save()

artifact = ln.Artifact.from_dataframe(
    df,
    key="examples/dataset1.parquet",
    features={"temperature": 21.6, "date_of_study": date(2024, 10, 1)},
    schema=schema,
).save()
artifact.describe()


================================================
FILE: docs/scripts/curate_dataframe_flexible.py
================================================
import lamindb as ln

ln.examples.datasets.mini_immuno.define_features_labels()
df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
artifact = ln.Artifact.from_dataframe(
    df, key="examples/dataset1.parquet", schema="valid_features"
).save()
artifact.describe()


================================================
FILE: docs/scripts/curate_dataframe_minimal_errors.py
================================================
import lamindb as ln

schema = ln.examples.datasets.mini_immuno.define_mini_immuno_schema_flexible()
df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
df.pop("donor")  # remove donor column to trigger validation error
try:
    artifact = ln.Artifact.from_dataframe(
        df, key="examples/dataset1.parquet", schema=schema
    ).save()
except ln.errors.ValidationError as error:
    print(error)


================================================
FILE: docs/scripts/curate_dataframe_union_features.py
================================================
import lamindb as ln
import pandas as pd

union_feature = ln.Feature(
    name="mixed_feature",
    dtype="cat[bionty.Tissue.ontology_id|bionty.CellType.ontology_id]",
).save()

df_mixed = pd.DataFrame({"mixed_feature": ["UBERON:0000178", "CL:0000540"]})

schema = ln.Schema(features=[union_feature], coerce=True).save()

curator = ln.curators.DataFrameCurator(df_mixed, schema)
curator.validate()


================================================
FILE: docs/scripts/curate_mudata.py
================================================
import lamindb as ln
import bionty as bt

from docs.scripts.define_schema_df_metadata import study_metadata_schema

# define labels
perturbation = ln.Record(name="Perturbation", is_type=True).save()
ln.Record(name="Perturbed", type=perturbation).save()
ln.Record(name="NT", type=perturbation).save()

replicate = ln.Record(name="Replicate", is_type=True).save()
ln.Record(name="rep1", type=replicate).save()
ln.Record(name="rep2", type=replicate).save()
ln.Record(name="rep3", type=replicate).save()

# define the global obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype="cat[Record[Perturbation]]").save(),
        ln.Feature(name="replicate", dtype="cat[Record[Replicate]]").save(),
    ],
).save()

# define the ['rna'].obs schema
obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
).save()

# define the ['hto'].obs schema
obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=float).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
).save()

# define ['rna'].var schema
var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
        "uns:study_metadata": study_metadata_schema,
    },
).save()

# curate a MuData
mdata = ln.examples.datasets.mudata_papalexi21_subset(with_uns=True)
bt.settings.organism = "human"  # set the organism to map gene symbols
curator = ln.curators.MuDataCurator(mdata, mudata_schema)
artifact = curator.save_artifact(key="examples/mudata_papalexi21_subset.h5mu")
assert artifact.schema == mudata_schema


================================================
FILE: docs/scripts/curate_soma_experiment.py
================================================
import lamindb as ln
import bionty as bt
import tiledbsoma as soma
import tiledbsoma.io

adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
tiledbsoma.io.from_anndata("small_dataset.tiledbsoma", adata, measurement_name="RNA")

obs_schema = ln.Schema(
    name="soma_obs_schema",
    features=[
        ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save(),
        ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save(),
    ],
).save()

var_schema = ln.Schema(
    name="soma_var_schema",
    features=[
        ln.Feature(name="var_id", dtype=bt.Gene.ensembl_gene_id).save(),
    ],
    coerce=True,
).save()

soma_schema = ln.Schema(
    name="soma_experiment_schema",
    otype="tiledbsoma",
    slots={
        "obs": obs_schema,
        "ms:RNA.T": var_schema,
    },
).save()

with soma.Experiment.open("small_dataset.tiledbsoma") as experiment:
    curator = ln.curators.TiledbsomaExperimentCurator(experiment, soma_schema)
    curator.validate()
    artifact = curator.save_artifact(
        key="examples/soma_experiment.tiledbsoma",
        description="SOMA experiment with schema validation",
    )
assert artifact.schema == soma_schema
artifact.describe()


================================================
FILE: docs/scripts/curate_spatialdata.py
================================================
import lamindb as ln

spatialdata = ln.examples.datasets.spatialdata_blobs()
sdata_schema = ln.Schema.get(name="spatialdata_blobs_schema")
curator = ln.curators.SpatialDataCurator(spatialdata, sdata_schema)
try:
    curator.validate()
except ln.errors.ValidationError:
    pass

spatialdata.tables["table"].var.drop(index="ENSG00000999999", inplace=True)

# validate again (must pass now) and save artifact
artifact = ln.Artifact.from_spatialdata(
    spatialdata, key="examples/spatialdata1.zarr", schema=sdata_schema
).save()
artifact.describe()


================================================
FILE: docs/scripts/define_schema_anndata_uns.py
================================================
import lamindb as ln

from define_schema_df_metadata import study_metadata_schema

anndata_uns_schema = ln.Schema(
    otype="AnnData",
    slots={
        "uns:study_metadata": study_metadata_schema,
    },
).save()


================================================
FILE: docs/scripts/define_schema_df_metadata.py
================================================
import lamindb as ln

study_metadata_schema = ln.Schema(
    name="Study metadata schema",
    features=[
        ln.Feature(name="temperature", dtype=float).save(),
        ln.Feature(name="experiment", dtype=str).save(),
    ],
).save()


================================================
FILE: docs/scripts/define_schema_spatialdata.py
================================================
import lamindb as ln
import bionty as bt

# a very comprehensive schema for different slots of a SpatialData object

# define or query features
bio_dict = ln.Feature(name="bio", dtype=dict).save()
tech_dict = ln.Feature(name="tech", dtype=dict).save()
disease = ln.Feature(name="disease", dtype=bt.Disease, coerce=True).save()
developmental_stage = ln.Feature(
    name="developmental_stage",
    dtype=bt.DevelopmentalStage,
    coerce=True,
).save()
assay = ln.Feature(name="assay", dtype=bt.ExperimentalFactor, coerce=True).save()
sample_region = ln.Feature(name="sample_region", dtype=str).save()
analysis = ln.Feature(name="analysis", dtype=str).save()

# define or query schema components
attrs_schema = ln.Schema([bio_dict, tech_dict]).save()
sample_schema = ln.Schema([disease, developmental_stage]).save()
tech_schema = ln.Schema([assay]).save()
obs_schema = ln.Schema([sample_region]).save()
uns_schema = ln.Schema([analysis]).save()
# enforces only registered Ensembl Gene IDs pass validation (maximal_set=True)
varT_schema = ln.Schema(itype=bt.Gene.ensembl_gene_id, maximal_set=True).save()

# compose the SpatialData schema
sdata_schema = ln.Schema(
    name="spatialdata_blobs_schema",
    otype="SpatialData",
    slots={
        "attrs:bio": sample_schema,
        "attrs:tech": tech_schema,
        "attrs": attrs_schema,
        "tables:table:obs": obs_schema,
        "tables:table:var.T": varT_schema,
    },
).save()


================================================
FILE: docs/scripts/my_workflow.py
================================================
import lamindb as ln


@ln.flow()
def ingest_dataset(key: str) -> ln.Artifact:
    df = ln.examples.datasets.mini_immuno.get_dataset1()
    artifact = ln.Artifact.from_dataframe(df, key=key).save()
    return artifact


if __name__ == "__main__":
    ingest_dataset(key="my_analysis/dataset.parquet")


================================================
FILE: docs/scripts/my_workflow_with_click.py
================================================
import click
import lamindb as ln


@click.command()
@click.option("--key", required=True)
@ln.flow()
def main(key: str):
    df = ln.examples.datasets.mini_immuno.get_dataset2()
    ln.Artifact.from_dataframe(df, key=key).save()


if __name__ == "__main__":
    main()


================================================
FILE: docs/scripts/my_workflow_with_step.py
================================================
import lamindb as ln


@ln.step()
def subset_dataframe(
    artifact: ln.Artifact,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> ln.Artifact:
    df = artifact.load()
    new_data = df.iloc[:subset_rows, :subset_cols]
    new_key = artifact.key.replace(".parquet", "_subsetted.parquet")
    return ln.Artifact.from_dataframe(new_data, key=new_key).save()


@ln.flow()
def ingest_dataset(key: str, subset: bool = False) -> ln.Artifact:
    df = ln.examples.datasets.mini_immuno.get_dataset1()
    artifact = ln.Artifact.from_dataframe(df, key=key).save()
    if subset:
        artifact = subset_dataframe(artifact)
    return artifact


if __name__ == "__main__":
    ingest_dataset(key="my_analysis/dataset.parquet", subset=True)


================================================
FILE: docs/scripts/run_script_with_step.py
================================================
import argparse
import lamindb as ln


@ln.step()
def subset_dataframe(
    artifact: ln.Artifact,
    subset_rows: int = 2,
    subset_cols: int = 2,
    run: ln.Run | None = None,
) -> ln.Artifact:
    dataset = artifact.load(is_run_input=run)
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    new_key = artifact.key.replace(".parquet", "_subsetted.parquet")
    return ln.Artifact.from_dataframe(new_data, key=new_key, run=run).save()


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--subset", action="store_true")
    args = p.parse_args()

    params = {"is_subset": args.subset}

    ln.track(params=params)

    if args.subset:
        df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
        artifact = ln.Artifact.from_dataframe(
            df, key="my_analysis/dataset.parquet"
        ).save()
        subsetted_artifact = subset_dataframe(artifact)

    ln.finish()


================================================
FILE: docs/scripts/run_track_and_finish.py
================================================
import lamindb as ln

ln.track()  # initiate a tracked notebook/script run

# your code automatically tracks inputs & outputs

ln.finish()  # mark run as finished, save execution report, source code & environment


================================================
FILE: docs/scripts/run_track_with_features_and_params.py
================================================
import argparse
import lamindb as ln


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--s3-folder", type=str)
    p.add_argument("--experiment", type=str)
    args = p.parse_args()
    features = {
        "s3_folder": args.s3_folder,
        "experiment": args.experiment,
    }
    ln.track(features=features, params={"example_param": 42})

    # your code

    ln.finish()


================================================
FILE: docs/scripts/run_track_with_params.py
================================================
import argparse
import lamindb as ln

if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--input-dir", type=str)
    p.add_argument("--downsample", action="store_true")
    p.add_argument("--learning-rate", type=float)
    args = p.parse_args()
    params = {
        "input_dir": args.input_dir,
        "learning_rate": args.learning_rate,
        "preprocess_params": {
            "downsample": args.downsample,
            "normalization": "the_good_one",
        },
    }
    ln.track(params=params)

    # your code

    ln.finish()


================================================
FILE: docs/scripts/synced_with_git.py
================================================
import lamindb as ln

ln.settings.sync_git_repo = "https://github.com/..."
ln.track()
# your code
ln.finish()


================================================
FILE: docs/storage/add-replace-cache.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Add, replace, cache and delete artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pytest\n",
    "import shutil\n",
    "import lamindb as ln"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "ln.setup.login(\"testuser1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {
    "tags": [
     "hide-output",
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    root_path = ln.UPath(\"s3://lamindb-ci/test-add-replace-cache\")\n",
    "    if root_path.exists():\n",
    "        root_path.rmdir()\n",
    "    ln.setup.delete(\"testuser1/test-add-replace-cache\", force=True)\n",
    "except BaseException:  # noqa: S110\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "ln.setup.init(storage=\"s3://lamindb-ci/test-add-replace-cache\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Save with auto-managed (`key=None`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "AUTO_KEY_PREFIX = ln.core.storage.paths.AUTO_KEY_PREFIX\n",
    "root = ln.settings.storage.root"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact(\"./test-files/iris.csv\", description=\"iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "key_path = root / f\"{AUTO_KEY_PREFIX}{artifact.uid}{artifact.suffix}\"\n",
    "assert key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "cache_csv_path = artifact.cache()\n",
    "print(cache_csv_path)\n",
    "assert cache_csv_path.suffix == \".csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/iris.data\")\n",
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path = key_path\n",
    "new_key_path = root / f\"{AUTO_KEY_PREFIX}{artifact.uid}{artifact.suffix}\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "The suffix changed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(old_key_path)\n",
    "print(new_key_path)\n",
    "assert not old_key_path.exists()\n",
    "assert new_key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "cache_data_path = artifact.cache()\n",
    "print(cache_data_path)\n",
    "assert cache_data_path.suffix == \".data\"\n",
    "assert cache_data_path.stat().st_mtime >= cache_csv_path.stat().st_mtime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.delete(permanent=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "## Save with manually passed real `key`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "ln.settings.creation._artifact_use_virtual_keys = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact(\"./test-files/iris.csv\", key=\"iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "key_path = root / \"iris.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/new_iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "Check paths: no changes here, as the suffix didn't change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path = key_path\n",
    "new_key_path = root / \"new_iris.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert old_key_path.exists()\n",
    "assert not new_key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/iris.data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_key_path = root / \"iris.data\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert not old_key_path.exists()\n",
    "assert new_key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.delete(permanent=True, storage=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37",
   "metadata": {},
   "source": [
    "## Save from memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39",
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = pd.read_csv(\"./test-files/iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact.from_dataframe(\n",
    "    iris, description=\"iris_store\", key=\"iris.parquet\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42",
   "metadata": {},
   "outputs": [],
   "source": [
    "key_path = root / \"iris.parquet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(data=iris[:-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert artifact.key == \"iris.parquet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/new_iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path = key_path\n",
    "new_key_path = root / \"iris.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51",
   "metadata": {},
   "outputs": [],
   "source": [
    "old_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_key_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert not old_key_path.exists()\n",
    "assert new_key_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we use the path in the next sections\n",
    "path_in_storage = artifact.path\n",
    "artifact.delete(permanent=True, storage=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55",
   "metadata": {},
   "source": [
    "## Save with manually passed virtual `key`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56",
   "metadata": {},
   "outputs": [],
   "source": [
    "ln.settings.creation._artifact_use_virtual_keys = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact(\"./test-files/iris.csv\", key=\"iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59",
   "metadata": {},
   "outputs": [],
   "source": [
    "with pytest.raises(ValueError) as err:\n",
    "    artifact.replace(path_in_storage)\n",
    "assert err.exconly().startswith(\n",
    "    \"ValueError: Can only replace with a local path not in any Storage.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60",
   "metadata": {},
   "outputs": [],
   "source": [
    "# return an existing artifact if the hash is the same\n",
    "assert artifact == artifact.replace(\"./test-files/iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61",
   "metadata": {},
   "outputs": [],
   "source": [
    "fpath = artifact.path\n",
    "assert fpath.suffix == \".csv\" and fpath.stem == artifact.uid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/iris.data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert artifact.key == \"iris.data\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert not fpath.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66",
   "metadata": {},
   "outputs": [],
   "source": [
    "fpath = artifact.path\n",
    "assert fpath.suffix == \".data\" and fpath.stem == artifact.uid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.delete(permanent=True, storage=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68",
   "metadata": {},
   "source": [
    "## Save in existing storage with a virtual `key`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact(path_in_storage, key=\"iris_test.csv\").save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert artifact._real_key.endswith(\"iris.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/iris.data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert artifact._real_key.endswith(\"iris.data\")\n",
    "assert artifact._clear_storagekey.endswith(\"iris.csv\")\n",
    "assert artifact.key == \"iris_test.data\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = artifact.path\n",
    "\n",
    "assert path.name == \"iris.data\"\n",
    "assert path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert not path_in_storage.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.delete(permanent=True, storage=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77",
   "metadata": {},
   "source": [
    "## Replace with folder artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78",
   "metadata": {},
   "outputs": [],
   "source": [
    "adata = ln.examples.datasets.anndata_pbmc68k_reduced()\n",
    "\n",
    "adata.write_zarr(\"./test-files/pbmc68k.zarr\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact(\"./test-files/pbmc68k.zarr\", key=\"pbmc68k.zarr\").save()\n",
    "save_hash = artifact.hash\n",
    "save_n_files = artifact.n_files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80",
   "metadata": {},
   "outputs": [],
   "source": [
    "with pytest.raises(ValueError) as err:\n",
    "    artifact.replace(\"./test-files/iris.csv\")\n",
    "assert err.exconly().endswith(\"It is not allowed to replace a folder with a file.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert save_hash is not None\n",
    "assert artifact.path.is_dir()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82",
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.obs[\"add_new_col\"] = \"new\"\n",
    "\n",
    "adata.write_zarr(\"./test-files/pbmc68k_new.zarr\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.replace(\"./test-files/pbmc68k_new.zarr\")\n",
    "artifact.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert artifact.key == \"pbmc68k.zarr\"\n",
    "assert artifact.hash != save_hash\n",
    "assert artifact.n_files != save_n_files\n",
    "assert artifact.path.is_dir()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85",
   "metadata": {},
   "outputs": [],
   "source": [
    "shutil.rmtree(artifact.cache())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86",
   "metadata": {},
   "outputs": [],
   "source": [
    "with artifact.open() as store:\n",
    "    assert \"add_new_col\" in store.obs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87",
   "metadata": {},
   "outputs": [],
   "source": [
    "# checks that .open above opened the cloud path without syncing\n",
    "assert not artifact._cache_path.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88",
   "metadata": {},
   "outputs": [],
   "source": [
    "shutil.rmtree(\"./test-files/pbmc68k.zarr\")\n",
    "shutil.rmtree(\"./test-files/pbmc68k_new.zarr\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89",
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.delete(permanent=True, storage=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ln.setup.delete(\"test-add-replace-cache\", force=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  },
  "nbproject": {
   "id": "uBQMCcdYwEjA",
   "parent": null,
   "pypackage": null,
   "time_init": "2023-04-04T16:26:17.675023+00:00",
   "user_handle": "Koncopd",
   "user_id": "qTQ5q0ar",
   "user_name": "Sergei Rybakov",
   "version": "0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: docs/storage/anndata-accessor.ipynb
================================================
{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Test `AnnDataAccessor`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import lamindb as ln\n",
    "\n",
    "ln.setup.init(storage=\"s3://lamindb-ci/test-anndata\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll need some test data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ln.Artifact(\"s3://lamindb-ci/test-anndata/pbmc68k.h5ad\").save()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An `h5ad` artifact stored on s3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact = ln.Artifact.filter(key=\"pbmc68k.h5ad\").one()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "artifact.path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata = artifact.open()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is possible to access `AnnData` attributes without loading them into memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "print(adata.obsm)\n",
    "print(adata.varm)\n",
    "print(adata.obsp)\n",
    "print(adata.varm)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, `.obs`, `.var` and `.uns` are always loaded fully into memory on `AnnDataAccessor` initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.obs.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.var.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.uns.keys()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Without subsetting, the `AnnDataAccessor` object gives references to underlying lazy `h5` or `zarr` arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.obsm[\"X_pca\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And to a lazy `SparseDataset` from the `anndata` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.obsp[\"distances\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get a subset of the object, attributes are loaded only on explicit access:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_idx = adata.obs.cell_type.isin([\"Dendritic cells\", \"CD14+ Monocytes\"]) & (\n",
    "    adata.obs.percent_mito <= 0.05\n",
    ")\n",
    "adata_subset = adata[obs_idx]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata_subset"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check shapes of the subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "num_idx = sum(obs_idx)\n",
    "assert adata_subset.shape == (num_idx, adata.shape[1])\n",
    "assert (adata_subset.obs.cell_type == \"CD34+\").sum() == 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata_subset.obs.cell_type.value_counts()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subsets load the arrays into memory only on direct access"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(adata_subset.X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(adata_subset.obsm[\"X_pca\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "assert adata_subset.obsp[\"distances\"].shape[0] == num_idx"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To load the entire subset into memory as an actual `AnnData` object, use `to_memory()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata_subset.to_memory()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "!lamin delete --force test-anndata"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "nbproject": {
   "id": "YVUCtH4GfQOy",
   "parent": null,
   "pypackage": null,
   "time_init": "2023-01-23T08:28:32.097943+00:00",
   "user_handle": "testuser1",
   "user_id": "DzTjkKse",
   "user_name": "Test User1",
   "version": "0"
  },
  "vscode": {
   "interpreter": {
    "hash": "ae1fefc8646a06dd2e75004cd934adda7c5727b046986a772e3b44b0ffba9754"
   }
  }
 },
 "nbformat": 4,
 "n

Download .txt

gitextract_xlz91t15/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   ├── config.yml
│   │   ├── enhancement.yml
│   │   └── usage_question.yml
│   └── workflows/
│       ├── build.yml
│       └── doc-changes.yml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── docs/
│   ├── api.md
│   ├── arrays.md
│   ├── bionty.md
│   ├── changelog.md
│   ├── curate.md
│   ├── faq/
│   │   ├── acid.md
│   │   ├── curate-any.md
│   │   ├── idempotency.md
│   │   ├── import-modules.md
│   │   ├── keep-artifacts-local.md
│   │   ├── pydantic-pandera.md
│   │   ├── reference-field.md
│   │   ├── search.md
│   │   ├── symbol-mapping.md
│   │   ├── test_notebooks.py
│   │   ├── track-run-inputs.md
│   │   ├── trash-archive.md
│   │   └── validate-fields.md
│   ├── faq.md
│   ├── guide.md
│   ├── index.md
│   ├── lightning.md
│   ├── manage-changes.md
│   ├── manage-ontologies.md
│   ├── organize.md
│   ├── pertdb.md
│   ├── query-search.md
│   ├── registries.md
│   ├── scripts/
│   │   ├── curate_anndata_flexible.py
│   │   ├── curate_anndata_uns.py
│   │   ├── curate_dataframe_attrs.py
│   │   ├── curate_dataframe_external_features.py
│   │   ├── curate_dataframe_flexible.py
│   │   ├── curate_dataframe_minimal_errors.py
│   │   ├── curate_dataframe_union_features.py
│   │   ├── curate_mudata.py
│   │   ├── curate_soma_experiment.py
│   │   ├── curate_spatialdata.py
│   │   ├── define_schema_anndata_uns.py
│   │   ├── define_schema_df_metadata.py
│   │   ├── define_schema_spatialdata.py
│   │   ├── my_workflow.py
│   │   ├── my_workflow_with_click.py
│   │   ├── my_workflow_with_step.py
│   │   ├── run_script_with_step.py
│   │   ├── run_track_and_finish.py
│   │   ├── run_track_with_features_and_params.py
│   │   ├── run_track_with_params.py
│   │   └── synced_with_git.py
│   ├── storage/
│   │   ├── add-replace-cache.ipynb
│   │   ├── anndata-accessor.ipynb
│   │   ├── prepare-sync-local-to-cloud.ipynb
│   │   ├── sync-local-to-cloud.ipynb
│   │   ├── test-files/
│   │   │   └── iris.data
│   │   ├── test_notebooks.py
│   │   ├── upload.ipynb
│   │   └── vitessce.ipynb
│   ├── storage.md
│   ├── sync.md
│   ├── test_notebooks.py
│   └── track.md
├── lamindb/
│   ├── __init__.py
│   ├── _finish.py
│   ├── _secret_redaction.py
│   ├── _view.py
│   ├── base/
│   │   ├── __init__.py
│   │   ├── dtypes.py
│   │   ├── fields.py
│   │   ├── ids.py
│   │   ├── types.py
│   │   ├── uids.py
│   │   ├── users.py
│   │   └── utils.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── _compat.py
│   │   ├── _context.py
│   │   ├── _functions.py
│   │   ├── _mapped_collection.py
│   │   ├── _settings.py
│   │   ├── _sync_git.py
│   │   ├── _track_environment.py
│   │   ├── exceptions.py
│   │   ├── loaders.py
│   │   ├── storage/
│   │   │   ├── __init__.py
│   │   │   ├── _anndata_accessor.py
│   │   │   ├── _backed_access.py
│   │   │   ├── _polars_lazy_df.py
│   │   │   ├── _pyarrow_dataset.py
│   │   │   ├── _spatialdata_accessor.py
│   │   │   ├── _tiledbsoma.py
│   │   │   ├── _valid_suffixes.py
│   │   │   ├── _zarr.py
│   │   │   ├── objects.py
│   │   │   ├── paths.py
│   │   │   └── types.py
│   │   └── subsettings/
│   │       ├── __init__.py
│   │       ├── _annotation_settings.py
│   │       └── _creation_settings.py
│   ├── curators/
│   │   ├── __init__.py
│   │   └── core.py
│   ├── errors.py
│   ├── examples/
│   │   ├── __init__.py
│   │   ├── cellxgene/
│   │   │   ├── __init__.py
│   │   │   └── _cellxgene.py
│   │   ├── croissant/
│   │   │   ├── __init__.py
│   │   │   └── mini_immuno.anndata.zarr_metadata.json
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── _core.py
│   │   │   ├── _fake.py
│   │   │   ├── _small.py
│   │   │   ├── define_mini_immuno_features_labels.py
│   │   │   ├── define_mini_immuno_schema_flexible.py
│   │   │   ├── mini_immuno.py
│   │   │   └── save_mini_immuno_datasets.py
│   │   ├── fixtures/
│   │   │   ├── __init__.py
│   │   │   └── sheets.py
│   │   ├── mlflow/
│   │   │   └── __init__.py
│   │   ├── schemas/
│   │   │   ├── __init__.py
│   │   │   ├── _anndata.py
│   │   │   ├── _simple.py
│   │   │   ├── define_schema_anndata_ensembl_gene_ids_and_valid_features_in_obs.py
│   │   │   └── define_valid_features.py
│   │   └── wandb/
│   │       └── __init__.py
│   ├── integrations/
│   │   ├── __init__.py
│   │   ├── _croissant.py
│   │   ├── _vitessce.py
│   │   └── lightning.py
│   ├── migrations/
│   │   ├── 0177_squashed.py
│   │   ├── 0178_v2_2.py
│   │   ├── 0179_v2_2_part_2.py
│   │   ├── 0180_v2_2_part_3.py
│   │   ├── 0181_v2_2_part_4.py
│   │   ├── 0182_v2_2_part_5.py
│   │   ├── 0183_v2_2_part_6.py
│   │   ├── 0184_alter_transformrecord_feature.py
│   │   ├── 0185_alter_runrecord_feature.py
│   │   ├── 0186_v2_4.py
│   │   ├── 0187_squashed.py
│   │   ├── 0187_v2_4_part_2.py
│   │   ├── README.md
│   │   └── __init__.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── _describe.py
│   │   ├── _django.py
│   │   ├── _feature_manager.py
│   │   ├── _from_values.py
│   │   ├── _is_versioned.py
│   │   ├── _label_manager.py
│   │   ├── _relations.py
│   │   ├── _run_cleanup.py
│   │   ├── artifact.py
│   │   ├── artifact_set.py
│   │   ├── block.py
│   │   ├── can_curate.py
│   │   ├── collection.py
│   │   ├── feature.py
│   │   ├── has_parents.py
│   │   ├── project.py
│   │   ├── query_manager.py
│   │   ├── query_set.py
│   │   ├── record.py
│   │   ├── run.py
│   │   ├── save.py
│   │   ├── schema.py
│   │   ├── sqlrecord.py
│   │   ├── storage.py
│   │   ├── transform.py
│   │   └── ulabel.py
│   ├── py.typed
│   └── setup/
│       ├── __init__.py
│       ├── _merge.py
│       ├── _switch.py
│       ├── core/
│       │   └── __init__.py
│       ├── errors/
│       │   └── __init__.py
│       └── types/
│           └── __init__.py
├── lamindb_full.py
├── noxfile.py
├── pyproject.full.toml
├── pyproject.toml
├── scripts/
│   └── migrate_test_instances.py
└── tests/
    ├── core/
    │   ├── _dataset_fixtures.py
    │   ├── conftest.py
    │   ├── notebooks/
    │   │   ├── basic-r-notebook.Rmd.cleaned.html
    │   │   ├── basic-r-notebook.Rmd.html
    │   │   ├── duplicate/
    │   │   │   └── with-title-initialized-consecutive-finish.ipynb
    │   │   ├── load_schema.ipynb
    │   │   ├── no-title.ipynb
    │   │   ├── with-title-initialized-consecutive-finish-not-last-cell.ipynb
    │   │   └── with-title-initialized-consecutive-finish.ipynb
    │   ├── scripts/
    │   │   ├── duplicate1/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate2/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate3/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate4/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── duplicate5/
    │   │   │   └── script-to-test-versioning.py
    │   │   ├── script-to-test-filename-change.py
    │   │   └── script-to-test-versioning.py
    │   ├── test_artifact_anndata_with_curation.py
    │   ├── test_artifact_basics.py
    │   ├── test_artifact_dataframe_with_curation.py
    │   ├── test_artifact_describe_to_dataframe.py
    │   ├── test_artifact_features_annotations.py
    │   ├── test_artifact_parquet.py
    │   ├── test_blocks.py
    │   ├── test_branches.py
    │   ├── test_can_curate.py
    │   ├── test_collection.py
    │   ├── test_curator_basics.py
    │   ├── test_data_migrations.py
    │   ├── test_db.py
    │   ├── test_delete.py
    │   ├── test_feature.py
    │   ├── test_feature_dtype.py
    │   ├── test_from_values.py
    │   ├── test_has_parents.py
    │   ├── test_has_type.py
    │   ├── test_integrity.py
    │   ├── test_is_versioned.py
    │   ├── test_label_manager.py
    │   ├── test_load.py
    │   ├── test_manager.py
    │   ├── test_merge.py
    │   ├── test_nbconvert.py
    │   ├── test_notebooks.py
    │   ├── test_querydb.py
    │   ├── test_queryset.py
    │   ├── test_record_basics.py
    │   ├── test_record_sheet_examples.py
    │   ├── test_rename_features_labels.py
    │   ├── test_run.py
    │   ├── test_save.py
    │   ├── test_schema.py
    │   ├── test_search.py
    │   ├── test_settings.py
    │   ├── test_sqlrecord.py
    │   ├── test_storage.py
    │   ├── test_switch.py
    │   ├── test_track_flow.py
    │   ├── test_track_script_or_notebook.py
    │   ├── test_track_step.py
    │   ├── test_transform.py
    │   ├── test_transform_from_git.py
    │   └── test_view.py
    ├── curators/
    │   ├── conftest.py
    │   ├── test_cellxgene_curation.py
    │   ├── test_curate_from_croissant.py
    │   ├── test_curators_examples.py
    │   ├── test_curators_remote.py
    │   └── test_dataframe_curation.py
    ├── integrations/
    │   ├── conftest.py
    │   └── test_lightning.py
    ├── no_instance/
    │   ├── conftest.py
    │   ├── test_connect_dynamic_import.py
    │   ├── test_import_side_effects.py
    │   └── test_no_default_instance.py
    ├── permissions/
    │   ├── conftest.py
    │   ├── jwt_utils.py
    │   ├── scripts/
    │   │   ├── check_lamin_dev.py
    │   │   ├── setup_access.py
    │   │   └── setup_instance.py
    │   └── test_rls_dbwritelog.py
    ├── profiling/
    │   ├── import_lamindb.py
    │   ├── import_lamindb_and_connect.py
    │   ├── import_lamindb_core_storage.py
    │   └── import_records_from_dataframe.py
    ├── storage/
    │   ├── conftest.py
    │   ├── test_artifact_storage.py
    │   ├── test_artifact_zarr.py
    │   ├── test_cache.py
    │   ├── test_connect_reconnect.py
    │   ├── test_storage_lifecycle.py
    │   ├── test_streaming.py
    │   └── test_transfer.py
    └── tiledbsoma/
        ├── conftest.py
        ├── test_artifact_basics.py
        ├── test_curators.py
        └── test_storage.py

Download .txt

SYMBOL INDEX (1902 symbols across 176 files)

FILE: docs/faq/test_notebooks.py
  function test_notebooks (line 8) | def test_notebooks():

FILE: docs/scripts/my_workflow.py
  function ingest_dataset (line 5) | def ingest_dataset(key: str) -> ln.Artifact:

FILE: docs/scripts/my_workflow_with_click.py
  function main (line 8) | def main(key: str):

FILE: docs/scripts/my_workflow_with_step.py
  function subset_dataframe (line 5) | def subset_dataframe(
  function ingest_dataset (line 17) | def ingest_dataset(key: str, subset: bool = False) -> ln.Artifact:

FILE: docs/scripts/run_script_with_step.py
  function subset_dataframe (line 6) | def subset_dataframe(

FILE: docs/storage/test_notebooks.py
  function test_notebooks (line 8) | def test_notebooks():

FILE: docs/test_notebooks.py
  function test_tutorial (line 13) | def test_tutorial():
  function test_guide (line 18) | def test_guide():
  function test_tiledbsoma (line 23) | def test_tiledbsoma():
  function test_biology (line 28) | def test_biology():

FILE: lamindb/_finish.py
  function get_save_notebook_message (line 22) | def get_save_notebook_message() -> str:
  function get_save_notebook_message_retry (line 28) | def get_save_notebook_message_retry() -> str:
  function check_consecutiveness (line 33) | def check_consecutiveness(
  function get_shortcut (line 84) | def get_shortcut() -> str:
  function get_seconds_since_modified (line 90) | def get_seconds_since_modified(filepath) -> float:
  function save_run_logs (line 94) | def save_run_logs(run: Run, save_run: bool = False) -> None:
  function prepare_notebook (line 113) | def prepare_notebook(
  function notebook_to_report (line 139) | def notebook_to_report(notebook_path: Path, output_path: Path) -> None:
  function notebook_to_script (line 162) | def notebook_to_script(  # type: ignore
  function clean_r_notebook_html (line 180) | def clean_r_notebook_html(file_path: Path) -> tuple[str | None, Path]:
  function check_filepath_recently_saved (line 210) | def check_filepath_recently_saved(filepath: Path, is_finish_retry: bool)...
  function save_context_core (line 239) | def save_context_core(

FILE: lamindb/_secret_redaction.py
  function normalize_sensitive_key_name (line 52) | def normalize_sensitive_key_name(key: str) -> str:
  function is_sensitive_param_key (line 58) | def is_sensitive_param_key(key: str) -> bool:
  function is_sensitive_param_value (line 62) | def is_sensitive_param_value(value: object) -> bool:
  function _redact_assignment_match (line 68) | def _redact_assignment_match(match: re.Match[str]) -> str:
  function redact_secrets_in_source_code (line 87) | def redact_secrets_in_source_code(source_code: str) -> tuple[str, int]:

FILE: lamindb/_view.py
  function display_df_with_descriptions (line 22) | def display_df_with_descriptions(
  function view (line 96) | def view(

FILE: lamindb/base/dtypes.py
  function is_list_of_type (line 13) | def is_list_of_type(value: Any, expected_type: Any) -> bool:
  function check_dtype (line 21) | def check_dtype(expected_type: Any, nullable: bool) -> Callable:
  function is_valid_datetime_str (line 90) | def is_valid_datetime_str(date_string: str) -> bool | str:
  function is_iterable_of_sqlrecord (line 98) | def is_iterable_of_sqlrecord(value: Any):

FILE: lamindb/base/fields.py
  class CharField (line 33) | class CharField(models.CharField):
    method __init__ (line 39) | def __init__(self, max_length: int = 255, **kwargs):
  class TextField (line 46) | class TextField(models.TextField):
    method __init__ (line 52) | def __init__(self, *args, **kwargs):
  class ForeignKey (line 58) | class ForeignKey(models.ForeignKey):
    method __init__ (line 64) | def __init__(self, *args, **kwargs):
  class BooleanField (line 77) | class BooleanField(models.BooleanField):
    method __init__ (line 83) | def __init__(self, *args, **kwargs):
  class DateField (line 89) | class DateField(models.DateField):
    method __init__ (line 95) | def __init__(self, *args, **kwargs):
  class DateTimeField (line 100) | class DateTimeField(models.DateTimeField):
    method __init__ (line 106) | def __init__(self, *args, **kwargs):
  class BigIntegerField (line 111) | class BigIntegerField(models.BigIntegerField):
    method __init__ (line 117) | def __init__(self, *args, **kwargs):
  class IntegerField (line 123) | class IntegerField(models.IntegerField):
    method __init__ (line 129) | def __init__(self, *args, **kwargs):
  class OneToOneField (line 134) | class OneToOneField(models.OneToOneField):
    method __init__ (line 140) | def __init__(self, *args, **kwargs):
  class FloatField (line 145) | class FloatField(models.FloatField):
    method __init__ (line 151) | def __init__(self, *args, **kwargs):
  class DecimalField (line 156) | class DecimalField(models.DecimalField):
    method __init__ (line 162) | def __init__(self, *args, **kwargs):
  class JSONField (line 167) | class JSONField(models.JSONField):
    method __init__ (line 173) | def __init__(self, *args, **kwargs):
  class DurationField (line 178) | class DurationField(models.DurationField):
    method __init__ (line 184) | def __init__(self, *args, **kwargs):
  class URLField (line 189) | class URLField(models.URLField):
    method __init__ (line 195) | def __init__(self, *args, **kwargs):
  class EmailField (line 200) | class EmailField(models.EmailField):
    method __init__ (line 206) | def __init__(self, *args, **kwargs):
  class TimeField (line 211) | class TimeField(models.TimeField):
    method __init__ (line 217) | def __init__(self, *args, **kwargs):
  class SlugField (line 222) | class SlugField(models.SlugField):
    method __init__ (line 228) | def __init__(self, *args, **kwargs):
  class UUIDField (line 233) | class UUIDField(models.UUIDField):
    method __init__ (line 239) | def __init__(self, *args, **kwargs):
  class PositiveIntegerField (line 244) | class PositiveIntegerField(models.PositiveIntegerField):
    method __init__ (line 250) | def __init__(self, *args, **kwargs):
  class PositiveSmallIntegerField (line 255) | class PositiveSmallIntegerField(models.PositiveSmallIntegerField):
    method __init__ (line 261) | def __init__(self, *args, **kwargs):
  class SmallIntegerField (line 266) | class SmallIntegerField(models.SmallIntegerField):
    method __init__ (line 272) | def __init__(self, *args, **kwargs):
  class BinaryField (line 277) | class BinaryField(models.BinaryField):
    method __init__ (line 283) | def __init__(self, *args, **kwargs):
  class GenericIPAddressField (line 288) | class GenericIPAddressField(models.GenericIPAddressField):
    method __init__ (line 294) | def __init__(self, *args, **kwargs):

FILE: lamindb/base/uids.py
  function base64 (line 65) | def base64(n_char: int) -> str:
  function base62 (line 72) | def base62(n_char: int) -> str:
  function base26 (line 79) | def base26(n_char: int):
  function base62_4 (line 86) | def base62_4() -> str:
  function base62_8 (line 90) | def base62_8() -> str:
  function base62_12 (line 95) | def base62_12() -> str:
  function base62_16 (line 100) | def base62_16() -> str:
  function base62_20 (line 105) | def base62_20() -> str:
  function base62_24 (line 110) | def base62_24() -> str:

FILE: lamindb/base/users.py
  function _user_has_write_access (line 4) | def _user_has_write_access() -> bool:
  function current_user_id (line 17) | def current_user_id() -> int:

FILE: lamindb/base/utils.py
  class class_and_instance_method (line 16) | class class_and_instance_method:
    method __init__ (line 19) | def __init__(self, func):
    method __get__ (line 23) | def __get__(self, instance, owner):
  class strict_classmethod (line 32) | class strict_classmethod:
    method __init__ (line 35) | def __init__(self, func):
    method __get__ (line 39) | def __get__(self, instance, owner):

FILE: lamindb/core/__init__.py
  function __getattr__ (line 47) | def __getattr__(name: str):

FILE: lamindb/core/_compat.py
  function is_package_installed (line 7) | def is_package_installed(package_name: str) -> bool:
  function with_package (line 12) | def with_package(package_name: str, operation: Callable[[Any], T]) -> T:
  function with_package_obj (line 33) | def with_package_obj(

FILE: lamindb/core/_context.py
  function get_key_from_module (line 49) | def get_key_from_module(caller_module: str) -> str:
  function detect_and_process_source_code_file (line 57) | def detect_and_process_source_code_file(
  function get_uid_ext (line 123) | def get_uid_ext(version: str) -> str:
  function get_notebook_path (line 132) | def get_notebook_path() -> tuple[Path, str]:
  function get_notebook_key_colab (line 150) | def get_notebook_key_colab() -> str:
  function get_cli_call (line 167) | def get_cli_call() -> tuple[str, str] | None:
  function pretty_pypackages (line 178) | def pretty_pypackages(dependencies: dict) -> str:
  function last_non_empty_r_block (line 189) | def last_non_empty_r_block(line: str) -> str:
  class LogStreamHandler (line 196) | class LogStreamHandler:
    method __init__ (line 197) | def __init__(self, log_stream: TextIO, file: TextIO, use_buffer: bool):
    method write (line 204) | def write(self, data: str) -> int:
    method flush (line 227) | def flush(self):
    method isatty (line 235) | def isatty(self) -> bool:
    method flush_buffer (line 240) | def flush_buffer(self):
  class LogStreamTracker (line 247) | class LogStreamTracker:
    method __init__ (line 248) | def __init__(self):
    method start (line 268) | def start(self, run: Run):
    method finish (line 299) | def finish(self):
    method cleanup (line 310) | def cleanup(self, signo=None, frame=None):
    method handle_exception (line 347) | def handle_exception(self, exc_type, exc_value, exc_traceback):
  function serialize_params_to_json (line 364) | def serialize_params_to_json(params: dict) -> dict:
  class Context (line 402) | class Context:
    method __init__ (line 408) | def __init__(self, uid: str | None = None, path: Path | None = None):
    method transform (line 426) | def transform(self) -> Transform | None:
    method description (line 431) | def description(self) -> str | None:
    method description (line 436) | def description(self, value: str | None):
    method uid (line 440) | def uid(self) -> str | None:
    method uid (line 445) | def uid(self, value: str | None):
    method version (line 449) | def version(self) -> str | None:
    method version (line 454) | def version(self, value: str | None):
    method project (line 458) | def project(self) -> Project | None:
    method space (line 463) | def space(self) -> Space | None:
    method branch (line 468) | def branch(self) -> Branch | None:
    method run (line 473) | def run(self) -> Run | None:
    method _track (line 477) | def _track(
    method _track_notebook (line 842) | def _track_notebook(
    method _process_aux_transform (line 886) | def _process_aux_transform(
    method _create_or_load_transform (line 933) | def _create_or_load_transform(
    method _finish (line 1190) | def _finish(self, ignore_non_consecutive: None | bool = None) -> None:

FILE: lamindb/core/_functions.py
  function get_current_tracked_run (line 23) | def get_current_tracked_run() -> Run | None:
  function _create_tracked_decorator (line 31) | def _create_tracked_decorator(
  function flow (line 132) | def flow(
  function step (line 186) | def step(uid: str | None = None) -> Callable[[Callable[P, R]], Callable[...
  function tracked (line 202) | def tracked(uid: str | None = None) -> Callable[[Callable[P, R]], Callab...

FILE: lamindb/core/_mapped_collection.py
  class _Connect (line 27) | class _Connect:
    method __init__ (line 28) | def __init__(self, storage):
    method __enter__ (line 37) | def __enter__(self):
    method __exit__ (line 40) | def __exit__(self, exc_type, exc_val, exc_tb):
    method close (line 43) | def close(self):
  class MappedCollection (line 55) | class MappedCollection:
    method __init__ (line 107) | def __init__(
    method _make_connections (line 259) | def _make_connections(self, path_list: list, parallel: bool):
    method _cache_keys (line 273) | def _cache_keys(self):
    method _cache_categories (line 284) | def _cache_categories(self, obs_keys: list):
    method _make_encoders (line 300) | def _make_encoders(self, encode_labels: list):
    method _read_vars (line 314) | def _read_vars(self):
    method _make_join_vars (line 323) | def _make_join_vars(self):
    method check_vars_sorted (line 348) | def check_vars_sorted(self, ascending: bool = True) -> bool:
    method check_vars_non_aligned (line 358) | def check_vars_non_aligned(self, vars: pd.Index | list) -> list[int]:
    method _check_csc_raise_error (line 369) | def _check_csc_raise_error(
    method __len__ (line 381) | def __len__(self):
    method shape (line 385) | def shape(self) -> tuple[int, int]:
    method original_shapes (line 390) | def original_shapes(self) -> list[tuple[int, int]]:
    method __getitem__ (line 398) | def __getitem__(self, idx: int):
    method _get_lazy_data (line 438) | def _get_lazy_data(self, store: StorageType, layers_key: str, storage_...
    method _get_data_idx (line 449) | def _get_data_idx(
    method _get_obs_idx (line 490) | def _get_obs_idx(
    method get_label_weights (line 520) | def get_label_weights(
    method get_merged_labels (line 564) | def get_merged_labels(self, label_key: str):
    method get_merged_categories (line 577) | def get_merged_categories(self, label_key: str):
    method _get_categories (line 597) | def _get_categories(self, storage: StorageType, label_key: str):
    method _get_codes (line 628) | def _get_codes(self, storage: StorageType, label_key: str):
    method _get_labels (line 640) | def _get_labels(
    method close (line 661) | def close(self):
    method closed (line 675) | def closed(self) -> bool:
    method __enter__ (line 682) | def __enter__(self):
    method __exit__ (line 685) | def __exit__(self, exc_type, exc_val, exc_tb):
    method torch_worker_init_fn (line 689) | def torch_worker_init_fn(cls, worker_id):

FILE: lamindb/core/_settings.py
  function raise_if_storage_managed_by_other_instance (line 41) | def raise_if_storage_managed_by_other_instance(storage) -> None:
  class Settings (line 49) | class Settings:
    method __init__ (line 55) | def __init__(self):
    method __repr__ (line 59) | def __repr__(self) -> str:  # pragma: no cover
    method creation (line 93) | def creation(self) -> CreationSettings:
    method annotation (line 102) | def annotation(self) -> AnnotationSettings:
    method _using_key (line 126) | def _using_key(self) -> str | None:
    method _using_key (line 131) | def _using_key(self, value: str | None):
    method _storage_settings (line 136) | def _storage_settings(self) -> ln_setup.core.StorageSettings:
    method sync_git_repo (line 144) | def sync_git_repo(self) -> str | None:
    method sync_git_repo (line 171) | def sync_git_repo(self, value) -> None:
    method storage (line 177) | def storage(self) -> StorageSettings:
    method storage (line 207) | def storage(self, path_kwargs: AnyPathStr | tuple[AnyPathStr, Mapping]):
    method instance_uid (line 239) | def instance_uid(self) -> str:
    method cache_dir (line 244) | def cache_dir(self) -> UPath:
    method local_storage (line 249) | def local_storage(self) -> StorageSettings:
    method local_storage (line 259) | def local_storage(self, local_root: Path | str):
    method verbosity (line 277) | def verbosity(self) -> str:
    method verbosity (line 290) | def verbosity(self, verbosity: str | int):

FILE: lamindb/core/_sync_git.py
  function get_git_repo_from_remote (line 14) | def get_git_repo_from_remote(url: str | None = None, depth: int | None =...
  function check_local_git_repo (line 41) | def check_local_git_repo() -> bool:
  function get_git_commit_hash (line 62) | def get_git_commit_hash(blob_hash: str, repo_dir: Path | None = None) ->...
  function get_filepath_within_git_repo (line 128) | def get_filepath_within_git_repo(
  function get_transform_reference_from_git_repo (line 175) | def get_transform_reference_from_git_repo(path: Path) -> str:
  function get_and_validate_git_metadata (line 195) | def get_and_validate_git_metadata(

FILE: lamindb/core/_track_environment.py
  function track_python_environment (line 14) | def track_python_environment(run: Run) -> None:

FILE: lamindb/core/loaders.py
  function load_fcs (line 43) | def load_fcs(*args, **kwargs) -> AnnData:
  function load_csv (line 56) | def load_csv(path: Path | str, **kwargs) -> DataFrame:
  function load_parquet (line 64) | def load_parquet(path: Path | str, **kwargs) -> DataFrame:
  function load_tsv (line 72) | def load_tsv(path: Path | str, **kwargs) -> DataFrame:
  function load_h5ad (line 80) | def load_h5ad(filepath: AnyPathStr, **kwargs) -> AnnData:
  function load_h5mu (line 91) | def load_h5mu(filepath: Path | str, **kwargs) -> MuData:
  function load_zarr (line 99) | def load_zarr(storepath, **kwargs):  # type: ignore
  function load_html (line 107) | def load_html(path: Path | str) -> None | Path | str:
  function load_json (line 129) | def load_json(path: Path | str) -> dict[str, Any] | list[Any]:
  function load_yaml (line 139) | def load_yaml(path: Path | str) -> dict[str, Any] | list[Any]:
  function load_image (line 149) | def load_image(path: Path | str) -> None | Path | str:
  function load_svg (line 161) | def load_svg(path: Path | str) -> None | Path | str:
  function load_txt (line 173) | def load_txt(path: Path | str) -> str:
  function load_rds (line 179) | def load_rds(path: Path | str) -> Path | str:
  function load_to_memory (line 217) | def load_to_memory(

FILE: lamindb/core/storage/__init__.py
  function __getattr__ (line 55) | def __getattr__(name: str) -> Any:

FILE: lamindb/core/storage/_anndata_accessor.py
  class CSRDataset (line 55) | class CSRDataset(SparseDataset):
    method format_str (line 57) | def format_str(self) -> str:
  function sparse_dataset (line 60) | def sparse_dataset(group):
  function _check_group_format (line 75) | def _check_group_format(*args):
  function _subset_sparse (line 82) | def _subset_sparse(sparse_ds: CSRDataset | SparseDataset, indices):
  function get_module_name (line 92) | def get_module_name(obj):
  function _records_to_df (line 96) | def _records_to_df(obj):
  class AccessRegistry (line 117) | class AccessRegistry:
    method __init__ (line 118) | def __init__(self):
    method register_open (line 122) | def register_open(self, module: str):
    method open (line 129) | def open(self, module: str, *args, **kwargs):
    method register (line 135) | def register(self, module: str):
    method __getattr__ (line 145) | def __getattr__(self, func_name: str):
  function open (line 162) | def open(filepath: AnyPathStr, mode: str = "r", compression: str | None ...
  function read_dataframe (line 189) | def read_dataframe(elem: h5py.Dataset | h5py.Group):
  function safer_read_partial (line 197) | def safer_read_partial(elem, indices):
  function keys (line 266) | def keys(storage: h5py.File):
  function open (line 304) | def open(filepath: AnyPathStr, mode: Literal["r", "r+", "a", "w", "w-"] ...
  function read_dataframe (line 324) | def read_dataframe(elem: Union[zarr.Array, zarr.Group]):  # noqa
  function safer_read_partial (line 331) | def safer_read_partial(elem, indices):
  function keys (line 370) | def keys(storage: zarr.Group):
  function _to_memory (line 432) | def _to_memory(elem):
  function _try_backed_full (line 441) | def _try_backed_full(elem):
  function _to_index (line 458) | def _to_index(elem: np.ndarray):
  function _safer_read_index (line 464) | def _safer_read_index(elem):
  class _MapAccessor (line 483) | class _MapAccessor:
    method __init__ (line 484) | def __init__(self, elem, name, indices=None):
    method __getitem__ (line 489) | def __getitem__(self, key):
    method keys (line 495) | def keys(self):
    method __repr__ (line 498) | def __repr__(self):
  function _safer_read_df (line 505) | def _safer_read_df(elem, indices=None):
  class _AnnDataAttrsMixin (line 516) | class _AnnDataAttrsMixin:
    method obs (line 521) | def obs(self) -> pd.DataFrame | None:
    method var (line 531) | def var(self) -> pd.DataFrame | None:
    method uns (line 541) | def uns(self):
    method X (line 547) | def X(self):
    method obsm (line 555) | def obsm(self):
    method varm (line 564) | def varm(self):
    method obsp (line 573) | def obsp(self):
    method varp (line 582) | def varp(self):
    method layers (line 591) | def layers(self):
    method obs_names (line 598) | def obs_names(self):
    method var_names (line 602) | def var_names(self):
    method shape (line 606) | def shape(self):
    method to_dict (line 609) | def to_dict(self):
    method to_memory (line 633) | def to_memory(self):
  class AnnDataAccessorSubset (line 638) | class AnnDataAccessorSubset(_AnnDataAttrsMixin):
    method __init__ (line 639) | def __init__(self, storage, indices, attrs_keys, obs_names, var_names,...
    method __getitem__ (line 648) | def __getitem__(self, index: Index):
    method __repr__ (line 664) | def __repr__(self):
    method raw (line 673) | def raw(self):
  class AnnDataRawAccessor (line 691) | class AnnDataRawAccessor(AnnDataAccessorSubset):
    method __init__ (line 692) | def __init__(
    method raw (line 723) | def raw(self):
  class AnnDataAccessor (line 727) | class AnnDataAccessor(_AnnDataAttrsMixin):
    method __init__ (line 730) | def __init__(
    method close (line 754) | def close(self):
    method closed (line 787) | def closed(self):
    method __enter__ (line 790) | def __enter__(self):
    method __exit__ (line 795) | def __exit__(self, exc_type, exc_val, exc_tb):
    method __getitem__ (line 798) | def __getitem__(self, index: Index) -> AnnDataAccessorSubset:
    method __repr__ (line 811) | def __repr__(self):
    method raw (line 821) | def raw(self):
    method add_column (line 828) | def add_column(
  function _anndata_n_observations (line 859) | def _anndata_n_observations(object: AnyPathStr | AnnData) -> int | None:

FILE: lamindb/core/storage/_backed_access.py
  function _track_writes_factory (line 29) | def _track_writes_factory(obj: Any, finalize: Callable):
  class BackedAccessor (line 63) | class BackedAccessor:
  function backed_access (line 72) | def backed_access(
  function _non_gz_suffix (line 149) | def _non_gz_suffix(suffixes: list[str]) -> str:
  function _flat_suffixes (line 161) | def _flat_suffixes(paths: UPath | list[UPath]) -> set[str]:
  function _open_dataframe (line 177) | def _open_dataframe(

FILE: lamindb/core/storage/_polars_lazy_df.py
  function _polars_options (line 17) | def _polars_options(storepath: UPath) -> dict:
  function _open_polars_lazy_df (line 73) | def _open_polars_lazy_df(

FILE: lamindb/core/storage/_pyarrow_dataset.py
  function _open_pyarrow_dataset (line 16) | def _open_pyarrow_dataset(paths: UPath | list[UPath], **kwargs) -> PyArr...

FILE: lamindb/core/storage/_spatialdata_accessor.py
  class _TablesAccessor (line 14) | class _TablesAccessor:
    method __init__ (line 15) | def __init__(self, tables: Group, artifact: Artifact | None = None):
    method __getitem__ (line 20) | def __getitem__(self, key: str) -> AnnDataAccessor:
    method keys (line 28) | def keys(self) -> list[str]:
    method __repr__ (line 31) | def __repr__(self) -> str:
  class SpatialDataAccessor (line 39) | class SpatialDataAccessor:
    method __init__ (line 45) | def __init__(self, storage: Group, name: str, artifact: Artifact | Non...
    method tables (line 52) | def tables(self) -> _TablesAccessor:
    method __repr__ (line 56) | def __repr__(self):

FILE: lamindb/core/storage/_tiledbsoma.py
  function _load_h5ad_zarr (line 31) | def _load_h5ad_zarr(objpath: UPath):
  class SOMAS3ContextFactory (line 46) | class SOMAS3ContextFactory:
    method __init__ (line 57) | def __init__(self, storepath: UPath):
    method _extract_refreshable_credentials (line 106) | def _extract_refreshable_credentials(self) -> dict:
    method get_context (line 123) | def get_context(self) -> SOMATileDBContext:
  function _open_tiledbsoma (line 132) | def _open_tiledbsoma(
  function save_tiledbsoma_experiment (line 166) | def save_tiledbsoma_experiment(
  function _soma_store_n_observations (line 352) | def _soma_store_n_observations(obj) -> int:
  function _soma_n_observations (line 372) | def _soma_n_observations(objectpath: UPath) -> int:

FILE: lamindb/core/storage/_valid_suffixes.py
  class VALID_SUFFIXES (line 15) | class VALID_SUFFIXES:

FILE: lamindb/core/storage/_zarr.py
  function get_zarr_store (line 32) | def get_zarr_store(
  function _identify_zarr_type_from_storage (line 50) | def _identify_zarr_type_from_storage(
  function identify_zarr_type (line 66) | def identify_zarr_type(
  function load_zarr (line 89) | def load_zarr(

FILE: lamindb/core/storage/objects.py
  function infer_suffix (line 20) | def infer_suffix(
  function _infer_anndata_suffix (line 62) | def _infer_anndata_suffix(format: str | dict[str, Any] | None) -> str:
  function _infer_dataframe_suffix (line 76) | def _infer_dataframe_suffix(format: str | dict[str, Any] | None) -> str:
  function _infer_spatialdata_suffix (line 86) | def _infer_spatialdata_suffix(format: str | dict[str, Any] | None) -> str:
  function write_to_disk (line 102) | def write_to_disk(dmem: SupportedDataTypes, filepath: Path | str, **kwar...
  function _write_anndata (line 134) | def _write_anndata(dmem: Any, filepath: Path | str, **kwargs) -> None:
  function _write_dataframe (line 146) | def _write_dataframe(dmem: Any, filepath: Path | str, **kwargs) -> None:

FILE: lamindb/core/storage/paths.py
  function auto_storage_key_from_artifact (line 25) | def auto_storage_key_from_artifact(artifact: Artifact):
  function auto_storage_key_from_artifact_uid (line 36) | def auto_storage_key_from_artifact_uid(
  function check_path_is_child_of_root (line 48) | def check_path_is_child_of_root(path: AnyPathStr, root: AnyPathStr) -> b...
  function attempt_accessing_path (line 69) | def attempt_accessing_path(
  function filepath_from_artifact (line 100) | def filepath_from_artifact(
  function _cache_key_from_artifact_storage (line 114) | def _cache_key_from_artifact_storage(
  function filepath_cache_key_from_artifact (line 134) | def filepath_cache_key_from_artifact(
  function store_file_or_folder (line 144) | def store_file_or_folder(
  function delete_storage_using_key (line 183) | def delete_storage_using_key(
  function delete_storage (line 195) | def delete_storage(

FILE: lamindb/core/subsettings/_annotation_settings.py
  class AnnotationSettings (line 1) | class AnnotationSettings:

FILE: lamindb/core/subsettings/_creation_settings.py
  class CreationSettings (line 1) | class CreationSettings:

FILE: lamindb/curators/__init__.py
  function __getattr__ (line 44) | def __getattr__(name: str):

FILE: lamindb/curators/core.py
  function strip_ansi_codes (line 66) | def strip_ansi_codes(text):
  class CatLookup (line 72) | class CatLookup:
    method __init__ (line 87) | def __init__(
    method __getattr__ (line 104) | def __getattr__(self, name):
    method __getitem__ (line 115) | def __getitem__(self, name):
    method __repr__ (line 126) | def __repr__(self) -> str:
  class Curator (line 175) | class Curator:
    method __init__ (line 188) | def __init__(
    method validate (line 240) | def validate(self) -> bool | str:
    method save_artifact (line 245) | def save_artifact(
    method __repr__ (line 258) | def __repr__(self) -> str:
  class SlotsCurator (line 317) | class SlotsCurator(Curator):
    method __init__ (line 327) | def __init__(
    method slots (line 351) | def slots(self) -> dict[str, ComponentCurator]:
    method validate (line 356) | def validate(self) -> None:
    method save_artifact (line 382) | def save_artifact(
  function convert_dict_to_dataframe_for_validation (line 441) | def convert_dict_to_dataframe_for_validation(d: dict, schema: Schema) ->...
  class ComponentCurator (line 462) | class ComponentCurator(Curator):
    method __init__ (line 475) | def __init__(
    method cat (line 643) | def cat(self) -> DataFrameCatManager:
    method standardize (line 647) | def standardize(self) -> None:
    method _cat_manager_validate (line 696) | def _cat_manager_validate(self) -> None:
    method validate (line 706) | def validate(self) -> None:
  class DataFrameCurator (line 725) | class DataFrameCurator(SlotsCurator):
    method __init__ (line 760) | def __init__(
    method cat (line 810) | def cat(self) -> DataFrameCatManager:
    method standardize (line 814) | def standardize(self) -> None:
    method validate (line 825) | def validate(self) -> None:
    method save_artifact (line 832) | def save_artifact(
  class ExperimentalDictCurator (line 847) | class ExperimentalDictCurator(DataFrameCurator):
    method __init__ (line 850) | def __init__(
  function _resolve_schema_slot_path (line 870) | def _resolve_schema_slot_path(
  function _handle_dict_slots (line 907) | def _handle_dict_slots(
  class AnnDataCurator (line 968) | class AnnDataCurator(SlotsCurator):
    method __init__ (line 995) | def __init__(
  function _assign_var_fields_categoricals_multimodal (line 1036) | def _assign_var_fields_categoricals_multimodal(
  class MuDataCurator (line 1068) | class MuDataCurator(SlotsCurator):
    method __init__ (line 1086) | def __init__(
  class SpatialDataCurator (line 1151) | class SpatialDataCurator(SlotsCurator):
    method __init__ (line 1169) | def __init__(
  class TiledbsomaExperimentCurator (line 1245) | class TiledbsomaExperimentCurator(SlotsCurator):
    method __init__ (line 1264) | def __init__(
  class CatVector (line 1316) | class CatVector:
    method __init__ (line 1319) | def __init__(
    method values (line 1387) | def values(self):
    method values (line 1394) | def values(self, new_values):
    method is_validated (line 1403) | def is_validated(self) -> bool:
    method _replace_synonyms (line 1420) | def _replace_synonyms(self) -> list[str]:
    method __repr__ (line 1455) | def __repr__(self) -> str:
    method _add_validated (line 1469) | def _add_validated(self) -> tuple[list, list]:
    method _add_new (line 1631) | def _add_new(
    method _validate (line 1670) | def _validate(
    method validate (line 1759) | def validate(self) -> None:
    method standardize (line 1765) | def standardize(self) -> None:
    method add_new (line 1782) | def add_new(self, **create_kwargs) -> None:
  class DataFrameCatManager (line 1799) | class DataFrameCatManager:
    method __init__ (line 1810) | def __init__(
    method non_validated (line 1898) | def non_validated(self) -> dict[str, list[str]]:
    method categoricals (line 1909) | def categoricals(self) -> list[Feature]:
    method __repr__ (line 1913) | def __repr__(self) -> str:
    method lookup (line 1937) | def lookup(self, public: bool = False) -> CatLookup:
    method validate (line 1950) | def validate(self) -> bool:
    method standardize (line 1971) | def standardize(self, key: str) -> None:
    method add_new_from (line 1993) | def add_new_from(self, key: str, **kwargs):
  function get_current_filter_kwargs (line 2012) | def get_current_filter_kwargs(
  function annotate_artifact (line 2029) | def annotate_artifact(
  function _flatten_unique (line 2159) | def _flatten_unique(series: pd.Series[list[Any] | Any]) -> list[Any]:

FILE: lamindb/errors.py
  class ValidationError (line 51) | class ValidationError(Exception):
  class InvalidArgument (line 57) | class InvalidArgument(Exception):
  class TrackNotCalled (line 63) | class TrackNotCalled(Exception):
  class NotebookNotSaved (line 69) | class NotebookNotSaved(Exception):
  class UnknownStorageLocation (line 75) | class UnknownStorageLocation(Exception):
  class NoStorageLocationForSpace (line 81) | class NoStorageLocationForSpace(Exception):
  class InconsistentKey (line 87) | class InconsistentKey(Exception):
  class FieldValidationError (line 93) | class FieldValidationError(Exception):
  class IntegrityError (line 104) | class IntegrityError(Exception):
  class MissingContextUID (line 114) | class MissingContextUID(Exception):
  class UpdateContext (line 120) | class UpdateContext(Exception):
  class BlobHashNotFound (line 126) | class BlobHashNotFound(Exception):
  class NoWriteAccess (line 137) | class NoWriteAccess(Exception):
  class FileNotInDevDir (line 143) | class FileNotInDevDir(Exception):
  class BranchAlreadyExists (line 149) | class BranchAlreadyExists(Exception):

FILE: lamindb/examples/cellxgene/_cellxgene.py
  function save_cellxgene_defaults (line 21) | def save_cellxgene_defaults() -> None:
  function create_cellxgene_schema (line 99) | def create_cellxgene_schema(

FILE: lamindb/examples/croissant/__init__.py
  function mini_immuno (line 11) | def mini_immuno(

FILE: lamindb/examples/datasets/__init__.py
  function __getattr__ (line 101) | def __getattr__(name: str):

FILE: lamindb/examples/datasets/_core.py
  function file_fcs (line 19) | def file_fcs() -> Path:
  function file_fcs_alpert19 (line 28) | def file_fcs_alpert19(populate_registries: bool = False) -> Path:
  function file_jpg_paradisi05 (line 59) | def file_jpg_paradisi05() -> Path:
  function file_tsv_rnaseq_nfcore_salmon_merged_gene_counts (line 71) | def file_tsv_rnaseq_nfcore_salmon_merged_gene_counts(
  function file_fastq (line 97) | def file_fastq(in_storage_root=False) -> Path:
  function file_bam (line 106) | def file_bam(in_storage_root=False) -> Path:
  function file_mini_csv (line 115) | def file_mini_csv(in_storage_root=False) -> Path:
  function file_tiff_suo22 (line 124) | def file_tiff_suo22() -> Path:
  function dir_iris_images (line 138) | def dir_iris_images() -> UPath:
  function anndata_mouse_sc_lymph_node (line 155) | def anndata_mouse_sc_lymph_node(
  function anndata_pbmc68k_reduced (line 234) | def anndata_pbmc68k_reduced() -> ad.AnnData:
  function anndata_file_pbmc68k_test (line 262) | def anndata_file_pbmc68k_test() -> Path:
  function anndata_pbmc3k_processed (line 284) | def anndata_pbmc3k_processed() -> ad.AnnData:
  function anndata_human_immune_cells (line 294) | def anndata_human_immune_cells(
  function anndata_suo22_Visium10X (line 342) | def anndata_suo22_Visium10X():
  function mudata_papalexi21_subset (line 355) | def mudata_papalexi21_subset(with_uns: bool = False) -> MuData:
  function dict_cellxgene_uns (line 427) | def dict_cellxgene_uns() -> dict[str, Any]:
  function df_iris (line 459) | def df_iris() -> pd.DataFrame:
  function df_iris_in_meter (line 470) | def df_iris_in_meter() -> pd.DataFrame:
  function df_iris_in_meter_study1 (line 491) | def df_iris_in_meter_study1() -> pd.DataFrame:
  function df_iris_in_meter_study2 (line 497) | def df_iris_in_meter_study2() -> pd.DataFrame:
  function dir_scrnaseq_cellranger (line 503) | def dir_scrnaseq_cellranger(
  function schmidt22_crispra_gws_IFNG (line 555) | def schmidt22_crispra_gws_IFNG(basedir=".") -> Path:
  function schmidt22_perturbseq (line 567) | def schmidt22_perturbseq(basedir=".") -> Path:
  function anndata_visium_mouse_cellxgene (line 593) | def anndata_visium_mouse_cellxgene() -> ad.AnnData:
  function spatialdata_blobs (line 605) | def spatialdata_blobs() -> SpatialData:

FILE: lamindb/examples/datasets/_fake.py
  function fake_bio_notebook_titles (line 4) | def fake_bio_notebook_titles(n=100) -> list[str]:

FILE: lamindb/examples/datasets/_small.py
  function small_dataset3_cellxgene (line 10) | def small_dataset3_cellxgene(
  function anndata_with_obs (line 107) | def anndata_with_obs() -> ad.AnnData:

FILE: lamindb/examples/datasets/mini_immuno.py
  function define_features_labels (line 34) | def define_features_labels() -> None:
  function define_mini_immuno_schema_flexible (line 43) | def define_mini_immuno_schema_flexible() -> Schema:
  function save_mini_immuno_datasets (line 57) | def save_mini_immuno_datasets():
  function get_dataset1 (line 66) | def get_dataset1(
  function get_dataset2 (line 148) | def get_dataset2(

FILE: lamindb/examples/fixtures/sheets.py
  function populate_sheets_compound_treatment (line 9) | def populate_sheets_compound_treatment():
  function populate_nextflow_sheet_with_samples (line 160) | def populate_nextflow_sheet_with_samples():

FILE: lamindb/examples/mlflow/__init__.py
  function save_mlflow_features (line 9) | def save_mlflow_features():

FILE: lamindb/examples/schemas/_anndata.py
  function anndata_ensembl_gene_ids_and_valid_features_in_obs (line 10) | def anndata_ensembl_gene_ids_and_valid_features_in_obs() -> Schema:

FILE: lamindb/examples/schemas/_simple.py
  function valid_features (line 10) | def valid_features() -> Schema:

FILE: lamindb/examples/wandb/__init__.py
  function save_wandb_features (line 9) | def save_wandb_features():

FILE: lamindb/integrations/__init__.py
  function __getattr__ (line 31) | def __getattr__(name: str):

FILE: lamindb/integrations/_croissant.py
  function curate_from_croissant (line 17) | def curate_from_croissant(

FILE: lamindb/integrations/_vitessce.py
  function save_vitessce_config (line 21) | def save_vitessce_config(

FILE: lamindb/integrations/lightning.py
  class ArtifactEvent (line 77) | class ArtifactEvent:
  class ArtifactSavedEvent (line 91) | class ArtifactSavedEvent(ArtifactEvent):
  class ArtifactRemovedEvent (line 104) | class ArtifactRemovedEvent(ArtifactEvent):
  class ArtifactObserver (line 115) | class ArtifactObserver(Protocol):
    method on_artifact_saved (line 122) | def on_artifact_saved(self, event: ArtifactSavedEvent) -> None: ...
    method on_artifact_removed (line 124) | def on_artifact_removed(self, event: ArtifactRemovedEvent) -> None: ...
  class ArtifactPublisher (line 127) | class ArtifactPublisher(Protocol):
    method create_artifact (line 135) | def create_artifact(
    method storage_uri (line 146) | def storage_uri(self, artifact: Any) -> str: ...
  class LaminArtifactPublisher (line 149) | class LaminArtifactPublisher:
    method create_artifact (line 157) | def create_artifact(
    method storage_uri (line 180) | def storage_uri(self, artifact: ln.Artifact) -> str:
  function save_lightning_features (line 184) | def save_lightning_features() -> None:
  class FeatureAnnotator (line 256) | class FeatureAnnotator:
    method __init__ (line 272) | def __init__(
    method setup (line 287) | def setup(self, trainer: pl.Trainer, pl_module: pl.LightningModule) ->...
    method _attach_user_run_features (line 298) | def _attach_user_run_features(self) -> None:
    method _validate_user_features (line 303) | def _validate_user_features(self) -> None:
    method _discover_auto_features (line 319) | def _discover_auto_features(self) -> None:
    method _discover_hparam_features (line 334) | def _discover_hparam_features(
    method _collect_hparam_names (line 346) | def _collect_hparam_names(*sources: Any) -> set[str]:
    method get (line 354) | def get(self, name: str) -> ln.Feature | None:
    method _set (line 358) | def _set(self, target: dict[str | ln.Feature, Any], name: str, value: ...
    method save_run_features (line 363) | def save_run_features(
    method _collect_run_features (line 381) | def _collect_run_features(
    method _add_trainer_config_features (line 409) | def _add_trainer_config_features(
    method _add_hparam_features (line 425) | def _add_hparam_features(
    method collect_checkpoint_features (line 439) | def collect_checkpoint_features(
    method clear_best_model_flags (line 484) | def clear_best_model_flags(self, checkpoint_key_prefix: str) -> None:
    method clear_last_model_flags (line 488) | def clear_last_model_flags(self, checkpoint_key_prefix: str) -> None:
    method _clear_flagged_model_feature (line 492) | def _clear_flagged_model_feature(
    method update_model_ranks (line 519) | def update_model_ranks(self, checkpoint_key_prefix: str, mode: str) ->...
    method _get_artifact_feature_rows (line 543) | def _get_artifact_feature_rows(
  class ArtifactPublishingModelCheckpoint (line 580) | class ArtifactPublishingModelCheckpoint(ModelCheckpoint):
    method __init__ (line 599) | def __init__(
    method last_checkpoint_artifact (line 619) | def last_checkpoint_artifact(self) -> Any | None:
    method last_config_artifact (line 624) | def last_config_artifact(self) -> Any | None:
    method last_hparams_artifact (line 629) | def last_hparams_artifact(self) -> Any | None:
    method last_artifact_event (line 634) | def last_artifact_event(self) -> ArtifactSavedEvent | ArtifactRemovedE...
    method get_last_artifact (line 638) | def get_last_artifact(self, kind: ArtifactKind) -> Any | None:
    method add_artifact_observer (line 642) | def add_artifact_observer(self, observer: ArtifactObserver) -> None:
    method remove_artifact_observer (line 646) | def remove_artifact_observer(self, observer: ArtifactObserver) -> None:
    method resolve_artifact_storage_uri (line 650) | def resolve_artifact_storage_uri(self, artifact: Any) -> str:
    method resolve_artifact_key (line 654) | def resolve_artifact_key(
    method _notify_artifact_saved (line 663) | def _notify_artifact_saved(
    method _notify_artifact_removed (line 686) | def _notify_artifact_removed(
    method _notify_artifact_observers (line 711) | def _notify_artifact_observers(
    method on_artifact_saved (line 721) | def on_artifact_saved(self, event: ArtifactSavedEvent) -> None:
    method on_artifact_removed (line 725) | def on_artifact_removed(self, event: ArtifactRemovedEvent) -> None:
    method save_checkpoint_artifact (line 729) | def save_checkpoint_artifact(
    method save_config_artifact (line 740) | def save_config_artifact(self, trainer: pl.Trainer, config_path: Path ...
    method save_hparams_artifact (line 745) | def save_hparams_artifact(
  class Checkpoint (line 753) | class Checkpoint(ArtifactPublishingModelCheckpoint):
    method __init__ (line 883) | def __init__(
    method setup (line 926) | def setup(
    method _base_prefix (line 946) | def _base_prefix(self, trainer: pl.Trainer) -> str:
    method _active_run_uid (line 963) | def _active_run_uid(self) -> str | None:
    method _logger_prefix (line 969) | def _logger_prefix(self, trainer: pl.Trainer, run_uid: str | None) -> ...
    method base_prefix (line 983) | def base_prefix(self) -> str:
    method checkpoint_key_prefix (line 995) | def checkpoint_key_prefix(self) -> str:
    method resolve_artifact_storage_uri (line 1004) | def resolve_artifact_storage_uri(self, artifact: ln.Artifact) -> str:
    method resolve_artifact_key (line 1012) | def resolve_artifact_key(
    method _create_lamin_artifact (line 1028) | def _create_lamin_artifact(
    method save_checkpoint_artifact (line 1048) | def save_checkpoint_artifact(
    method save_config_artifact (line 1085) | def save_config_artifact(
    method save_hparams_artifact (line 1113) | def save_hparams_artifact(
    method _save_hparams_yaml (line 1143) | def _save_hparams_yaml(self, trainer: pl.Trainer) -> None:
    method _save_checkpoint (line 1159) | def _save_checkpoint(self, trainer: pl.Trainer, filepath: str) -> None:
    method _remove_checkpoint (line 1190) | def _remove_checkpoint(self, trainer: pl.Trainer, filepath: str) -> None:
  class SaveConfigCallback (line 1211) | class SaveConfigCallback(_SaveConfigCallback):
    method setup (line 1247) | def setup(
    method _config_path (line 1279) | def _config_path(self, trainer: pl.Trainer) -> Path:
    method _save_config (line 1306) | def _save_config(self, trainer: pl.Trainer, config_path: Path) -> None:
    method _get_artifact_checkpoint_callback (line 1318) | def _get_artifact_checkpoint_callback(
  class Callback (line 1331) | class Callback(pl.Callback):
    method __init__ (line 1343) | def __init__(
    method on_train_start (line 1358) | def on_train_start(
    method on_train_epoch_end (line 1374) | def on_train_epoch_end(

FILE: lamindb/migrations/0177_squashed.py
  function apply_constraints (line 169) | def apply_constraints(apps, schema_editor):
  class Migration (line 185) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0178_v2_2.py
  class Migration (line 10) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0179_v2_2_part_2.py
  class Migration (line 10) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0180_v2_2_part_3.py
  class Migration (line 9) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0181_v2_2_part_4.py
  class Migration (line 7) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0182_v2_2_part_5.py
  class Migration (line 10) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0183_v2_2_part_6.py
  class Migration (line 6) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0184_alter_transformrecord_feature.py
  class Migration (line 9) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0185_alter_runrecord_feature.py
  class Migration (line 9) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0186_v2_4.py
  class Migration (line 9) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0187_squashed.py
  function apply_constraints (line 169) | def apply_constraints(apps, schema_editor):
  class Migration (line 185) | class Migration(migrations.Migration):

FILE: lamindb/migrations/0187_v2_4_part_2.py
  class Migration (line 9) | class Migration(migrations.Migration):

FILE: lamindb/models/_describe.py
  function strip_ansi_from_string (line 31) | def strip_ansi_from_string(text: str) -> str:
  function format_rich_tree (line 37) | def format_rich_tree(
  function format_run_title (line 82) | def format_run_title(
  function format_title_with_version (line 105) | def format_title_with_version(
  function describe_header (line 119) | def describe_header(record: BaseSQLRecord) -> Tree:
  function format_bytes (line 160) | def format_bytes(bytes_value):
  function append_uid_run (line 174) | def append_uid_run(record: TracksRun, two_column_items: list, fk_data=No...
  function append_branch_space_created_at_created_by (line 192) | def append_branch_space_created_at_created_by(
  function add_two_column_items_to_tree (line 212) | def add_two_column_items_to_tree(tree: Tree, two_column_items: list) -> ...
  function describe_artifact (line 230) | def describe_artifact(
  function describe_collection (line 306) | def describe_collection(
  function display_text (line 322) | def display_text(
  function describe_run (line 340) | def describe_run(
  function describe_record (line 427) | def describe_record(
  function describe_transform (line 474) | def describe_transform(
  function describe_branch (line 499) | def describe_branch(record: Branch) -> Tree:
  function describe_schema (line 514) | def describe_schema(record: Schema, slot: str | None = None) -> Tree:
  function describe_postgres (line 597) | def describe_postgres(record):
  function describe_sqlite (line 639) | def describe_sqlite(record):
  function append_readme_blocks_to_tree (line 690) | def append_readme_blocks_to_tree(
  function describe_postgres_sqlite (line 724) | def describe_postgres_sqlite(

FILE: lamindb/models/_django.py
  function patch_many_to_many_descriptor (line 21) | def patch_many_to_many_descriptor() -> None:
  function get_related_model (line 65) | def get_related_model(model, field_name):
  function get_artifact_or_run_with_related (line 81) | def get_artifact_or_run_with_related(
  function get_collection_with_related (line 295) | def get_collection_with_related(
  function get_schema_m2m_relations (line 352) | def get_schema_m2m_relations(artifact: Artifact, slot_schema: dict, limi...

FILE: lamindb/models/_feature_manager.py
  function get_accessor_by_registry_ (line 72) | def get_accessor_by_registry_(host: Artifact | Collection) -> dict:
  function get_schema_by_slot_ (line 83) | def get_schema_by_slot_(host: Artifact) -> dict[str, Schema]:
  function get_label_links (line 101) | def get_label_links(
  function get_schema_links (line 113) | def get_schema_links(host: Artifact | Collection) -> BasicQuerySet:
  function get_link_attr (line 119) | def get_link_attr(
  function get_categorical_link_info (line 132) | def get_categorical_link_info(
  function strip_cat (line 190) | def strip_cat(feature_dtype: str) -> str:
  function format_dtype_for_display (line 201) | def format_dtype_for_display(dtype_str: str) -> str:
  class GroupConcat (line 242) | class GroupConcat(Aggregate):
  function custom_aggregate (line 247) | def custom_aggregate(field, using: str):
  function get_categoricals_postgres (line 254) | def get_categoricals_postgres(
  function get_categoricals_sqlite (line 329) | def get_categoricals_sqlite(
  function get_non_categoricals (line 362) | def get_non_categoricals(
  function create_feature_table (line 435) | def create_feature_table(
  function get_features_data (line 452) | def get_features_data(
  function describe_features (line 605) | def describe_features(
  function infer_convert_dtype_key_value (line 719) | def infer_convert_dtype_key_value(
  function _filter_one_feature_clause (line 793) | def _filter_one_feature_clause(
  function filter_with_feature_predicates (line 911) | def filter_with_feature_predicates(
  function filter_base (line 936) | def filter_base(
  function filter_with_features (line 970) | def filter_with_features(
  class FeatureManager (line 1024) | class FeatureManager:
    method __init__ (line 1027) | def __init__(self, sqlrecord: Artifact | Run | Record):
    method __repr__ (line 1034) | def __repr__(self) -> str:
    method describe (line 1037) | def describe(self, return_str: bool = False) -> str | None:
    method get_values (line 1050) | def get_values(self, external_only: bool = False) -> dict[str, Any]:
    method __getitem__ (line 1060) | def __getitem__(
    method slots (line 1164) | def slots(self) -> dict[str, Schema]:
    method _accessor_by_registry (line 1177) | def _accessor_by_registry(self):
    method _add_label_feature_links (line 1183) | def _add_label_feature_links(
    method _get_feature_objects (line 1215) | def _get_feature_objects(self, dictionary, feature_field):
    method _resolve_feature_value_dictionary (line 1251) | def _resolve_feature_value_dictionary(
    method _merge_feature_objects (line 1319) | def _merge_feature_objects(
    method _raise_not_validated_values (line 1336) | def _raise_not_validated_values(
    method _collect_record_feature_writes (line 1352) | def _collect_record_feature_writes(
    method add_values (line 1503) | def add_values(
    method _add_values (line 1617) | def _add_values(
    method set_values (line 1752) | def set_values(
    method _get_external_schema (line 1859) | def _get_external_schema(self) -> Schema | None:
    method remove_values (line 1867) | def remove_values(
    method _remove_values (line 1898) | def _remove_values(
    method _add_schema (line 2002) | def _add_schema(self, schema: Schema, slot: str) -> None:
    method _add_from (line 2033) | def _add_from(self, data: Artifact | Collection, transfer_logs: dict =...
  function bulk_set_features_in_records (line 2119) | def bulk_set_features_in_records(records: Iterable[Record]) -> None:

FILE: lamindb/models/_from_values.py
  function _from_values (line 17) | def _from_values(
  function get_existing_records (line 99) | def get_existing_records(
  function create_records_from_source (line 192) | def create_records_from_source(
  function index_iterable (line 307) | def index_iterable(iterable: ListLike) -> Index:
  function _format_values (line 317) | def _format_values(
  function _bulk_create_dicts_from_df (line 336) | def _bulk_create_dicts_from_df(
  function get_organism_record_from_field (line 357) | def get_organism_record_from_field(  # type: ignore

FILE: lamindb/models/_is_versioned.py
  class IsVersioned (line 21) | class IsVersioned(models.Model):
    class Meta (line 24) | class Meta:
    method __init__ (line 38) | def __init__(
    method stem_uid (line 47) | def stem_uid(self) -> str:
    method version (line 60) | def version(self) -> str:
    method version (line 70) | def version(self, value: str | None) -> None:
    method versions (line 74) | def versions(self) -> QuerySet:
    method _add_to_version_family (line 88) | def _add_to_version_family(
  function bump_version (line 120) | def bump_version(
  function set_version (line 165) | def set_version(version: str | None = None, previous_version: str | None...
  function create_uid (line 180) | def create_uid(
  function process_revises (line 219) | def process_revises(
  function _adjust_is_latest_when_deleting_is_versioned (line 239) | def _adjust_is_latest_when_deleting_is_versioned(
  function reconcile_is_latest_within_branch (line 294) | def reconcile_is_latest_within_branch(

FILE: lamindb/models/_label_manager.py
  function _get_labels (line 37) | def _get_labels(
  function _get_labels_postgres (line 63) | def _get_labels_postgres(
  function describe_labels (line 76) | def describe_labels(
  function _save_validated_records (line 129) | def _save_validated_records(
  function save_validated_records (line 162) | def save_validated_records(
  class LabelManager (line 174) | class LabelManager:
    method __init__ (line 182) | def __init__(self, sqlrecord: Artifact | Collection) -> None:
    method __repr__ (line 187) | def __repr__(self) -> str:
    method describe (line 190) | def describe(self, return_str=True) -> str:
    method add (line 195) | def add(
    method get (line 210) | def get(
    method add_from (line 227) | def add_from(self, data: Artifact | Collection, transfer_logs: dict = ...

FILE: lamindb/models/_relations.py
  function get_schema_modules (line 19) | def get_schema_modules(instance: str | None) -> set[str]:
  function dict_module_name_to_model_name (line 45) | def dict_module_name_to_model_name(
  function dict_related_model_to_related_name (line 66) | def dict_related_model_to_related_name(
  function get_related_name (line 100) | def get_related_name(features_type: type[SQLRecord]) -> str:

FILE: lamindb/models/_run_cleanup.py
  function main (line 14) | def main() -> None:

FILE: lamindb/models/artifact.py
  function _lazy_load_storage_module (line 79) | def _lazy_load_storage_module():
  function _s (line 114) | def _s():
  function _identify_zarr_type (line 126) | def _identify_zarr_type(storepath, *, check: bool = True):
  function process_pathlike (line 177) | def process_pathlike(
  function process_data (line 257) | def process_data(
  function get_stat_or_artifact (line 343) | def get_stat_or_artifact(
  function check_path_in_existing_storage (line 434) | def check_path_in_existing_storage(
  function get_relative_path_to_directory (line 452) | def get_relative_path_to_directory(
  function get_artifact_kwargs_from_data (line 469) | def get_artifact_kwargs_from_data(
  function log_storage_hint (line 610) | def log_storage_hint(
  function data_is_dataframe (line 640) | def data_is_dataframe(data: Any) -> bool:
  function data_is_scversedatastructure (line 646) | def data_is_scversedatastructure(
  function data_is_soma_experiment (line 704) | def data_is_soma_experiment(data: SOMAExperiment | AnyPathStr) -> bool:
  function check_otype_artifact (line 713) | def check_otype_artifact(
  function populate_subsequent_run (line 751) | def populate_subsequent_run(record: Artifact | Collection, run: Run | No...
  function get_run (line 762) | def get_run(run: Run | None) -> Run | None:
  function save_staged_schemas (line 780) | def save_staged_schemas(self: Artifact) -> None:
  function save_schema_links (line 804) | def save_schema_links(self: Artifact) -> None:
  function validate_feature (line 819) | def validate_feature(feature: Feature, records: list[SQLRecord]) -> None:
  function get_labels (line 830) | def get_labels(
  function add_labels (line 882) | def add_labels(
  function delete_permanently (line 990) | def delete_permanently(artifact: Artifact, storage: bool | None, using_k...
  class LazyArtifact (line 1048) | class LazyArtifact:
    method __init__ (line 1070) | def __init__(self, suffix: str, overwrite_versions: bool, **kwargs):
    method path (line 1090) | def path(self) -> UPath:
    method save (line 1093) | def save(self, upload: bool | None = None, **kwargs) -> Artifact:
    method __repr__ (line 1097) | def __repr__(self) -> str:  # pragma: no cover
  function _sqlrecord_or_id (line 1107) | def _sqlrecord_or_id(
  class Artifact (line 1129) | class Artifact(SQLRecord, IsVersioned, TracksRun, TracksUpdates):
    class Meta (line 1296) | class Meta(SQLRecord.Meta, IsVersioned.Meta, TracksRun.Meta, TracksUpd...
    method features (line 1328) | def features(self) -> FeatureManager:
    method labels (line 1370) | def labels(self) -> LabelManager:
    method __init__ (line 1570) | def __init__(
    method __init__ (line 1589) | def __init__(
    method __init__ (line 1594) | def __init__(
    method transform (line 1864) | def transform(self) -> Transform | None:
    method overwrite_versions (line 1869) | def overwrite_versions(self) -> bool:
    method _storage_ongoing (line 1881) | def _storage_ongoing(self) -> bool:
    method _storage_ongoing (line 1898) | def _storage_ongoing(self, value: bool | None) -> None:
    method feature_sets (line 1912) | def feature_sets(self):
    method path (line 1916) | def path(self) -> UPath:
    method _cache_path (line 1937) | def _cache_path(self) -> UPath:
    method get (line 1948) | def get(
    method filter (line 1990) | def filter(
    method from_lazy (line 2019) | def from_lazy(
    method from_dataframe (line 2065) | def from_dataframe(
    method from_df (line 2172) | def from_df(
    method from_anndata (line 2194) | def from_anndata(
    method from_mudata (line 2322) | def from_mudata(
    method from_spatialdata (line 2382) | def from_spatialdata(
    method from_tiledbsoma (line 2460) | def from_tiledbsoma(
    method from_dir (line 2516) | def from_dir(
    method replace (line 2627) | def replace(
    method open (line 2738) | def open(
    method load (line 2935) | def load(
    method cache (line 3012) | def cache(
    method delete (line 3047) | def delete(
    method save (line 3088) | def save(
  function _update_artifact_keys_with_suffix (line 3336) | def _update_artifact_keys_with_suffix(artifact: Artifact, suffix: str):
  function _confirm_artifact_move (line 3346) | def _confirm_artifact_move(source_path_str: str, target_path_str: str) -...
  function _handle_non_virtual_key_change_on_save (line 3359) | def _handle_non_virtual_key_change_on_save(
  function _handle_suffix_change_on_save (line 3382) | def _handle_suffix_change_on_save(artifact: Artifact) -> bool:
  function _sorted_sizes (line 3402) | def _sorted_sizes(fs: AbstractFileSystem, path: str) -> list[int]:
  function _rm_catch_error (line 3407) | def _rm_catch_error(fs: AbstractFileSystem, path: str) -> Exception | None:
  function _safe_move (line 3416) | def _safe_move(fs: AbstractFileSystem, source: str, target: str):
  function _move_artifact_to_storage (line 3446) | def _move_artifact_to_storage(
  function _synchronize_cleanup_on_error (line 3467) | def _synchronize_cleanup_on_error(
  function _delete_skip_storage (line 3488) | def _delete_skip_storage(artifact, *args, **kwargs) -> None:
  function _save_skip_storage (line 3492) | def _save_skip_storage(artifact, **kwargs) -> None:
  class ArtifactJsonValue (line 3498) | class ArtifactJsonValue(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 3504) | class Meta:
  class ArtifactUser (line 3509) | class ArtifactUser(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 3517) | class Meta:
  class ArtifactRun (line 3524) | class ArtifactRun(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 3533) | class Meta:
  class ArtifactArtifact (line 3540) | class ArtifactArtifact(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 3549) | class Meta:
  function track_run_input (line 3556) | def track_run_input(
  function migrate_save_completed_to_aux_postgres (line 3727) | def migrate_save_completed_to_aux_postgres(schema_editor) -> None:

FILE: lamindb/models/artifact_set.py
  class ArtifactSet (line 32) | class ArtifactSet(Iterable):
    method load (line 45) | def load(
    method open (line 62) | def open(
    method mapped (line 83) | def mapped(
  function artifacts_from_path (line 133) | def artifacts_from_path(artifacts: ArtifactSet, path: AnyPathStr) -> Art...

FILE: lamindb/models/block.py
  function _init_versioned_attached_block (line 50) | def _init_versioned_attached_block(
  class BaseBlock (line 166) | class BaseBlock(IsVersioned):
    class Meta (line 167) | class Meta:
  class Block (line 205) | class Block(BaseBlock, SQLRecord):
    class Meta (line 211) | class Meta:
    method __init__ (line 232) | def __init__(
    method __init__ (line 243) | def __init__(
    method __init__ (line 248) | def __init__(
  class HasBranch (line 367) | class HasBranch(models.Model):
    class Meta (line 368) | class Meta:
  class RecordBlock (line 389) | class RecordBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 392) | class Meta:
    method __init__ (line 398) | def __init__(self, *args, **kwargs):
  class ArtifactBlock (line 402) | class ArtifactBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 405) | class Meta:
    method __init__ (line 411) | def __init__(self, *args, **kwargs):
  class TransformBlock (line 415) | class TransformBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 418) | class Meta:
    method __init__ (line 428) | def __init__(self, *args, **kwargs):
  class RunBlock (line 434) | class RunBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 437) | class Meta:
    method __init__ (line 443) | def __init__(self, *args, **kwargs):
  class CollectionBlock (line 447) | class CollectionBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 450) | class Meta:
    method __init__ (line 458) | def __init__(self, *args, **kwargs):
  class SchemaBlock (line 462) | class SchemaBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 465) | class Meta:
    method __init__ (line 471) | def __init__(self, *args, **kwargs):
  class FeatureBlock (line 475) | class FeatureBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 478) | class Meta:
    method __init__ (line 484) | def __init__(self, *args, **kwargs):
  class ProjectBlock (line 488) | class ProjectBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 491) | class Meta:
    method __init__ (line 497) | def __init__(self, *args, **kwargs):
  class SpaceBlock (line 501) | class SpaceBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 504) | class Meta:
    method __init__ (line 510) | def __init__(self, *args, **kwargs):
  class ULabelBlock (line 514) | class ULabelBlock(BaseBlock, BaseSQLRecord, HasBranch):
    class Meta (line 517) | class Meta:
    method __init__ (line 523) | def __init__(self, *args, **kwargs):
  class BranchBlock (line 527) | class BranchBlock(BaseBlock, BaseSQLRecord):
    class Meta (line 530) | class Meta:
    method __init__ (line 536) | def __init__(self, *args, **kwargs):

FILE: lamindb/models/can_curate.py
  function _check_if_record_in_db (line 29) | def _check_if_record_in_db(record: str | SQLRecord | None, using_key: st...
  function _concat_lists (line 39) | def _concat_lists(values: ListLike | str) -> list[str]:
  function _inspect (line 56) | def _inspect(
  function _validate (line 159) | def _validate(
  function _standardize (line 218) | def _standardize(
  function _add_or_remove_synonyms (line 350) | def _add_or_remove_synonyms(
  function _check_synonyms_field_exist (line 430) | def _check_synonyms_field_exist(record: CanCurate):
  function _filter_queryset_with_organism (line 438) | def _filter_queryset_with_organism(
  class CanCurate (line 461) | class CanCurate:
    method inspect (line 465) | def inspect(
    method validate (line 521) | def validate(
    method from_values (line 575) | def from_values(
    method standardize (line 629) | def standardize(
    method add_synonym (line 710) | def add_synonym(
    method remove_synonym (line 746) | def remove_synonym(self, synonym: str | ListLike):
    method set_abbr (line 773) | def set_abbr(self, value: str):

FILE: lamindb/models/collection.py
  function _load_concat_artifacts (line 59) | def _load_concat_artifacts(
  class Collection (line 94) | class Collection(SQLRecord, IsVersioned, TracksRun, TracksUpdates):
    class Meta (line 124) | class Meta(SQLRecord.Meta, IsVersioned.Meta, TracksRun.Meta, TracksUpd...
    method __init__ (line 218) | def __init__(
    method __init__ (line 232) | def __init__(
    method __init__ (line 237) | def __init__(
    method get (line 341) | def get(
    method append (line 372) | def append(self, artifact: Artifact, run: Run | None = None) -> Collec...
    method open (line 399) | def open(
    method mapped (line 433) | def mapped(
    method cache (line 535) | def cache(self, is_run_input: bool | None = None) -> list[UPath]:
    method load (line 552) | def load(
    method save (line 569) | def save(self, using: str | None = None) -> Collection:
    method restore (line 599) | def restore(self) -> None:
    method transform (line 612) | def transform(self) -> Transform | None:
    method name (line 617) | def name(self) -> str:
    method ordered_artifacts (line 625) | def ordered_artifacts(self) -> QuerySet:
    method data_artifact (line 637) | def data_artifact(self) -> Artifact | None:
  function from_artifacts (line 650) | def from_artifacts(artifacts: Iterable[Artifact]) -> tuple[str, dict[str...
  class CollectionArtifact (line 668) | class CollectionArtifact(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 675) | class Meta:

FILE: lamindb/models/feature.py
  class FeaturePredicate (line 69) | class FeaturePredicate:
    method __bool__ (line 76) | def __bool__(self) -> bool:
  function parse_dtype (line 83) | def parse_dtype(
  function get_record_type_from_uid (line 127) | def get_record_type_from_uid(
  function get_record_type_from_nested_subtypes (line 144) | def get_record_type_from_nested_subtypes(
  function dtype_as_object (line 244) | def dtype_as_object(dtype_str: str, old_format: bool = False) -> type | ...
  function parse_cat_dtype (line 309) | def parse_cat_dtype(
  function parse_nested_brackets (line 394) | def parse_nested_brackets(dtype_str: str, old_format: bool = False) -> d...
  function extract_subtypes_and_filter (line 515) | def extract_subtypes_and_filter(subtype_str: str) -> dict[str, Any]:
  function serialize_dtype (line 578) | def serialize_dtype(
  function serialize_pandas_dtype (line 665) | def serialize_pandas_dtype(pandas_dtype: ExtensionDtype) -> str:
  function convert_to_pandas_dtype (line 689) | def convert_to_pandas_dtype(lamin_dtype: str) -> str | pd.CategoricalDtype:
  function parse_filter_string (line 713) | def parse_filter_string(filter_str: str) -> dict[str, tuple[str, str | N...
  function resolve_relation_filters (line 749) | def resolve_relation_filters(
  function migrate_dtype_to_uid_format (line 778) | def migrate_dtype_to_uid_format(connection, input_field: str = "_dtype_s...
  function process_init_feature_param (line 842) | def process_init_feature_param(args, kwargs):
  class Feature (line 931) | class Feature(SQLRecord, HasType, CanCurate, TracksRun, TracksUpdates):
    class Meta (line 1083) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 1186) | def __init__(
    method __init__ (line 1202) | def __init__(
    method __init__ (line 1207) | def __init__(
    method __eq__ (line 1290) | def __eq__(self, other: object) -> bool:
    method __ne__ (line 1298) | def __ne__(self, other: object) -> bool:
    method __gt__ (line 1306) | def __gt__(self, value: Any) -> FeaturePredicate:
    method __ge__ (line 1309) | def __ge__(self, value: Any) -> FeaturePredicate:
    method __lt__ (line 1312) | def __lt__(self, value: Any) -> FeaturePredicate:
    method __le__ (line 1315) | def __le__(self, value: Any) -> FeaturePredicate:
    method query_features (line 1319) | def query_features(self) -> QuerySet:
    method from_dataframe (line 1328) | def from_dataframe(
    method from_df (line 1369) | def from_df(
    method from_dict (line 1375) | def from_dict(
    method save (line 1421) | def save(self, *args, **kwargs) -> Feature:
    method with_config (line 1426) | def with_config(self, optional: bool | None = None) -> tuple[Feature, ...
    method coerce_dtype (line 1434) | def coerce_dtype(self) -> bool | None:
    method coerce_dtype (line 1439) | def coerce_dtype(self, value: bool | None) -> None:
    method dtype (line 1444) | def dtype(self) -> str | None:
    method dtype_as_str (line 1464) | def dtype_as_str(self) -> DtypeStr | str | None:
    method dtype_as_object (line 1501) | def dtype_as_object(self) -> type | SQLRecord | FieldAttr | None:  # t...
  class JsonValue (line 1552) | class JsonValue(SQLRecord, TracksRun):
    class Meta (line 1584) | class Meta(BaseSQLRecord.Meta, TracksRun.Meta):
    method get_or_create (line 1589) | def get_or_create(cls, feature, value):
  function suggest_categorical_for_str_iterable (line 1604) | def suggest_categorical_for_str_iterable(
  function categoricals_from_df (line 1620) | def categoricals_from_df(df: pd.DataFrame) -> dict:

FILE: lamindb/models/has_parents.py
  function _query_relatives (line 34) | def _query_relatives(
  function keep_topmost_matches (line 74) | def keep_topmost_matches(records: list[HasType] | SQLRecordList) -> SQLR...
  function _query_ancestors_of_fk (line 135) | def _query_ancestors_of_fk(record: SQLRecord, attr: str) -> SQLRecordList:
  class HasParents (line 149) | class HasParents:
    method view_parents (line 152) | def view_parents(
    method view_children (line 187) | def view_children(
    method query_parents (line 220) | def query_parents(self) -> QuerySet:
    method query_children (line 224) | def query_children(self) -> QuerySet:
  function view_digraph (line 229) | def view_digraph(u: Digraph):
  function view_lineage (line 254) | def view_lineage(
  function view_parents (line 323) | def view_parents(
  function _get_parents (line 407) | def _get_parents(
  function _df_edges_from_parents (line 449) | def _df_edges_from_parents(
  function get_record_label (line 502) | def get_record_label(record: SQLRecord, field: str | None = None):
  function _get_all_parent_runs (line 527) | def _get_all_parent_runs(data: Artifact | Collection) -> list:
  function _get_all_child_runs (line 573) | def _get_all_child_runs(data: Artifact | Collection) -> list:
  function _df_edges_from_runs (line 634) | def _df_edges_from_runs(df_values: list):

FILE: lamindb/models/project.py
  class Reference (line 43) | class Reference(
    class Meta (line 65) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 138) | def __init__(
    method __init__ (line 153) | def __init__(
    method __init__ (line 158) | def __init__(self, *args, **kwargs):
    method query_references (line 161) | def query_references(self) -> QuerySet:
  class Project (line 170) | class Project(SQLRecord, HasType, CanCurate, TracksRun, TracksUpdates, V...
    class Meta (line 187) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 292) | def __init__(
    method __init__ (line 304) | def __init__(
    method __init__ (line 309) | def __init__(self, *args, **kwargs):
    method query_projects (line 312) | def query_projects(self) -> QuerySet:
  class ArtifactProject (line 321) | class ArtifactProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 333) | class Meta:
  class RunProject (line 339) | class RunProject(BaseSQLRecord, IsLink):
    class Meta (line 356) | class Meta:
  class BranchProject (line 361) | class BranchProject(BaseSQLRecord, IsLink):
    class Meta (line 366) | class Meta:
  class ProjectUser (line 371) | class ProjectUser(BaseSQLRecord, IsLink):
    class Meta (line 378) | class Meta:
  class TransformProject (line 383) | class TransformProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 388) | class Meta:
  class CollectionProject (line 393) | class CollectionProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 400) | class Meta:
  class ULabelProject (line 405) | class ULabelProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 410) | class Meta:
  class FeatureProject (line 415) | class FeatureProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 420) | class Meta:
  class SchemaProject (line 425) | class SchemaProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 430) | class Meta:
  class ReferenceRecord (line 436) | class ReferenceRecord(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 448) | class Meta:
  class RecordReference (line 453) | class RecordReference(BaseSQLRecord, IsLink):
    class Meta (line 461) | class Meta:
  class ProjectRecord (line 467) | class ProjectRecord(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 479) | class Meta:
  class RecordProject (line 484) | class RecordProject(BaseSQLRecord, IsLink):
    class Meta (line 490) | class Meta:
  class BlockProject (line 495) | class BlockProject(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 500) | class Meta:
  class ArtifactReference (line 505) | class ArtifactReference(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 517) | class Meta:
  class TransformReference (line 522) | class TransformReference(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 531) | class Meta:
  class CollectionReference (line 536) | class CollectionReference(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 545) | class Meta:

FILE: lamindb/models/query_manager.py
  function _search (line 34) | def _search(
  function _lookup (line 173) | def _lookup(
  class QueryManager (line 241) | class QueryManager(Manager):
    method to_list (line 264) | def to_list(self, field: str | None = None):
    method to_dataframe (line 271) | def to_dataframe(self, **kwargs):
    method df (line 279) | def df(self, **kwargs):
    method search (line 283) | def search(self, string: str, **kwargs):
    method lookup (line 288) | def lookup(self, field: StrField | None = None, **kwargs) -> NamedTuple:
    method get_queryset (line 292) | def get_queryset(self):
  class RelatedManager (line 302) | class RelatedManager(QueryManager):
    method add (line 333) | def add(self, *objs, bulk: bool = True) -> None:
    method set (line 337) | def set(self, objs, *, bulk: bool = True, clear: bool = False) -> None:
    method remove (line 341) | def remove(self, *objs, bulk: bool = True) -> None:
    method clear (line 345) | def clear(self) -> None:

FILE: lamindb/models/query_set.py
  function get_keys_from_df (line 84) | def get_keys_from_df(data: list, registry: SQLRecord) -> list[str]:
  function get_default_branch_ids (line 106) | def get_default_branch_ids(branch: Branch | None = None) -> list[int]:
  function one_helper (line 128) | def one_helper(
  function get_backward_compat_filter_kwargs (line 154) | def get_backward_compat_filter_kwargs(queryset, expressions):
  function process_expressions (line 251) | def process_expressions(queryset: QuerySet, queries: tuple, expressions:...
  function get (line 343) | def get(
  class SQLRecordList (line 404) | class SQLRecordList(UserList, Generic[T]):
    method __init__ (line 407) | def __init__(self, records: Iterable[T]):
    method to_dataframe (line 413) | def to_dataframe(self) -> pd.DataFrame:
    method df (line 421) | def df(self) -> pd.DataFrame:
    method to_list (line 424) | def to_list(
    method one (line 431) | def one(self) -> T:
    method save (line 435) | def save(self) -> SQLRecordList[T]:
  function get_basic_field_names (line 443) | def get_basic_field_names(
  function get_feature_annotate_kwargs (line 501) | def get_feature_annotate_kwargs(
  function analyze_lookup_cardinality (line 704) | def analyze_lookup_cardinality(
  function reorder_subset_columns_in_df (line 754) | def reorder_subset_columns_in_df(
  function encode_lamindb_fields_as_columns (line 765) | def encode_lamindb_fields_as_columns(
  function reshape_annotate_result (line 787) | def reshape_annotate_result(
  function process_links_features (line 982) | def process_links_features(
  function process_cols_from_include (line 1028) | def process_cols_from_include(
  function _queryset_class_factory (line 1047) | def _queryset_class_factory(
  class BasicQuerySet (line 1064) | class BasicQuerySet(models.QuerySet):
    method __new__ (line 1078) | def __new__(cls, model=None, query=None, using=None, hints=None):
    method _to_class (line 1082) | def _to_class(
    method _to_basic (line 1089) | def _to_basic(self, copy: bool = True) -> BasicQuerySet:
    method _to_non_basic (line 1093) | def _to_non_basic(self, copy: bool = True) -> QuerySet:
    method to_dataframe (line 1098) | def to_dataframe(
    method df (line 1228) | def df(
    method describe (line 1235) | def describe(self, return_str: bool = False) -> str | None:
    method delete (line 1239) | def delete(self, *args, permanent: bool | None = None, **kwargs):
    method to_list (line 1302) | def to_list(self, field: str | None = None) -> list[SQLRecord] | list[...
    method first (line 1318) | def first(self) -> SQLRecord | None:
    method one (line 1329) | def one(self) -> SQLRecord:
    method one_or_none (line 1333) | def one_or_none(self) -> SQLRecord | None:
    method search (line 1344) | def search(self, string: str, **kwargs):
    method lookup (line 1349) | def lookup(self, field: StrField | None = None, **kwargs) -> NamedTuple:
    method validate (line 1358) | def validate(self, values: ListLike, field: str | StrField | None = No...
    method inspect (line 1363) | def inspect(self, values: ListLike, field: str | StrField | None = Non...
    method standardize (line 1368) | def standardize(
  class QuerySet (line 1378) | class QuerySet(BasicQuerySet):
    method _handle_unknown_field (line 1394) | def _handle_unknown_field(self, error: FieldError) -> None:
    method get (line 1405) | def get(self, idlike: int | str | None = None, **expressions) -> SQLRe...
    method filter (line 1444) | def filter(self, *queries, **expressions) -> QuerySet:
  class NonInstantiableQuerySet (line 1512) | class NonInstantiableQuerySet:
    method __init__ (line 1515) | def __init__(self, qs: QuerySet, registry_name: str):
    method __repr__ (line 1519) | def __repr__(self) -> str:
    method __call__ (line 1522) | def __call__(self, *args, **kwargs):
    method __getattr__ (line 1528) | def __getattr__(self, attr):
  class ModuleNamespace (line 1532) | class ModuleNamespace:
    method __init__ (line 1540) | def __init__(self, query_db: DB, module_name: str):
    method __getattr__ (line 1545) | def __getattr__(self, name: str) -> NonInstantiableQuerySet:
    method __dir__ (line 1572) | def __dir__(self) -> list[str]:
  class BiontyDB (line 1589) | class BiontyDB(ModuleNamespace):
  class PertdbDB (line 1607) | class PertdbDB(ModuleNamespace):
  class DB (line 1619) | class DB:
    method __init__ (line 1681) | def __init__(self, instance: str):
    method __getattr__ (line 1694) | def __getattr__(self, name: str) -> NonInstantiableQuerySet | BiontyDB...
    method __repr__ (line 1741) | def __repr__(self) -> str:
    method __dir__ (line 1744) | def __dir__(self) -> list[str]:

FILE: lamindb/models/record.py
  class RecordBatch (line 56) | class RecordBatch:
    method __init__ (line 59) | def __init__(
    method __len__ (line 73) | def __len__(self) -> int:
    method type (line 77) | def type(self) -> Record:
    method _build_records (line 80) | def _build_records(self) -> list[Record]:
    method save (line 107) | def save(self) -> SQLRecordList[Record]:
  class Record (line 118) | class Record(SQLRecord, HasType, HasParents, CanCurate, TracksRun, Track...
    class Meta (line 225) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 395) | def __init__(
    method __init__ (line 408) | def __init__(
    method __init__ (line 413) | def __init__(
    method save (line 463) | def save(self, *args, **kwargs) -> Record:
    method from_dataframe (line 472) | def from_dataframe(
    method features (line 560) | def features(self) -> FeatureManager:
    method is_sheet (line 570) | def is_sheet(self) -> bool:
    method query_parents (line 574) | def query_parents(self) -> QuerySet:
    method query_children (line 582) | def query_children(self) -> QuerySet:
    method query_records (line 590) | def query_records(self) -> QuerySet:
    method _set_export_run (line 598) | def _set_export_run(self, is_run_input: bool | Run | None = None) -> N...
    method to_dataframe (line 617) | def to_dataframe(
    method to_artifact (line 686) | def to_artifact(
  class RecordJson (line 722) | class RecordJson(BaseSQLRecord, IsLink):
    class Meta (line 728) | class Meta:
  class RecordRecord (line 736) | class RecordRecord(BaseSQLRecord, IsLink):
    class Meta (line 742) | class Meta:
  class RecordULabel (line 748) | class RecordULabel(BaseSQLRecord, IsLink):
    class Meta (line 754) | class Meta:
  class RecordUser (line 761) | class RecordUser(BaseSQLRecord, IsLink):
    class Meta (line 767) | class Meta:
  class RecordRun (line 774) | class RecordRun(BaseSQLRecord, IsLink):
    class Meta (line 780) | class Meta:
  class RunRecord (line 787) | class RunRecord(BaseSQLRecord, IsLink):
    class Meta (line 801) | class Meta:
  class RecordArtifact (line 807) | class RecordArtifact(BaseSQLRecord, IsLink):
    class Meta (line 813) | class Meta:
  class ArtifactRecord (line 819) | class ArtifactRecord(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 827) | class Meta:
  class RecordCollection (line 833) | class RecordCollection(BaseSQLRecord, IsLink):
    class Meta (line 841) | class Meta:
  class CollectionRecord (line 847) | class CollectionRecord(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 857) | class Meta:
  class RecordTransform (line 863) | class RecordTransform(BaseSQLRecord, IsLink):
    class Meta (line 871) | class Meta:
  class TransformRecord (line 877) | class TransformRecord(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 891) | class Meta:

FILE: lamindb/models/run.py
  function current_run (line 54) | def current_run() -> Run | None:
  class TracksRun (line 71) | class TracksRun(models.Model):
    class Meta (line 74) | class Meta:
  class TracksUpdates (line 95) | class TracksUpdates(models.Model):
    class Meta (line 98) | class Meta:
  class User (line 107) | class User(BaseSQLRecord, CanCurate):
    class Meta (line 121) | class Meta:
    method __init__ (line 163) | def __init__(
    method __init__ (line 171) | def __init__(
    method __init__ (line 176) | def __init__(
  class Run (line 184) | class Run(SQLRecord, TracksUpdates):
    class Meta (line 237) | class Meta:
    method __init__ (line 374) | def __init__(
    method __init__ (line 388) | def __init__(
    method __init__ (line 393) | def __init__(
    method status (line 438) | def status(self) -> RunStatus:
    method features (line 471) | def features(self) -> FeatureManager:
    method query_output_artifacts (line 480) | def query_output_artifacts(
    method filter (line 511) | def filter(
  function _permanent_delete_runs (line 539) | def _permanent_delete_runs(runs: Run | QuerySet) -> None:
  class RunJsonValue (line 587) | class RunJsonValue(BaseSQLRecord, IsLink):
    class Meta (line 601) | class Meta:
  class RunArtifact (line 608) | class RunArtifact(BaseSQLRecord, IsLink):
    class Meta (line 616) | class Meta:

FILE: lamindb/models/save.py
  function save (line 29) | def save(
  function bulk_create (line 128) | def bulk_create(
  function bulk_update (line 228) | def bulk_update(
  function check_and_attempt_upload (line 272) | def check_and_attempt_upload(
  function copy_or_move_to_cache (line 326) | def copy_or_move_to_cache(
  function check_and_attempt_clearing (line 380) | def check_and_attempt_clearing(
  function store_artifacts (line 410) | def store_artifacts(
  function prepare_error_message (line 472) | def prepare_error_message(records, stored_artifacts, exception) -> str:
  function upload_artifact (line 492) | def upload_artifact(

FILE: lamindb/models/schema.py
  function validate_features (line 67) | def validate_features(features: list[SQLRecord]) -> SQLRecord:
  function get_features_config (line 91) | def get_features_config(
  class SchemaOptionals (line 109) | class SchemaOptionals:
    method __init__ (line 112) | def __init__(self, schema) -> None:
    method get_uids (line 115) | def get_uids(self) -> list[str]:
    method get (line 129) | def get(self) -> QuerySet:
    method set (line 137) | def set(self, features: list[Feature]) -> None:
    method remove (line 147) | def remove(self, features: Feature | list[Feature]) -> None:
    method add (line 159) | def add(self, features: Feature | list[Feature]) -> None:
  class Schema (line 183) | class Schema(SQLRecord, HasType, CanCurate, TracksRun, TracksUpdates):
    class Meta (line 289) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 412) | def __init__(
    method __init__ (line 437) | def __init__(
    method __init__ (line 442) | def __init__(
    method query_schemas (line 566) | def query_schemas(self) -> QuerySet:
    method _validate_kwargs_calculate_hash (line 574) | def _validate_kwargs_calculate_hash(
    method from_values (line 740) | def from_values(  # type: ignore
    method from_dataframe (line 819) | def from_dataframe(
    method from_df (line 864) | def from_df(
    method save (line 875) | def save(self, *args, **kwargs) -> Schema:
    method members (line 977) | def members(self) -> QuerySet:
    method dtype (line 1031) | def dtype(self) -> str | None:
    method dtype (line 1036) | def dtype(self, value: str | None) -> None:
    method coerce_dtype (line 1041) | def coerce_dtype(self) -> bool | None:
    method coerce_dtype (line 1046) | def coerce_dtype(self, value: bool | None) -> None:
    method n (line 1051) | def n(self) -> int | None:
    method n (line 1056) | def n(self, value: int | None) -> None:
    method index (line 1060) | def index(self) -> None | Feature:
    method index (line 1077) | def index(self, value: None | Feature) -> None:
    method _index_feature_uid (line 1087) | def _index_feature_uid(self) -> None | str:
    method _index_feature_uid (line 1095) | def _index_feature_uid(self, value: str | None) -> None:
    method slots (line 1103) | def slots(self) -> dict[str, Schema]:
    method optionals (line 1130) | def optionals(self) -> SchemaOptionals:
    method add_optional_features (line 1167) | def add_optional_features(self, features: list[Feature]) -> None:
    method remove_optional_features (line 1173) | def remove_optional_features(self, features: list[Feature]) -> None:
    method describe (line 1183) | def describe(cls_or_self, return_str: bool = False) -> None | str:
  function get_type_str (line 1195) | def get_type_str(dtype: str | None) -> str | None:
  function _get_related_name (line 1203) | def _get_related_name(self: Schema) -> str | None:
  class SchemaFeature (line 1213) | class SchemaFeature(BaseSQLRecord, IsLink):
    class Meta (line 1218) | class Meta:
  class ArtifactSchema (line 1223) | class ArtifactSchema(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 1230) | class Meta:
  class SchemaComponent (line 1235) | class SchemaComponent(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 1241) | class Meta:
  function migrate_auxiliary_fields_postgres (line 1253) | def migrate_auxiliary_fields_postgres(schema_editor) -> None:

FILE: lamindb/models/sqlrecord.py
  function _is_branch_sensitive_model (line 116) | def _is_branch_sensitive_model(model: type[BaseSQLRecord]) -> bool:
  class IsLink (line 160) | class IsLink:
  class HasType (line 164) | class HasType(models.Model):
    class Meta (line 178) | class Meta:
    method query_types (line 187) | def query_types(self) -> SQLRecordList:
  function deferred_attribute__repr__ (line 212) | def deferred_attribute__repr__(self):
  function unique_constraint_error_in_error_message (line 216) | def unique_constraint_error_in_error_message(error_msg: str) -> bool:
  function parse_violated_field_from_error_message (line 224) | def parse_violated_field_from_error_message(error_msg: str) -> list[str]...
  class ValidateFields (line 282) | class ValidateFields:
  function is_approx_pascal_case (line 286) | def is_approx_pascal_case(s: str) -> bool:
  function init_self_from_db (line 299) | def init_self_from_db(self: SQLRecord, existing_record: SQLRecord):
  function update_attributes (line 314) | def update_attributes(record: SQLRecord, attributes: dict[str, str]):
  function validate_literal_fields (line 331) | def validate_literal_fields(record: SQLRecord, kwargs) -> None:
  function validate_fields (line 377) | def validate_fields(record: SQLRecord, kwargs):
  function suggest_records_with_similar_names (line 436) | def suggest_records_with_similar_names(
  function delete_record (line 481) | def delete_record(record: BaseSQLRecord, is_soft: bool = True):
  function _synchronize_clone (line 529) | def _synchronize_clone(storage_root: str) -> str | None:
  class Registry (line 559) | class Registry(ModelBase):
    method __new__ (line 577) | def __new__(cls, name, bases, attrs, **kwargs):
    method __dir__ (line 584) | def __dir__(cls):
    method describe (line 615) | def describe(cls, return_str: bool = False) -> str | None:
    method lookup (line 631) | def lookup(
    method filter (line 640) | def filter(cls, *queries, **expressions) -> QuerySet:
    method get (line 663) | def get(
    method to_dataframe (line 692) | def to_dataframe(
    method df (line 743) | def df(
    method search (line 756) | def search(
    method using (line 774) | def using(
    method connect (line 782) | def connect(
    method __get_module_name__ (line 896) | def __get_module_name__(cls) -> str:
    method __get_name_with_module__ (line 903) | def __get_name_with_module__(cls) -> str:
    method __get_available_fields__ (line 911) | def __get_available_fields__(cls) -> set[str]:
  class BaseSQLRecord (line 926) | class BaseSQLRecord(models.Model, metaclass=Registry):
    class Meta (line 935) | class Meta:
    method __init__ (line 944) | def __init__(self, *args, **kwargs):
    method _populate_tracked_fields (line 1072) | def _populate_tracked_fields(self):
    method _field_changed (line 1090) | def _field_changed(self, field_name: str, check_is_saved: bool = True)...
    method save (line 1112) | def save(self: T, *args, **kwargs) -> T:
    method describe (line 1303) | def describe(
    method __repr__ (line 1324) | def __repr__(
    method __str__ (line 1384) | def __str__(self) -> str:
    method delete (line 1387) | def delete(self, permanent: bool | None = None):
  class Space (line 1406) | class Space(BaseSQLRecord):
    class Meta (line 1433) | class Meta:
    method __init__ (line 1465) | def __init__(
    method __init__ (line 1472) | def __init__(
    method __init__ (line 1477) | def __init__(
  class Branch (line 1500) | class Branch(BaseSQLRecord):
    class Meta (line 1616) | class Meta:
    method status (line 1689) | def status(self) -> BranchStatus:
    method status (line 1730) | def status(self, value: BranchStatus) -> None:
    method __init__ (line 1739) | def __init__(
    method __init__ (line 1746) | def __init__(
    method __init__ (line 1751) | def __init__(
  class BranchUser (line 1759) | class BranchUser(BaseSQLRecord, IsLink):
    class Meta (line 1760) | class Meta:
  class SQLRecord (line 1771) | class SQLRecord(BaseSQLRecord, metaclass=Registry):
    class Meta (line 1813) | class Meta:
    method restore (line 1816) | def restore(self) -> None:
    method delete (line 1824) | def delete(self, permanent: bool | None = None, **kwargs):
  function _format_django_validation_error (line 1911) | def _format_django_validation_error(record: SQLRecord, e: DjangoValidati...
  function _get_record_kwargs (line 1942) | def _get_record_kwargs(record_class) -> list[tuple[str, str]]:
  function get_name_field (line 1996) | def get_name_field(
  function add_db_connection (line 2039) | def add_db_connection(db: str, using: str):
  function update_fk_to_default_db (line 2052) | def update_fk_to_default_db(
  function transfer_fk_to_default_db_bulk (line 2103) | def transfer_fk_to_default_db_bulk(
  function get_transfer_run (line 2110) | def get_transfer_run(record) -> Run:
  function transfer_to_default_db (line 2151) | def transfer_to_default_db(
  function track_current_name_value (line 2204) | def track_current_name_value(record: SQLRecord):
  function check_name_change (line 2211) | def check_name_change(record: SQLRecord):
  function format_field_value (line 2277) | def format_field_value(value: datetime | str | Any, none: str = "None") ...
  class SQLRecordInfo (line 2294) | class SQLRecordInfo:
    method __init__ (line 2295) | def __init__(self, registry: Registry):
    method _get_type_for_field (line 2298) | def _get_type_for_field(self, field_name: str) -> str:
    method _get_base_class_fields (line 2307) | def _get_base_class_fields(self) -> list[str]:
    method _reorder_fields_by_class (line 2315) | def _reorder_fields_by_class(self, fields_to_order: list[Field]) -> li...
    method get_simple_fields (line 2329) | def get_simple_fields(self, return_str: bool = False) -> Any:
    method get_relational_fields (line 2356) | def get_relational_fields(self, return_str: bool = False):
  class Migration (line 2490) | class Migration(BaseSQLRecord):
    class Meta (line 2495) | class Meta:

FILE: lamindb/models/storage.py
  class Storage (line 41) | class Storage(SQLRecord, TracksRun, TracksUpdates):
    class Meta (line 158) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 188) | def __init__(
    method __init__ (line 198) | def __init__(
    method __init__ (line 203) | def __init__(
    method host (line 314) | def host(self) -> str | None:
    method path (line 326) | def path(self) -> UPath:
    method save (line 334) | def save(self, *args, **kwargs):
    method delete (line 341) | def delete(self, permanent: bool | None = None) -> None:  # type: ignore

FILE: lamindb/models/transform.py
  class Transform (line 49) | class Transform(SQLRecord, IsVersioned):
    class Meta (line 131) | class Meta(SQLRecord.Meta, IsVersioned.Meta):
    method __init__ (line 229) | def __init__(
    method __init__ (line 243) | def __init__(
    method __init__ (line 248) | def __init__(
    method from_git (line 368) | def from_git(
    method latest_run (line 487) | def latest_run(self) -> Run:
    method type (line 493) | def type(self) -> TransformKind:
    method type (line 497) | def type(self, value: TransformKind):
    method view_lineage (line 500) | def view_lineage(self, with_successors: bool = False, distance: int = 5):
    method _update_source_code_from_path (line 517) | def _update_source_code_from_path(self, source_code_path: Path) -> Non...
  function _permanent_delete_transforms (line 548) | def _permanent_delete_transforms(transforms: Transform | QuerySet) -> None:
  class TransformTransform (line 570) | class TransformTransform(BaseSQLRecord, IsLink):
    class Meta (line 586) | class Meta:

FILE: lamindb/models/ulabel.py
  class ULabel (line 39) | class ULabel(SQLRecord, HasType, HasParents, CanCurate, TracksRun, Track...
    class Meta (line 83) | class Meta(SQLRecord.Meta, TracksRun.Meta, TracksUpdates.Meta):
    method __init__ (line 188) | def __init__(
    method __init__ (line 199) | def __init__(
    method __init__ (line 204) | def __init__(
    method query_ulabels (line 246) | def query_ulabels(self) -> QuerySet:
  class ArtifactULabel (line 255) | class ArtifactULabel(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 263) | class Meta:
  class TransformULabel (line 270) | class TransformULabel(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 275) | class Meta:
  class RunULabel (line 280) | class RunULabel(BaseSQLRecord, IsLink):
    class Meta (line 293) | class Meta:
  class BranchULabel (line 298) | class BranchULabel(BaseSQLRecord, IsLink):
    class Meta (line 305) | class Meta:
  class CollectionULabel (line 310) | class CollectionULabel(BaseSQLRecord, IsLink, TracksRun):
    class Meta (line 320) | class Meta:

FILE: lamindb/setup/_merge.py
  function merge (line 16) | def merge(branch: str | Branch) -> None:

FILE: lamindb/setup/_switch.py
  function switch (line 12) | def switch(target: str | Branch, *, space: bool = False, create: bool = ...

FILE: noxfile.py
  function lint (line 47) | def lint(session: nox.Session) -> None:
  function install (line 52) | def install(session):
  function install_ci (line 91) | def install_ci(session, group):
  function configure_coverage (line 194) | def configure_coverage(session) -> None:
  function prepare (line 227) | def prepare(session):
  function test (line 283) | def test(session, group):
  function clidocs (line 366) | def clidocs(session):
  function docs (line 452) | def docs(session):

FILE: scripts/migrate_test_instances.py
  function run (line 21) | def run(cmd: str) -> None:
  function main (line 27) | def main() -> None:

FILE: tests/core/_dataset_fixtures.py
  function get_small_adata (line 12) | def get_small_adata():
  function get_small_mdata (line 25) | def get_small_mdata():
  function get_small_sdata (line 48) | def get_small_sdata():
  function get_mini_csv (line 72) | def get_mini_csv() -> Generator[Path, None, None]:

FILE: tests/core/conftest.py
  function pytest_sessionstart (line 20) | def pytest_sessionstart():
  function pytest_sessionfinish (line 54) | def pytest_sessionfinish(session: pytest.Session):
  function ccaplog (line 64) | def ccaplog(caplog) -> pytest.LogCaptureFixture:
  function get_test_filepaths (line 87) | def get_test_filepaths(request):  # -> Tuple[bool, Path, Path, Path, str]
  function registered_storage_file_and_folder (line 125) | def registered_storage_file_and_folder():
  function example_dataframe (line 147) | def example_dataframe():
  function adata_file (line 152) | def adata_file():
  function tsv_file (line 166) | def tsv_file():
  function zip_file (line 174) | def zip_file():
  function yaml_file (line 182) | def yaml_file():
  function fcs_file (line 192) | def fcs_file():
  function mudata_file (line 199) | def mudata_file(get_small_mdata):
  function spatialdata_file (line 207) | def spatialdata_file(get_small_sdata):

FILE: tests/core/test_artifact_anndata_with_curation.py
  function test_create_anndata_with_curation (line 4) | def test_create_anndata_with_curation():

FILE: tests/core/test_artifact_basics.py
  function data (line 58) | def data(request):
  function test_basic_validation (line 70) | def test_basic_validation():
  function test_create_from_path_file (line 100) | def test_create_from_path_file(get_test_filepaths, key_is_virtual, key, ...
  function test_create_from_path_file_with_explicit_key_is_virtual (line 203) | def test_create_from_path_file_with_explicit_key_is_virtual(
  function test_create_from_empty_files_skips_hash_lookup (line 226) | def test_create_from_empty_files_skips_hash_lookup(tmp_path):
  function test_create_from_path_folder (line 247) | def test_create_from_path_folder(get_test_filepaths, key):
  function test_create_from_path_overwrite_versions_false (line 317) | def test_create_from_path_overwrite_versions_false(get_test_filepaths):
  function test_delete_permanently_from_trash_folder (line 344) | def test_delete_permanently_from_trash_folder(tmp_path):
  function test_create_from_path_set_branch (line 364) | def test_create_from_path_set_branch():
  function test_from_dir (line 376) | def test_from_dir(get_test_filepaths, key):
  function test_create_from_dataframe (line 402) | def test_create_from_dataframe(example_dataframe: pd.DataFrame):
  function test_dataframe_validate_suffix (line 522) | def test_dataframe_validate_suffix(example_dataframe: pd.DataFrame):
  function test_create_from_parquet_file_default_constructor (line 535) | def test_create_from_parquet_file_default_constructor(
  function test_create_from_anndata (line 545) | def test_create_from_anndata(get_small_adata, adata_file, example_datafr...
  function test_from_anndata_uses_h5ad_kwargs (line 567) | def test_from_anndata_uses_h5ad_kwargs(get_small_adata):
  function test_from_anndata_uses_zarr_kwargs (line 581) | def test_from_anndata_uses_zarr_kwargs(get_small_adata):
  function test_from_anndata_validate_suffix (line 596) | def test_from_anndata_validate_suffix(get_small_adata):
  function test_create_from_mudata (line 621) | def test_create_from_mudata(get_small_mdata, mudata_file, adata_file):
  function test_create_from_spatialdata (line 636) | def test_create_from_spatialdata(
  function test_create_from_anndata_in_storage (line 668) | def test_create_from_anndata_in_storage(data):
  function test_revise_recreate_artifact (line 685) | def test_revise_recreate_artifact(example_dataframe: pd.DataFrame, ccapl...
  function test_delete_and_restore_artifact (line 894) | def test_delete_and_restore_artifact(example_dataframe: pd.DataFrame):
  function test_delete_storage (line 925) | def test_delete_storage():
  function test_recreate_after_artifact_moved_in_storage (line 930) | def test_recreate_after_artifact_moved_in_storage(ccaplog):
  function test_move_artifact_exception_handling (line 948) | def test_move_artifact_exception_handling():
  function test_auto_storage_key_from_artifact_uid (line 1095) | def test_auto_storage_key_from_artifact_uid(suffix):
  function test_storage_root_upath_equivalence (line 1106) | def test_storage_root_upath_equivalence():
  function test_get_relative_path_to_directory (line 1112) | def test_get_relative_path_to_directory():
  function test_check_path_is_child_of_root (line 1150) | def test_check_path_is_child_of_root():
  function test_serialize_paths (line 1209) | def test_serialize_paths():
  function test_data_is_anndata_paths (line 1265) | def test_data_is_anndata_paths():
  function test_data_is_anndata_anndatacessor (line 1274) | def test_data_is_anndata_anndatacessor(get_small_adata):
  function test_data_is_mudata_paths (line 1283) | def test_data_is_mudata_paths():
  function test_data_is_spatialdata_paths (line 1288) | def test_data_is_spatialdata_paths():
  function test_data_is_scversedatastructure (line 1307) | def test_data_is_scversedatastructure(request, data, data_type, expected):
  function test_load_to_memory (line 1319) | def test_load_to_memory(tsv_file, zip_file, fcs_file, yaml_file):
  function test_bulk_delete (line 1341) | def test_bulk_delete():
  function test_no_unnecessary_imports (line 1387) | def test_no_unnecessary_imports(
  function test_artifact_get_tracking (line 1409) | def test_artifact_get_tracking(example_dataframe: pd.DataFrame):
  function test_get_by_path (line 1423) | def test_get_by_path(example_dataframe: pd.DataFrame):
  function test_update_suffix_for_registered_storage_with_real_key (line 1458) | def test_update_suffix_for_registered_storage_with_real_key(
  function test_update_suffix_for_registered_storage_folder_artifact (line 1486) | def test_update_suffix_for_registered_storage_folder_artifact(
  function test_update_non_virtual_key_for_registered_storage_file (line 1516) | def test_update_non_virtual_key_for_registered_storage_file(
  function test_update_non_virtual_key_for_registered_storage_file_invalid_suffix (line 1551) | def test_update_non_virtual_key_for_registered_storage_file_invalid_suffix(
  function test_update_key_to_none_raises_invalid_argument (line 1569) | def test_update_key_to_none_raises_invalid_argument(
  function test_update_non_virtual_key_before_save_raises_invalid_argument (line 1586) | def test_update_non_virtual_key_before_save_raises_invalid_argument(tsv_...
  function test_update_non_virtual_key_in_unmanaged_storage_raises_invalid_argument (line 1598) | def test_update_non_virtual_key_in_unmanaged_storage_raises_invalid_argu...
  function test_create_artifact_in_foreign_managed_storage_raises_value_error (line 1615) | def test_create_artifact_in_foreign_managed_storage_raises_value_error(t...
  function test_save_url_with_virtual_key_and_unmanaged_suffix_update_error (line 1629) | def test_save_url_with_virtual_key_and_unmanaged_suffix_update_error():
  function test_change_space_for_artifact_in_foreign_managed_storage_raises_value_error (line 1656) | def test_change_space_for_artifact_in_foreign_managed_storage_raises_val...
  function test_save_artifact_to_foreign_managed_storage_raises_value_error (line 1679) | def test_save_artifact_to_foreign_managed_storage_raises_value_error(tsv...
  function test_artifact_space_change (line 1693) | def test_artifact_space_change(tsv_file):
  function test_passing_foreign_keys_ids (line 1718) | def test_passing_foreign_keys_ids(tsv_file):

FILE: tests/core/test_artifact_dataframe_with_curation.py
  function two_internal_features (line 10) | def two_internal_features():
  function two_external_features (line 19) | def two_external_features():
  function test_create_artifact_with_external_feature_annotations (line 28) | def test_create_artifact_with_external_feature_annotations(
  function test_artifact_from_dataframe_with_schema (line 68) | def test_artifact_from_dataframe_with_schema(example_dataframe: pd.DataF...
  function test_artifact_dataframe_with_features (line 99) | def test_artifact_dataframe_with_features(example_dataframe: pd.DataFrame):
  function test_from_dataframe_with_external_schema (line 124) | def test_from_dataframe_with_external_schema(

FILE: tests/core/test_artifact_describe_to_dataframe.py
  function _check_df_equality (line 11) | def _check_df_equality(actual_df: pd.DataFrame, expected_df: pd.DataFram...
  function test_describe_to_dataframe_example_dataset (line 50) | def test_describe_to_dataframe_example_dataset():

FILE: tests/core/test_artifact_features_annotations.py
  function test_artifact_features_add_remove_query (line 13) | def test_artifact_features_add_remove_query():
  function test_features_name_duplicates_across_root_and_nested (line 383) | def test_features_name_duplicates_across_root_and_nested():
  function test_features_name_duplicates_across_equal_levels (line 399) | def test_features_name_duplicates_across_equal_levels():
  function test_feature_predicate_queries_safe_hybrid (line 440) | def test_feature_predicate_queries_safe_hybrid():
  function test_features_add_with_schema (line 505) | def test_features_add_with_schema():
  function test_artifact_feature_cat_filters_schema_end_to_end (line 527) | def test_artifact_feature_cat_filters_schema_end_to_end():
  function test_features_add_remove_error_behavior (line 575) | def test_features_add_remove_error_behavior():
  function test_add_remove_list_features (line 801) | def test_add_remove_list_features(ccaplog):
  function test_add_list_of_cat_features (line 837) | def test_add_list_of_cat_features():
  function test_artifact_features_accept_feature_object_keys (line 899) | def test_artifact_features_accept_feature_object_keys():

FILE: tests/core/test_artifact_parquet.py
  function test_parquet_kwargs (line 6) | def test_parquet_kwargs():

FILE: tests/core/test_blocks.py
  function test_block_recovery_based_on_hash (line 5) | def test_block_recovery_based_on_hash():
  function test_block_recovery_based_on_key (line 15) | def test_block_recovery_based_on_key():
  function test_readme_md_key_is_allowed_and_revises (line 25) | def test_readme_md_key_is_allowed_and_revises():
  function test_revise_blocks (line 39) | def test_revise_blocks():
  function test_record_block_readme_always_new_version (line 152) | def test_record_block_readme_always_new_version():
  function test_record_block_comment_always_new_block (line 165) | def test_record_block_comment_always_new_block():
  function test_record_block_recovery_based_on_record_and_kind (line 195) | def test_record_block_recovery_based_on_record_and_kind():
  function test_revise_record_blocks (line 206) | def test_revise_record_blocks():
  function test_record_block_filter_respects_default_branch_scope (line 271) | def test_record_block_filter_respects_default_branch_scope():

FILE: tests/core/test_branches.py
  function testbranch_id (line 4) | def testbranch_id():

FILE: tests/core/test_can_curate.py
  function test_inspect (line 8) | def test_inspect():
  function test_inspect_source (line 26) | def test_inspect_source():
  function test_standardize (line 55) | def test_standardize():
  function test_standardize_from_source (line 81) | def test_standardize_from_source():
  function test_add_remove_synonym (line 86) | def test_add_remove_synonym():
  function test_set_abbr (line 128) | def test_set_abbr():
  function test_validate_int (line 151) | def test_validate_int():
  function test_synonym_mapping (line 156) | def test_synonym_mapping():
  function test_validate_called_on_object_raises_error (line 173) | def test_validate_called_on_object_raises_error():
  function test_standardize_source (line 184) | def test_standardize_source():

FILE: tests/core/test_collection.py
  function df (line 13) | def df():
  function adata (line 18) | def adata():
  function adata2 (line 29) | def adata2():
  function test_from_single_artifact (line 38) | def test_from_single_artifact(adata):
  function test_edge_cases (line 68) | def test_edge_cases(df, ccaplog):
  function test_from_inconsistent_artifacts (line 96) | def test_from_inconsistent_artifacts(df, adata):
  function test_from_consistent_artifacts (line 126) | def test_from_consistent_artifacts(adata, adata2):
  function test_mapped (line 172) | def test_mapped(adata, adata2):
  function test_revise_collection (line 415) | def test_revise_collection(df, adata):
  function test_collection_append (line 477) | def test_collection_append(df, adata):
  function test_with_metadata (line 493) | def test_with_metadata(df, adata):
  function test_collection_get_tracking (line 510) | def test_collection_get_tracking(df):
  function test_describe_collection (line 527) | def test_describe_collection(adata, capsys):

FILE: tests/core/test_curator_basics.py
  function _strip_ansi (line 11) | def _strip_ansi(text: str) -> str:
  function df (line 18) | def df() -> pd.DataFrame:
  function df_missing_sample_type_column (line 29) | def df_missing_sample_type_column() -> pd.DataFrame:
  function df_missing_sample_name_column (line 39) | def df_missing_sample_name_column() -> pd.DataFrame:
  function df_changed_col_order (line 49) | def df_changed_col_order() -> pd.DataFrame:
  function df_extra_column (line 60) | def df_extra_column() -> pd.DataFrame:
  function df_disease (line 72) | def df_disease() -> pd.DataFrame:
  function disease_ontology_old (line 91) | def disease_ontology_old() -> bt.Source:
  function lists_df (line 100) | def lists_df():
  function cat_df (line 112) | def cat_df():
  function test_curator_df_multivalue (line 123) | def test_curator_df_multivalue(lists_df, cat_df):
  function test_curators_list_feature_nullable_empty_list (line 164) | def test_curators_list_feature_nullable_empty_list():
  function test_curator__repr__ (line 185) | def test_curator__repr__(df):
  function test_df_curator_typed_categorical (line 209) | def test_df_curator_typed_categorical(model_class):
  function test_df_curator_same_name_at_different_levels_involving_root (line 297) | def test_df_curator_same_name_at_different_levels_involving_root():
  function test_df_curator_same_name_at_different_levels_below_root (line 328) | def test_df_curator_same_name_at_different_levels_below_root():
  function test_df_curator_same_name_at_same_level (line 361) | def test_df_curator_same_name_at_same_level():
  function test_curator_schema_feature_mapping (line 397) | def test_curator_schema_feature_mapping():
  function test_dtypes_at_different_levels (line 416) | def test_dtypes_at_different_levels(ccaplog):
  function test_nullable (line 459) | def test_nullable():
  function test_pandera_dataframe_schema (line 483) | def test_pandera_dataframe_schema(
  function test_schema_not_saved (line 579) | def test_schema_not_saved(df):
  function test_schema_artifact_annotated (line 591) | def test_schema_artifact_annotated(df):
  function test_schema_optionals (line 610) | def test_schema_optionals():
  function test_schema_ordered_set (line 645) | def test_schema_ordered_set(df):
  function test_schema_minimal_set_var_allowed (line 670) | def test_schema_minimal_set_var_allowed(minimal_set):
  function test_schema_maximal_set_var (line 687) | def test_schema_maximal_set_var():
  function test_feature_dtype_path (line 707) | def test_feature_dtype_path():
  function test_cat_filters_specific_source_uid (line 786) | def test_cat_filters_specific_source_uid(df_disease, disease_ontology_old):
  function test_cat_filters_specific_source (line 806) | def test_cat_filters_specific_source(df_disease, disease_ontology_old):
  function test_cat_filters_multiple_relation_filters (line 827) | def test_cat_filters_multiple_relation_filters(df_disease, disease_ontol...
  function test_curate_columns (line 851) | def test_curate_columns(df):
  function test_wrong_datatype (line 879) | def test_wrong_datatype(df):
  function test_hash_index_feature (line 899) | def test_hash_index_feature(df):
  function test_add_new_from_subtype (line 942) | def test_add_new_from_subtype(df):
  function test_index_feature_exclusion_from_categoricals (line 975) | def test_index_feature_exclusion_from_categoricals(df):

FILE: tests/core/test_data_migrations.py
  function test_migrate_auxiliary_fields_postgres (line 13) | def test_migrate_auxiliary_fields_postgres():

FILE: tests/core/test_db.py
  function test_create_to_load (line 4) | def test_create_to_load():

FILE: tests/core/test_delete.py
  function test_delete_qs (line 7) | def test_delete_qs(permanent):
  function test_recreate_soft_deleted_record (line 22) | def test_recreate_soft_deleted_record():

FILE: tests/core/test_feature.py
  function dict_data (line 11) | def dict_data():
  function test_feature_init (line 23) | def test_feature_init():
  function test_cat_filters_empty_filter (line 106) | def test_cat_filters_empty_filter():
  function test_cat_filters_invalid_field_name (line 116) | def test_cat_filters_invalid_field_name():
  function test_cat_filters_artifact_schema_filter (line 134) | def test_cat_filters_artifact_schema_filter():
  function test_feature_from_df (line 149) | def test_feature_from_df():
  function test_feature_from_dict (line 204) | def test_feature_from_dict(dict_data):
  function test_feature_from_dict_type (line 226) | def test_feature_from_dict_type(dict_data):
  function test_feature_query_by_dtype (line 235) | def test_feature_query_by_dtype():

FILE: tests/core/test_feature_dtype.py
  function organism (line 19) | def organism():
  function test_serialize_basic_dtypes (line 31) | def test_serialize_basic_dtypes():
  function test_serialize_basic_list_dtypes (line 42) | def test_serialize_basic_list_dtypes():
  function test_seralize_pandas_numpy_dtypes (line 52) | def test_seralize_pandas_numpy_dtypes():
  function test_serialize_user (line 58) | def test_serialize_user(ccaplog):
  function test_serialize_record_objects (line 71) | def test_serialize_record_objects():
  function test_serialize_union_of_registries (line 114) | def test_serialize_union_of_registries():
  function test_serialize_with_field_information (line 121) | def test_serialize_with_field_information():
  function test_simple_record_with_subtype_and_field (line 133) | def test_simple_record_with_subtype_and_field():
  function test_multiple_records_with_subtypes_and_fields (line 150) | def test_multiple_records_with_subtypes_and_fields():
  function test_bionty_celltype_with_field (line 179) | def test_bionty_celltype_with_field():
  function test_bionty_perturbations_with_field (line 192) | def test_bionty_perturbations_with_field():
  function test_invalid_registry (line 212) | def test_invalid_registry():
  function test_empty_category (line 219) | def test_empty_category():
  function test_url_dtype_is_supported (line 225) | def test_url_dtype_is_supported():
  function test_malformed_categorical (line 231) | def test_malformed_categorical():
  function test_simple_registry_without_field (line 246) | def test_simple_registry_without_field():
  function test_registry_with_subtype_no_field (line 259) | def test_registry_with_subtype_no_field():
  function test_list_of_dtypes (line 276) | def test_list_of_dtypes():
  function test_registry_with_filter (line 295) | def test_registry_with_filter():
  function test_nested_cat_dtypes (line 308) | def test_nested_cat_dtypes():
  function test_nested_cat_with_filter (line 329) | def test_nested_cat_with_filter():
  function test_feature_dtype (line 357) | def test_feature_dtype():
  function test_cat_filters_incompatible_with_union_dtypes (line 379) | def test_cat_filters_incompatible_with_union_dtypes():
  function test_cat_filters_incompatible_with_nested_dtypes (line 392) | def test_cat_filters_incompatible_with_nested_dtypes():
  function test_parse_filter_string_basic (line 407) | def test_parse_filter_string_basic():
  function test_parse_filter_string_direct_fields (line 416) | def test_parse_filter_string_direct_fields():
  function test_parse_filter_string_empty (line 422) | def test_parse_filter_string_empty():
  function test_parse_filter_string_malformed (line 428) | def test_parse_filter_string_malformed():
  function test_parse_filter_string_missing_key (line 434) | def test_parse_filter_string_missing_key():
  function test_parse_filter_string_missing_value (line 440) | def test_parse_filter_string_missing_value():
  function test_resolve_direct_fields (line 446) | def test_resolve_direct_fields():
  function test_resolve_relation_filter_with_uid (line 452) | def test_resolve_relation_filter_with_uid():
  function test_resolve_relation_filter_with_name (line 469) | def test_resolve_relation_filter_with_name(organism):
  function test_resolve_multiple_relation_filters (line 476) | def test_resolve_multiple_relation_filters(organism):
  function test_resolve_nested_filter (line 496) | def test_resolve_nested_filter(organism):
  function test_resolve_relation_filter_failed_resolution (line 503) | def test_resolve_relation_filter_failed_resolution():
  function test_resolve_relation_filter_duplicate (line 509) | def test_resolve_relation_filter_duplicate():
  function test_convert_old_format_ulabel_string (line 523) | def test_convert_old_format_ulabel_string():
  function test_convert_old_format_record_string (line 539) | def test_convert_old_format_record_string():
  function test_convert_old_format_nested_record_string (line 555) | def test_convert_old_format_nested_record_string():
  function test_convert_old_format_list_string (line 573) | def test_convert_old_format_list_string():
  function test_feature_constructor_with_old_format_string (line 594) | def test_feature_constructor_with_old_format_string(ccaplog):
  function test_feature_constructor_with_old_format_nested_string (line 617) | def test_feature_constructor_with_old_format_nested_string(ccaplog):
  function test_bare_cat_dtype_backward_compatibility (line 643) | def test_bare_cat_dtype_backward_compatibility():
  function test_migrate_dtype_to_uid_format (line 655) | def test_migrate_dtype_to_uid_format():

FILE: tests/core/test_from_values.py
  function df (line 8) | def df():
  function test_from_values_name (line 19) | def test_from_values_name(df):
  function test_from_values_ontology_id (line 34) | def test_from_values_ontology_id(df):
  function test_from_values_multiple_match (line 43) | def test_from_values_multiple_match():
  function test_get_or_create_records (line 48) | def test_get_or_create_records():
  function test_from_values_synonyms_aware (line 57) | def test_from_values_synonyms_aware():
  function test_standardize (line 92) | def test_standardize():
  function test_from_values_no_source (line 105) | def test_from_values_no_source():

FILE: tests/core/test_has_parents.py
  function test_view_parents (line 5) | def test_view_parents():
  function test_query_parents_children (line 16) | def test_query_parents_children():
  function test_view_lineage_circular (line 33) | def test_view_lineage_circular():
  function test_view_parents_connected_instance (line 47) | def test_view_parents_connected_instance():
  function test_query_relatives_connected_instance (line 54) | def test_query_relatives_connected_instance():
  function test_view_lineage_connected_instance (line 65) | def test_view_lineage_connected_instance():

FILE: tests/core/test_has_type.py
  function test_invalid_type (line 19) | def test_invalid_type(model_class, extra_kwargs):
  function test_prevent_type_cycle (line 44) | def test_prevent_type_cycle(model_class):
  function test_query_sub_types_super_types_instances (line 73) | def test_query_sub_types_super_types_instances(model_class):

FILE: tests/core/test_integrity.py
  function test_migrate_check (line 4) | def test_migrate_check():
  function test_system_check (line 8) | def test_system_check():

FILE: tests/core/test_is_versioned.py
  function df1 (line 12) | def df1():
  function df2 (line 17) | def df2():
  function test_set_version (line 21) | def test_set_version():
  function test_bump_version (line 32) | def test_bump_version():
  function test_add_to_version_family (line 45) | def test_add_to_version_family(df1, df2):
  function test_transform_versioning_based_on_key (line 65) | def test_transform_versioning_based_on_key():
  function test_transform_versioning_based_on_revises (line 115) | def test_transform_versioning_based_on_revises():
  function test_transform_versioning_across_branches_preserves_main_latest (line 186) | def test_transform_versioning_across_branches_preserves_main_latest():
  function test_path_rename (line 228) | def test_path_rename():
  function test_version_backward_compatibility (line 243) | def test_version_backward_compatibility():
  function test_adjust_is_latest_when_deleting_is_versioned (line 300) | def test_adjust_is_latest_when_deleting_is_versioned():

FILE: tests/core/test_label_manager.py
  function adata (line 14) | def adata():
  function test_labels_add (line 22) | def test_labels_add(adata):
  function test_labels_add_using_anndata (line 99) | def test_labels_add_using_anndata(adata):
  function test_labels_get (line 261) | def test_labels_get(get_mini_csv: Path):  # noqa: F811
  function get_test_artifacts (line 279) | def get_test_artifacts():
  function test_add_from (line 293) | def test_add_from(get_test_artifacts):

FILE: tests/core/test_load.py
  function zip_file (line 13) | def zip_file():
  function html_filepath (line 22) | def html_filepath():
  function json_filepath (line 31) | def json_filepath():
  function csv_filepath (line 40) | def csv_filepath():
  function tsv_filepath (line 49) | def tsv_filepath():
  function parquet_filepath (line 58) | def parquet_filepath():
  function yaml_filepath (line 67) | def yaml_filepath():
  function image_filepath (line 76) | def image_filepath():
  function svg_filepath (line 85) | def svg_filepath():
  function rds_filepath (line 94) | def rds_filepath():
  function local_anndata_filepath (line 103) | def local_anndata_filepath():
  function adata (line 108) | def adata(local_anndata_filepath):
  function test_load_anndata (line 112) | def test_load_anndata(local_anndata_filepath, adata):
  function test_load_mudata (line 124) | def test_load_mudata(get_small_mdata):
  function test_load_spatialdata (line 131) | def test_load_spatialdata(get_small_sdata):
  function load_blobs__repr__ (line 138) | def load_blobs__repr__():
  function test_load_html (line 148) | def test_load_html(html_filepath):
  function test_load_json (line 153) | def test_load_json(json_filepath):
  function test_no_loader (line 159) | def test_no_loader(zip_file):
  function test_load_csv (line 165) | def test_load_csv(csv_filepath):
  function test_load_tsv (line 172) | def test_load_tsv(tsv_filepath):
  function test_load_parquet (line 179) | def test_load_parquet(parquet_filepath):
  function test_load_yaml (line 186) | def test_load_yaml(yaml_filepath):
  function test_load_image (line 193) | def test_load_image(image_filepath):
  function test_load_svg (line 199) | def test_load_svg(svg_filepath):
  function test_load_rds (line 205) | def test_load_rds(rds_filepath, ccaplog):

FILE: tests/core/test_manager.py
  function test_manager_list (line 4) | def test_manager_list():

FILE: tests/core/test_merge.py
  function test_merge_branch_into_main (line 7) | def test_merge_branch_into_main():
  function test_branch_status_values (line 44) | def test_branch_status_values():
  function test_draft_review_and_close_merge_request_status (line 70) | def test_draft_review_and_close_merge_request_status():
  function test_merge_nonexistent_branch_raises (line 92) | def test_merge_nonexistent_branch_raises():
  function test_merge_reconciles_is_latest_for_versioned_records (line 99) | def test_merge_reconciles_is_latest_for_versioned_records():
  function test_merge_updates_recordblock_branch (line 134) | def test_merge_updates_recordblock_branch():

FILE: tests/core/test_nbconvert.py
  function test_nbconvert (line 4) | def test_nbconvert():

FILE: tests/core/test_notebooks.py
  function test_all_notebooks (line 12) | def test_all_notebooks():
  function test_run_after_rename_no_uid (line 17) | def test_run_after_rename_no_uid():

FILE: tests/core/test_querydb.py
  function test_DB_multiple_instances (line 5) | def test_DB_multiple_instances():
  function test_DB_bionty (line 14) | def test_DB_bionty():
  function test_DB_missing_module (line 20) | def test_DB_missing_module():
  function test_DB_instantiate_class (line 33) | def test_DB_instantiate_class():
  function test_DB_rejects_invalid_attributes (line 52) | def test_DB_rejects_invalid_attributes(attr, expected_msg):
  function test_DB_cache (line 60) | def test_DB_cache():
  function test_queryset_caching (line 68) | def test_queryset_caching():
  function test_DB_dir (line 77) | def test_DB_dir():

FILE: tests/core/test_queryset.py
  function test_to_dataframe (line 15) | def test_to_dataframe():
  function test_complex_df_with_features (line 84) | def test_complex_df_with_features():
  function test_run_to_dataframe_includes_json_features (line 91) | def test_run_to_dataframe_includes_json_features():
  function test_one_first (line 107) | def test_one_first():
  function test_filter_related_field_name (line 129) | def test_filter_related_field_name():
  function test_filter_unknown_field (line 139) | def test_filter_unknown_field():
  function test_filter_status_field (line 145) | def test_filter_status_field():
  function test_get_id_type_error (line 168) | def test_get_id_type_error():
  function test_get_related_field_name (line 175) | def test_get_related_field_name():
  function test_get_unknown_field (line 185) | def test_get_unknown_field():
  function test_search (line 191) | def test_search():
  function test_lookup (line 202) | def test_lookup():
  function test_inspect (line 216) | def test_inspect():
  function test_validate (line 224) | def test_validate():
  function test_standardize (line 239) | def test_standardize():
  function test_get_doesnotexist_error (line 244) | def test_get_doesnotexist_error():
  function set_branch (line 259) | def set_branch(branch: ln.Branch):
  function test_get_filter_branch (line 268) | def test_get_filter_branch():
  function test_to_class (line 329) | def test_to_class():
  function test_queryset_soft_delete_error (line 348) | def test_queryset_soft_delete_error():
  function test_encode_lamindb_fields_as_columns (line 356) | def test_encode_lamindb_fields_as_columns():

FILE: tests/core/test_record_basics.py
  function test_record_docstring_examples (line 14) | def test_record_docstring_examples():
  function test_record_initialization (line 74) | def test_record_initialization():
  function test_record_lazy_features_on_save (line 88) | def test_record_lazy_features_on_save():
  function test_record_from_dataframe_bulk_save_paths (line 99) | def test_record_from_dataframe_bulk_save_paths():
  function test_record_from_dataframe_requires_named_type (line 133) | def test_record_from_dataframe_requires_named_type():
  function test_record_from_dataframe_with_string_type_creates_import_type (line 147) | def test_record_from_dataframe_with_string_type_creates_import_type():
  function test_record_from_dataframe_with_string_type_duplicate_name_errors (line 198) | def test_record_from_dataframe_with_string_type_duplicate_name_errors():
  function test_feature_manager_raise_not_validated_values (line 224) | def test_feature_manager_raise_not_validated_values():
  function test_name_lookup (line 248) | def test_name_lookup():
  function test_invalid_type_record_with_schema (line 266) | def test_invalid_type_record_with_schema():
  function test_record_features_add_remove_values (line 282) | def test_record_features_add_remove_values():
  function test_date_and_datetime_corruption (line 803) | def test_date_and_datetime_corruption():
  function test_only_list_type_features_and_field_qualifiers (line 847) | def test_only_list_type_features_and_field_qualifiers():
  function test_record_feature_predicate_query (line 921) | def test_record_feature_predicate_query():
  function test_record_features_accept_feature_object_keys (line 941) | def test_record_features_accept_feature_object_keys():

FILE: tests/core/test_record_sheet_examples.py
  function test_float_int_casting (line 9) | def test_float_int_casting():
  function test_record_example_compound_treatment (line 39) | def test_record_example_compound_treatment(
  function test_nextflow_sheet_with_samples (line 186) | def test_nextflow_sheet_with_samples(
  function test_record_soft_deleted_recreate (line 263) | def test_record_soft_deleted_recreate():
  function test_annotate_with_user_feature (line 282) | def test_annotate_with_user_feature():
  function test_to_artifact_exports_all_records (line 314) | def test_to_artifact_exports_all_records():
  function test_to_artifact_with_required_non_nullable_data_id_maximal_set_true (line 325) | def test_to_artifact_with_required_non_nullable_data_id_maximal_set_true():

FILE: tests/core/test_rename_features_labels.py
  function test_rename_feature (line 9) | def test_rename_feature(ccaplog):
  function test_rename_label (line 55) | def test_rename_label(model_class, ccaplog):

FILE: tests/core/test_run.py
  function test_run (line 7) | def test_run():
  function test_bulk_permanent_run_delete (line 58) | def test_bulk_permanent_run_delete(tmp_path):

FILE: tests/core/test_save.py
  function test_bulk_save_and_update (line 11) | def test_bulk_save_and_update():
  function test_prepare_error_message (line 24) | def test_prepare_error_message(get_mini_csv):
  function test_save_data_object (line 37) | def test_save_data_object(get_mini_csv):
  function test_store_artifacts_acid (line 44) | def test_store_artifacts_acid(get_mini_csv):
  function test_save_parents (line 60) | def test_save_parents():
  function test_save_batch_size (line 68) | def test_save_batch_size():
  function test_bulk_save_lazy_record_features (line 76) | def test_bulk_save_lazy_record_features():
  function test_bulk_save_lazy_record_features_requires_same_schema (line 119) | def test_bulk_save_lazy_record_features_requires_same_schema():
  function test_bulk_save_lazy_record_features_requires_schema (line 145) | def test_bulk_save_lazy_record_features_requires_schema():
  function test_bulk_resave_trashed_records (line 163) | def test_bulk_resave_trashed_records():

FILE: tests/core/test_schema.py
  function df (line 11) | def df():
  function test_schema_from_values (line 22) | def test_schema_from_values():
  function test_schema_from_records (line 68) | def test_schema_from_records(df):
  function test_schema_from_df (line 97) | def test_schema_from_df(df):
  function test_get_related_name (line 118) | def test_get_related_name():
  function test_validate_features (line 123) | def test_validate_features():
  function test_kwargs (line 138) | def test_kwargs():
  function test_edge_cases (line 143) | def test_edge_cases():
  function mini_immuno_schema_flexible (line 156) | def mini_immuno_schema_flexible():
  function test_schema_update_implicit_through_name_equality (line 169) | def test_schema_update_implicit_through_name_equality(
  function test_schema_update (line 249) | def test_schema_update(
  function test_schema_mutations_feature_removal (line 343) | def test_schema_mutations_feature_removal(
  function test_schema_add_remove_optional_features (line 368) | def test_schema_add_remove_optional_features(mini_immuno_schema_flexible...
  function test_schema_components (line 378) | def test_schema_components(mini_immuno_schema_flexible: ln.Schema):
  function test_mini_immuno_schema_flexible (line 425) | def test_mini_immuno_schema_flexible(mini_immuno_schema_flexible):
  function test_schema_recovery_based_on_hash (line 454) | def test_schema_recovery_based_on_hash(mini_immuno_schema_flexible: ln.S...
  function test_schemas_dataframe (line 466) | def test_schemas_dataframe():
  function test_schemas_anndata (line 487) | def test_schemas_anndata():
  function test_schema_already_saved_aux (line 534) | def test_schema_already_saved_aux():
  function test_schema_not_saved_describe (line 607) | def test_schema_not_saved_describe():
  function test_schema_is_type (line 614) | def test_schema_is_type():
  function test_composite_component (line 631) | def test_composite_component():
  function test_schema_describe_bracket_names (line 661) | def test_schema_describe_bracket_names():

FILE: tests/core/test_search.py
  function prepare_cell_type_registry (line 7) | def prepare_cell_type_registry():
  function test_search_synonyms (line 46) | def test_search_synonyms(prepare_cell_type_registry):
  function test_search_limit (line 51) | def test_search_limit(prepare_cell_type_registry):
  function test_search_case_sensitive (line 56) | def test_search_case_sensitive(prepare_cell_type_registry):
  function test_search_None (line 61) | def test_search_None():

FILE: tests/core/test_settings.py
  function test_settings_repr (line 5) | def test_settings_repr():
  function test_storage_setter_raises_on_foreign_managed_storage (line 18) | def test_storage_setter_raises_on_foreign_managed_storage(tmp_path):
  function test_local_storage_setter_raises_on_foreign_managed_storage (line 32) | def test_local_storage_setter_raises_on_foreign_managed_storage(tmp_path):

FILE: tests/core/test_sqlrecord.py
  function test_feature_describe (line 19) | def test_feature_describe():
  function test_artifact_describe (line 54) | def test_artifact_describe():
  function test_repr_describe (line 115) | def test_repr_describe():
  function test_record_describe_includes_features (line 121) | def test_record_describe_includes_features():
  function test_validate_literal_fields (line 135) | def test_validate_literal_fields():
  function test_init_with_args (line 141) | def test_init_with_args():
  function test_validate_required_fields (line 153) | def test_validate_required_fields():
  function get_search_test_filepaths (line 163) | def get_search_test_filepaths():
  function test_search_and_get (line 172) | def test_search_and_get(get_search_test_filepaths):
  function test_suggest_similar_names (line 238) | def test_suggest_similar_names():
  function test_pass_version (line 278) | def test_pass_version():
  function test_delete (line 296) | def test_delete():
  function test_get_name_field (line 315) | def test_get_name_field():
  function test_using (line 323) | def test_using():
  function test_get_record_kwargs (line 354) | def test_get_record_kwargs():
  function test_get_record_kwargs_empty (line 376) | def test_get_record_kwargs_empty():
  function test_soft_delete_error (line 389) | def test_soft_delete_error():
  function test_delete_return_value_permanent (line 397) | def test_delete_return_value_permanent():
  function test_unsaved_relationship_modification_attempts (line 415) | def test_unsaved_relationship_modification_attempts():
  function test_failed_connect (line 433) | def test_failed_connect():
  function test_unsaved_model_different_instance (line 441) | def test_unsaved_model_different_instance():
  function test_track_fields_with_deferred_columns (line 459) | def test_track_fields_with_deferred_columns(example_dataframe: pd.DataFr...
  function test_track_fields_must_exist_on_model (line 472) | def test_track_fields_must_exist_on_model(monkeypatch, example_dataframe...

FILE: tests/core/test_storage.py
  function test_reference_storage_location (line 9) | def test_reference_storage_location(ccaplog):
  function test_create_storage_locations_parallel (line 18) | def test_create_storage_locations_parallel():

FILE: tests/core/test_switch.py
  function test_switch_create_existing_branch_raises (line 7) | def test_switch_create_existing_branch_raises():

FILE: tests/core/test_track_flow.py
  function process_chunk (line 12) | def process_chunk(
  function test_flow (line 29) | def test_flow():
  function test_flow_track_arg_aliases_implicit (line 81) | def test_flow_track_arg_aliases_implicit():
  function test_flow_track_arg_aliases_false (line 96) | def test_flow_track_arg_aliases_false():

FILE: tests/core/test_track_script_or_notebook.py
  function test_serialize_params_to_json (line 27) | def test_serialize_params_to_json():
  function test_redact_secrets_in_source_code (line 68) | def test_redact_secrets_in_source_code():
  function test_redact_secrets_in_source_code_keeps_env_references (line 87) | def test_redact_secrets_in_source_code_keeps_env_references():
  function test_redact_secrets_in_source_code_ignores_annotations_and_forwarding (line 103) | def test_redact_secrets_in_source_code_ignores_annotations_and_forwardin...
  function test_serialize_params_to_json_redacts_provider_api_key_names (line 119) | def test_serialize_params_to_json_redacts_provider_api_key_names():
  function test_redact_secrets_in_source_code_redacts_provider_api_key_names (line 135) | def test_redact_secrets_in_source_code_redacts_provider_api_key_names():
  function test_track_basic_invocation (line 152) | def test_track_basic_invocation():
  function test_track_accepts_initiated_by_run_uid (line 240) | def test_track_accepts_initiated_by_run_uid():
  function test_track_uses_initiated_by_run_uid_from_env (line 272) | def test_track_uses_initiated_by_run_uid_from_env(monkeypatch: pytest.Mo...
  function test_track_with_plan_links_run (line 293) | def test_track_with_plan_links_run(tmp_path, pass_plan_as_key):
  function create_record (line 321) | def create_record():
  function test_track_input_record (line 344) | def test_track_input_record(create_record, kind):
  function test_track_notebook_colab (line 378) | def test_track_notebook_colab():
  function test_track_notebook_untitled (line 383) | def test_track_notebook_untitled():
  function test_detect_and_process_source_code_file_returns_key_from_module_for_package (line 393) | def test_detect_and_process_source_code_file_returns_key_from_module_for...
  function test_detect_and_process_source_code_file_returns_none_key_for_script (line 418) | def test_detect_and_process_source_code_file_returns_none_key_for_script():
  function test_finish_before_track (line 442) | def test_finish_before_track():
  function test_invalid_transform_kind (line 449) | def test_invalid_transform_kind():
  function test_create_or_load_transform (line 462) | def test_create_or_load_transform():
  function test_create_or_load_transform_warns_when_outside_dev_dir (line 500) | def test_create_or_load_transform_warns_when_outside_dev_dir(
  function test_run_scripts (line 527) | def test_run_scripts():
  function test_run_external_script (line 616) | def test_run_external_script():
  function test_track_notebook_or_script_manually (line 647) | def test_track_notebook_or_script_manually(type):
  function test_clean_r_notebook_html (line 657) | def test_clean_r_notebook_html():
  function test_notebook_to_script_notebooknode_metadata (line 672) | def test_notebook_to_script_notebooknode_metadata(tmp_path):
  class MockRun (line 691) | class MockRun:
    method __init__ (line 692) | def __init__(self, uid):
    method save (line 697) | def save(self):
  function test_logstream_tracker_multiple (line 701) | def test_logstream_tracker_multiple():
  function test_logstream_tracker_exception_handling (line 765) | def test_logstream_tracker_exception_handling():
  function test_logstream_tracker_cleanup_sigint_chains_to_keyboard_interrupt (line 805) | def test_logstream_tracker_cleanup_sigint_chains_to_keyboard_interrupt():

FILE: tests/core/test_track_step.py
  function process_chunk (line 11) | def process_chunk(
  function test_step_parallel (line 27) | def test_step_parallel():

FILE: tests/core/test_transform.py
  function test_transform_recovery_based_on_hash (line 8) | def test_transform_recovery_based_on_hash():
  function test_transform_recovery_based_on_key (line 18) | def test_transform_recovery_based_on_key():
  function test_revise_transforms (line 28) | def test_revise_transforms():
  function test_delete (line 159) | def test_delete():
  function test_successor_predecessor (line 195) | def test_successor_predecessor():
  function test_bulk_transform_permanent_delete (line 228) | def test_bulk_transform_permanent_delete(tmp_path):
  function test_single_transform_permanent_delete_delegates_to_queryset (line 263) | def test_single_transform_permanent_delete_delegates_to_queryset(tmp_path):
  function test_bulk_transform_soft_delete (line 291) | def test_bulk_transform_soft_delete():
  function test_bulk_transform_permanent_delete_promotes_previous_version (line 302) | def test_bulk_transform_permanent_delete_promotes_previous_version():
  function test_bulk_transform_soft_delete_promotes_previous_version (line 318) | def test_bulk_transform_soft_delete_promotes_previous_version():

FILE: tests/core/test_transform_from_git.py
  function test_transform_from_git (line 7) | def test_transform_from_git():
  function test_transform_from_git_with_entrypoint (line 51) | def test_transform_from_git_with_entrypoint():
  function test_transform_custom_key_and_hash_lookup (line 73) | def test_transform_custom_key_and_hash_lookup():
  function test_transform_from_git_failure_modes (line 92) | def test_transform_from_git_failure_modes():

FILE: tests/core/test_view.py
  function test_view (line 4) | def test_view():

FILE: tests/curators/conftest.py
  function pytest_sessionstart (line 8) | def pytest_sessionstart():
  function pytest_sessionfinish (line 15) | def pytest_sessionfinish(session: pytest.Session):
  function ccaplog (line 21) | def ccaplog(caplog):

FILE: tests/curators/test_cellxgene_curation.py
  function cellxgene_defaults (line 9) | def cellxgene_defaults() -> Generator:
  function test_cellxgene_curation (line 28) | def test_cellxgene_curation(cellxgene_defaults) -> None:

FILE: tests/curators/test_curate_from_croissant.py
  function test_curate_artifact_from_croissant (line 8) | def test_curate_artifact_from_croissant(filepath_prefix: str | None):
  function test_curate_collection_from_croissant (line 59) | def test_curate_collection_from_croissant():

FILE: tests/curators/test_curators_examples.py
  function mini_immuno_schema (line 17) | def mini_immuno_schema():
  function curator_params (line 62) | def curator_params():
  function mudata_papalexi21_subset_schema (line 75) | def mudata_papalexi21_subset_schema():
  function study_metadata_schema (line 147) | def study_metadata_schema():
  function anndata_uns_schema (line 157) | def anndata_uns_schema():
  function spatialdata_blobs_schema (line 167) | def spatialdata_blobs_schema():
  function test_dataframe_curator (line 189) | def test_dataframe_curator(mini_immuno_schema: ln.Schema):
  function test_dataframe_curator_index (line 292) | def test_dataframe_curator_index():
  function test_dataframe_curator_validate_all_annotate_cat (line 308) | def test_dataframe_curator_validate_all_annotate_cat(mini_immuno_schema):
  function test_same_name_different_type (line 334) | def test_same_name_different_type():
  function test_dataframe_curator_validate_all_annotate_cat2 (line 367) | def test_dataframe_curator_validate_all_annotate_cat2(mini_immuno_schema):
  function test_dataframe_attrs_validation (line 397) | def test_dataframe_attrs_validation(study_metadata_schema, include_attrs...
  function test_schema_new_genes (line 463) | def test_schema_new_genes(ccaplog):
  function test_schema_no_match_ensembl (line 493) | def test_schema_no_match_ensembl():
  function test_schema_mixed_ensembl_symbols (line 518) | def test_schema_mixed_ensembl_symbols(ccaplog):
  function test_schema_mixed_features (line 552) | def test_schema_mixed_features(ccaplog):
  function test_anndata_curator_different_components (line 579) | def test_anndata_curator_different_components(mini_immuno_schema: ln.Sch...
  function test_anndata_curator_varT_curation (line 656) | def test_anndata_curator_varT_curation():
  function test_anndata_curator_varT_curation_legacy (line 709) | def test_anndata_curator_varT_curation_legacy(ccaplog):
  function test_anndata_curator_nested_uns (line 753) | def test_anndata_curator_nested_uns(study_metadata_schema, anndata_uns_s...
  function test_anndata_curator_no_var (line 802) | def test_anndata_curator_no_var(mini_immuno_schema: ln.Schema):
  function test_mudata_curator (line 819) | def test_mudata_curator(
  function test_mudata_curator_nested_uns (line 860) | def test_mudata_curator_nested_uns(study_metadata_schema):
  function test_spatialdata_curator (line 907) | def test_spatialdata_curator(
  function test_specific_source (line 962) | def test_specific_source():

FILE: tests/curators/test_curators_remote.py
  function test_curator_remote (line 4) | def test_curator_remote():

FILE: tests/curators/test_dataframe_curation.py
  function transactions_schema (line 11) | def transactions_schema():
  function transactions_dataframe (line 68) | def transactions_dataframe():
  function test_schema_creation (line 92) | def test_schema_creation(transactions_schema):
  function test_data_curation (line 107) | def test_data_curation(
  function test_missing_required_feature (line 120) | def test_missing_required_feature(transactions_schema: ln.Schema):
  function test_invalid_label (line 139) | def test_invalid_label(transactions_schema: ln.Schema):
  function test_invalid_url_dtype (line 161) | def test_invalid_url_dtype(transactions_schema: ln.Schema):

FILE: tests/integrations/conftest.py
  function pytest_sessionstart (line 8) | def pytest_sessionstart():
  function pytest_sessionfinish (line 15) | def pytest_sessionfinish(session: pytest.Session):
  function ccaplog (line 21) | def ccaplog(caplog):

FILE: tests/integrations/test_lightning.py
  function cleanup_checkpoints (line 20) | def cleanup_checkpoints() -> Generator[None, None, None]:
  function cleanup_test_dir (line 29) | def cleanup_test_dir() -> Generator[None, None, None]:
  function simple_model (line 39) | def simple_model() -> pl.LightningModule:
  function dataloader (line 61) | def dataloader() -> DataLoader:
  function dirpath (line 68) | def dirpath(request: pytest.FixtureRequest) -> Generator[str, None, None]:
  function lightning_features (line 81) | def lightning_features() -> Generator[None, None, None]:
  function test_checkpoint_basic (line 99) | def test_checkpoint_basic(
  function test_checkpoint_with_features (line 121) | def test_checkpoint_with_features(
  function test_checkpoint_missing_features (line 159) | def test_checkpoint_missing_features(
  function test_checkpoint_auto_features (line 180) | def test_checkpoint_auto_features(
  function test_checkpoint_auto_features_with_duplicate_score_name (line 211) | def test_checkpoint_auto_features_with_duplicate_score_name(
  function test_checkpoint_best_model_with_duplicate_feature_names (line 237) | def test_checkpoint_best_model_with_duplicate_feature_names(
  function test_checkpoint_query_budget_scales_sublinearly_with_hparams (line 281) | def test_checkpoint_query_budget_scales_sublinearly_with_hparams(
  function test_model_rank_update_query_budget (line 319) | def test_model_rank_update_query_budget(
  function test_checkpoint_best_model_tracking (line 356) | def test_checkpoint_best_model_tracking(
  function test_checkpoint_model_rank (line 384) | def test_checkpoint_model_rank(
  function test_checkpoint_last_model_points_to_last_saved_artifact (line 414) | def test_checkpoint_last_model_points_to_last_saved_artifact(
  function test_checkpoint_semantic_paths (line 447) | def test_checkpoint_semantic_paths(
  function test_callback_deprecated (line 477) | def test_callback_deprecated(
  function test_checkpoint_overwrites_existing_artifact (line 505) | def test_checkpoint_overwrites_existing_artifact(
  function test_checkpoint_invalid_feature_keys (line 531) | def test_checkpoint_invalid_feature_keys(dirpath: str):
  function test_checkpoint_hparams (line 540) | def test_checkpoint_hparams(dataloader: DataLoader, dirpath: str, lightn...
  function test_checkpoint_datamodule_hparams (line 580) | def test_checkpoint_datamodule_hparams(
  function test_checkpoint_trainer_config (line 614) | def test_checkpoint_trainer_config(
  function test_checkpoint_hparams_yaml_with_hparams (line 661) | def test_checkpoint_hparams_yaml_with_hparams(
  function test_key_layout_matrix (line 726) | def test_key_layout_matrix(
  function test_run_uid_not_in_key_when_disabled (line 829) | def test_run_uid_not_in_key_when_disabled(
  function test_two_runs_same_logger_produce_different_keys (line 857) | def test_two_runs_same_logger_produce_different_keys(
  function test_checkpoint_artifact_key_prefix_matrix (line 898) | def test_checkpoint_artifact_key_prefix_matrix(
  function test_checkpoint_auto_features_without_dirpath (line 951) | def test_checkpoint_auto_features_without_dirpath(
  function test_save_config_artifact_key_matrix (line 1013) | def test_save_config_artifact_key_matrix(
  function test_save_config_artifact_tracked_as_run_input (line 1094) | def test_save_config_artifact_tracked_as_run_input(
  function test_checkpoint_subclass_receives_artifact_events (line 1149) | def test_checkpoint_subclass_receives_artifact_events(
  function test_checkpoint_artifact_observers_receive_shared_events (line 1245) | def test_checkpoint_artifact_observers_receive_shared_events(

FILE: tests/no_instance/conftest.py
  function ccaplog (line 5) | def ccaplog(caplog) -> pytest.LogCaptureFixture:

FILE: tests/no_instance/test_connect_dynamic_import.py
  function test_connect_dynamic_import (line 1) | def test_connect_dynamic_import(ccaplog):

FILE: tests/no_instance/test_import_side_effects.py
  function _probe_modules_loaded (line 73) | def _probe_modules_loaded(code: str) -> dict[str, bool]:
  function _assert_modules (line 104) | def _assert_modules(
  function test_storage_import_side_effects (line 120) | def test_storage_import_side_effects(

FILE: tests/no_instance/test_no_default_instance.py
  function test_no_read_only_warning (line 7) | def test_no_read_only_warning(ccaplog):
  function test_instance_not_connected (line 14) | def test_instance_not_connected():
  function test_query_artifacts_lamindata (line 21) | def test_query_artifacts_lamindata():
  function test_get_artifact_lamindata (line 28) | def test_get_artifact_lamindata():

FILE: tests/permissions/conftest.py
  function pytest_sessionstart (line 10) | def pytest_sessionstart():
  function pytest_sessionfinish (line 31) | def pytest_sessionfinish(session: pytest.Session):

FILE: tests/permissions/jwt_utils.py
  function sign_jwt (line 6) | def sign_jwt(db_url, payload: dict) -> str:

FILE: tests/permissions/scripts/check_lamin_dev.py
  function cleanup (line 9) | def cleanup(records):

FILE: tests/permissions/scripts/setup_access.py
  function create_jwt_user (line 21) | def create_jwt_user(dsn_admin: str, jwt_role_name: str):

FILE: tests/permissions/test_rls_dbwritelog.py
  function test_token_expiration (line 39) | def test_token_expiration():
  function test_authentication (line 56) | def test_authentication():
  function test_select_without_db_token (line 133) | def test_select_without_db_token():
  function test_fine_grained_permissions_account_and_dbwrite (line 178) | def test_fine_grained_permissions_account_and_dbwrite():
  function test_fine_grained_permissions_team (line 280) | def test_fine_grained_permissions_team():
  function test_fine_grained_permissions_single_records (line 285) | def test_fine_grained_permissions_single_records():
  function test_atomic (line 357) | def test_atomic():
  function test_utility_tables (line 371) | def test_utility_tables():
  function test_user_rls (line 392) | def test_user_rls():
  function test_write_role (line 415) | def test_write_role():
  function test_locking (line 441) | def test_locking():
  function test_tracking_error (line 449) | def test_tracking_error():
  function test_token_reset (line 494) | def test_token_reset():
  function test_dbwrite_uninstall (line 507) | def test_dbwrite_uninstall():
  function test_lamin_dev (line 537) | def test_lamin_dev():

FILE: tests/profiling/import_records_from_dataframe.py
  function generate_values (line 10) | def generate_values(dtype: str, n_rows: int, rng: Random):
  function main (line 26) | def main(n_rows: int):

FILE: tests/storage/conftest.py
  function create_test_instance (line 13) | def create_test_instance(pgurl: str):
  function pytest_sessionstart (line 29) | def pytest_sessionstart():
  function delete_test_instance (line 53) | def delete_test_instance():
  function pytest_sessionfinish (line 70) | def pytest_sessionfinish(session: pytest.Session):
  function ccaplog (line 76) | def ccaplog(caplog):

FILE: tests/storage/test_artifact_storage.py
  function test_create_from_anndata_in_existing_cloud_storage (line 11) | def test_create_from_anndata_in_existing_cloud_storage():
  function test_create_small_file_from_remote_path (line 30) | def test_create_small_file_from_remote_path(
  function test_versioning_arifact_from_existing_path (line 62) | def test_versioning_arifact_from_existing_path(ccaplog):
  function test_create_big_file_from_remote_path (line 74) | def test_create_big_file_from_remote_path():
  function test_delete_artifact_from_non_managed_storage (line 93) | def test_delete_artifact_from_non_managed_storage():
  function test_huggingface_paths (line 117) | def test_huggingface_paths():
  function test_gcp_paths (line 142) | def test_gcp_paths():
  function test_http_paths (line 162) | def test_http_paths():
  function test_folder_like_artifact_s3 (line 186) | def test_folder_like_artifact_s3():
  function test_single_file_directory_preserved (line 194) | def test_single_file_directory_preserved(tmp_path):

FILE: tests/storage/test_artifact_zarr.py
  function get_small_adata (line 16) | def get_small_adata():
  function test_zarr_upload_cache (line 25) | def test_zarr_upload_cache(get_small_adata):

FILE: tests/storage/test_cache.py
  function switch_storage (line 14) | def switch_storage():
  function test_local_cache (line 21) | def test_local_cache():
  function test_cloud_cache (line 69) | def test_cloud_cache(switch_storage):
  function test_cloud_cache_versions (line 141) | def test_cloud_cache_versions(switch_storage):
  function test_corrupted_cache_local (line 201) | def test_corrupted_cache_local():
  function test_corrupted_cache_cloud (line 217) | def test_corrupted_cache_cloud(switch_storage):

FILE: tests/storage/test_connect_reconnect.py
  function test_connect_reconnect (line 5) | def test_connect_reconnect():

FILE: tests/storage/test_storage_lifecycle.py
  function check_storage_location_on_hub_exists (line 8) | def check_storage_location_on_hub_exists(uid: str):
  function test_reference_storage_location (line 20) | def test_reference_storage_location(ccaplog):
  function test_switch_delete_storage_location (line 29) | def test_switch_delete_storage_location():

FILE: tests/storage/test_streaming.py
  function bad_adata_path (line 25) | def bad_adata_path():
  function test_anndata_io (line 51) | def test_anndata_io():
  function test_backed_access (line 67) | def test_backed_access(adata_format):
  function test_add_column (line 161) | def test_add_column():
  function test_to_index (line 195) | def test_to_index():
  function test_infer_suffix (line 205) | def test_infer_suffix():
  function test_write_to_disk (line 214) | def test_write_to_disk():
  function test_backed_bad_format (line 227) | def test_backed_bad_format(bad_adata_path):
  function test_backed_zarr_not_adata (line 248) | def test_backed_zarr_not_adata():
  function test_anndata_open_mode (line 262) | def test_anndata_open_mode():
  function test_from_lazy (line 277) | def test_from_lazy():
  function test_zarr_open_mode_overwrite_versions_false (line 303) | def test_zarr_open_mode_overwrite_versions_false():
  function test_from_lazy_cloud (line 317) | def test_from_lazy_cloud():
  function test_polars_options (line 333) | def test_polars_options():
  function test_open_dataframe_artifact (line 344) | def test_open_dataframe_artifact():
  function test_open_dataframe_collection (line 414) | def test_open_dataframe_collection():
  function test_backed_wrong_suffix (line 486) | def test_backed_wrong_suffix():
  function test_anndata_n_observations (line 498) | def test_anndata_n_observations(bad_adata_path):
  function _compress (line 524) | def _compress(input_filepath, output_filepath):
  function test_compressed (line 531) | def test_compressed(gz_suffix):

FILE: tests/storage/test_transfer.py
  function test_describe_artifact_from_remote_instance (line 9) | def test_describe_artifact_from_remote_instance(capsys):
  function test_transfer_from_remote_to_local (line 18) | def test_transfer_from_remote_to_local(ccaplog):
  function test_transfer_into_space (line 154) | def test_transfer_into_space():
  function test_using_record_organism (line 167) | def test_using_record_organism():
  function test_using_query_by_feature (line 215) | def test_using_query_by_feature():

FILE: tests/tiledbsoma/conftest.py
  function pytest_sessionstart (line 14) | def pytest_sessionstart():
  function pytest_sessionfinish (line 33) | def pytest_sessionfinish(session: pytest.Session):
  function adata_file (line 45) | def adata_file():
  function clean_soma_files (line 61) | def clean_soma_files(request):
  function soma_experiment_file (line 73) | def soma_experiment_file(clean_soma_files):

FILE: tests/tiledbsoma/test_artifact_basics.py
  function test_create_from_soma_experiment (line 6) | def test_create_from_soma_experiment(soma_experiment_file, adata_file):
  function test_data_is_soma_experiment_paths (line 21) | def test_data_is_soma_experiment_paths():
  function test_data_is_soma_experiment (line 26) | def test_data_is_soma_experiment(soma_experiment_file):

FILE: tests/tiledbsoma/test_curators.py
  function test_tiledbsoma_curator (line 10) | def test_tiledbsoma_curator(clean_soma_files):

FILE: tests/tiledbsoma/test_storage.py
  function test_write_read_tiledbsoma (line 19) | def test_write_read_tiledbsoma(storage):
  function test_from_tiledbsoma (line 182) | def test_from_tiledbsoma():
  function test_tiledb_config (line 211) | def test_tiledb_config():
  function test_tiledbsoma_in_managed_storage (line 220) | def test_tiledbsoma_in_managed_storage():

Download .json

Condensed preview — 288 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,010K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yml",
    "chars": 417,
    "preview": "name: Report a bug\ndescription: Report a bug.\nlabels:\n  - \":bug: bug\"\nbody:\n  - type: textarea\n    id: report\n    attrib"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 347,
    "preview": "blank_issues_enabled: true\ncontact_links:\n    - name: LaminHub issues\n      url: https://github.com/laminlabs/laminhub-p"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/enhancement.yml",
    "chars": 302,
    "preview": "name: Propose an enhancement\ndescription: Propose an enhancement.\nbody:\n  - type: textarea\n    id: description\n    attri"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/usage_question.yml",
    "chars": 327,
    "preview": "name: Ask a usage question\ndescription: Ask a usage question.\nlabels:\n  - \"usage question\"\nbody:\n  - type: textarea\n    "
  },
  {
    "path": ".github/workflows/build.yml",
    "chars": 9618,
    "preview": "name: build\n\non:\n  push:\n    branches: [release]\n  pull_request:\n\njobs:\n  pre-filter:\n    runs-on: ubuntu-latest\n    out"
  },
  {
    "path": ".github/workflows/doc-changes.yml",
    "chars": 648,
    "preview": "name: doc-changes\n\non:\n  pull_request_target:\n    branches:\n      - main\n      - release\n    types:\n      - closed\n\njobs"
  },
  {
    "path": ".gitignore",
    "chars": 3755,
    "preview": "__MACOSX/\n\n# LaminDB\nREADME_stripped.md\ndocs/scripts/test_artifact_parquet.py\nREADME.ipynb\ndocs/sample.fasta\ndocs/faq/sa"
  },
  {
    "path": ".gitmodules",
    "chars": 496,
    "preview": "[submodule \"sub/lamindb-setup\"]\n\tpath = sub/lamindb-setup\n\turl = https://github.com/laminlabs/lamindb-setup\n[submodule \""
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 2252,
    "preview": "fail_fast: false\ndefault_language_version:\n  python: python3\ndefault_stages:\n  - pre-commit\n  - pre-push\nminimum_pre_com"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3864,
    "preview": "# Contributing\n\nContributions are generally welcome. Please make an issue to discuss proposals.\n\n## Installation\n\n### Py"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 19223,
    "preview": "[![docs](https://img.shields.io/badge/docs-yellow)](https://docs.lamin.ai) [![llms.txt](https://img.shields.io/badge/llm"
  },
  {
    "path": "docs/api.md",
    "chars": 229,
    "preview": "# API Reference\n\n<meta http-equiv=\"Refresh\" content=\"0; url=./lamindb.html\" />\n\n```{toctree}\n:maxdepth: 1\n:caption: CLI "
  },
  {
    "path": "docs/arrays.md",
    "chars": 4634,
    "preview": "---\nexecute_via: python\n---\n\n# Stream datasets from storage\n\nThis guide walks through streaming datasets from disk or cl"
  },
  {
    "path": "docs/bionty.md",
    "chars": 53,
    "preview": "# `bionty`\n\n```{eval-rst}\n.. automodule:: bionty\n```\n"
  },
  {
    "path": "docs/changelog.md",
    "chars": 43,
    "preview": "# Changelog\n\nActual content in lamin-docs.\n"
  },
  {
    "path": "docs/curate.md",
    "chars": 15017,
    "preview": "---\nexecute_via: python\n---\n\n# Validate & standardize datasets\n\nData curation with LaminDB ensures your datasets are **v"
  },
  {
    "path": "docs/faq/acid.md",
    "chars": 2666,
    "preview": "---\nexecute_via: python\n---\n\n# Will data & metadata stay in sync?\n\nHere, we walk through different errors that can occur"
  },
  {
    "path": "docs/faq/curate-any.md",
    "chars": 3670,
    "preview": "---\nexecute_via: python\n---\n\n# How do I validate & annotate arbitrary data structures?\n\nThis guide walks through the low"
  },
  {
    "path": "docs/faq/idempotency.md",
    "chars": 3311,
    "preview": "---\nexecute_via: python\n---\n\n# Will data get duplicated upon re-running code?\n\nLaminDB's operations are idempotent in th"
  },
  {
    "path": "docs/faq/import-modules.md",
    "chars": 534,
    "preview": "---\nexecute_via: python\n---\n\n# What happens if I import a schema module without lamindb?\n\n```python\n# !pip install 'lami"
  },
  {
    "path": "docs/faq/keep-artifacts-local.md",
    "chars": 4865,
    "preview": "---\nexecute_via: python\n---\n\n# Keep artifacts local in a cloud instance\n\nIf you want to default to keeping artifacts loc"
  },
  {
    "path": "docs/faq/pydantic-pandera.md",
    "chars": 15366,
    "preview": "---\nexecute_via: python\n---\n\n# Pydantic & Pandera vs. LaminDB\n\nThis doc explains conceptual differences between data val"
  },
  {
    "path": "docs/faq/reference-field.md",
    "chars": 828,
    "preview": "---\nexecute_via: python\n---\n\n# Where to store external links and IDs?\n\nWhen registering data in LaminDB, you might want "
  },
  {
    "path": "docs/faq/search.md",
    "chars": 1635,
    "preview": "---\nexecute_via: python\n---\n\n# How does search work?\n\n```python\nfrom laminci.db import setup_local_test_postgres\n\npgurl "
  },
  {
    "path": "docs/faq/symbol-mapping.md",
    "chars": 3629,
    "preview": "---\nexecute_via: python\n---\n\n# Why should I not index datasets with gene symbols?\n\nGene symbols are widely used for read"
  },
  {
    "path": "docs/faq/test_notebooks.py",
    "chars": 258,
    "preview": "from pathlib import Path\n\nimport nbproject_test as test\n\nimport lamindb as ln\n\n\ndef test_notebooks():\n    nbdir = Path(_"
  },
  {
    "path": "docs/faq/track-run-inputs.md",
    "chars": 2108,
    "preview": "---\nexecute_via: python\n---\n\n# Can I disable tracking run inputs?\n\nYes, if you switch {attr}`~lamindb.core.Settings.trac"
  },
  {
    "path": "docs/faq/trash-archive.md",
    "chars": 1386,
    "preview": "# How do I trash or archive objects?\n\nAny object in LaminDB has the following 3 levels of visibility through 3 default b"
  },
  {
    "path": "docs/faq/validate-fields.md",
    "chars": 568,
    "preview": "---\nexecute_via: python\n---\n\n# Django field validation\n\n[Django field validation](https://docs.djangoproject.com/en/5.1/"
  },
  {
    "path": "docs/faq.md",
    "chars": 252,
    "preview": "# FAQ\n\n```{toctree}\n:maxdepth: 1\n\nfaq/pydantic-pandera\nfaq/idempotency\nfaq/acid\nfaq/track-run-inputs\nfaq/curate-any\nfaq/"
  },
  {
    "path": "docs/guide.md",
    "chars": 298,
    "preview": "# Guide\n\n```{toctree}\n:hidden:\n:caption: \"Overview\"\n\nREADME\n```\n\n```{toctree}\n:hidden:\n:caption: \"How to\"\n\nquery-search\n"
  },
  {
    "path": "docs/index.md",
    "chars": 180,
    "preview": "```{include} ../README.md\n:start-line: 0\n:end-line: 5\n```\n\n<meta http-equiv=\"Refresh\" content=\"0; url=./guide.html\" />\n\n"
  },
  {
    "path": "docs/lightning.md",
    "chars": 12846,
    "preview": "# Lightning\n\nThis guide offers more context on the {class}`lamindb.integrations.lightning.Checkpoint` callback. For end-"
  },
  {
    "path": "docs/manage-changes.md",
    "chars": 230,
    "preview": "# Manage changes\n\nManaging changes in LaminDB is largely analogous to managing code changes via branching in git and Pul"
  },
  {
    "path": "docs/manage-ontologies.md",
    "chars": 9235,
    "preview": "---\nexecute_via: python\n---\n\n# Manage biological ontologies\n\nThis guide shows how to manage ontologies for basic biologi"
  },
  {
    "path": "docs/organize.md",
    "chars": 9434,
    "preview": "# Organize datasets\n\n```{toctree}\n:maxdepth: 1\n:hidden:\n\ncurate\n```\n\nThis guide walks through organizing datasets using "
  },
  {
    "path": "docs/pertdb.md",
    "chars": 53,
    "preview": "# `pertdb`\n\n```{eval-rst}\n.. automodule:: pertdb\n```\n"
  },
  {
    "path": "docs/query-search.md",
    "chars": 75,
    "preview": "# Query, search & stream\n\n```{toctree}\n:maxdepth: 1\n\nregistries\narrays\n```\n"
  },
  {
    "path": "docs/registries.md",
    "chars": 7542,
    "preview": "---\nexecute_via: python\n---\n\n# Query & search registries\n\nThis guide walks through different ways of querying & searchin"
  },
  {
    "path": "docs/scripts/curate_anndata_flexible.py",
    "chars": 322,
    "preview": "import lamindb as ln\n\nln.examples.datasets.mini_immuno.define_features_labels()\nadata = ln.examples.datasets.mini_immuno"
  },
  {
    "path": "docs/scripts/curate_anndata_uns.py",
    "chars": 325,
    "preview": "import lamindb as ln\n\nln.examples.datasets.mini_immuno.define_features_labels()\nadata = ln.examples.datasets.mini_immuno"
  },
  {
    "path": "docs/scripts/curate_dataframe_attrs.py",
    "chars": 486,
    "preview": "import lamindb as ln\n\nfrom .define_schema_df_metadata import study_metadata_schema\n\ndf = ln.examples.datasets.mini_immun"
  },
  {
    "path": "docs/scripts/curate_dataframe_external_features.py",
    "chars": 799,
    "preview": "import lamindb as ln\nfrom datetime import date\n\ndf = ln.examples.datasets.mini_immuno.get_dataset1(otype=\"DataFrame\")\n\nt"
  },
  {
    "path": "docs/scripts/curate_dataframe_flexible.py",
    "chars": 283,
    "preview": "import lamindb as ln\n\nln.examples.datasets.mini_immuno.define_features_labels()\ndf = ln.examples.datasets.mini_immuno.ge"
  },
  {
    "path": "docs/scripts/curate_dataframe_minimal_errors.py",
    "chars": 418,
    "preview": "import lamindb as ln\n\nschema = ln.examples.datasets.mini_immuno.define_mini_immuno_schema_flexible()\ndf = ln.examples.da"
  },
  {
    "path": "docs/scripts/curate_dataframe_union_features.py",
    "chars": 398,
    "preview": "import lamindb as ln\nimport pandas as pd\n\nunion_feature = ln.Feature(\n    name=\"mixed_feature\",\n    dtype=\"cat[bionty.Ti"
  },
  {
    "path": "docs/scripts/curate_mudata.py",
    "chars": 2287,
    "preview": "import lamindb as ln\nimport bionty as bt\n\nfrom docs.scripts.define_schema_df_metadata import study_metadata_schema\n\n# de"
  },
  {
    "path": "docs/scripts/curate_soma_experiment.py",
    "chars": 1210,
    "preview": "import lamindb as ln\nimport bionty as bt\nimport tiledbsoma as soma\nimport tiledbsoma.io\n\nadata = ln.examples.datasets.mi"
  },
  {
    "path": "docs/scripts/curate_spatialdata.py",
    "chars": 548,
    "preview": "import lamindb as ln\n\nspatialdata = ln.examples.datasets.spatialdata_blobs()\nsdata_schema = ln.Schema.get(name=\"spatiald"
  },
  {
    "path": "docs/scripts/define_schema_anndata_uns.py",
    "chars": 217,
    "preview": "import lamindb as ln\n\nfrom define_schema_df_metadata import study_metadata_schema\n\nanndata_uns_schema = ln.Schema(\n    o"
  },
  {
    "path": "docs/scripts/define_schema_df_metadata.py",
    "chars": 239,
    "preview": "import lamindb as ln\n\nstudy_metadata_schema = ln.Schema(\n    name=\"Study metadata schema\",\n    features=[\n        ln.Fea"
  },
  {
    "path": "docs/scripts/define_schema_spatialdata.py",
    "chars": 1438,
    "preview": "import lamindb as ln\nimport bionty as bt\n\n# a very comprehensive schema for different slots of a SpatialData object\n\n# d"
  },
  {
    "path": "docs/scripts/my_workflow.py",
    "chars": 301,
    "preview": "import lamindb as ln\n\n\n@ln.flow()\ndef ingest_dataset(key: str) -> ln.Artifact:\n    df = ln.examples.datasets.mini_immuno"
  },
  {
    "path": "docs/scripts/my_workflow_with_click.py",
    "chars": 270,
    "preview": "import click\nimport lamindb as ln\n\n\n@click.command()\n@click.option(\"--key\", required=True)\n@ln.flow()\ndef main(key: str)"
  },
  {
    "path": "docs/scripts/my_workflow_with_step.py",
    "chars": 742,
    "preview": "import lamindb as ln\n\n\n@ln.step()\ndef subset_dataframe(\n    artifact: ln.Artifact,\n    subset_rows: int = 2,\n    subset_"
  },
  {
    "path": "docs/scripts/run_script_with_step.py",
    "chars": 945,
    "preview": "import argparse\nimport lamindb as ln\n\n\n@ln.step()\ndef subset_dataframe(\n    artifact: ln.Artifact,\n    subset_rows: int "
  },
  {
    "path": "docs/scripts/run_track_and_finish.py",
    "chars": 213,
    "preview": "import lamindb as ln\n\nln.track()  # initiate a tracked notebook/script run\n\n# your code automatically tracks inputs & ou"
  },
  {
    "path": "docs/scripts/run_track_with_features_and_params.py",
    "chars": 410,
    "preview": "import argparse\nimport lamindb as ln\n\n\nif __name__ == \"__main__\":\n    p = argparse.ArgumentParser()\n    p.add_argument(\""
  },
  {
    "path": "docs/scripts/run_track_with_params.py",
    "chars": 570,
    "preview": "import argparse\nimport lamindb as ln\n\nif __name__ == \"__main__\":\n    p = argparse.ArgumentParser()\n    p.add_argument(\"-"
  },
  {
    "path": "docs/scripts/synced_with_git.py",
    "chars": 110,
    "preview": "import lamindb as ln\n\nln.settings.sync_git_repo = \"https://github.com/...\"\nln.track()\n# your code\nln.finish()\n"
  },
  {
    "path": "docs/storage/add-replace-cache.ipynb",
    "chars": 18914,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Add, replace, cache"
  },
  {
    "path": "docs/storage/anndata-accessor.ipynb",
    "chars": 6555,
    "preview": "{\n \"cells\": [\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Test `AnnDa"
  },
  {
    "path": "docs/storage/prepare-sync-local-to-cloud.ipynb",
    "chars": 2197,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Prepare sync artifacts from a loc"
  },
  {
    "path": "docs/storage/sync-local-to-cloud.ipynb",
    "chars": 2582,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Sync artifacts from a local insta"
  },
  {
    "path": "docs/storage/test-files/iris.data",
    "chars": 182,
    "preview": "5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-setosa\n7.0,3.2,4.7,1.4,Iris-versicolor\n6.4,3.2,4.5,1.5,Iris-versicolor\n"
  },
  {
    "path": "docs/storage/test_notebooks.py",
    "chars": 214,
    "preview": "from pathlib import Path\n\nimport nbproject_test as test\n\nimport lamindb as ln\n\n\ndef test_notebooks():\n    nbdir = Path(_"
  },
  {
    "path": "docs/storage/upload.ipynb",
    "chars": 7492,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Track artifacts, in"
  },
  {
    "path": "docs/storage/vitessce.ipynb",
    "chars": 4532,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Vitessce integration\"\n   ]\n  },\n "
  },
  {
    "path": "docs/storage.md",
    "chars": 189,
    "preview": "# Storage\n\n```{toctree}\n:maxdepth: 1\n\nstorage/upload\nstorage/add-replace-cache\nstorage/anndata-accessor\nstorage/prepare-"
  },
  {
    "path": "docs/sync.md",
    "chars": 2557,
    "preview": "---\nexecute_via: python\n---\n\n# Sync data across databases\n\nThis guide shows how to sync objects from a source database t"
  },
  {
    "path": "docs/test_notebooks.py",
    "chars": 711,
    "preview": "import sys\nfrom pathlib import Path\n\nimport nbproject_test as test\n\nsys.path[:0] = [str(Path(__file__).parent.parent)]\n\n"
  },
  {
    "path": "docs/track.md",
    "chars": 14325,
    "preview": "---\nexecute_via: python\n---\n\n# Track notebooks, scripts & workflows\n\nThis guide walks from tracking data lineage in a no"
  },
  {
    "path": "lamindb/__init__.py",
    "chars": 4417,
    "preview": "\"\"\"A data framework for biology.\n\nInstallation::\n\n   pip install lamindb\n\nIf you just want to *read* data from a LaminDB"
  },
  {
    "path": "lamindb/_finish.py",
    "chars": 21121,
    "preview": "from __future__ import annotations\n\nimport builtins\nimport re\nfrom datetime import datetime, timezone\nfrom time import s"
  },
  {
    "path": "lamindb/_secret_redaction.py",
    "chars": 3829,
    "preview": "from __future__ import annotations\n\nimport re\n\nREDACTED_SECRET_VALUE = \"***REDACTED***\"  # noqa: S105\nSENSITIVE_PARAM_KE"
  },
  {
    "path": "lamindb/_view.py",
    "chars": 4900,
    "preview": "from __future__ import annotations\n\nimport builtins\nimport importlib\nimport inspect\nfrom typing import TYPE_CHECKING\n\nfr"
  },
  {
    "path": "lamindb/base/__init__.py",
    "chars": 313,
    "preview": "\"\"\"Base library.\n\nIs available also when no instance is setup.\n\nModules\n-------\n\n.. autosummary::\n   :toctree: .\n\n   uid"
  },
  {
    "path": "lamindb/base/dtypes.py",
    "chars": 4298,
    "preview": "\"\"\"Dtype utils.\n\n.. autofunction:: check_dtype\n\n\"\"\"\n\nfrom datetime import datetime\nfrom typing import Any, Callable, Ite"
  },
  {
    "path": "lamindb/base/fields.py",
    "chars": 8367,
    "preview": "\"\"\"Fields.\n\nDjango fields with modified default arguments.\n\n.. autoclass:: CharField\n.. autoclass:: TextField\n.. autocla"
  },
  {
    "path": "lamindb/base/ids.py",
    "chars": 34,
    "preview": "from .uids import *  # noqa: F403\n"
  },
  {
    "path": "lamindb/base/types.py",
    "chars": 5995,
    "preview": "\"\"\"Base types.\n\nCentral object types\n--------------------\n\n.. autoclass:: ArtifactKind\n.. autoclass:: TransformKind\n.. a"
  },
  {
    "path": "lamindb/base/uids.py",
    "chars": 2183,
    "preview": "\"\"\"Universal IDs.\n\nBase generators\n===============\n\n.. autofunction:: base26\n.. autofunction:: base62\n.. autofunction:: "
  },
  {
    "path": "lamindb/base/users.py",
    "chars": 2116,
    "preview": "user_id_cache = {}\n\n\ndef _user_has_write_access() -> bool:\n    from django.db import connection\n\n    with connection.cur"
  },
  {
    "path": "lamindb/base/utils.py",
    "chars": 1480,
    "preview": "\"\"\"Utilities.\n\n.. autodecorator:: doc_args\n.. autodecorator:: deprecated\n.. autodecorator:: class_and_instance_method\n.."
  },
  {
    "path": "lamindb/core/__init__.py",
    "chars": 1392,
    "preview": "\"\"\"Core library.\n\nSettings & context:\n\n.. autosummary::\n   :toctree: .\n\n   Settings\n   subsettings\n   Context\n\nArtifact "
  },
  {
    "path": "lamindb/core/_compat.py",
    "chars": 2017,
    "preview": "import importlib.util\nfrom typing import Any, Callable, TypeVar\n\nT = TypeVar(\"T\")\n\n\ndef is_package_installed(package_nam"
  },
  {
    "path": "lamindb/core/_context.py",
    "chars": 54364,
    "preview": "from __future__ import annotations\n\nimport builtins\nimport hashlib\nimport os\nimport signal\nimport sys\nimport threading\ni"
  },
  {
    "path": "lamindb/core/_functions.py",
    "chars": 7860,
    "preview": "import functools\nimport inspect\nfrom contextvars import ContextVar\nfrom datetime import datetime, timezone\nfrom pathlib "
  },
  {
    "path": "lamindb/core/_mapped_collection.py",
    "chars": 28485,
    "preview": "from __future__ import annotations\n\nfrom collections import Counter\nfrom functools import reduce\nfrom typing import TYPE"
  },
  {
    "path": "lamindb/core/_settings.py",
    "chars": 10509,
    "preview": "from __future__ import annotations\n\nimport os\nimport sys\nfrom typing import TYPE_CHECKING\n\nimport lamindb_setup as ln_se"
  },
  {
    "path": "lamindb/core/_sync_git.py",
    "chars": 9650,
    "preview": "from __future__ import annotations\n\nimport subprocess\nfrom pathlib import Path\n\nfrom lamin_utils import logger\nfrom lami"
  },
  {
    "path": "lamindb/core/_track_environment.py",
    "chars": 974,
    "preview": "from __future__ import annotations\n\nimport subprocess\nimport sys\nfrom typing import TYPE_CHECKING\n\nimport lamindb_setup "
  },
  {
    "path": "lamindb/core/exceptions.py",
    "chars": 53,
    "preview": "from ..errors import *  # noqa: F403 backward compat\n"
  },
  {
    "path": "lamindb/core/loaders.py",
    "chars": 6744,
    "preview": "\"\"\"Loaders in :class:`lamindb.Artifact.load`.\n\n.. autodata:: SUPPORTED_SUFFIXES\n.. autofunction:: load_fcs\n.. autofuncti"
  },
  {
    "path": "lamindb/core/storage/__init__.py",
    "chars": 1913,
    "preview": "\"\"\"Storage API.\n\nValid suffixes.\n\n.. autodata:: VALID_SUFFIXES\n\nArray accessors.\n\n.. autoclass:: AnnDataAccessor\n.. auto"
  },
  {
    "path": "lamindb/core/storage/_anndata_accessor.py",
    "chars": 29854,
    "preview": "from __future__ import annotations\n\nimport inspect\nfrom functools import cached_property\nfrom importlib.metadata import "
  },
  {
    "path": "lamindb/core/storage/_backed_access.py",
    "chars": 8409,
    "preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass\nfrom typing import TYPE_CHECKING, Any, Callable, L"
  },
  {
    "path": "lamindb/core/storage/_polars_lazy_df.py",
    "chars": 3916,
    "preview": "from __future__ import annotations\n\nfrom contextlib import contextmanager\nfrom typing import TYPE_CHECKING\n\nfrom lamindb"
  },
  {
    "path": "lamindb/core/storage/_pyarrow_dataset.py",
    "chars": 1398,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING\n\nimport pyarrow.dataset\nfrom lamindb_setup.core.upa"
  },
  {
    "path": "lamindb/core/storage/_spatialdata_accessor.py",
    "chars": 1718,
    "preview": "from __future__ import annotations\n\nfrom functools import cached_property\nfrom typing import TYPE_CHECKING\n\nfrom ._annda"
  },
  {
    "path": "lamindb/core/storage/_tiledbsoma.py",
    "chars": 14553,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Literal\nfrom urllib.parse import urlparse\n\nimport "
  },
  {
    "path": "lamindb/core/storage/_valid_suffixes.py",
    "chars": 501,
    "preview": "from __future__ import annotations\n\nfrom lamindb_setup.core.upath import VALID_COMPOSITE_SUFFIXES, VALID_SIMPLE_SUFFIXES"
  },
  {
    "path": "lamindb/core/storage/_zarr.py",
    "chars": 4459,
    "preview": "from __future__ import annotations\n\nfrom importlib.metadata import version as get_version\nfrom typing import TYPE_CHECKI"
  },
  {
    "path": "lamindb/core/storage/objects.py",
    "chars": 4212,
    "preview": "from __future__ import annotations\n\nfrom pathlib import Path\nfrom typing import TYPE_CHECKING, Any, TypeAlias\n\nfrom lami"
  },
  {
    "path": "lamindb/core/storage/paths.py",
    "chars": 7340,
    "preview": "from __future__ import annotations\n\nimport shutil\nfrom typing import TYPE_CHECKING\n\nimport fsspec\nfrom lamindb_setup.cor"
  },
  {
    "path": "lamindb/core/storage/types.py",
    "chars": 424,
    "preview": "\"\"\"Storage-related type definitions.\"\"\"\n\nfrom __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Any\n\nif T"
  },
  {
    "path": "lamindb/core/subsettings/__init__.py",
    "chars": 192,
    "preview": "\"\"\"Sub settings.\n\n.. autoclass:: CreationSettings\n.. autoclass:: AnnotationSettings\n\n\"\"\"\n\nfrom ._annotation_settings imp"
  },
  {
    "path": "lamindb/core/subsettings/_annotation_settings.py",
    "chars": 370,
    "preview": "class AnnotationSettings:\n    n_max_records: int = 1000\n    \"\"\"Maximal number of records to annotate with during automat"
  },
  {
    "path": "lamindb/core/subsettings/_creation_settings.py",
    "chars": 906,
    "preview": "class CreationSettings:\n    search_names: bool = True\n    \"\"\"Switch off to speed up creating records (default `True`).\n\n"
  },
  {
    "path": "lamindb/curators/__init__.py",
    "chars": 1040,
    "preview": "\"\"\"Curators.\n\nHigh-level curators\n-------------------\n\n.. autoclass:: DataFrameCurator\n.. autoclass:: AnnDataCurator\n.. "
  },
  {
    "path": "lamindb/curators/core.py",
    "chars": 86644,
    "preview": "\"\"\"Curator utilities.\n\n.. autoclass:: Curator\n.. autoclass:: SlotsCurator\n.. autoclass:: ComponentCurator\n.. autoclass::"
  },
  {
    "path": "lamindb/errors.py",
    "chars": 3446,
    "preview": "\"\"\"Errors.\n\nDjango.\n\n.. autoexception:: ObjectDoesNotExist\n.. autoexception:: MultipleObjectsReturned\n\nLaminDB.\n\n.. auto"
  },
  {
    "path": "lamindb/examples/__init__.py",
    "chars": 213,
    "preview": "\"\"\"Examples.\n\n.. autosummary::\n   :toctree: .\n\n   schemas\n   datasets\n   cellxgene\n   croissant\n   mlflow\n   wandb\n\n\"\"\"\n"
  },
  {
    "path": "lamindb/examples/cellxgene/__init__.py",
    "chars": 201,
    "preview": "\"\"\"CELLxGENE utilities.\n\n.. autofunction:: save_cellxgene_defaults\n.. autofunction:: create_cellxgene_schema\n\n\"\"\"\n\nfrom "
  },
  {
    "path": "lamindb/examples/cellxgene/_cellxgene.py",
    "chars": 11086,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Collection, Literal, NamedTuple\n\nif TYPE_CHECKING:"
  },
  {
    "path": "lamindb/examples/croissant/__init__.py",
    "chars": 2292,
    "preview": "\"\"\"Examples for MLCommons Croissant files, which are used to store metadata about datasets.\n\n.. autofunction:: mini_immu"
  },
  {
    "path": "lamindb/examples/croissant/mini_immuno.anndata.zarr_metadata.json",
    "chars": 2921,
    "preview": "{\n  \"@context\": {\n    \"@vocab\": \"https://schema.org/\",\n    \"cr\": \"https://mlcommons.org/croissant/\",\n    \"ml\": \"http://m"
  },
  {
    "path": "lamindb/examples/datasets/__init__.py",
    "chars": 5966,
    "preview": "\"\"\"Example datasets.\n\nThe mini immuno dataset\n-----------------------\n\n.. autosummary::\n   :toctree: .\n\n   mini_immuno\n\n"
  },
  {
    "path": "lamindb/examples/datasets/_core.py",
    "chars": 21666,
    "preview": "from __future__ import annotations\n\nfrom pathlib import Path\nfrom typing import TYPE_CHECKING, Any\nfrom urllib.request i"
  },
  {
    "path": "lamindb/examples/datasets/_fake.py",
    "chars": 953,
    "preview": "from __future__ import annotations\n\n\ndef fake_bio_notebook_titles(n=100) -> list[str]:\n    \"\"\"A fake collection of study"
  },
  {
    "path": "lamindb/examples/datasets/_small.py",
    "chars": 4852,
    "preview": "from __future__ import annotations\n\nfrom typing import Any, Literal\n\nimport anndata as ad\nimport numpy as np\nimport pand"
  },
  {
    "path": "lamindb/examples/datasets/define_mini_immuno_features_labels.py",
    "chars": 878,
    "preview": "import bionty as bt\n\nimport lamindb as ln\n\n# define valid labels\nperturbation_type = ln.Record(name=\"Perturbation\", is_t"
  },
  {
    "path": "lamindb/examples/datasets/define_mini_immuno_schema_flexible.py",
    "chars": 459,
    "preview": "import lamindb as ln\n\nschema = ln.Schema(\n    name=\"Mini immuno schema\",\n    features=[\n        ln.Feature.get(name=\"per"
  },
  {
    "path": "lamindb/examples/datasets/mini_immuno.py",
    "chars": 6004,
    "preview": "\"\"\"Two \"mini immuno\" datasets.\n\nDatasets\n--------\n\n.. autofunction:: get_dataset1\n.. autofunction:: get_dataset2\n\nSchema"
  },
  {
    "path": "lamindb/examples/datasets/save_mini_immuno_datasets.py",
    "chars": 1704,
    "preview": "from datetime import date\n\nimport bionty as bt\n\nimport lamindb as ln\n\n## define valid labels\nln.Record.from_values([\"DMS"
  },
  {
    "path": "lamindb/examples/fixtures/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lamindb/examples/fixtures/sheets.py",
    "chars": 12195,
    "preview": "import bionty as bt\nimport pandas as pd\nimport pytest\n\nimport lamindb as ln\n\n\n@pytest.fixture(scope=\"module\")\ndef popula"
  },
  {
    "path": "lamindb/examples/mlflow/__init__.py",
    "chars": 1314,
    "preview": "\"\"\"Examples and utilities for Mlflow.\n\n.. autofunction:: save_mlflow_features\n\"\"\"\n\nimport lamindb as ln\n\n\ndef save_mlflo"
  },
  {
    "path": "lamindb/examples/schemas/__init__.py",
    "chars": 238,
    "preview": "\"\"\"Example schemas.\n\n.. autofunction:: valid_features\n.. autofunction:: anndata_ensembl_gene_ids_and_valid_features_in_o"
  },
  {
    "path": "lamindb/examples/schemas/_anndata.py",
    "chars": 1021,
    "preview": "from __future__ import annotations\n\nimport importlib\nfrom typing import TYPE_CHECKING\n\nif TYPE_CHECKING:\n    from ... im"
  },
  {
    "path": "lamindb/examples/schemas/_simple.py",
    "chars": 722,
    "preview": "from __future__ import annotations\n\nimport importlib\nfrom typing import TYPE_CHECKING\n\nif TYPE_CHECKING:\n    from ... im"
  },
  {
    "path": "lamindb/examples/schemas/define_schema_anndata_ensembl_gene_ids_and_valid_features_in_obs.py",
    "chars": 358,
    "preview": "import bionty as bt\n\nimport lamindb as ln\n\nobs_schema = ln.examples.schemas.valid_features()\nvarT_schema = ln.Schema(\n  "
  },
  {
    "path": "lamindb/examples/schemas/define_valid_features.py",
    "chars": 89,
    "preview": "import lamindb as ln\n\nschema = ln.Schema(name=\"valid_features\", itype=ln.Feature).save()\n"
  },
  {
    "path": "lamindb/examples/wandb/__init__.py",
    "chars": 1322,
    "preview": "\"\"\"Examples and utilities for Weights & Biases.\n\n.. autofunction:: save_wandb_features\n\"\"\"\n\nimport lamindb as ln\n\n\ndef s"
  },
  {
    "path": "lamindb/integrations/__init__.py",
    "chars": 800,
    "preview": "\"\"\"Integrations.\n\nModules\n-------\n\n.. autosummary::\n   :toctree: .\n\n   lightning\n\nFunctions\n---------\n\n.. autofunction::"
  },
  {
    "path": "lamindb/integrations/_croissant.py",
    "chars": 5296,
    "preview": "from __future__ import annotations\n\nimport json\nfrom pathlib import Path\nfrom typing import TYPE_CHECKING, Any\n\nimport l"
  },
  {
    "path": "lamindb/integrations/_vitessce.py",
    "chars": 4032,
    "preview": "from __future__ import annotations\n\nimport json\nfrom datetime import datetime, timezone\nfrom typing import TYPE_CHECKING"
  },
  {
    "path": "lamindb/integrations/lightning.py",
    "chars": 54260,
    "preview": "\"\"\"PyTorch Lightning integration for LaminDB.\n\nThe public API has two layers:\n\n- :class:`Checkpoint` is the concrete Lam"
  },
  {
    "path": "lamindb/migrations/0177_squashed.py",
    "chars": 224328,
    "preview": "# Generated by Django 5.2 on 2026-01-10 23:06\n\nimport django.core.validators\nimport django.db.models.deletion\nimport dja"
  },
  {
    "path": "lamindb/migrations/0178_v2_2.py",
    "chars": 20145,
    "preview": "# Generated by Django 5.2 on 2026-02-15 11:25\n\nimport django.db.models.deletion\nfrom django.db import migrations, models"
  },
  {
    "path": "lamindb/migrations/0179_v2_2_part_2.py",
    "chars": 1840,
    "preview": "# Generated by Django 5.2 on 2026-02-15 14:12\n\nimport django.db.models.deletion\nfrom django.db import migrations, models"
  },
  {
    "path": "lamindb/migrations/0180_v2_2_part_3.py",
    "chars": 963,
    "preview": "# Generated by Django 5.2 on 2026-02-15 14:29\n\nimport django.db.models.deletion\nfrom django.db import migrations\n\nimport"
  },
  {
    "path": "lamindb/migrations/0181_v2_2_part_4.py",
    "chars": 749,
    "preview": "# Generated by Django 5.2 on 2026-02-15 15:43\n\nimport django.db.models.deletion\nfrom django.db import migrations, models"
  },
  {
    "path": "lamindb/migrations/0182_v2_2_part_5.py",
    "chars": 10910,
    "preview": "# Generated by Django 5.2 on 2026-02-17 16:33\n\nimport django.db.models.deletion\nfrom django.db import migrations, models"
  },
  {
    "path": "lamindb/migrations/0183_v2_2_part_6.py",
    "chars": 399,
    "preview": "# Generated by Django 5.2 on 2026-02-17 23:04\n\nfrom django.db import migrations\n\n\nclass Migration(migrations.Migration):"
  },
  {
    "path": "lamindb/migrations/0184_alter_transformrecord_feature.py",
    "chars": 661,
    "preview": "# Generated by Django 5.2 on 2026-03-07 12:16\n\nimport django.db.models.deletion\nfrom django.db import migrations\n\nimport"
  },
  {
    "path": "lamindb/migrations/0185_alter_runrecord_feature.py",
    "chars": 667,
    "preview": "# Generated by Django 5.2 on 2026-04-05 14:32\n\nimport django.db.models.deletion\nfrom django.db import migrations\n\nimport"
  },
  {
    "path": "lamindb/migrations/0186_v2_4.py",
    "chars": 8539,
    "preview": "# Generated by Django 5.2 on 2026-04-12 18:49\n\nimport django.db.models.deletion\nfrom django.db import migrations, models"
  },
  {
    "path": "lamindb/migrations/0187_squashed.py",
    "chars": 248028,
    "preview": "# Generated by Django 5.2 on 2026-04-16 06:44\n\nimport django.core.validators\nimport django.db.models.deletion\nimport dja"
  },
  {
    "path": "lamindb/migrations/0187_v2_4_part_2.py",
    "chars": 1151,
    "preview": "# Generated by Django 5.2 on 2026-04-16 06:38\n\nimport django.db.models.deletion\nfrom django.db import migrations\n\nimport"
  },
  {
    "path": "lamindb/migrations/README.md",
    "chars": 191,
    "preview": "# Attention\n\nRemember that lamindb schema changes that do not work on old databases (like adding columns or tables) cann"
  },
  {
    "path": "lamindb/migrations/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lamindb/models/__init__.py",
    "chars": 5369,
    "preview": "\"\"\"Auxiliary models & database library.\n\nRegistry basics\n---------------\n\n.. autoclass:: BaseSQLRecord\n.. autoclass:: SQ"
  },
  {
    "path": "lamindb/models/_describe.py",
    "chars": 25667,
    "preview": "from __future__ import annotations\n\nimport re\nfrom types import SimpleNamespace\nfrom typing import TYPE_CHECKING, Litera"
  },
  {
    "path": "lamindb/models/_django.py",
    "chars": 15199,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Any\n\nfrom django.contrib.postgres.aggregates impor"
  },
  {
    "path": "lamindb/models/_feature_manager.py",
    "chars": 95302,
    "preview": "# ruff: noqa: TC004\nfrom __future__ import annotations\n\nfrom collections import defaultdict\nfrom collections.abc import "
  },
  {
    "path": "lamindb/models/_from_values.py",
    "chars": 14762,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING\n\nfrom lamin_utils import colors, logger\n\nif TYPE_CH"
  },
  {
    "path": "lamindb/models/_is_versioned.py",
    "chars": 11451,
    "preview": "from __future__ import annotations\n\nfrom pathlib import PurePosixPath\nfrom typing import TYPE_CHECKING, Any, Iterable, L"
  },
  {
    "path": "lamindb/models/_label_manager.py",
    "chars": 11468,
    "preview": "from __future__ import annotations\n\nfrom collections import defaultdict\nfrom typing import TYPE_CHECKING\n\nfrom django.db"
  },
  {
    "path": "lamindb/models/_relations.py",
    "chars": 4004,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING\n\nimport lamindb_setup as ln_setup\nfrom django.db.mo"
  },
  {
    "path": "lamindb/models/_run_cleanup.py",
    "chars": 1624,
    "preview": "\"\"\"Background cleanup of report/environment artifacts after Run bulk delete.\n\nRunnable as: python -m lamindb.models._run"
  },
  {
    "path": "lamindb/models/artifact.py",
    "chars": 155481,
    "preview": "# ruff: noqa: TC004\nfrom __future__ import annotations\n\nimport shutil\nimport types\nimport warnings\nfrom collections impo"
  },
  {
    "path": "lamindb/models/artifact_set.py",
    "chars": 6015,
    "preview": "from __future__ import annotations\n\nfrom collections.abc import Iterable, Iterator\nfrom typing import TYPE_CHECKING, Lit"
  },
  {
    "path": "lamindb/models/block.py",
    "chars": 17610,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Any, Literal, get_args, overload\n\nfrom django.db i"
  },
  {
    "path": "lamindb/models/can_curate.py",
    "chars": 29648,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Iterable, Literal, Union\n\nimport numpy as np\nfrom "
  },
  {
    "path": "lamindb/models/collection.py",
    "chars": 26745,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Any, Literal, overload\n\nfrom django.db import mode"
  },
  {
    "path": "lamindb/models/feature.py",
    "chars": 64115,
    "preview": "from __future__ import annotations\n\nimport importlib\nimport warnings\nfrom dataclasses import dataclass\nfrom typing impor"
  },
  {
    "path": "lamindb/models/has_parents.py",
    "chars": 21853,
    "preview": "# ruff: noqa: TC004\nfrom __future__ import annotations\n\nimport builtins\nfrom typing import TYPE_CHECKING, Literal\n\nimpor"
  },
  {
    "path": "lamindb/models/project.py",
    "chars": 20221,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, overload\n\nfrom django.core.validators import Regex"
  },
  {
    "path": "lamindb/models/query_manager.py",
    "chars": 11216,
    "preview": "from __future__ import annotations\n\nimport re\nfrom functools import reduce\nfrom typing import TYPE_CHECKING, Literal, Na"
  },
  {
    "path": "lamindb/models/query_set.py",
    "chars": 67359,
    "preview": "from __future__ import annotations\n\nimport ast\nimport re\nimport warnings\nfrom collections import UserList, defaultdict\nf"
  },
  {
    "path": "lamindb/models/record.py",
    "chars": 37129,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, Any, overload\n\nimport pgtrigger\nfrom django.conf i"
  },
  {
    "path": "lamindb/models/run.py",
    "chars": 22288,
    "preview": "from __future__ import annotations\n\nimport os\nimport subprocess\nimport sys\nfrom typing import TYPE_CHECKING, overload\n\nf"
  },
  {
    "path": "lamindb/models/save.py",
    "chars": 20915,
    "preview": "# ruff: noqa: TC004\nfrom __future__ import annotations\n\nimport os\nimport shutil\nimport traceback\nfrom collections import"
  },
  {
    "path": "lamindb/models/schema.py",
    "chars": 55222,
    "preview": "from __future__ import annotations\n\nimport warnings\nfrom typing import TYPE_CHECKING, Any, Type, overload\n\nimport numpy "
  },
  {
    "path": "lamindb/models/sqlrecord.py",
    "chars": 98341,
    "preview": "from __future__ import annotations\n\nimport builtins\nimport gzip\nimport inspect\nimport os\nimport re\nimport shutil\nimport "
  },
  {
    "path": "lamindb/models/storage.py",
    "chars": 16193,
    "preview": "from __future__ import annotations\n\nfrom typing import (\n    TYPE_CHECKING,\n    overload,\n)\nfrom uuid import UUID\n\nfrom "
  },
  {
    "path": "lamindb/models/transform.py",
    "chars": 23042,
    "preview": "from __future__ import annotations\n\nimport warnings\nfrom typing import TYPE_CHECKING, overload\n\nfrom django.db import mo"
  },
  {
    "path": "lamindb/models/ulabel.py",
    "chars": 12278,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING, overload\n\nimport pgtrigger\nfrom django.conf import"
  },
  {
    "path": "lamindb/py.typed",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lamindb/setup/__init__.py",
    "chars": 507,
    "preview": "import lamindb_setup as _lamindb_setup\nfrom lamindb_setup import *  # noqa: F403\nfrom lamindb_setup import (\n    connect"
  },
  {
    "path": "lamindb/setup/_merge.py",
    "chars": 3723,
    "preview": "# Tested in lamin-cli (tests/core/test_create_switch_delete_list_settings.py::test_merge*).\nfrom __future__ import annot"
  },
  {
    "path": "lamindb/setup/_switch.py",
    "chars": 1263,
    "preview": "from __future__ import annotations\n\nfrom typing import TYPE_CHECKING\n\nfrom lamin_utils import logger\nfrom lamindb_setup "
  },
  {
    "path": "lamindb/setup/core/__init__.py",
    "chars": 167,
    "preview": "import lamindb_setup as _lamindb_setup\nfrom lamindb_setup.core import *  # noqa: F403\n\n__doc__ = _lamindb_setup.core.__d"
  },
  {
    "path": "lamindb/setup/errors/__init__.py",
    "chars": 171,
    "preview": "import lamindb_setup as _lamindb_setup\nfrom lamindb_setup.errors import *  # noqa: F403\n\n__doc__ = _lamindb_setup.errors"
  },
  {
    "path": "lamindb/setup/types/__init__.py",
    "chars": 169,
    "preview": "import lamindb_setup as _lamindb_setup\nfrom lamindb_setup.types import *  # noqa: F403\n\n__doc__ = _lamindb_setup.types._"
  },
  {
    "path": "lamindb_full.py",
    "chars": 397,
    "preview": "\"\"\"Full/meta-package module for the `lamindb` distribution.\"\"\"\n\nfrom __future__ import annotations\n\nimport re\nfrom pathl"
  },
  {
    "path": "noxfile.py",
    "chars": 16112,
    "preview": "import os\nimport shutil\nfrom pathlib import Path\n\nimport nox\nfrom laminci import convert_executable_md_files, upload_doc"
  },
  {
    "path": "pyproject.full.toml",
    "chars": 1237,
    "preview": "[build-system]\nrequires = [\"flit_core >=3.2,<4\"]\nbuild-backend = \"flit_core.buildapi\"\n\n[project]\nname = \"lamindb\"\nrequir"
  },
  {
    "path": "pyproject.toml",
    "chars": 7784,
    "preview": "[build-system]\nrequires = [\"flit_core >=3.2,<4\"]\nbuild-backend = \"flit_core.buildapi\"\n\n[project]\nname = \"lamindb-core\"\nr"
  },
  {
    "path": "scripts/migrate_test_instances.py",
    "chars": 911,
    "preview": "#!/usr/bin/env python3\n\"\"\"Migrate all LaminDB instances used in lamindb tests.\n\nFor each instance: connect, run migratio"
  },
  {
    "path": "tests/core/_dataset_fixtures.py",
    "chars": 1977,
    "preview": "from pathlib import Path\nfrom typing import Generator\n\nimport lamindb as ln\nimport numpy as np\nimport pandas as pd\nimpor"
  },
  {
    "path": "tests/core/conftest.py",
    "chars": 6635,
    "preview": "import os\nimport shutil\nfrom pathlib import Path\nfrom subprocess import DEVNULL, run\nfrom time import perf_counter\n\nimpo"
  },
  {
    "path": "tests/core/notebooks/basic-r-notebook.Rmd.cleaned.html",
    "chars": 2771,
    "preview": "<!doctype html>\n<html>\n  <meta charset=\"utf-8\" />\n  \n  \n\n  <!-- rnb-text-begin -->\n  <!-- rnb-text-end -->\n  <!-- rnb-ch"
  },
  {
    "path": "tests/core/notebooks/basic-r-notebook.Rmd.html",
    "chars": 2943,
    "preview": "<!doctype html>\n<html>\n  <meta charset=\"utf-8\" />\n  <title>My exemplary R analysis</title>\n  <h1 class=\"title toc-ignore"
  },
  {
    "path": "tests/core/notebooks/duplicate/with-title-initialized-consecutive-finish.ipynb",
    "chars": 927,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# My duplicated test notebook (cons"
  },
  {
    "path": "tests/core/notebooks/load_schema.ipynb",
    "chars": 1007,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0\",\n   \"metadata\": {},\n   \"outputs\": [],"
  },
  {
    "path": "tests/core/notebooks/no-title.ipynb",
    "chars": 1336,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0\",\n   \"metadata\": {},\n   \"source\": [\n    \"A notebook without ti"
  },
  {
    "path": "tests/core/notebooks/with-title-initialized-consecutive-finish-not-last-cell.ipynb",
    "chars": 1382,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# My test notebook (consecutive) wi"
  }
]

// ... and 88 more files (download for full content)

About this extraction

This page contains the full source code of the laminlabs/lamindb GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 288 files (2.7 MB), approximately 728.5k tokens, and a symbol index with 1902 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo