[
  {
    "path": ".bumpversion.cfg",
    "content": "[bumpversion]\ncurrent_version = 0.5.2\ncommit = True\ntag = False\nparse = (?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)(\\.(?P<release>[a-z]+)(?P<build>\\d+))?\nserialize = \n\t{major}.{minor}.{patch}.{release}{build}\n\t{major}.{minor}.{patch}\n\n[bumpversion:part:release]\noptional_value = rc\nfirst_value = dev\nvalues = \n\tdev\n\trc\n\n[bumpversion:part:build]\n\n[bumpversion:file:setup.py]\nsearch = version='{current_version}'\nreplace = version='{new_version}'\n\n[bumpversion:file:docs/conf.py]\nsearch = version = release = '{current_version}'\nreplace = version = release = '{new_version}'\n\n[bumpversion:file:src/hangar/__init__.py]\nsearch = __version__ = '{current_version}'\nreplace = __version__ = '{new_version}'\n\n[bumpversion:file:src/hangar/diagnostics/__init__.py]\nsearch = __version__ = '{current_version}'\nreplace = __version__ = '{new_version}'\n"
  },
  {
    "path": ".coveragerc",
    "content": "[paths]\nsource =\n   src\n\n[run]\nbranch = True\nparallel = True\nsource =\n    hangar\n    tests\nomit =\n    */hangar/__main__.py\n    */hangar_service_pb2.py\n    */hangar_service_pb2_grpc.py\n    */hangar_service_pb2.pyi\n\n[report]\nexclude_lines =\n    pragma: no cover\n    def __repr__\n    def _repr_pretty_\n    def _ipython_key_completions_\nshow_missing = True\nprecision = 2\nomit = *migrations*\n"
  },
  {
    "path": ".editorconfig",
    "content": "# see http://editorconfig.org\nroot = true\n\n[*]\nend_of_line = lf\ntrim_trailing_whitespace = true\ninsert_final_newline = true\nindent_style = space\nindent_size = 4\ncharset = utf-8\n\n[*.{bat,cmd,ps1}]\nend_of_line = crlf\n"
  },
  {
    "path": ".gitattributes",
    "content": "* text=auto\n\n*.bat eol=crlf\n*.cmd eol=crlf\n*.ps1 eol=lf\n*.sh eol=lf\n*.rtf -text"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: \"[BUG REPORT]\"\nlabels: 'Bug: Awaiting Priority Assignment'\nassignees: ''\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n\n**Severity**\n<!--- fill in the space between `[ ]` with and `x` (ie. `[x]`) --->\nSelect an option:\n- [ ] Data Corruption / Loss of Any Kind\n- [ ] Unexpected Behavior, Exceptions or Error Thrown\n- [ ] Performance Bottleneck\n\n**To Reproduce**\nSteps to reproduce the behavior, minimal example code preferred:\n\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n\n**Screenshots**\nIf applicable, add screenshots to help explain your problem.\n\n\n**Desktop (please complete the following information):**\n\n - OS:\n - Python:\n - Hangar:\n\n**Additional context**\nAdd any other context about the problem here.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: \"[FEATURE REQUEST]\"\nlabels: enhancement\nassignees: ''\n\n---\n\n**Is your feature request related to a problem? Please describe.**\nA clear and concise description of what the problem is. Ex. I'm always frustrated when [...]\n\n**Describe the solution you'd like**\nA clear and concise description of what you want to happen.\n\n**Describe alternatives you've considered**\nA clear and concise description of any alternative solutions or features you've considered.\n\n**Additional context**\nAdd any other context or screenshots about the feature request here.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/questions_and_documentation.md",
    "content": "---\nname: Questions and Documentation\nabout: Is something confusing? The documentation not clear? We can help\ntitle: \"[QUESTION & DOCS]: \"\nlabels: documentation, question\nassignees: ''\n\n---\n\n**Executive Summary**\nIn one to two sentences, describe your question or issue with the documentation:\n\n\n\n**Additional Context / Explantation**\n(if applicable) provide more info about the question/problem (we love example code & screenshots!)\n\n\n\n**Desktop (If applicable, please complete the following version information):**\n\n - OS:\n - Python:\n - Hangar Version:\n  - _Install Type_\n    <!--- fill in the space between `[ ]` with and `x` (ie. `[x]`) --->\n    <!--- For Source Build, include commit hash if possible --->\n    - [ ] Source Build\n    - [ ] Pip install\n    - [ ] Conda (conda-forge) install\n\n\n**External Links**\n(If applicable) reference other issues, read the docs pages, code docstrings.\n\n  -\n  <!--- insert more `bullets` as needed --->\n"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "content": "## Motivation and Context\n#### _Why is this change required? What problem does it solve?:_\n\n\n\n#### _If it fixes an open issue, please link to the issue here:_\n\n\n## Description\n#### _Describe your changes in detail:_\n\n\n## Screenshots (if appropriate):\n\n## Types of changes\nWhat types of changes does your code introduce? Put an `x` in all the boxes that apply:\n- [ ] Documentation update\n- [ ] Bug fix (non-breaking change which fixes an issue)\n- [ ] New feature (non-breaking change which adds functionality)\n- [ ] Breaking change (fix or feature that would cause existing functionality to change)\n\nIs this PR ready for review, or a work in progress?\n- [ ] Ready for review\n- [ ] Work in progress\n\n## How Has This Been Tested?\nPut an `x` in the boxes that apply:\n- [ ] Current tests cover modifications made\n- [ ] New tests have been added to the test suite\n- [ ] Modifications were made to existing tests to support these changes\n- [ ] Tests may be needed, but they are not included when the PR was proposed\n- [ ] I don't know. Help!\n\n## Checklist:\n- [ ] My code follows the code style of this project.\n- [ ] My change requires a change to the documentation.\n- [ ] I have updated the documentation accordingly.\n- [ ] I have read the **[CONTRIBUTING](../CONTRIBUTING.rst)** document.\n- [ ] I have signed (or will sign when prompted) the tensorwork CLA.\n- [ ] I have added tests to cover my changes.\n- [ ] All new and existing tests passed.\n"
  },
  {
    "path": ".github/workflows/asvbench.yml",
    "content": "name: ASV Benchmarking\n\non:\n  pull_request:\n    branches:\n    - master\n\njobs:\n  run_benchmarks:\n    runs-on: ${{ matrix.os }}\n    strategy:\n      max-parallel: 4\n      fail-fast: false\n      matrix:\n        os: [ubuntu-18.04, macOS-10.14]\n        python-version: [3.6, 3.7]\n    steps:\n    - uses: actions/checkout@v1\n    - name: Set up Python ${{ matrix.python-version }}\n      uses: actions/setup-python@v1\n      with:\n        python-version: ${{ matrix.python-version }}\n    - name: Install dependencies\n      run: |\n        python -m pip install --upgrade pip\n        pip install --upgrade setuptools\n        pip install virtualenv==16.7.9\n        pip install git+https://github.com/airspeed-velocity/asv\n    - name: Run Benchmarks\n      run: |\n        cd asv_bench/\n        asv machine --yes\n        asv continuous --split origin/master HEAD | tee -a asv_continuous.log\n      shell: bash\n      continue-on-error: true\n    - name: Show Comparison\n      run: |\n        cd asv_bench/\n        asv compare --split origin/master HEAD | tee -a asv_compare.log\n        if [[ $(cat asv_continuous.log | grep \"PERFORMANCE DECREASED\") ]]; then\n          echo \"Benchmarks Performance Decreased\"\n          exit 1\n        elif [[ $(cat asv_continuous.log | grep \"PERFORMANCE INCREASED\") ]]; then\n          echo \"Benchmark Performance Increased\"\n        else\n          echo \"Benchmarks Run Without Errors, No Significant Change.\"\n        fi\n      shell: bash\n"
  },
  {
    "path": ".github/workflows/release.yml",
    "content": "name: release\n\non:\n  release:\n    types: [published, prereleased]\n\njobs:\n  build-linux-cp36:\n    runs-on: ubuntu-latest\n    container: quay.io/pypa/manylinux2014_x86_64\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Install Python package dependencies\n      run: /opt/python/cp36-cp36m/bin/python -m pip install cython wheel setuptools\n    - name: Build binary wheel\n      run: /opt/python/cp36-cp36m/bin/python setup.py bdist_wheel\n    - name: Apply auditwheel\n      run: auditwheel repair -w dist dist/*\n    - name: Remove linux wheel\n      run: rm dist/*-linux_x86_64.whl\n    - name: Archive dist artifacts\n      uses: actions/upload-artifact@v1\n      with:\n        name: dist-linux-3.6\n        path: dist\n\n  build-linux-cp37:\n    runs-on: ubuntu-latest\n    container: quay.io/pypa/manylinux2014_x86_64\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Install Python package dependencies\n      run: /opt/python/cp37-cp37m/bin/python -m pip install cython wheel setuptools\n    - name: Build binary wheel\n      run: /opt/python/cp37-cp37m/bin/python setup.py bdist_wheel\n    - name: Apply auditwheel\n      run: auditwheel repair -w dist dist/*\n    - name: Remove linux wheel\n      run: rm dist/*-linux_x86_64.whl\n    - name: Archive dist artifacts\n      uses: actions/upload-artifact@v1\n      with:\n        name: dist-linux-3.7\n        path: dist\n\n  build-linux-cp38:\n    runs-on: ubuntu-latest\n    container: quay.io/pypa/manylinux2014_x86_64\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Install Python package dependencies\n      run: /opt/python/cp38-cp38/bin/python -m pip install cython wheel setuptools\n    - name: Build binary wheel\n      run: /opt/python/cp38-cp38/bin/python setup.py bdist_wheel\n    - name: Apply auditwheel for manylinux wheel\n      run: auditwheel repair -w dist dist/*\n    - name: Remove linux wheel\n      run: rm dist/*-linux_x86_64.whl\n    - name: Archive dist artifacts\n      uses: actions/upload-artifact@v1\n      with:\n        name: dist-linux-3.8\n        path: dist\n\n  build-macos:\n    runs-on: macos-latest\n    strategy:\n      max-parallel: 4\n      matrix:\n        python-version: [3.6, 3.7, 3.8]\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Set up Python ${{ matrix.python-version }} x64\n      uses: actions/setup-python@v1\n      with:\n        python-version: ${{ matrix.python-version }}\n        architecture: x64\n    - name: Install Python package dependencies\n      run: pip install cython wheel setuptools\n    - name: Build binary wheel\n      run: python setup.py bdist_wheel\n    - name: Archive dist artifacts\n      uses: actions/upload-artifact@v1\n      with:\n        name: dist-macos-${{ matrix.python-version }}\n        path: dist\n\n  build-windows:\n    runs-on: windows-latest\n    strategy:\n      max-parallel: 3\n      matrix:\n        python-version: [3.6, 3.7, 3.8]\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Download Build Tools for Visual Studio 2019\n      run: Invoke-WebRequest -Uri https://aka.ms/vs/16/release/vs_buildtools.exe -OutFile vs_buildtools.exe\n    - name: Run vs_buildtools.exe install\n      run: ./vs_buildtools.exe --quiet --wait --norestart --nocache --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.VC.v141.x86.x64 --add Microsoft.VisualStudio.Component.VC.140 --includeRecommended\n    - name: Set up Python ${{ matrix.python-version }} x64\n      uses: actions/setup-python@v1\n      with:\n        python-version: ${{ matrix.python-version }}\n        architecture: x64\n    - name: Install Python package dependencies\n      run: pip install cython wheel setuptools\n    - name: Build binary wheel\n      run: python setup.py bdist_wheel\n    - name: Archive dist artifacts\n      uses: actions/upload-artifact@v1\n      with:\n        name: dist-windows-${{ matrix.python-version }}\n        path: dist\n\n  upload:\n    needs: [build-linux-cp36, build-linux-cp37, build-linux-cp38, build-macos, build-windows]\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/checkout@v1\n    - name: Set up Python\n      uses: actions/setup-python@v1\n      with:\n        python-version: 3.8\n    - name: Install dependencies\n      run: |\n        python -m pip install --upgrade pip\n        pip install cython wheel setuptools\n\n    - name: Create source dist\n      run: python setup.py sdist\n\n    # Linux\n    - name: Stage linux 3.6\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-linux-3.6\n    - run: mv -v dist-linux-3.6/* dist/\n\n    - name: Stage linux 3.7\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-linux-3.7\n    - run: mv -v dist-linux-3.7/* dist/\n\n    - name: Stage linux 3.8\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-linux-3.8\n    - run: mv -v dist-linux-3.8/* dist/\n\n    # MacOS\n    - name: Stage macos 3.6\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-macos-3.6\n    - run: mv -v dist-macos-3.6/* dist/\n\n    - name: Stage macos 3.7\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-macos-3.7\n    - run: mv -v dist-macos-3.7/* dist/\n\n    - name: Stage macos 3.8\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-macos-3.8\n    - run: mv -v dist-macos-3.8/* dist/\n\n    # Windows\n    - name: Stage windows 3.6\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-windows-3.6\n    - run: mv -v dist-windows-3.6/* dist/\n\n    - name: Stage windows 3.7\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-windows-3.7\n    - run: mv -v dist-windows-3.7/* dist/\n\n    - name: Stage windows 3.8\n      uses: actions/download-artifact@v1\n      with:\n        name: dist-windows-3.8\n    - run: mv -v dist-windows-3.8/* dist/\n\n    - name: Upload PreRelease to Test PyPi with Twine\n      if: \"github.event.release.prerelease\"\n      env:\n        TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}\n        TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}\n      run: |\n        ls -l dist/*\n        pip install twine\n        twine upload --repository-url https://test.pypi.org/legacy/ dist/*\n\n    - name: Upload Release to PyPi with Twine\n      if: \"!github.event.release.prerelease\"\n      env:\n        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}\n        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}\n      run: |\n        ls -l dist/*\n        pip install twine\n        twine upload dist/*\n"
  },
  {
    "path": ".github/workflows/testsphinx.yml",
    "content": "name: Build Sphinx Docs\n\non:\n  pull_request:\n    branches:\n      - master\n  push:\n    branches:\n      - master\n\njobs:\n  build_docs:\n    runs-on: ubuntu-latest\n    strategy:\n      fail-fast: false\n\n    steps:\n      - uses: actions/checkout@v2\n      - name: Set up Python 3.7\n        uses: actions/setup-python@v1\n        with:\n          python-version: 3.7\n      - name: Install dependencies\n        run: |\n          python -m pip install --upgrade setuptools pip wheel tox\n          sudo apt-get update\n          sudo apt-get install pandoc\n      - name: Run Documentation Generator\n        run: tox -e docs\n        env:\n          GH_ACTIONS_PROC_NR: 1\n"
  },
  {
    "path": ".github/workflows/testsuite.yml",
    "content": "name: Run Test Suite\n\non:\n  pull_request:\n    branches:\n      - master\n  push:\n    branches:\n      - master\n\njobs:\n  run_test_suite:\n    runs-on: ${{ matrix.platform }}\n    strategy:\n      fail-fast: false\n      matrix:\n        # https://help.github.com/articles/virtual-environments-for-github-actions\n        testcover: [yes, no]\n        testml: [no, yes]\n        platform:\n          - windows-latest\n          - macos-latest\n          - ubuntu-latest\n        python-version: [3.6, 3.7, 3.8]\n        exclude:\n          # tensorflow-cpu:latest (2.1.0) is not available for python 3.8 yet.\n          - python-version: 3.8\n            testml: yes\n          # build time with limited macos jobs\n          - platform: macos-latest\n            python-version: 3.7\n          - platform: windows-latest\n            python-version: 3.7\n            testml: yes\n\n    steps:\n    - uses: actions/checkout@v2\n    - name: Set up Python ${{ matrix.python-version }}\n      uses: actions/setup-python@v2\n      with:\n        python-version: ${{ matrix.python-version }}\n    - name: Install dependencies\n      run: |\n        python -m pip install --upgrade setuptools wheel\n        # Use the latest published version for myself :)\n        python -m pip install tox-gh-actions\n    - name: Run Tests Without Coverage Report\n      if: matrix.testcover == 'no'\n      run: tox\n      env:\n        PYTEST_XDIST_PROC_NR: 2\n        TESTCOVER: ${{ matrix.testcover }}\n        TESTML: ${{ matrix.testml }}\n    - name: Run Tests With Coverage Report\n      if: matrix.testcover == 'yes'\n      run: tox -- --cov-report xml\n      env:\n        PYTEST_XDIST_PROC_NR: 2\n        TESTCOVER: ${{ matrix.testcover }}\n        TESTML: ${{ matrix.testml }}\n    - name: Upload Coverage Report to Codecov\n      if: matrix.testcover == 'yes'\n      run: bash <(curl -s https://codecov.io/bash) -n \"${CC_PLAT}-py${CC_PY}-cov${CC_COV}-ml${CC_ML}\"\n      shell: bash\n      env:\n        CC_PLAT: ${{ matrix.platform }}\n        CC_PY: ${{ matrix.python-version }}\n        CC_COV: ${{ matrix.testcover }}\n        CC_ML: ${{ matrix.testml }}\n"
  },
  {
    "path": ".gitignore",
    "content": "*.py[cod]\n\n# C extensions\n*.c\n*.so\ncython_debug/\n\n# cython annotation files\nsrc/hangar/backends/*.html\ndocs/_build\n\n# Packages\n*.egg\n*.egg-info\ndist\nbuild\neggs\n.eggs\nparts\nbin\nvar\nsdist\nwheelhouse\ndevelop-eggs\n.installed.cfg\nlib\nlib64\nvenv*/\npyvenv*/\nMANIFEST\n\n# Installer logs\npip-log.txt\n\n# Unit test / coverage reports\n.coverage\n.tox\n.coverage.*\n.pytest_cache/\nnosetests.xml\ncoverage.xml\nhtmlcov\n.hypothesis\n\n# Performance Testing\nasv_bench/html\nasv_bench/env\nasv_bench/results\n\n# Translations\n*.mo\n\n# Mr Developer\n.mr.developer.cfg\n.project\n.pydevproject\n.idea\n*.iml\n*.komodoproject\n\n# Complexity\noutput/*.html\noutput/*/index.html\n\n# Sphinx\ndocs/_build\n\n.DS_Store\n*~\n.*.sw[po]\n.build\n.ve\n.env\n.cache\n.pytest\n.bootstrap\n.appveyor.token\n*.bak\n\n# Mypy Cache\n.mypy_cache/\n.dmypy.json\nmonkeytype.sqlite3\n\n# IDE Settings\n.vscode/\n.ipynb_checkpoints/\n\n# Testing data\n*.pkl.gz\n\n*.sqlite3\n*.dmypy.json\n"
  },
  {
    "path": ".readthedocs.yml",
    "content": "# .readthedocs.yml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\n# Required\nversion: 2\n\n# Build documentation in the docs/ directory with Sphinx\nsphinx:\n  configuration: docs/conf.py\n\n# Optionally build your docs in additional formats such as PDF and ePub\nformats: all\n\n# Optionally set the version of Python and requirements required to build your docs\npython:\n  version: 3.7\n  install:\n    - requirements: docs/requirements.txt\n    - method: pip\n      path: .\n    - method: setuptools\n      path: .\n    - requirements: docs/requirements_rtd.txt\n  system_packages: true"
  },
  {
    "path": "AUTHORS.rst",
    "content": "Authors\n=======\n\n* Richard Izzo - rick@tensorwerk.com\n* Luca Antiga - luca@tensorwerk.com\n* Sherin Thomas - sherin@tensorwerk.com\n* Alessia Marcolini - alessia@tensorwerk.com"
  },
  {
    "path": "CHANGELOG.rst",
    "content": "==========\nChange Log\n==========\n\n\n_`In-Progress`\n==============\n\nImprovements\n------------\n\n* New API design for datasets (previously dataloaders) for machine learning libraries.\n  (`#187 <https://github.com/tensorwerk/hangar-py/pull/187>`__) `@hhsecond <<https://github.com/hhsecond>>`__\n\n`v0.5.2`_ (2020-05-08)\n======================\n\nNew Features\n------------\n\n* New column data type supporting arbitrary ``bytes`` data.\n  (`#198 <https://github.com/tensorwerk/hangar-py/pull/198>`__) `@rlizzo <https://github.com/rlizzo>`__\n\nImprovements\n------------\n\n* ``str`` typed columns can now accept data containing any unicode code-point. In prior releases\n  data containing any ``non-ascii`` character could not be written to this column type.\n  (`#198 <https://github.com/tensorwerk/hangar-py/pull/198>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nBug Fixes\n---------\n\n* Fixed issue where ``str`` and (newly added) ``bytes`` column data could not be fetched / pushed\n  between a local client repository and remote server.\n  (`#198 <https://github.com/tensorwerk/hangar-py/pull/198>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\n\n`v0.5.1`_ (2020-04-05)\n======================\n\nBugFixes\n--------\n\n* Fixed issue where importing ``make_torch_dataloader`` or ``make_tf_dataloader`` under python 3.6\n  Would raise a ``NameError`` irrigardless of if the package is installed.\n  (`#196 <https://github.com/tensorwerk/hangar-py/pull/196>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\n`v0.5.0`_ (2020-04-4)\n=====================\n\nImprovements\n------------\n\n* Python 3.8 is now fully supported.\n  (`#193 <https://github.com/tensorwerk/hangar-py/pull/193>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Major backend overhaul which defines column layouts and data types in the same interchangable\n  / extensable manner as storage backends. This will allow rapid development of new layouts and\n  data type support as new use cases are discovered by the community.\n  (`#184 <https://github.com/tensorwerk/hangar-py/pull/184>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Column and backend classes are now fully serializable (pickleable) for ``read-only`` checkouts.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Modularized internal structure of API classes to easily allow new columnn layouts / data types\n  to be added in the future.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Improved type / value checking of manual specification for column ``backend`` and ``backend_options``.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Standardized column data access API to follow python standard library ``dict`` methods API.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Memory usage of arrayset checkouts has been reduced by ~70% by using C-structs for allocating\n  sample record locating info.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Read times from the ``HDF5_00`` and ``HDF5_01`` backend have been reduced by 33-38% (or more for\n  arraysets with many samples) by eliminating redundant computation of chunked storage B-Tree.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Commit times and checkout times have been reduced by 11-18% by optimizing record parsing and\n  memory allocation.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nNew Features\n------------\n\n* Added ``str`` type column with same behavior as ``ndarray`` column (supporting both\n  single-level and nested layouts) added to replace functionality of removed ``metadata`` container.\n  (`#184 <https://github.com/tensorwerk/hangar-py/pull/184>`__) `@rlizzo <https://github.com/rlizzo>`__\n* New backend based on ``LMDB`` has been added (specifier of ``lmdb_30``).\n  (`#184 <https://github.com/tensorwerk/hangar-py/pull/184>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added ``.diff()`` method to ``Repository`` class to enable diffing changes between any pair of\n  commits / branches without needing to open the diff base in a checkout.\n  (`#183 <https://github.com/tensorwerk/hangar-py/pull/183>`__) `@rlizzo <https://github.com/rlizzo>`__\n* New CLI command ``hangar diff`` which reports a summary view of changes made between any pair of\n  commits / branches.\n  (`#183 <https://github.com/tensorwerk/hangar-py/pull/183>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added ``.log()`` method to ``Checkout`` objects so graphical commit graph or machine readable\n  commit details / DAG can be queried when operating on a particular commit.\n  (`#183 <https://github.com/tensorwerk/hangar-py/pull/183>`__) `@rlizzo <https://github.com/rlizzo>`__\n* \"string\" type columns now supported alongside \"ndarray\" column type.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* New \"column\" API, which replaces \"arrayset\" name.\n  (`#180 <https://github.com/tensorwerk/hangar-py/pull/180>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Arraysets can now contain \"nested subsamples\" under a common sample key.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n* New API to add and remove samples from and arrayset.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added ``repo.size_nbytes`` and ``repo.size_human`` to report disk usage of a repository on disk.\n  (`#174 <https://github.com/tensorwerk/hangar-py/pull/174>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added method to traverse the entire repository history and cryptographically verify integrity.\n  (`#173 <https://github.com/tensorwerk/hangar-py/pull/173>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nChanges\n-------\n\n* Argument syntax of ``__getitem__()`` and ``get()`` methods of ``ReaderCheckout`` and\n  ``WriterCheckout`` classes. The new format supports handeling arbitrary arguments specific\n  to retrieval of data from any column type.\n  (`#183 <https://github.com/tensorwerk/hangar-py/pull/183>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nRemoved\n-------\n\n* ``metadata`` container for ``str`` typed data has been completly removed. It is replaced by a highly\n  extensible and much more user-friendly ``str`` typed column.\n  (`#184 <https://github.com/tensorwerk/hangar-py/pull/184>`__) `@rlizzo <https://github.com/rlizzo>`__\n* ``__setitem__()`` method in ``WriterCheckout`` objects.  Writing data to columns via a checkout object\n  is no longer supported.\n  (`#183 <https://github.com/tensorwerk/hangar-py/pull/183>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nBug Fixes\n---------\n\n* Backend data stores no longer use file symlinks, improving compatibility with some types file systems.\n  (`#171 <https://github.com/tensorwerk/hangar-py/pull/171>`__) `@rlizzo <https://github.com/rlizzo>`__\n* All arrayset types (\"flat\" and \"nested subsamples\") and backend readers can now be pickled -- for parallel\n  processing -- in a read-only checkout.\n  (`#179 <https://github.com/tensorwerk/hangar-py/pull/179>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nBreaking changes\n----------------\n\n* New backend record serialization format is incompatible with repositories written in version 0.4 or earlier.\n* New arrayset API is incompatible with Hangar API in version 0.4 or earlier.\n\n\n`v0.4.0`_ (2019-11-21)\n======================\n\nNew Features\n------------\n\n* Added ability to delete branch names/pointers from a local repository via both API and CLI.\n  (`#128 <https://github.com/tensorwerk/hangar-py/pull/128>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added ``local`` keyword arg to arrayset key/value iterators to return only locally available samples\n  (`#131 <https://github.com/tensorwerk/hangar-py/pull/131>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Ability to change the backend storage format and options applied to an ``arrayset`` after initialization.\n  (`#133 <https://github.com/tensorwerk/hangar-py/pull/133>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added blosc compression to HDF5 backend by default on PyPi installations.\n  (`#146 <https://github.com/tensorwerk/hangar-py/pull/146>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added Benchmarking Suite to Test for Performance Regressions in PRs.\n  (`#155 <https://github.com/tensorwerk/hangar-py/pull/155>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added new backend optimized to increase speeds for fixed size arrayset access.\n  (`#160 <https://github.com/tensorwerk/hangar-py/pull/160>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nImprovements\n------------\n\n* Removed ``msgpack`` and ``pyyaml`` dependencies. Cleaned up and improved remote client/server code.\n  (`#130 <https://github.com/tensorwerk/hangar-py/pull/130>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Multiprocess Torch DataLoaders allowed on Linux and MacOS.\n  (`#144 <https://github.com/tensorwerk/hangar-py/pull/144>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added CLI options ``commit``, ``checkout``, ``arrayset create``, & ``arrayset remove``.\n  (`#150 <https://github.com/tensorwerk/hangar-py/pull/150>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Plugin system revamp.\n  (`#134 <https://github.com/tensorwerk/hangar-py/pull/134>`__) `@hhsecond <https://github.com/hhsecond>`__\n* Documentation Improvements and Typo-Fixes.\n  (`#156 <https://github.com/tensorwerk/hangar-py/pull/156>`__) `@alessiamarcolini <https://github.com/alessiamarcolini>`__\n* Removed implicit removal of arrayset schema from checkout if every sample was removed from arrayset.\n  This could potentially result in dangling accessors which may or may not self-destruct (as expected)\n  in certain edge-cases.\n  (`#159 <https://github.com/tensorwerk/hangar-py/pull/159>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added type codes to hash digests so that calculation function can be updated in the future without\n  breaking repos written in previous Hangar versions.\n  (`#165 <https://github.com/tensorwerk/hangar-py/pull/165>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nBug Fixes\n---------\n\n* Programatic access to repository log contents now returns branch heads alongside other log info.\n  (`#125 <https://github.com/tensorwerk/hangar-py/pull/125>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Fixed minor bug in types of values allowed for ``Arrayset`` names vs ``Sample`` names.\n  (`#151 <https://github.com/tensorwerk/hangar-py/pull/151>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Fixed issue where using checkout object to access a sample in multiple arraysets would try to create\n  a ``namedtuple`` instance with invalid field names. Now incompatible field names are automatically\n  renamed with their positional index.\n  (`#161 <https://github.com/tensorwerk/hangar-py/pull/161>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Explicitly raise error if ``commit`` argument is set while checking out a repository with ``write=True``.\n  (`#166 <https://github.com/tensorwerk/hangar-py/pull/166>`__) `@rlizzo <https://github.com/rlizzo>`__\n\n\nBreaking changes\n----------------\n\n* New commit reference serialization format is incompatible with repositories written in version 0.3.0 or earlier.\n\n\n`v0.3.0`_ (2019-09-10)\n======================\n\nNew Features\n------------\n\n* API addition allowing reading and writing arrayset data from a checkout object directly.\n  (`#115 <https://github.com/tensorwerk/hangar-py/pull/115>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Data importer, exporters, and viewers via CLI for common file formats. Includes plugin system\n  for easy extensibility in the future.\n  (`#103 <https://github.com/tensorwerk/hangar-py/pull/103>`__)\n  (`@rlizzo <https://github.com/rlizzo>`__, `@hhsecond <https://github.com/hhsecond>`__)\n\nImprovements\n------------\n\n* Added tutorial on working with remote data.\n  (`#113 <https://github.com/tensorwerk/hangar-py/pull/113>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Added Tutorial on Tensorflow and PyTorch Dataloaders.\n  (`#117 <https://github.com/tensorwerk/hangar-py/pull/117>`__) `@hhsecond <https://github.com/hhsecond>`__\n* Large performance improvement to diff/merge algorithm (~30x previous).\n  (`#112 <https://github.com/tensorwerk/hangar-py/pull/112>`__) `@rlizzo <https://github.com/rlizzo>`__\n* New commit hash algorithm which is much more reproducible in the long term.\n  (`#120 <https://github.com/tensorwerk/hangar-py/pull/120>`__) `@rlizzo <https://github.com/rlizzo>`__\n* HDF5 backend updated to increase speed of reading/writing variable sized dataset compressed chunks\n  (`#120 <https://github.com/tensorwerk/hangar-py/pull/120>`__) `@rlizzo <https://github.com/rlizzo>`__\n\nBug Fixes\n---------\n\n* Fixed ML Dataloaders errors for a number of edge cases surrounding partial-remote data and non-common keys.\n  (`#110 <https://github.com/tensorwerk/hangar-py/pull/110>`__)\n  ( `@hhsecond <https://github.com/hhsecond>`__, `@rlizzo <https://github.com/rlizzo>`__)\n\nBreaking changes\n----------------\n\n* New commit hash algorithm is incompatible with repositories written in version 0.2.0 or earlier\n\n\n`v0.2.0`_ (2019-08-09)\n======================\n\nNew Features\n------------\n\n* Numpy memory-mapped array file backend added.\n  (`#70 <https://github.com/tensorwerk/hangar-py/pull/70>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Remote server data backend added.\n  (`#70 <https://github.com/tensorwerk/hangar-py/pull/70>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Selection heuristics to determine appropriate backend from arrayset schema.\n  (`#70 <https://github.com/tensorwerk/hangar-py/pull/70>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Partial remote clones and fetch operations now fully supported.\n  (`#85 <https://github.com/tensorwerk/hangar-py/pull/85>`__) `@rlizzo <https://github.com/rlizzo>`__\n* CLI has been placed under test coverage, added interface usage to docs.\n  (`#85 <https://github.com/tensorwerk/hangar-py/pull/85>`__) `@rlizzo <https://github.com/rlizzo>`__\n* TensorFlow and PyTorch Machine Learning Dataloader Methods (*Experimental Release*).\n  (`#91 <https://github.com/tensorwerk/hangar-py/pull/91>`__)\n  lead: `@hhsecond <https://github.com/hhsecond>`__, co-author: `@rlizzo <https://github.com/rlizzo>`__,\n  reviewed by: `@elistevens <https://github.com/elistevens>`__\n\nImprovements\n------------\n\n* Record format versioning and standardization so to not break backwards compatibility in the future.\n  (`#70 <https://github.com/tensorwerk/hangar-py/pull/70>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Backend addition and update developer protocols and documentation.\n  (`#70 <https://github.com/tensorwerk/hangar-py/pull/70>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Read-only checkout arrayset sample ``get`` methods now are multithread and multiprocess safe.\n  (`#84 <https://github.com/tensorwerk/hangar-py/pull/84>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Read-only checkout metadata sample ``get`` methods are thread safe if used within a context manager.\n  (`#101 <https://github.com/tensorwerk/hangar-py/pull/101>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Samples can be assigned integer names in addition to ``string`` names.\n  (`#89 <https://github.com/tensorwerk/hangar-py/pull/89>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Forgetting to close a ``write-enabled`` checkout before terminating the python process will close the\n  checkout automatically for many situations.\n  (`#101 <https://github.com/tensorwerk/hangar-py/pull/101>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Repository software version compatability methods added to ensure upgrade paths in the future.\n  (`#101 <https://github.com/tensorwerk/hangar-py/pull/101>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Many tests added (including support for Mac OSX on Travis-CI).\n  lead: `@rlizzo <https://github.com/rlizzo>`__, co-author: `@hhsecond <https://github.com/hhsecond>`__\n\nBug Fixes\n---------\n\n* Diff results for fast forward merges now returns sensible results.\n  (`#77 <https://github.com/tensorwerk/hangar-py/pull/77>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Many type annotations added, and developer documentation improved.\n  `@hhsecond <https://github.com/hhsecond>`__ & `@rlizzo <https://github.com/rlizzo>`__\n\nBreaking changes\n----------------\n\n* Renamed all references to ``datasets`` in the API / world-view to ``arraysets``.\n* These are backwards incompatible changes. For all versions > 0.2, repository upgrade utilities will\n  be provided if breaking changes occur.\n\n\n`v0.1.1`_ (2019-05-24)\n======================\n\nBug Fixes\n---------\n\n* Fixed typo in README which was uploaded to PyPi\n\n\n`v0.1.0`_ (2019-05-24)\n======================\n\nNew Features\n------------\n\n* Remote client-server config negotiation and administrator permissions.\n  (`#10 <https://github.com/tensorwerk/hangar-py/pull/10>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Allow single python process to access multiple repositories simultaneously.\n  (`#20 <https://github.com/tensorwerk/hangar-py/pull/20>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Fast-Forward and 3-Way Merge and Diff methods now fully supported and behaving as expected.\n  (`#32 <https://github.com/tensorwerk/hangar-py/pull/32>`__) `@rlizzo <https://github.com/rlizzo>`__\n\nImprovements\n------------\n\n* Initial test-case specification.\n  (`#14 <https://github.com/tensorwerk/hangar-py/pull/14>`__) `@hhsecond <https://github.com/hhsecond>`__\n* Checkout test-case work.\n  (`#25 <https://github.com/tensorwerk/hangar-py/pull/25>`__) `@hhsecond <https://github.com/hhsecond>`__\n* Metadata test-case work.\n  (`#27 <https://github.com/tensorwerk/hangar-py/pull/27>`__) `@hhsecond <https://github.com/hhsecond>`__\n* Any potential failure cases raise exceptions instead of silently returning.\n  (`#16 <https://github.com/tensorwerk/hangar-py/pull/16>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Many usability improvements in a variety of commits.\n\n\nBug Fixes\n---------\n\n* Ensure references to checkout arrayset or metadata objects cannot operate after the checkout is closed.\n  (`#41 <https://github.com/tensorwerk/hangar-py/pull/41>`__) `@rlizzo <https://github.com/rlizzo>`__\n* Sensible exception classes and error messages raised on a variety of situations (Many commits).\n  `@hhsecond <https://github.com/hhsecond>`__ & `@rlizzo <https://github.com/rlizzo>`__\n* Many minor issues addressed.\n\nAPI Additions\n-------------\n\n* Refer to API documentation (`#23 <https://github.com/tensorwerk/hangar-py/pull/23>`__)\n\nBreaking changes\n----------------\n\n* All repositories written with previous versions of Hangar are liable to break when using this version. Please upgrade versions immediately.\n\n\n`v0.0.0`_ (2019-04-15)\n======================\n\n* First Public Release of Hangar!\n\n.. _v0.0.0: https://github.com/tensorwerk/hangar-py/commit/2aff3805c66083a7fbb2ebf701ceaf38ac5165c7\n.. _v0.1.0: https://github.com/tensorwerk/hangar-py/compare/v0.0.0...v0.1.0\n.. _v0.1.1: https://github.com/tensorwerk/hangar-py/compare/v0.1.0...v0.1.1\n.. _v0.2.0: https://github.com/tensorwerk/hangar-py/compare/v0.1.1...v0.2.0\n.. _v0.3.0: https://github.com/tensorwerk/hangar-py/compare/v0.2.0...v0.3.0\n.. _v0.4.0: https://github.com/tensorwerk/hangar-py/compare/v0.3.0...v0.4.0\n.. _v0.5.0: https://github.com/tensorwerk/hangar-py/compare/v0.4.0...v0.5.0\n.. _v0.5.1:  https://github.com/tensorwerk/hangar-py/compare/v0.5.0...v0.5.1\n.. _v0.5.2:  https://github.com/tensorwerk/hangar-py/compare/v0.5.1...v0.5.2\n.. _In-Progress: https://github.com/tensorwerk/hangar-py/compare/v0.5.2...master\n"
  },
  {
    "path": "CODE_OF_CONDUCT.rst",
    "content": "===========================\nContributor Code of Conduct\n===========================\n\nOur Pledge\n----------\n\nIn the interest of fostering an open and welcoming environment, we as\ncontributors and maintainers pledge to making participation in our project and\nour community a harassment-free experience for everyone, regardless of age, body\nsize, disability, ethnicity, sex characteristics, gender identity and expression,\nlevel of experience, education, socio-economic status, nationality, personal\nappearance, race, religion, or sexual identity and orientation.\n\nOur Standards\n-------------\n\nExamples of behavior that contributes to creating a positive environment\ninclude:\n\n* Using welcoming and inclusive language\n* Being respectful of differing viewpoints and experiences\n* Gracefully accepting constructive criticism\n* Focusing on what is best for the community\n* Showing empathy towards other community members\n\nExamples of unacceptable behavior by participants include:\n\n* The use of sexualized language or imagery and unwelcome sexual attention or\n  advances\n* Trolling, insulting/derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or electronic\n  address, without explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\n  professional setting\n\nOur Responsibilities\n--------------------\n\nProject maintainers are responsible for clarifying the standards of acceptable\nbehavior and are expected to take appropriate and fair corrective action in\nresponse to any instances of unacceptable behavior.\n\nProject maintainers have the right and responsibility to remove, edit, or\nreject comments, commits, code, wiki edits, issues, and other contributions\nthat are not aligned to this Code of Conduct, or to ban temporarily or\npermanently any contributor for other behaviors that they deem inappropriate,\nthreatening, offensive, or harmful.\n\nScope\n-----\n\nThis Code of Conduct applies both within project spaces and in public spaces\nwhen an individual is representing the project or its community. Examples of\nrepresenting a project or community include using an official project e-mail\naddress, posting via an official social media account, or acting as an appointed\nrepresentative at an online or offline event. Representation of a project may be\nfurther defined and clarified by project maintainers.\n\nEnforcement\n-----------\n\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported by contacting the project team at\n`hangar.info@tensorwerk.com <hangar.info@tensorwerk.com>`__. All complaints will\nbe reviewed and investigated and will result in a response that is deemed\nnecessary and appropriate to the circumstances. The project team is obligated to\nmaintain confidentiality with regard to the reporter of an incident. Further\ndetails of specific enforcement policies may be posted separately.\n\nProject maintainers who do not follow or enforce the Code of Conduct in good\nfaith may face temporary or permanent repercussions as determined by other\nmembers of the project's leadership.\n\nAttribution\n-----------\n\nThis Code of Conduct is adapted from the `Contributor Covenant`_ homepage, version 1.4,\navailable at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html\n\n.. _Contributor Covenant: https://www.contributor-covenant.org\n\nFor answers to common questions about this code of conduct, see\nhttps://www.contributor-covenant.org/faq\n"
  },
  {
    "path": "CONTRIBUTING.rst",
    "content": "============\nContributing\n============\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will always be given.\n\nAll community members should read and abide by our :ref:`ref-code-of-conduct`.\n\nBug reports\n===========\n\nWhen `reporting a bug <https://github.com/tensorwerk/hangar-py/issues>`_ please include:\n\n    * Your operating system name and version.\n    * Any details about your local setup that might be helpful in\n      troubleshooting.\n    * Detailed steps to reproduce the bug.\n\nDocumentation improvements\n==========================\n\nHangar could always use more documentation, whether as part of the\nofficial Hangar docs, in docstrings, or even on the web in blog posts,\narticles, and such.\n\nFeature requests and feedback\n=============================\n\nThe best way to send feedback is to file an issue at https://github.com/tensorwerk/hangar-py/issues.\n\nIf you are proposing a feature:\n\n* Explain in detail how it would work.\n* Keep the scope as narrow as possible, to make it easier to implement.\n* Remember that this is a volunteer-driven project, and that code contributions\n  are welcome :)\n\nDevelopment\n===========\n\nTo set up `hangar-py` for local development:\n\n1. Fork `hangar-py <https://github.com/tensorwerk/hangar-py>`_\n   (look for the \"Fork\" button).\n2. Clone your fork locally::\n\n    git clone git@github.com:your_name_here/hangar-py.git\n\n3. Create a branch for local development::\n\n    git checkout -b name-of-your-bugfix-or-feature\n\n   Now you can make your changes locally.\n\n4. When you're done making changes, run all the checks, doc builder and spell\n   checker with `tox <http://tox.readthedocs.io/en/latest/install.html>`_ one\n   command::\n\n    tox\n\n5. Commit your changes and push your branch to GitHub::\n\n    git add .\n    git commit -m \"Your detailed description of your changes.\"\n    git push origin name-of-your-bugfix-or-feature\n\n6. Submit a pull request through the GitHub website.\n\nPull Request Guidelines\n-----------------------\n\nIf you need some code review or feedback while you're developing the code just\nmake the pull request.\n\nFor merging, you should:\n\n1. Include passing tests (run ``tox``) [1]_.\n2. Update documentation when there's new API, functionality etc.\n3. Add a note to ``CHANGELOG.rst`` about the changes.\n4. Add yourself to ``AUTHORS.rst``.\n\n.. [1] If you don't have all the necessary python versions available\n       locally you can rely on Travis - it will `run the tests\n       <https://travis-ci.org/tensorwerk/hangar-py/pull_requests>`_ for each change\n       you add in the pull request.\n\n       It will be slower though ...\n\nTips\n----\n\nTo run a subset of tests::\n\n    tox -e envname -- pytest -k test_myfeature\n\nTo run all the test environments in *parallel* (you need to ``pip install detox``)::\n\n    detox\n"
  },
  {
    "path": "LICENSE",
    "content": "\n                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   Copyright 2019 Richard Izzo\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License."
  },
  {
    "path": "MANIFEST.in",
    "content": "graft docs\ngraft src\ngraft tests\n\ninclude .bumpversion.cfg\ninclude .coveragerc\ninclude .editorconfig\n\ninclude AUTHORS.rst\ninclude CHANGELOG.rst\ninclude CONTRIBUTING.rst\ninclude CODE_OF_CONDUCT.rst\ninclude LICENSE\ninclude README.rst\n\ninclude tox.ini\ninclude mypy.ini\ninclude setup.py\n\nglobal-exclude *.py[cod] *.so *.DS_Store\nglobal-exclude __pycache__ .mypy_cache .pytest_cache .hypothesis\n"
  },
  {
    "path": "README.rst",
    "content": "========\nOverview\n========\n\n.. start-badges\n\n.. list-table::\n    :stub-columns: 1\n\n    * - docs\n      - |docs|\n    * - tests\n      - | |gh-build-status| |codecov|\n        | |lgtm|\n    * - package\n      - | |version| |wheel| |conda-forge|\n        | |supported-versions| |supported-implementations|\n        | |license|\n.. |docs| image:: https://readthedocs.org/projects/hangar-py/badge/?style=flat\n    :target: https://readthedocs.org/projects/hangar-py\n    :alt: Documentation Status\n\n.. |gh-build-status| image:: https://github.com/tensorwerk/hangar-py/workflows/Run%20Test%20Suite/badge.svg?branch=master\n    :alt: Build Status\n    :target: https://github.com/tensorwerk/hangar-py/actions?query=workflow%3A%22Run+Test+Suite%22+branch%3Amaster+event%3Apush+is%3Acompleted\n\n.. |codecov| image:: https://codecov.io/gh/tensorwerk/hangar-py/branch/master/graph/badge.svg\n   :alt: Code Coverage\n   :target: https://codecov.io/gh/tensorwerk/hangar-py\n\n.. |lgtm| image:: https://img.shields.io/lgtm/grade/python/g/tensorwerk/hangar-py.svg?logo=lgtm&logoWidth=18\n   :alt: Language grade: Python\n   :target: https://lgtm.com/projects/g/tensorwerk/hangar-py/context:python\n\n.. |version| image:: https://img.shields.io/pypi/v/hangar.svg\n    :alt: PyPI Package latest release\n    :target: https://pypi.org/project/hangar\n\n.. |license| image:: https://img.shields.io/github/license/tensorwerk/hangar-py\n   :alt: GitHub license\n   :target: https://github.com/tensorwerk/hangar-py/blob/master/LICENSE\n\n.. |conda-forge| image:: https://img.shields.io/conda/vn/conda-forge/hangar.svg\n   :alt: Conda-Forge Latest Version\n   :target: https://anaconda.org/conda-forge/hangar\n\n.. |wheel| image:: https://img.shields.io/pypi/wheel/hangar.svg\n    :alt: PyPI Wheel\n    :target: https://pypi.org/project/hangar\n\n.. |supported-versions| image:: https://img.shields.io/pypi/pyversions/hangar.svg\n    :alt: Supported versions\n    :target: https://pypi.org/project/hangar\n\n.. |supported-implementations| image:: https://img.shields.io/pypi/implementation/hangar.svg\n    :alt: Supported implementations\n    :target: https://pypi.org/project/hangar\n\n\n.. end-badges\n\nHangar is version control for tensor data. Commit, branch, merge, revert, and\ncollaborate in the data-defined software era.\n\n* Free software: Apache 2.0 license\n\nWhat is Hangar?\n===============\n\nHangar is based off the belief that too much time is spent collecting, managing,\nand creating home-brewed version control systems for data. At its core Hangar\nis designed to solve many of the same problems faced by traditional code version\ncontrol system (i.e. ``Git``), just adapted for numerical data:\n\n* Time travel through the historical evolution of a dataset\n* Zero-cost Branching to enable exploratory analysis and collaboration\n* Cheap Merging to build datasets over time (with multiple collaborators)\n* Completely abstracted organization and management of data files on disk\n* Ability to only retrieve a small portion of the data (as needed) while still\n  maintaining complete historical record\n* Ability to push and pull changes directly to collaborators or a central server\n  (i.e. a truly distributed version control system)\n\nThe ability of version control systems to perform these tasks for codebases is\nlargely taken for granted by almost every developer today; however, we are\nin-fact standing on the shoulders of giants, with decades of engineering which\nhas resulted in these phenomenally useful tools. Now that a new era of\n\"Data-Defined software\" is taking hold, we find there is a strong need for\nanalogous version control systems which are designed to handle numerical data at\nlarge scale... Welcome to Hangar!\n\n\nThe Hangar Workflow:\n\n::\n\n       Checkout Branch\n              |\n              ▼\n     Create/Access Data\n              |\n              ▼\n    Add/Remove/Update Samples\n              |\n              ▼\n           Commit\n\nLog Style Output:\n\n.. code-block:: text\n\n   *   5254ec (master) : merge commit combining training updates and new validation samples\n   |\\\n   | * 650361 (add-validation-data) : Add validation labels and image data in isolated branch\n   * | 5f15b4 : Add some metadata for later reference and add new training samples received after initial import\n   |/\n   *   baddba : Initial commit adding training images and labels\n\n\nLearn more about what Hangar is all about at https://hangar-py.readthedocs.io/\n\n\nInstallation\n============\n\nHangar is in early alpha development release!\n\n::\n\n    pip install hangar\n\nDocumentation\n=============\n\nhttps://hangar-py.readthedocs.io/\n\n\nDevelopment\n===========\n\nTo run the all tests run::\n\n    tox\n\nNote, to combine the coverage data from all the tox environments run:\n\n.. list-table::\n    :widths: 10 90\n    :stub-columns: 1\n\n    - - Windows\n      - ::\n\n            set PYTEST_ADDOPTS=--cov-append\n            tox\n\n    - - Other\n      - ::\n\n            PYTEST_ADDOPTS=--cov-append tox\n"
  },
  {
    "path": "asv_bench/README.rst",
    "content": "Hangar Performance Benchmarking Suite\n=====================================\n\nA set of benchmarking tools are included in order to track the performance of\ncommon hangar operations over the course of time. The benchmark suite is run\nvia the phenomenal `Airspeed Velocity (ASV) <https://asv.readthedocs.io/>`_\nproject.\n\nBenchmarks can be viewed at the following web link, or by examining the raw\ndata files in the separate benchmark results repo.\n\n-  `Benchmark Web View <https://tensorwerk.com/hangar-benchmarks>`_\n-  `Benchmark Results Repo <https://github.com/tensorwerk/hangar-benchmarks>`_\n\n.. figure:: ../docs/img/asv-detailed.png\n   :align: center\n\nPurpose\n*******\n\nIn addition to providing historical metrics and insight into application\nperformance over many releases of Hangar, *the benchmark suite is used as a\ncanary to identify potentially problematic pull requests.* All PRs to the\nHangar repository are automatically benchmarked by our CI system to compare the\nperformance of proposed changes to that of the current ``master`` branch.\n\n*The results of this canary are explicitly NOT to be used as the\n\"be-all-end-all\" decider of whether a PR is suitable to be merged or not.*\n\nInstead, it is meant to serve the following purposes:\n\n1. **Help contributors understand the consequences of some set of changes on the\n   greater system early in the PR process.** Simple code is best; if there's no\n   obvious performance degradation or significant improvement to be had, then\n   there's no need (or really rationale) for using more complex algorithms or\n   data structures. It's more work for the author, project maintainers, and\n   long term health of the codebase.\n\n2. **Not everything can be caught by the capabilities of a traditional test\n   suite.** Hangar is fairly flat/modular in structure, but there are certain\n   hotspots in the codebase where a simple change could drastically degrade\n   performance. It's not always obvious where these hotspots are, and even a\n   change which is functionally identical (introducing no issues/bugs to the\n   end user) can unknowingly cross a line and introduce some large regression\n   completely unnoticed to the authors/reviewers.\n\n3. Sometimes tradeoffs need to be made when introducing something new to a\n   system. Whether this be due to fundamental CS problems (space vs. time) or\n   simple matters of practicality vs. purity, it's always easier to act in\n   environments where relevant information is available before a decision is\n   made. **Identifying and quantifying tradeoffs/regressions/benefits during\n   development is the only way we can make informed decisions.** The only times\n   to be OK with some regression is when knowing about it in advance, it might\n   be the right choice at the time, but if we don't measure we will never know.\n\n\nImportant Notes on Using/Modifying the Benchmark Suite\n******************************************************\n\n1. **Do not commit any of the benchmark results, environment files, or generated\n   visualizations to the repository**. We store benchmark results in a `separate\n   repository <https://github.com/tensorwerk/hangar-benchmarks>`_ so to not\n   clutter the main repo with un-necessary data. The default directories these are\n   generated in are excluded in our ``.gitignore`` config, so baring some unusual\n   git usage patterns, this should not be a day-to-day concern.\n\n2. Proposed changes to the benchmark suite should be made to the code in this\n   repository first. The benchmark results repository mirror will be\n   synchronized upon approval/merge of changes to the main Hangar repo.\n\n\nIntroduction to Running Benchmarks\n**********************************\n\nAs ASV sets up and manages it's own virtual environments and source\ninstallations, benchmark execution is not run via ``tox``. While a brief\ntutorial is included below, please refer to the `ASV Docs\n<https://asv.readthedocs.io/>`_ for detailed information on how to both run,\nunderstand, and write ASV benchmarks.\n\nFirst Time Setup\n----------------\n\n1. Ensure that ``virtualenv``, ``setuptools``, ``pip`` are updated to the\n   latest version.\n\n2. Install ASV ``$ pip install asv``.\n\n3. Open a terminal and navigate to the ``hangar-py/asv-bench`` directory.\n\n4. Run ``$ asv machine`` to record details of your machine, it is OK to\n   just use the defaults.\n\n\nRunning Benchmarks\n------------------\n\nRefer to the `using ASV\n<https://asv.readthedocs.io/en/stable/using.html#running-benchmarks>`_ page for\na full tutorial, paying close attention to the `asv run\n<https://asv.readthedocs.io/en/stable/commands.html#asv-run>`_ command.\nGenerally ``asv run`` requires a range of commits to benchmark across\n(specified via either branch name, tags, or commit digests).\n\nTo benchmark every commit between the current master ``HEAD`` and ``v0.3.0``,\nyou would execute::\n\n    $ asv run v0.2.0..master\n\nHowever, this may result in a larger workload then you are willing to wait\naround for. To limit the number of commits, you can specify the ``--steps=N``\noption to only benchmark ``N`` commits at most between ``HEAD`` and ``v0.3.0``.\n\nThe most useful tool during development is the `asv continuous\n<https://asv.readthedocs.io/en/stable/commands.html#asv-continuous>`_ command.\nusing the following syntax will benchmark any changes in a local development\nbranch against the base ``master`` commit::\n\n    $ asv continuous origin/master HEAD\n\nRunning `asv compare\n<https://asv.readthedocs.io/en/stable/commands.html#asv-compare>`_ will\ngenerate a quick summary of any performance differences::\n\n    $ asv compare origin/master HEAD\n\nVisualizing Results\n-------------------\n\nAfter generating benchmark data for a number of commits through history, the\nresults can be reviewed in (an automatically generated) local web interface by\nrunning the following commands::\n\n    $ asv publish\n    $ asv preview\n\nNavigating to ``http://127.0.0.1:8080/`` will pull up an interactive webpage\nwhere the full set of benchmark graphs/explorations utilities can be viewed.\nThis will look something like the image below.\n\n.. figure:: ../docs/img/asv-main.png\n   :align: center"
  },
  {
    "path": "asv_bench/asv.conf.json",
    "content": "{\n    // The version of the config file format.  Do not change, unless\n    // you know what you are doing.\n    \"version\": 1,\n\n    // The name of the project being benchmarked\n    \"project\": \"hangar\",\n\n    // The project's homepage\n    \"project_url\": \"https://hangar-py.readthedocs.io\",\n\n    // The URL or local path of the source code repository for the\n    // project being benchmarked\n    \"repo\": \"..\",\n\n    // The Python project's subdirectory in your repo.  If missing or\n    // the empty string, the project is assumed to be located at the root\n    // of the repository.\n    // \"repo_subdir\": \"\",\n\n    // Customizable commands for building, installing, and\n    // uninstalling the project. See asv.conf.json documentation.\n    //\n    // \"install_command\": [\"in-dir={env_dir} python -mpip install {wheel_file}\"],\n    // \"uninstall_command\": [\"return-code=any python -mpip uninstall -y {project}\"],\n    // \"build_command\": [\n    //     \"python setup.py build\",\n    //     \"PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}\"\n    // ],\n\n    // List of branches to benchmark. If not provided, defaults to \"master\"\n    // (for git) or \"default\" (for mercurial).\n    \"branches\": [\"master\"], // for git\n    // \"branches\": [\"default\"],    // for mercurial\n\n    // The DVCS being used.  If not set, it will be automatically\n    // determined from \"repo\" by looking at the protocol in the URL\n    // (if remote), or by looking for special directories, such as\n    // \".git\" (if local).\n    \"dvcs\": \"git\",\n\n    // The tool to use to create environments.  May be \"conda\",\n    // \"virtualenv\" or other value depending on the plugins in use.\n    // If missing or the empty string, the tool will be automatically\n    // determined by looking for tools on the PATH environment\n    // variable.\n    \"environment_type\": \"virtualenv\",\n\n    // timeout in seconds for installing any dependencies in environment\n    // defaults to 10 min\n    //\"install_timeout\": 600,\n\n    // the base URL to show a commit for the project.\n    \"show_commit_url\": \"http://github.com/tensorwerk/hangar-py/commit/\",\n\n    // The Pythons you'd like to test against.  If not provided, defaults\n    // to the current version of Python used to run `asv`.\n    // \"pythons\": [\"3.7\"],\n\n    // The list of conda channel names to be searched for benchmark\n    // dependency packages in the specified order\n    // \"conda_channels\": [\"conda-forge\", \"defaults\"],\n\n    // The matrix of dependencies to test.  Each key is the name of a\n    // package (in PyPI) and the values are version numbers.  An empty\n    // list or empty string indicates to just test against the default\n    // (latest) version. null indicates that the package is to not be\n    // installed. If the package to be tested is only available from\n    // PyPi, and the 'environment_type' is conda, then you can preface\n    // the package name by 'pip+', and the package will be installed via\n    // pip (with all the conda available packages installed first,\n    // followed by the pip installed packages).\n    //\n    // \"matrix\": {\n    //     \"numpy\": [\"1.6\", \"1.7\"],\n    //     \"six\": [\"\", null],        // test with and without six installed\n    //     \"pip+emcee\": [\"\"],   // emcee is only available for install with pip.\n    // },\n    \"matrix\": {\n        \"req\": {\n            \"Cython\": [],  // latest version of Cython\n        },\n    },\n\n    // Combinations of libraries/python versions can be excluded/included\n    // from the set to test. Each entry is a dictionary containing additional\n    // key-value pairs to include/exclude.\n    //\n    // An exclude entry excludes entries where all values match. The\n    // values are regexps that should match the whole string.\n    //\n    // An include entry adds an environment. Only the packages listed\n    // are installed. The 'python' key is required. The exclude rules\n    // do not apply to includes.\n    //\n    // In addition to package names, the following keys are available:\n    //\n    // - python\n    //     Python version, as in the *pythons* variable above.\n    // - environment_type\n    //     Environment type, as above.\n    // - sys_platform\n    //     Platform, as in sys.platform. Possible values for the common\n    //     cases: 'linux2', 'win32', 'cygwin', 'darwin'.\n    //\n    // \"exclude\": [\n    //     {\"python\": \"3.2\", \"sys_platform\": \"win32\"}, // skip py3.2 on windows\n    //     {\"environment_type\": \"conda\", \"six\": null}, // don't run without six on conda\n    // ],\n    //\n    // \"include\": [\n    //     // additional env for python2.7\n    //     {\"python\": \"2.7\", \"numpy\": \"1.8\"},\n    //     // additional env if run on windows+conda\n    //     {\"platform\": \"win32\", \"environment_type\": \"conda\", \"python\": \"2.7\", \"libpython\": \"\"},\n    // ],\n\n    // The directory (relative to the current directory) that benchmarks are\n    // stored in.  If not provided, defaults to \"benchmarks\"\n    \"benchmark_dir\": \"benchmarks\",\n\n    // The directory (relative to the current directory) to cache the Python\n    // environments in.  If not provided, defaults to \"env\"\n    \"env_dir\": \"env\",\n\n    // The directory (relative to the current directory) that raw benchmark\n    // results are stored in.  If not provided, defaults to \"results\".\n    \"results_dir\": \"results\",\n\n    // The directory (relative to the current directory) that the html tree\n    // should be written to.  If not provided, defaults to \"html\".\n    \"html_dir\": \"html\",\n\n    // The number of characters to retain in the commit hashes.\n    \"hash_length\": 8,\n\n    // `asv` will cache results of the recent builds in each\n    // environment, making them faster to install next time.  This is\n    // the number of builds to keep, per environment.\n    \"build_cache_size\": 2\n\n    // The commits after which the regression search in `asv publish`\n    // should start looking for regressions. Dictionary whose keys are\n    // regexps matching to benchmark names, and values corresponding to\n    // the commit (exclusive) after which to start looking for\n    // regressions.  The default is to start from the first commit\n    // with results. If the commit is `null`, regression detection is\n    // skipped for the matching benchmark.\n    //\n    // \"regressions_first_commits\": {\n    //    \"some_benchmark\": \"352cdf\",  // Consider regressions only after this commit\n    //    \"another_benchmark\": null,   // Skip regression detection altogether\n    // },\n\n    // The thresholds for relative change in results, after which `asv\n    // publish` starts reporting regressions. Dictionary of the same\n    // form as in ``regressions_first_commits``, with values\n    // indicating the thresholds.  If multiple entries match, the\n    // maximum is taken. If no entry matches, the default is 5%.\n    //\n    // \"regressions_thresholds\": {\n    //    \"some_benchmark\": 0.01,     // Threshold of 1%\n    //    \"another_benchmark\": 0.5,   // Threshold of 50%\n    // },\n}\n"
  },
  {
    "path": "asv_bench/benchmarks/__init__.py",
    "content": "\n"
  },
  {
    "path": "asv_bench/benchmarks/backend_comparisons.py",
    "content": "# Write the benchmarking functions here.\n# See \"Writing benchmarks\" in the asv docs for more information.\nimport numpy as np\nimport os\nfrom hangar import Repository\nfrom tempfile import mkdtemp\nfrom shutil import rmtree\nfrom hangar.utils import folder_size\n\n\n# ------------------------- fixture functions ----------------------------------\n\n\nclass _WriterSuite:\n\n    params = ['hdf5_00', 'hdf5_01', 'numpy_10']\n    param_names = ['backend']\n    processes = 2\n    repeat = (2, 4, 30.0)\n    # repeat == tuple (min_repeat, max_repeat, max_time)\n    number = 2\n    warmup_time = 0\n\n    def setup(self, backend):\n\n        # self.method\n        self.current_iter_number = 0\n        self.backend_code = {\n            'numpy_10': '10',\n            'hdf5_00': '00',\n            'hdf5_01': '01',\n        }\n        # self.num_samples\n\n        self.sample_shape = (50, 50, 20)\n\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n\n        component_arrays = []\n        ndims = len(self.sample_shape)\n        for idx, shape in enumerate(self.sample_shape):\n            layout = [1 for i in range(ndims)]\n            layout[idx] = shape\n            component = np.hamming(shape).reshape(*layout) * 100\n            component_arrays.append(component.astype(np.float32))\n        self.arr = np.prod(component_arrays).astype(np.float32)\n\n        try:\n            self.aset = self.co.arraysets.init_arrayset(\n                'aset', prototype=self.arr, backend_opts=self.backend_code[backend])\n        except TypeError:\n            try:\n                self.aset = self.co.arraysets.init_arrayset(\n                    'aset', prototype=self.arr, backend=self.backend_code[backend])\n            except ValueError:\n                raise NotImplementedError\n        except ValueError:\n            raise NotImplementedError\n        except AttributeError:\n            self.aset = self.co.add_ndarray_column(\n                'aset', prototype=self.arr, backend=self.backend_code[backend])\n\n    def teardown(self, backend):\n        self.co.close()\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def write(self, backend):\n        arr = self.arr\n        iter_number = self.current_iter_number\n        with self.aset as cm_aset:\n            for i in range(self.num_samples):\n                arr[iter_number, iter_number, iter_number] += 1\n                cm_aset[i] = arr\n        self.current_iter_number += 1\n\n\n# ----------------------------- Writes ----------------------------------------\n\n\nclass Write_50by50by20_300_samples(_WriterSuite):\n    method = 'write'\n    num_samples = 300\n    time_write = _WriterSuite.write\n\n\n# ----------------------------- Reads -----------------------------------------\n\n\nclass _ReaderSuite:\n\n    params = ['hdf5_00', 'hdf5_01', 'numpy_10']\n    param_names = ['backend']\n    processes = 2\n    repeat = (2, 4, 30.0)\n    # repeat == tuple (min_repeat, max_repeat, max_time)\n    number = 3\n    warmup_time = 0\n    timeout = 60\n\n    def setup_cache(self):\n\n        backend_code = {\n            'numpy_10': '10',\n            'hdf5_00': '00',\n            'hdf5_01': '01',\n        }\n\n        sample_shape = (50, 50, 10)\n        num_samples = 3_000\n\n        repo = Repository(path=os.getcwd(), exists=False)\n        repo.init('tester', 'foo@test.bar', remove_old=True)\n        co = repo.checkout(write=True)\n\n        component_arrays = []\n        ndims = len(sample_shape)\n        for idx, shape in enumerate(sample_shape):\n            layout = [1 for i in range(ndims)]\n            layout[idx] = shape\n            component = np.hamming(shape).reshape(*layout) * 100\n            component_arrays.append(component.astype(np.float32))\n        arr = np.prod(component_arrays).astype(np.float32)\n\n        for backend, code in backend_code.items():\n            try:\n                co.arraysets.init_arrayset(\n                    backend, prototype=arr, backend_opts=code)\n            except TypeError:\n                try:\n                    co.arraysets.init_arrayset(\n                        backend, prototype=arr, backend=code)\n                except ValueError:\n                    pass\n            except ValueError:\n                pass\n            except AttributeError:\n                co.add_ndarray_column(backend, prototype=arr, backend=code)\n\n        try:\n            col = co.columns\n        except AttributeError:\n            col = co.arraysets\n\n        with col as asets_cm:\n            for aset in asets_cm.values():\n                changer = 0\n                for i in range(num_samples):\n                    arr[changer, changer, changer] += 1\n                    aset[i] = arr\n                changer += 1\n        co.commit('first commit')\n        co.close()\n        repo._env._close_environments()\n\n    def setup(self, backend):\n        self.repo = Repository(path=os.getcwd(), exists=True)\n        self.co = self.repo.checkout(write=False)\n        try:\n            try:\n                self.aset = self.co.columns[backend]\n            except AttributeError:\n                self.aset = self.co.arraysets[backend]\n        except KeyError:\n            raise NotImplementedError\n\n    def teardown(self, backend):\n        self.co.close()\n        self.repo._env._close_environments()\n\n    def read(self, backend):\n        with self.aset as cm_aset:\n            for i in cm_aset.keys():\n                arr = cm_aset[i]\n\n\nclass Read_50by50by10_3000_samples(_ReaderSuite):\n    method = 'read'\n    num_samples = 3000\n    time_read = _ReaderSuite.read\n"
  },
  {
    "path": "asv_bench/benchmarks/backends/__init__.py",
    "content": ""
  },
  {
    "path": "asv_bench/benchmarks/backends/hdf5_00.py",
    "content": "# Write the benchmarking functions here.\n# See \"Writing benchmarks\" in the asv docs for more information.\nimport numpy as np\nfrom hangar import Repository\nfrom tempfile import mkdtemp\nfrom shutil import rmtree\nfrom hangar.utils import folder_size\n\n\nclass _WriterSuite_HDF5_00:\n\n    processes = 2\n    repeat = (2, 4, 20.0)\n    # repeat == tuple (min_repeat, max_repeat, max_time)\n    number = 2\n    warmup_time = 0\n\n    def setup(self):\n\n        # self.method\n        # self.num_samples\n        # self.sample_shape\n        self.current_iter_number = 0\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n\n        component_arrays = []\n        ndims = len(self.sample_shape)\n        for idx, shape in enumerate(self.sample_shape):\n            layout = [1 for i in range(ndims)]\n            layout[idx] = shape\n            component = np.hamming(shape).reshape(*layout) * 100\n            component_arrays.append(component.astype(np.float32))\n        arr = np.prod(component_arrays).astype(np.float32)\n\n        try:\n            self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend_opts='00')\n        except TypeError:\n            self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend='00')\n        except ValueError:\n            # marks as skipped benchmark for commits which do not have this backend.\n            raise NotImplementedError\n        except AttributeError:\n            self.aset = self.co.add_ndarray_column('aset', prototype=arr, backend='00')\n\n        if self.method == 'read':\n            with self.aset as cm_aset:\n                for i in range(self.num_samples):\n                    arr[0, 0, 0] += 1\n                    cm_aset[i] = arr\n            self.co.commit('first commit')\n            self.co.close()\n            self.co = self.repo.checkout(write=False)\n            try:\n                self.aset = self.co.columns['aset']\n            except AttributeError:\n                self.aset = self.co.arraysets['aset']\n        else:\n            self.arr = arr\n\n    def teardown(self):\n        self.co.close()\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def read(self):\n        with self.aset as cm_aset:\n            for k in cm_aset.keys():\n                arr = cm_aset[k]\n\n    def write(self):\n        arr = self.arr\n        iter_num = self.current_iter_number\n        with self.aset as cm_aset:\n            for i in range(self.num_samples):\n                arr[iter_num, iter_num, iter_num] += 1\n                cm_aset[i] = arr\n        self.current_iter_number += 1\n\n    def size(self):\n        return folder_size(self.repo._env.repo_path, recurse=True)\n\n\nclass Write_50by50by10_1_samples(_WriterSuite_HDF5_00):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n\n    time_write = _WriterSuite_HDF5_00.write\n\n\nclass Write_50by50by10_100_samples(_WriterSuite_HDF5_00):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n\n    time_write = _WriterSuite_HDF5_00.write\n\n\n# ----------------------------- Reads -----------------------------------------\n\nclass Read_50by50by10_1_samples(_WriterSuite_HDF5_00):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n    time_read = _WriterSuite_HDF5_00.read\n\n\nclass Read_50by50by10_100_samples(_WriterSuite_HDF5_00):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n    time_read = _WriterSuite_HDF5_00.read\n\n\nclass Read_50by50by10_300_samples(_WriterSuite_HDF5_00):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 300\n\n    time_read = _WriterSuite_HDF5_00.read\n    track_repo_size = _WriterSuite_HDF5_00.size\n    track_repo_size.unit = 'bytes'\n"
  },
  {
    "path": "asv_bench/benchmarks/backends/hdf5_01.py",
    "content": "# Write the benchmarking functions here.\n# See \"Writing benchmarks\" in the asv docs for more information.\nimport numpy as np\nfrom hangar import Repository\nfrom tempfile import mkdtemp\nfrom shutil import rmtree\nfrom hangar.utils import folder_size\n\n\nclass _WriterSuite_HDF5_01:\n\n    processes = 2\n    repeat = (2, 4, 20.0)\n    # repeat == tuple (min_repeat, max_repeat, max_time)\n    number = 2\n    warmup_time = 0\n\n    def setup(self):\n\n        # self.method\n        # self.num_samples\n        # self.sample_shape\n        self.current_iter_number = 0\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n\n        component_arrays = []\n        ndims = len(self.sample_shape)\n        for idx, shape in enumerate(self.sample_shape):\n            layout = [1 for i in range(ndims)]\n            layout[idx] = shape\n            component = np.hamming(shape).reshape(*layout) * 100\n            component_arrays.append(component.astype(np.float32))\n        arr = np.prod(component_arrays).astype(np.float32)\n\n        try:\n            self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend_opts='01')\n        except TypeError:\n            try:\n                self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend='01')\n            except ValueError:\n                raise NotImplementedError\n        except ValueError:\n            # marks as skipped benchmark for commits which do not have this backend.\n            raise NotImplementedError\n        except AttributeError:\n            self.aset = self.co.add_ndarray_column('aset', prototype=arr, backend='01')\n\n        if self.method == 'read':\n            with self.aset as cm_aset:\n                for i in range(self.num_samples):\n                    arr[0, 0, 0] += 1\n                    cm_aset[i] = arr\n            self.co.commit('first commit')\n            self.co.close()\n            self.co = self.repo.checkout(write=False)\n            try:\n                self.aset = self.co.columns['aset']\n            except AttributeError:\n                self.aset = self.co.arraysets['aset']\n        else:\n            self.arr = arr\n\n    def teardown(self):\n        self.co.close()\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def read(self):\n        with self.aset as cm_aset:\n            for k in cm_aset.keys():\n                arr = cm_aset[k]\n\n    def write(self):\n        arr = self.arr\n        iter_num = self.current_iter_number\n        with self.aset as cm_aset:\n            for i in range(self.num_samples):\n                arr[iter_num, iter_num, iter_num] += 1\n                cm_aset[i] = arr\n        self.current_iter_number += 1\n\n    def size(self):\n        return folder_size(self.repo._env.repo_path, recurse=True)\n\n\nclass Write_50by50by10_1_samples(_WriterSuite_HDF5_01):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n\n    time_write = _WriterSuite_HDF5_01.write\n\n\nclass Write_50by50by10_100_samples(_WriterSuite_HDF5_01):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n\n    time_write = _WriterSuite_HDF5_01.write\n\n\n# ----------------------------- Reads -----------------------------------------\n\nclass Read_50by50by10_1_samples(_WriterSuite_HDF5_01):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n    time_read = _WriterSuite_HDF5_01.read\n\n\nclass Read_50by50by10_100_samples(_WriterSuite_HDF5_01):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n    time_read = _WriterSuite_HDF5_01.read\n\n\nclass Read_50by50by10_300_samples(_WriterSuite_HDF5_01):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 300\n\n    time_read = _WriterSuite_HDF5_01.read\n    track_repo_size = _WriterSuite_HDF5_01.size\n    track_repo_size.unit = 'bytes'\n"
  },
  {
    "path": "asv_bench/benchmarks/backends/numpy_10.py",
    "content": "# Write the benchmarking functions here.\n# See \"Writing benchmarks\" in the asv docs for more information.\nimport numpy as np\nfrom hangar import Repository\nfrom tempfile import mkdtemp\nfrom shutil import rmtree\nfrom hangar.utils import folder_size\n\n\nclass _WriterSuite_NUMPY_10:\n\n    processes = 2\n    repeat = (2, 4, 20.0)\n    # repeat == tuple (min_repeat, max_repeat, max_time)\n    number = 2\n    warmup_time = 0\n\n    def setup(self):\n\n        # self.method\n        # self.num_samples\n        # self.sample_shape\n        self.current_iter_number = 0\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n\n        component_arrays = []\n        ndims = len(self.sample_shape)\n        for idx, shape in enumerate(self.sample_shape):\n            layout = [1 for i in range(ndims)]\n            layout[idx] = shape\n            component = np.hamming(shape).reshape(*layout) * 100\n            component_arrays.append(component.astype(np.float32))\n        arr = np.prod(component_arrays).astype(np.float32)\n\n        try:\n            self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend_opts='10')\n        except TypeError:\n            self.aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend='10')\n        except ValueError:\n            # marks as skipped benchmark for commits which do not have this backend.\n            raise NotImplementedError\n        except AttributeError:\n            self.aset = self.co.add_ndarray_column('aset', prototype=arr, backend='10')\n\n        if self.method == 'read':\n            with self.aset as cm_aset:\n                for i in range(self.num_samples):\n                    arr[0, 0, 0] += 1\n                    cm_aset[i] = arr\n            self.co.commit('first commit')\n            self.co.close()\n            self.co = self.repo.checkout(write=False)\n            try:\n                self.aset = self.co.columns['aset']\n            except AttributeError:\n                self.aset = self.co.arraysets['aset']\n        else:\n            self.arr = arr\n\n    def teardown(self):\n        self.co.close()\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def read(self):\n        with self.aset as cm_aset:\n            for k in cm_aset.keys():\n                arr = cm_aset[k]\n\n    def write(self):\n        arr = self.arr\n        iter_num = self.current_iter_number\n        with self.aset as cm_aset:\n            for i in range(self.num_samples):\n                arr[iter_num, iter_num, iter_num] += 1\n                cm_aset[i] = arr\n        self.current_iter_number += 1\n\n    def size(self):\n        return folder_size(self.repo._env.repo_path, recurse=True)\n\n\nclass Write_50by50by10_1_samples(_WriterSuite_NUMPY_10):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n\n    time_write = _WriterSuite_NUMPY_10.write\n\n\nclass Write_50by50by10_100_samples(_WriterSuite_NUMPY_10):\n    method = 'write'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n\n    time_write = _WriterSuite_NUMPY_10.write\n\n\n# ----------------------------- Reads -----------------------------------------\n\nclass Read_50by50by10_1_samples(_WriterSuite_NUMPY_10):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 1\n    time_read = _WriterSuite_NUMPY_10.read\n\n\nclass Read_50by50by10_100_samples(_WriterSuite_NUMPY_10):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 100\n    time_read = _WriterSuite_NUMPY_10.read\n\n\nclass Read_50by50by10_300_samples(_WriterSuite_NUMPY_10):\n    method = 'read'\n    sample_shape = (50, 50, 10)\n    num_samples = 300\n\n    time_read = _WriterSuite_NUMPY_10.read\n    track_repo_size = _WriterSuite_NUMPY_10.size\n    track_repo_size.unit = 'bytes'\n"
  },
  {
    "path": "asv_bench/benchmarks/commit_and_checkout.py",
    "content": "from tempfile import mkdtemp\nfrom shutil import rmtree\nimport numpy as np\nfrom hangar import Repository\n\n\nclass MakeCommit(object):\n\n    params = (5_000, 20_000, 50_000)\n    param_names = ['num_samples']\n    processes = 2\n    repeat = (2, 4, 20)\n    number = 1\n    warmup_time = 0\n\n    def setup(self, num_samples):\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n        arr = np.array([0,], dtype=np.uint8)\n        try:\n            aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend_opts='10')\n        except TypeError:\n            aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend='10')\n        except AttributeError:\n            aset = self.co.add_ndarray_column('aset', prototype=arr, backend='10')\n\n        with aset as cm_aset:\n            for i in range(num_samples):\n                arr[:] = i % 255\n                cm_aset[i] = arr\n\n    def teardown(self, num_samples):\n        self.co.close()\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def time_commit(self, num_samples):\n        self.co.commit('hello')\n\n\nclass CheckoutCommit(object):\n\n    params = (5_000, 20_000, 50_000)\n    param_names = ['num_samples']\n    processes = 2\n    number = 1\n    repeat = (2, 4, 20)\n    warmup_time = 0\n\n    def setup(self, num_samples):\n        self.tmpdir = mkdtemp()\n        self.repo = Repository(path=self.tmpdir, exists=False)\n        self.repo.init('tester', 'foo@test.bar', remove_old=True)\n        self.co = self.repo.checkout(write=True)\n        arr = np.array([0,], dtype=np.uint8)\n        try:\n            aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend_opts='10')\n        except TypeError:\n            aset = self.co.arraysets.init_arrayset('aset', prototype=arr, backend='10')\n        except AttributeError:\n            aset = self.co.add_ndarray_column('aset', prototype=arr, backend='10')\n\n        with aset as cm_aset:\n            for i in range(num_samples):\n                arr[:] = i % 255\n                cm_aset[i] = arr\n        self.co.commit('first')\n        self.co.close()\n        self.co = None\n\n    def teardown(self, num_samples):\n        try:\n            self.co.close()\n        except PermissionError:\n            pass\n        self.repo._env._close_environments()\n        rmtree(self.tmpdir)\n\n    def time_checkout_read_only(self, num_samples):\n        self.co = self.repo.checkout(write=False)\n\n    def time_checkout_write_enabled(self, num_samples):\n        self.co = self.repo.checkout(write=True)\n        self.co.close()\n"
  },
  {
    "path": "asv_bench/benchmarks/package.py",
    "content": "\n\nclass TimeImport(object):\n\n    processes = 2\n    repeat = (5, 10, 10.0)\n\n    def timeraw_import(self):\n        return \"\"\"\n        from hangar import Repository\n        \"\"\""
  },
  {
    "path": "codecov.yml",
    "content": "comment:\n  layout: \"diff, files\"\n  behavior: default\n  require_changes: false  # if true: only post the comment if coverage changes\n\n\ncoverage:\n  range: 60..100\n  round: nearest\n  precision: 2"
  },
  {
    "path": "docs/Tutorial-001.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 1: Creating A Repository And Working With Data\\n\",\n    \"\\n\",\n    \"This tutorial will review the first steps of working with a hangar repository.\\n\",\n    \"\\n\",\n    \"To fit with the beginner's theme, we will use the MNIST dataset. Later examples will show off how to work with much more complex data.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\\n\",\n    \"\\n\",\n    \"import numpy as np\\n\",\n    \"import pickle\\n\",\n    \"import gzip\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"\\n\",\n    \"from tqdm import tqdm\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Creating & Interacting with a Hangar Repository\\n\",\n    \"\\n\",\n    \"Hangar is designed to “just make sense” in every operation you have to perform.\\n\",\n    \"As such, there is a single interface which all interaction begins with: the\\n\",\n    \" designed to “just make sense” in every operation you have to perform.\\n\",\n    \"As such, there is a single interface which all interaction begins with: the\\n\",\n    \"[Repository](api.rst#hangar.repository.Repository) object.\\n\",\n    \"\\n\",\n    \"Whether a hangar repository exists at the path you specify or not, just tell\\n\",\n    \"hangar where it should live!\\n\",\n    \"\\n\",\n    \"#### Intitializing a repository\\n\",\n    \"\\n\",\n    \"The first time you want to work with a new repository, the repository\\n\",\n    \"[init()](api.rst#hangar.repository.Repository.init) method\\n\",\n    \"must be called. This is where you provide Hangar with your name and email\\n\",\n    \"address (to be used in the commit log), as well as implicitly confirming that\\n\",\n    \"you do want to create the underlying data files hangar uses on disk.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/mnist/.hangar\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'/Users/rick/projects/tensorwerk/hangar/dev/mnist/.hangar'\"\n      ]\n     },\n     \"execution_count\": 2,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo = Repository(path='/Users/rick/projects/tensorwerk/hangar/dev/mnist/')\\n\",\n    \"\\n\",\n    \"# First time a repository is accessed only!\\n\",\n    \"# Note: if you feed a path to the `Repository` which does not contain a pre-initialized hangar repo,\\n\",\n    \"# when the Repository object is initialized it will let you know that you need to run `init()`\\n\",\n    \"\\n\",\n    \"repo.init(user_name='Rick Izzo', user_email='rick@tensorwerk.com', remove_old=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Checking out the repo for writing\\n\",\n    \"\\n\",\n    \"A repository can be checked out in two modes:\\n\",\n    \"\\n\",\n    \"1. [write-enabled](api.rst#hangar.checkout.WriterCheckout): applies all operations to the staging area’s current\\n\",\n    \"   state. Only one write-enabled checkout can be active at a different time,\\n\",\n    \"   must be closed upon last use, or manual intervention will be needed to remove\\n\",\n    \"   the writer lock.\\n\",\n    \"\\n\",\n    \"2. [read-only](api.rst#read-only-checkout): checkout a commit or branch to view repository state as it\\n\",\n    \"   existed at that point in time.\\n\",\n    \"\\n\",\n    \"#### Lots of useful information is in the iPython `__repr__`\\n\",\n    \"\\n\",\n    \"If you're ever in doubt about what the state of the object your working\\n\",\n    \"on is, just call its reps, and the most relevant information will be\\n\",\n    \"sent to your screen!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar WriterCheckout                \\n\",\n       \"    Writer       : True                \\n\",\n       \"    Base Branch  : master                \\n\",\n       \"    Num Columns  : 0\\n\"\n      ]\n     },\n     \"execution_count\": 3,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co = repo.checkout(write=True)\\n\",\n    \"co\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### A checkout allows access to [columns](api.rst#hangar.columns.column.Columns)\\n\",\n    \"\\n\",\n    \"The [columns](api.rst#hangar.checkout.WriterCheckout.columns) attributes\\n\",\n    \"of a checkout provide the interface to working with all of the data on disk!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 0                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - \"\n      ]\n     },\n     \"execution_count\": 4,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Before data can be added to a repository, a column must be initialized.\\n\",\n    \"\\n\",\n    \"We're going to first load up a the MNIST pickled dataset so it can be added to\\n\",\n    \"the repo!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Load the dataset\\n\",\n    \"with gzip.open('/Users/rick/projects/tensorwerk/hangar/dev/data/mnist.pkl.gz', 'rb') as f:\\n\",\n    \"    train_set, valid_set, test_set = pickle.load(f, encoding='bytes')\\n\",\n    \"\\n\",\n    \"def rescale(array):\\n\",\n    \"    array = array * 256\\n\",\n    \"    rounded = np.round(array)\\n\",\n    \"    return rounded.astype(np.uint8())\\n\",\n    \"\\n\",\n    \"sample_trimg = rescale(train_set[0][0])\\n\",\n    \"sample_trlabel = np.array([train_set[1][0]])\\n\",\n    \"trimgs = rescale(train_set[0])\\n\",\n    \"trlabels = train_set[1]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Before data can be added to a repository, a column must be initialized.\\n\",\n    \"\\n\",\n    \"An \\\"Column\\\" is a named grouping of data samples where each sample shares a\\n\",\n    \"number of similar attributes and array properties.\\n\",\n    \"\\n\",\n    \"See the docstrings below or in [add_ndarray_column()](api.rst#hangar.checkout.WriterCheckout.add_ndarray_column)\\n\",\n    \"\\n\",\n    \".. include:: ./noindexapi/apiinit.rst\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"col = co.add_ndarray_column(name='mnist_training_images', prototype=trimgs[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : mnist_training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (784,)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 8,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Interaction\\n\",\n    \"\\n\",\n    \"#### Through columns attribute\\n\",\n    \"\\n\",\n    \"When a column is initialized, a column accessor object will be returned,\\n\",\n    \"however, depending on your use case, this may or may not be the most convenient\\n\",\n    \"way to access a arrayset.\\n\",\n    \"\\n\",\n    \"In general, we have implemented a full `dict` mapping interface on top of all\\n\",\n    \"objects. To access the `'mnist_training_images'` arrayset you can just use a\\n\",\n    \"dict style access like the following (note: if operating in iPython/Jupyter, the\\n\",\n    \"arrayset keys will autocomplete for you).\\n\",\n    \"\\n\",\n    \"The column objects returned here contain many useful instrospecion methods which\\n\",\n    \"we will review over the rest of the tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : mnist_training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (784,)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['mnist_training_images']\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : mnist_training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (784,)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 10,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_aset = co.columns['mnist_training_images']\\n\",\n    \"\\n\",\n    \"# OR an equivalent way using the `.get()` method\\n\",\n    \"\\n\",\n    \"train_aset = co.columns.get('mnist_training_images')\\n\",\n    \"train_aset\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Through the checkout object (arrayset and sample access)\\n\",\n    \"\\n\",\n    \"In addition to the standard `co.columns` access methods, we have implemented a convenience mapping to [columns](api.rst#hangar.columns.column.Columns) and [flat samples](api.rst#hangar.columns.layout_flat.FlatSampleWriter) or [nested samples](api.rst#hangar.columns.layout_nested.NestedSampleWriter) / [nested subsamples](api.rst#hangar.columns.layout_nested.FlatSubsampleWriter) (ie. data) for both reading and writing from the [checkout](api.rst#hangar.checkout.WriterCheckout) object itself.\\n\",\n    \"\\n\",\n    \"To get the same arrayset object from the checkout, simply use:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : mnist_training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (784,)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_asets = co['mnist_training_images']\\n\",\n    \"train_asets\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Though that works as expected, most use cases will take advantage of adding and reading data from multiple columns / samples at a time. This is shown in the next section.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Adding Data\\n\",\n    \"\\n\",\n    \"To add data to a named arrayset, we can use dict-style setting\\n\",\n    \"(refer to the `__setitem__`, `__getitem__`, and `__delitem__` methods),\\n\",\n    \"or the `update()` method. Sample keys can be either `str` or `int` type.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_aset['0'] = trimgs[0]\\n\",\n    \"\\n\",\n    \"data = {\\n\",\n    \"    '1': trimgs[1],\\n\",\n    \"    '2': trimgs[2],\\n\",\n    \"}\\n\",\n    \"train_aset.update(data)\\n\",\n    \"\\n\",\n    \"train_aset[51] = trimgs[51]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Using the checkout method\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co['mnist_training_images', 60] = trimgs[60]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### How many samples are in the arrayset?\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"5\"\n      ]\n     },\n     \"execution_count\": 14,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(train_aset)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Containment Testing\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"False\"\n      ]\n     },\n     \"execution_count\": 15,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"'hi' in train_aset\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"execution_count\": 16,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"'0' in train_aset\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"execution_count\": 17,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"60 in train_aset\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Dictionary Style Retrieval for known keys\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"True\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"<matplotlib.image.AxesImage at 0x3703cc7f0>\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAOYElEQVR4nO3dbYxc5XnG8euKbUwxJvHGseMQFxzjFAg0Jl0ZkBFQoVCCIgGKCLGiiFBapwlOQutKUFoVWtHKrRIiSimSKS6m4iWQgPAHmsSyECRqcFmoAROHN+MS4+0aswIDIfZ6fffDjqsFdp5dZs68eO//T1rNzLnnzLk1cPmcmeeceRwRAjD5faDTDQBoD8IOJEHYgSQIO5AEYQeSmNrOjR3i6XGoZrRzk0Aqv9Fb2ht7PFatqbDbPkfS9ZKmSPrXiFhVev6hmqGTfVYzmwRQsDE21K01fBhve4qkGyV9TtLxkpbZPr7R1wPQWs18Zl8i6fmI2BoReyXdJem8atoCULVmwn6kpF+Nery9tuwdbC+33We7b0h7mtgcgGY0E/axvgR4z7m3EbE6InojoneapjexOQDNaCbs2yXNH/X445J2NNcOgFZpJuyPSlpke4HtQyR9SdK6atoCULWGh94iYp/tFZJ+rJGhtzUR8XRlnQGoVFPj7BHxgKQHKuoFQAtxuiyQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJNDWLK7qfp5b/E0/5yOyWbv+ZPz+6bm34sP3FdY9auLNYP+wbLtb/97pD6tYe7/1+cd1dw28V6yffs7JYP+bPHinWO6GpsNveJukNScOS9kVEbxVNAaheFXv234+IXRW8DoAW4jM7kESzYQ9JP7H9mO3lYz3B9nLbfbb7hrSnyc0BaFSzh/FLI2KH7TmS1tv+ZUQ8PPoJEbFa0mpJOsI90eT2ADSoqT17ROyo3e6UdJ+kJVU0BaB6DYfd9gzbMw/cl3S2pM1VNQagWs0cxs+VdJ/tA69zR0T8qJKuJpkpxy0q1mP6tGJ9xxkfKtbfPqX+mHDPB8vjxT/9dHm8uZP+49czi/V/+OdzivWNJ95Rt/bi0NvFdVcNfLZY/9hPD75PpA2HPSK2Svp0hb0AaCGG3oAkCDuQBGEHkiDsQBKEHUiCS1wrMHzmZ4r16269sVj/5LT6l2JOZkMxXKz/9Q1fLdanvlUe/jr1nhV1azNf3ldcd/qu8tDcYX0bi/VuxJ4dSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnL0C05/ZUaw/9pv5xfonpw1U2U6lVvafUqxvfbP8U9S3LvxB3drr+8vj5HP/6T+L9VY6+C5gHR97diAJwg4kQdiBJAg7kARhB5Ig7EAShB1IwhHtG1E8wj1xss9q2/a6xeAlpxbru88p/9zzlCcPL9af+MYN77unA67d9bvF+qNnlMfRh197vViPU+v/APG2bxVX1YJlT5SfgPfYGBu0OwbHnMuaPTuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJME4exeYMvvDxfrwq4PF+ot31B8rf/r0NcV1l/z9N4v1OTd27ppyvH9NjbPbXmN7p+3No5b12F5v+7na7awqGwZQvYkcxt8q6d2z3l8paUNELJK0ofYYQBcbN+wR8bCkdx9Hnidpbe3+WknnV9wXgIo1+gXd3Ijol6Ta7Zx6T7S93Haf7b4h7WlwcwCa1fJv4yNidUT0RkTvNE1v9eYA1NFo2Adsz5Ok2u3O6loC0AqNhn2dpItr9y+WdH817QBolXF/N972nZLOlDTb9nZJV0taJelu25dKeknSha1scrIb3vVqU+sP7W58fvdPffkXxforN00pv8D+8hzr6B7jhj0iltUpcXYMcBDhdFkgCcIOJEHYgSQIO5AEYQeSYMrmSeC4K56tW7vkxPKgyb8dtaFYP+PCy4r1md9/pFhH92DPDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJMM4+CZSmTX7168cV131p3dvF+pXX3las/8UXLyjW478/WLc2/+9+XlxXbfyZ8wzYswNJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEkzZnNzgH55arN9+9XeK9QVTD21425+6bUWxvujm/mJ939ZtDW97smpqymYAkwNhB5Ig7EAShB1IgrADSRB2IAnCDiTBODuKYuniYv2IVduL9Ts/8eOGt33sg39UrP/O39S/jl+Shp/b2vC2D1ZNjbPbXmN7p+3No5ZdY/tl25tqf+dW2TCA6k3kMP5WSeeMsfx7EbG49vdAtW0BqNq4YY+IhyUNtqEXAC3UzBd0K2w/WTvMn1XvSbaX2+6z3TekPU1sDkAzGg37TZIWSlosqV/Sd+s9MSJWR0RvRPRO0/QGNwegWQ2FPSIGImI4IvZLulnSkmrbAlC1hsJue96ohxdI2lzvuQC6w7jj7LbvlHSmpNmSBiRdXXu8WFJI2ibpaxFRvvhYjLNPRlPmzinWd1x0TN3axiuuL677gXH2RV9+8exi/fXTXi3WJ6PSOPu4k0RExLIxFt/SdFcA2orTZYEkCDuQBGEHkiDsQBKEHUiCS1zRMXdvL0/ZfJgPKdZ/HXuL9c9/8/L6r33fxuK6Byt+ShoAYQeyIOxAEoQdSIKwA0kQdiAJwg4kMe5Vb8ht/2nln5J+4cLylM0nLN5WtzbeOPp4bhg8qVg/7P6+pl5/smHPDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJMM4+ybn3hGL92W+Vx7pvXrq2WD/90PI15c3YE0PF+iODC8ovsH/cXzdPhT07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBOPtBYOqCo4r1Fy75WN3aNRfdVVz3C4fvaqinKlw10FusP3T9KcX6rLXl353HO427Z7c93/aDtrfYftr2t2vLe2yvt/1c7XZW69sF0KiJHMbvk7QyIo6TdIqky2wfL+lKSRsiYpGkDbXHALrUuGGPiP6IeLx2/w1JWyQdKek8SQfOpVwr6fxWNQmgee/rCzrbR0s6SdJGSXMjol8a+QdB0pw66yy33We7b0h7musWQMMmHHbbh0v6oaTLI2L3RNeLiNUR0RsRvdM0vZEeAVRgQmG3PU0jQb89Iu6tLR6wPa9WnydpZ2taBFCFcYfebFvSLZK2RMR1o0rrJF0saVXt9v6WdDgJTD36t4v1139vXrF+0d/+qFj/kw/dW6y30sr+8vDYz/+l/vBaz63/VVx31n6G1qo0kXH2pZK+Iukp25tqy67SSMjvtn2ppJckXdiaFgFUYdywR8TPJI05ubuks6ptB0CrcLoskARhB5Ig7EAShB1IgrADSXCJ6wRNnffRurXBNTOK6359wUPF+rKZAw31VIUVL59WrD9+U3nK5tk/2Fys97zBWHm3YM8OJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0mkGWff+wflny3e+6eDxfpVxzxQt3b2b73VUE9VGRh+u27t9HUri+se+1e/LNZ7XiuPk+8vVtFN2LMDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBJpxtm3nV/+d+3ZE+9p2bZvfG1hsX79Q2cX6x6u9+O+I4699sW6tUUDG4vrDhermEzYswNJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEo6I8hPs+ZJuk/RRjVy+vDoirrd9jaQ/lvRK7alXRUT9i74lHeGeONlM/Aq0ysbYoN0xOOaJGRM5qWafpJUR8bjtmZIes72+VvteRHynqkYBtM5E5mfvl9Rfu/+G7S2Sjmx1YwCq9b4+s9s+WtJJkg6cg7nC9pO219ieVWed5bb7bPcNaU9TzQJo3ITDbvtwST+UdHlE7JZ0k6SFkhZrZM//3bHWi4jVEdEbEb3TNL2ClgE0YkJhtz1NI0G/PSLulaSIGIiI4YjYL+lmSUta1yaAZo0bdtuWdIukLRFx3ajl80Y97QJJ5ek8AXTURL6NXyrpK5Kesr2ptuwqSctsL5YUkrZJ+lpLOgRQiYl8G/8zSWON2xXH1AF0F86gA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJDHuT0lXujH7FUn/M2rRbEm72tbA+9OtvXVrXxK9NarK3o6KiI+MVWhr2N+zcbsvIno71kBBt/bWrX1J9NaodvXGYTyQBGEHkuh02Fd3ePsl3dpbt/Yl0Vuj2tJbRz+zA2ifTu/ZAbQJYQeS6EjYbZ9j+xnbz9u+shM91GN7m+2nbG+y3dfhXtbY3ml786hlPbbX236udjvmHHsd6u0a2y/X3rtNts/tUG/zbT9oe4vtp21/u7a8o+9doa+2vG9t/8xue4qkZyV9VtJ2SY9KWhYRv2hrI3XY3iapNyI6fgKG7dMlvSnptog4obbsHyUNRsSq2j+UsyLiii7p7RpJb3Z6Gu/abEXzRk8zLul8SV9VB9+7Ql9fVBvet07s2ZdIej4itkbEXkl3STqvA310vYh4WNLguxafJ2lt7f5ajfzP0nZ1eusKEdEfEY/X7r8h6cA04x197wp9tUUnwn6kpF+Nerxd3TXfe0j6ie3HbC/vdDNjmBsR/dLI/zyS5nS4n3cbdxrvdnrXNONd8941Mv15szoR9rGmkuqm8b+lEfEZSZ+TdFntcBUTM6FpvNtljGnGu0Kj0583qxNh3y5p/qjHH5e0owN9jCkidtRud0q6T903FfXAgRl0a7c7O9zP/+umabzHmmZcXfDedXL6806E/VFJi2wvsH2IpC9JWteBPt7D9ozaFyeyPUPS2eq+qajXSbq4dv9iSfd3sJd36JZpvOtNM64Ov3cdn/48Itr+J+lcjXwj/4Kkv+xED3X6+oSkJ2p/T3e6N0l3auSwbkgjR0SXSvqwpA2Snqvd9nRRb/8u6SlJT2okWPM61NtpGvlo+KSkTbW/czv93hX6asv7xumyQBKcQQckQdiBJAg7kARhB5Ig7EAShB1IgrADSfwfs4RxaLJFjqkAAAAASUVORK5CYII=\\n\",\n      \"text/plain\": [\n       \"<Figure size 432x288 with 1 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"needs_background\": \"light\"\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"out1 = train_aset['0']\\n\",\n    \"# OR\\n\",\n    \"out2 = co['mnist_training_images', '0']\\n\",\n    \"\\n\",\n    \"print(np.allclose(out1, out2))\\n\",\n    \"\\n\",\n    \"plt.imshow(out1.reshape(28, 28))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Dict style iteration supported out of the box\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"0\\n\",\n      \"1\\n\",\n      \"2\\n\",\n      \"51\\n\",\n      \"60\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAlAAAACBCAYAAAAPH4TmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAZWUlEQVR4nO3deZgV1ZkG8Pf0AnSzN9BsIg1Cs4kBaRSQJQYRNa4ji8QIIThmNC4oKkicSaIYMZNHRgVUVECNwd2IjqLCdAwisquADYIsgiCbIMja3ffMH7Tn1HfTRd+6a93q9/c8Pv2d/ureOvbXdftQdeqU0lqDiIiIiCKXkeoOEBEREaUbDqCIiIiIPOIAioiIiMgjDqCIiIiIPOIAioiIiMgjDqCIiIiIPIppAKWUukgptV4ptVEpNSFenaLUYD2Dg7UMFtYzOFjL4FDRrgOllMoE8CWAQQC2A1gGYITW+ov4dY+ShfUMDtYyWFjP4GAtgyUrhteeA2Cj1noTACilXgRwBQDXX4Qaqqauhdox7JJicQyHcUIfVy5pT/VkLVPvEPbv1Vo3qSTFYzPN8NgMFh6bwXGqYzOWAVRLANsc7e0Azg3fSCl1A4AbAKAWcnGuGhjDLikWS/SCU6WrrCdr6S/z9atbXVI8NtMMj81g4bEZHKc6NmOZA1XZiOxfrgdqrWdorYu01kXZqBnD7ijBqqwna5k2eGwGC4/N4OCxGSCxDKC2A2jlaJ8GYEds3aEUYj2Dg7UMFtYzOFjLAIllALUMQHulVBulVA0A1wCYG59uUQqwnsHBWgYL6xkcrGWARD0HSmtdppS6GcB7ADIBzNRar41bzyipWM/gYC2DhfUMDtYyWGKZRA6t9TsA3olTXyjFWM/gYC2DhfUMDtYyOLgSOREREZFHHEARERERecQBFBEREZFHHEARERERecQBFBEREZFHHEAREREReRTTMgZEQVH2sx4m3nnTcZH7rPezJv7J4lEi12JaDRNnFq9MUO+IiMhveAaKiIiIyCMOoIiIiIg84iW8Sqgs+2PJbNI4otesv7NAtMtzQyZufcZukcu9yT6Q+9uHa4jcyqKXTLy3/LDInfvKOBO3u+OTiPpFlQsN6C7aj86cauJ22fKwCDniVb1nidz6onIT31XQK34dpJQ7PORc0X7oz4+b+P5hI0VOL1+TlD4FVahvNxPv6Jcrcp/fPDV884hkKnt+oOTEEZEbO+w3trF0dVTvXx2UXtBDtLPnr0hRT4B9/97bxE3nbRO5sm3bk90dADwDRUREROQZB1BEREREHnEARURERORRoOdAZXZqb2JdM1vkdgxoYOKjveRco7z6tr3wJy8hVu8eqSvaD029yMRLuv5N5DaXHjXx5F2DRK7FQh1zX6qz0guLTHz39OdFrjDbzkULiVlPwKbSUhN/H6opct0dzeMX9xS5nGI7tyJ07Jj3DqeBo1ecY+NGmSKXN3NxsrsTV7uL5L8v799yWYp6EgyqexfR3jS0noknDbGfg1fX3i+2CyG6z72QtvMT22XL43bwzEUmfvuOn4lc9vvLo9pfUBy6xs7lnP7gIyK36Gg7E7/VvYXI6eNy+ZdY7bmxt2gX/+5hEw++6jqRq39JXHcdMZ6BIiIiIvKIAygiIiIijwJ1Ca/8p2eL9sOzp5nYeYkmGUodp4//67FfiVzWYXtKuvcrN4tc3W/KTFxz71GRy12+JI49DKbMevVE+3D/jia+fYq9THB+zg9hr3T/t8Ts/X1MvGC6PK286A+PmviDp58Quc5/tbVtOz69L2e52dHf/txyzzggkzOT3Jl4yLCXIfXp8vgbmL/OxAtUH1DVMhs3MnGnp0tE7q1my1xepVy+Hz+3NNxg4if7Dxa5gvcTvntfOThCLr8y9U/2M61rDTn1pWuNrSZ+W7UUuXhPMMkMuyJYqu3UilmdnxO5YW9cb+IWV30R55644xkoIiIiIo84gCIiIiLyiAMoIiIiIo8CNQeq5vodor3iWCsTF2bvivn9x+2U14o3/WAf8zL7jFdF7vuQvSLc9NGPo9ofFy3wbvtz8rr8sp7TXLaM3H35dq7GvDpy7svoLRea+NmC+SJXr/O+mPftd3+89BUTP1Ry4Sm2TA+ZZ7Q28boBchJXt6W/NHGLZXz8x48ym+abeOv0JiL3wtn2Z9ilRvz/3OwP2eVBVp+Q8x/71zoR9/0FRWaD+ibuf7d8LFg3R53KUC5ynRbYR+C0P/FZgnp3UvgyKPPHn2bioXXkZ+u4Tvaz9+VGZ4pc+b7vEtC7k3gGioiIiMgjDqCIiIiIPArUJbyynd+K9mMPDTXxAxfJ1cYzP69j4s9uesz1PSftPcvEGy+QTwkvP7DTxL/ofZPIbbnVxm2Q2FOd1V3Zz+wTw+d0k09uz0Dly1eM3jpQtJfP72Ti1WPkexQfrWXi/OXy1vaN++0yCdl/Kpb7Tvzd2CmXrcqq3iiNZD19xDV39Kt6rrnqbNt1dnXqT3uFf5Ym9k/MSwc7m3jGrJ+L3Mrb3T/Xq7stT9vpLW/lF7tu1+3jX4t2+5ErE9anWFxX1/7tnzJyiMg1mxLdFJpI8AwUERERkUccQBERERF5VOUASik1Uym1Wym1xvG9PKXUB0qpDRVfGya2mxQvrGegFLCWwcFjM1B4bFYDkVygng1gKgDn2ukTACzQWk9WSk2oaI+Pf/dikzfL3gbZ5K1GIue8tbHLmfI679r+9tbbuTMGmDj/gPu1VLVYznNq498nd8xGmtbzR6EB3UX70Zl2zlK7bPkrHYJd/v/ydVeZOHOInBPX4Od20YjOz8vH6xRO22bijG2rRK7hQhuXPiBv+X3tLPt79OvzbxW5zOK4zCXYC+AXSGItQ327iXa/Wh/F6619oaC2+9ITreaXu+biZDbS4NjMalsg2r2Gxj7Hs8Nrdg5pnS2ZIldr4B4TL+r2osg986Rj3lNyn9ZVlaQfm6dSfr58zNnzPZyPnZKfmatPlJq45XT5KBeSqjwDpbX+J4DwhRSuAPBsRfwsgCvj3C9KENYzUH4AaxkYPDYDhcdmNRDtHKimWuudAFDxNd9tQ6XUDUqp5Uqp5aU47rYZpVZE9WQt0wKPzWDhsRkcPDYDJuHLGGitZwCYAQD1VF7KFtcu3+t+ar70oPu53y7X2ic773lcnlpGKOGn9H0llbVUPbqYeO8dcimBwmxbvxVhnzX/94O9zXnfi/bW3Ub75TXW+n+1q/HWhxTtjfpNM2vafY+Vt8ef4s7hpImmnlsvzRHt/Mxcly3TQ1bB6aI9JG+u67Y5m/eb2G9HfjKPzXPe+FK0JzZ2X5W9VNuf1Ocn5OfntX//rYk7/H6tiUOHDontsuY0M/FlLUaKXLPPlpo4o6GcUtR/4DAT//Osl1376EfxqKdzeZfHZz0qcmdk5YRvbox5cKyJGxf7Zy7KpLWXmHjouc+7bvfIzU+I9oNTznLZMnbRnoHapZRqDgAVX3fHr0uUAqxncLCWwcJ6BgdrGTDRDqDmAhhVEY8C8GZ8ukMpwnoGB2sZLKxncLCWARPJMgZzACwG0EEptV0pNQbAZACDlFIbAAyqaFMaYD0DpQ1Yy8DgsRkoPDargSrnQGmtR7ikBrp8P+10Gi+v6Y/uav/XZrVeYOIBQ38rtqv7knyKdTpIl3pm5Mq5NWV/PmjiTzq+LnKby+xT1++YOE7kGi782sT5te0Z82TPYTmn+VbR3hKft92stS6q5PsJq2VWu0OuuWPrGiRqtwmz7X9qi/Z5Ne2yF88cPE1ufOAgEsnPx+aJwfbXbESDR8KyteDGOe/p9217iFw72M/PENyJR3SFPa5LaCLnQLWsE34TXFIl/dgMt+0COzf0VHOe7tvbVbTz55ilq05Zl2RrNcp+lk/66EyRu7ex7XMtVYpk4UrkRERERB5xAEVERETkUcKXMUgH5Qe+F+19N3Yy8ddz7S3zEyY9J7a7Z5hd2Vqvkje/t3rAcfunTtnqDWnr6IAuov1ex+mu215/2+0mrvt3eVk12iUIyLv85f454Z/Z2D55YNfVhSKXN2y7iT8sfCbslfZy1OPT5DqH+bsS91R337vTrgbeJsv9kl0451IFzkt2ibDtksaivbLtnITuz+9mDH/SNbfihJ3E8MGf+olc3UP+nJriXN7iYJn772D9DLmWTWaXDiYuX7s+rn3iGSgiIiIijziAIiIiIvKIl/AqEfqsxMTX/PEuE7/w+7+I7T7t5bik10u+R5fa9oG07Z/aKXJlm7bE3smAO+v+T0U7wzHWH71V3siS8/el8INsJVdaLnVcuc1Uwb+MezRP/nustst24UL95MOhdaYy8bYLaorciRb2DpuMGvYyxPv9HhPbZdu3wLfl8j3+c5O99P5dSF52zM2w79l0ibzjMPgVjJ3zocBA2Arjye5MNffTHPsTLw/75f3dpn8zsZ/uJs9q09rEx1s3ct2uZc1/uOYKs+Xlvatf/dDEL3dqFr55THgGioiIiMgjDqCIiIiIPOIAioiIiMgjzoGqQt5MuxzBzevlSuT1Jtvboee0fU/k1o6cauKOra4XuQ5/tOPW8g2b4tLPIDhwXW8T39tUzjcLwa6qu+L9ziJ3Ovxxe7nzyfMAEHLM+phXIvvcHiuT0qd4O34sW7RDjplBsyZOEbm5N3eL6D3HN3patDNgJzAd1SdEbke5/RlP3fNTE18wf6zYrsEq+/vS/P1dIqe22uN2T4lcoblppp1jpZetrqrrgbX5wd6iXdJ5mqOlRO4Tx13j+UtlznnreaK1+Iv8HDjr3F+ZeE2fZ91fqNxTQXVX63km/s3jo0Wu05R9Ub3nvl75Ji4dEt0q8MPb2M/FO/Pis+RAnxz7N/ZlcA4UERERUUpxAEVERETkES/heaAWyVvrjwyxpyx7Dr9F5JaMtw/cXHe+vERxbcGFJv6+bzx7mN7KHFdT6mfUELnFx+yt6G2f2yFfl9BeSeEPOV73F+dDLVeI3LWbLjZxx9s2i1yyH2YcL+1+uUq0uzxol+to1fObqN6zeLdcKXzPu/Yhvo3WygeD1pi3zNGyuUIsd33/8J/1N+P7mLhnzcUi9+IPLavobTURdtt76BSLOIxe8msTt/mrf26JD4XstblT9T+o61P0X22X6yg+8zWRG5hjr7tuvPwJ+cLLE9qthPu67Iho3zL6VhNnxnnqBM9AEREREXnEARQRERGRRxxAEREREXnEOVAxKN+128RNH90tcsfutjNzcpWcz/NUwdsmvvQqeft17htL4tnFwNhXXsfEyX4UjnPe0/rJXUVu3RV2uYp3j9QXuR3T2pm47n7/zA2Jpzb3LK56I4+a4+u4v6dTbv89rrl7i682cSH88YggomjkDLbzLs956xqRW3r2i8nuTkRu2NbfxMVLznTd7omfPyPazjldV382RuSaFCduyRiegSIiIiLyiAMoIiIiIo94Cc+DUF+5svJXQ+1Tn8/stkXkwi/bOT32nX36fO6b7rdfk3XnoqEmLgxbLiDeQgO6i/buO46auKRoqsgNXD3cxLUvkqvK10UwL9sFWes3A3pPewScn2+ThvwthT2JnPPy+qaJPxG5hX2cTzOoJXKXrbf36rd9aI3IhRA8TYZsFe3L6lxg4o3jOohcqPWxiN6z7mK5in/dbXbayu6z7dCi7aPrIu6nPmr33f6I++fne33lVIqBOfbvaHlxo4j3FyuegSIiIiLyiAMoIiIiIo84gCIiIiLyiHOgKqGK7O2TX95q5zI9dZ58onf/WvJJ8W6Oa/k4ik++a2MboZ1R9DCgHE9Fzwgb2z/Sd46Jp0E++iMett5nnz7/2siHRa4w2/4OnL10lMi1uOqLuPeFKBUyPrKPqrr31V+I3FWjpoZv7gvOeU9rRof3sRbcHCm1x3TOoUPx7pbv6OPHRbvc0W4zMf5LkbR+07GvOL2nPs/O0buswXNxetfY8AwUERERkUdVDqCUUq2UUsVKqRKl1Fql1G0V389TSn2glNpQ8bVh4rtLsQghBNYyULJZz2DgsRk4PDargUgu4ZUBGKe1XqmUqgtghVLqAwC/ArBAaz1ZKTUBwAQA4xPX1fjKatPaxF+NbiFyfxhuV2m9us7eqN5/4q4iE3/4SC+Ra/hs/E+ZeuDfWjruIA+F3Uw8IGeficfO7iFyZ8yy22Z/K0/H7xrQxMR5w7eb+JbTF4jtLs61SyPMPdxU5EauvsjEjZ+s7dr9FPFvPdNAppL/htxfmG3iZu8muzesZVW+/q8+or1w1H87Wu6X7DaXyVvzQ0/lO7Nx6FmlWM84UovsJea3Dsglhfo1S81yQFWegdJa79Rar6yIDwEoAdASwBUAfpwU9CyAKxPVSYqPDGSAtQyUUtYzGHhsBg6PzWrA0xwopVQBgO4AlgBoqrXeCZwcZAHId3nNDUqp5Uqp5aU4XtkmlAKsZbCwnsHBWgYL6xlcEQ+glFJ1ALwGYKzW+mCkr9Naz9BaF2mti7JRM5o+UpyxlsHCegYHaxksrGewRbSMgVIqGyd/CV7QWr9e8e1dSqnmWuudSqnmAHYnqpPRyio43cTf92gucsPvm2fi/2jwOqIxbqed27R4epHI5c22T3JvGErpnCchXWtZS9lf1ZJBT4jcR/3s3IcNx5uJ3Oj6WyJ6/9t29DPxvI/l9fX2t/n3kSzpWk+/KNdhD+5I4X3J6VrLN3rb43HhF+1E7tWbBpu45sZdEb3f9+eeJtq/vP9tEw+q/WeRa5hhHyeyt/yoyG0ts7m7xt0hcrXfWBJRX2KRrvVMB58faCnaEzPscdzyf+XvWbyWUahMJHfhKQDPACjRWjsXyJkL4MdFcUYBeDP8teQv+uQsbdYyWFjPAOCxGUisZ8BFcgbqPADXAVitlPpxGvxEAJMBvKyUGgPgawBDXV5PPlF+cizOWgZHHbCegcBjM3B4bFYDVQ6gtNYfQawRLQyMb3e8y2puL9l8N1PeYn5jmw9NPKJuZKePw938TV8Tr3xcXtpp/Kp9infeIf9cpnOThSxorX1by6b/sGezx/+mt8g91Mz95+tcEb5vrS2u2606bk+4jvjwBpErHG2XMWgP/16yC/ODn+uZjo70PJKS/frt2Kz3lWz/85hduTv8CQzOlfoL638tcmNeeMrzvjPC/tyEnOubIEfknMsTXDnjbpFr9cDHJs5F4i/ZheGxmUA1rpcXz1Znn2Hi8i+/Ct88YbgSOREREZFHHEARERERecQBFBEREZFHES1jkGonBtslAk7c/p3ITWz3jokvzDkc1fvvCrv9tf/ccSbueO86E+cdkPNwwm6Aphg5r11vGFogcp1vucXEXwx7LOL37PjOTSbuMN3ObylctaKyzamaCX+UC53U6Bn5WTfpmktN/H7n6JZ9SYQrn7rLxM45TxRsZZu3proLAHgGioiIiMgzDqCIiIiIPEqLS3hbrrTjvC+7vhLx66YdsLc2PvLhhSKnyu0dph0nyadxt99lb3lN5Cqm5K5s0xbRbne7bV9+e8+I36cQy0ysT7EdVR/H5zcxcXk3XoiPRM2J9Uy88RX5bLZ22Yl91Ej3JSNN3PipXJE7ff5yE/P4pmTjGSgiIiIijziAIiIiIvKIAygiIiIij9JiDlThjUtNfOmNPaJ7Dyx1zXGeE1H10WyKvd39kilni1xbfBq+OQHQy1abeGxBn6TuuyXWuuY474lSiWegiIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDxSWidvLVel1B4AWwE0BrA3aTt2V9360Vpr3aTqzarGWp4S6xm76tYP1jI50rWeh1H9foZVSXktkzqAMjtVarnWuijpO2Y/4s4vffdLPwB/9cUrv/Sd/YidX/rul34A/uqLF37qt1/64od+8BIeERERkUccQBERERF5lKoB1IwU7Tcc+xE7v/TdL/0A/NUXr/zSd/Yjdn7pu1/6AfirL174qd9+6UvK+5GSOVBERERE6YyX8IiIiIg84gCKiIiIyKOkDqCUUhcppdYrpTYqpSYked8zlVK7lVJrHN/LU0p9oJTaUPG1YRL60UopVayUKlFKrVVK3ZaqvsQqVfVkLeOPx2Zw6slaBqeWAOtZsU9f1jNpAyilVCaAaQAuBtAZwAilVOdk7R/AbAAXhX1vAoAFWuv2ABZUtBOtDMA4rXUnAL0A/Lbi55CKvkQtxfWcDdYybnhsGmlfT9bSSPtaAqyngz/rqbVOyn8AegN4z9G+B8A9ydp/xT4LAKxxtNcDaF4RNwewPpn9qdjvmwAG+aEv6VRP1jI4tWQ9WUvWkvVMx3om8xJeSwDbHO3tFd9LpaZa650AUPE1P5k7V0oVAOgOYEmq+xIFv9WTtYye32oJsJ7RYi3DpHEtAdbzX/ipnskcQKlKvldt11BQStUB8BqAsVrrg6nuTxRYzwqsZbCkeT1ZS4c0ryXAegp+q2cyB1DbAbRytE8DsCOJ+6/MLqVUcwCo+Lo7GTtVSmXj5C/BC1rr11PZlxj4rZ6sZfT8VkuA9YwWa1khALUEWE/Dj/VM5gBqGYD2Sqk2SqkaAK4BMDeJ+6/MXACjKuJROHldNaGUUgrAMwBKtNYPp7IvMfJbPVnL6PmtlgDrGS3WEoGpJcB6AvBxPZM88esSAF8C+ArA75K87zkAdgIoxclR/RgAjXBy5v6Giq95SehHX5w8Bfs5gE8r/rskFX1J13qylsGpJevJWrKWrGe61pOPciEiIiLyiCuRExEREXnEARQRERGRRxxAEREREXnEARQRERGRRxxAEREREXnEARQRERGRRxxAEREREXn0/6qK5FZQqcBNAAAAAElFTkSuQmCC\\n\",\n      \"text/plain\": [\n       \"<Figure size 720x720 with 5 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"needs_background\": \"light\"\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"# iterate normally over keys\\n\",\n    \"\\n\",\n    \"for k in train_aset:\\n\",\n    \"    # equivalent method: for k in train_aset.keys():\\n\",\n    \"    print(k)\\n\",\n    \"\\n\",\n    \"# iterate over items (plot results)\\n\",\n    \"\\n\",\n    \"fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(10, 10))\\n\",\n    \"\\n\",\n    \"for idx, v in enumerate(train_aset.values()):\\n\",\n    \"    axs[idx].imshow(v.reshape(28, 28))\\n\",\n    \"plt.show()\\n\",\n    \"\\n\",\n    \"# iterate over items, store k, v in dict\\n\",\n    \"\\n\",\n    \"myDict = {}\\n\",\n    \"for k, v in train_aset.items():\\n\",\n    \"    myDict[k] = v\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Performance\\n\",\n    \"\\n\",\n    \"Once you’ve completed an interactive exploration, be sure to use the context\\n\",\n    \"manager form of the `update()` and `get()` methods!\\n\",\n    \"\\n\",\n    \"In order to make sure that all your data is always safe in Hangar, the backend\\n\",\n    \"diligently ensures that all contexts (operations which can somehow interact\\n\",\n    \"with the record structures) are opened and closed appropriately. When you use the\\n\",\n    \"context manager form of a arrayset object, we can offload a significant amount of\\n\",\n    \"work to the python runtime, and dramatically increase read and write speeds.\\n\",\n    \"\\n\",\n    \"Most columns we’ve tested see an increased throughput differential of 250% -\\n\",\n    \"500% for writes and 300% - 600% for reads when comparing using the context\\n\",\n    \"manager form vs the naked form!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Beginning non-context manager form\\n\",\n      \"----------------------------------\\n\",\n      \"Finished non-context manager form in: 78.54769086837769 seconds\\n\",\n      \"Hard reset requested with writer_lock: 8910b50e-1f9d-4cb1-986c-b99ea84c8a54\\n\",\n      \"\\n\",\n      \"Beginning context manager form\\n\",\n      \"--------------------------------\\n\",\n      \"Finished context manager form in: 11.608536720275879 seconds\\n\",\n      \"Hard reset requested with writer_lock: ad4a2ef9-8494-49f8-84ef-40c3990b1e9b\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"import time\\n\",\n    \"\\n\",\n    \"# ----------------- Non Context Manager Form ----------------------\\n\",\n    \"\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"aset_trimgs = co.add_ndarray_column(name='train_images', prototype=sample_trimg)\\n\",\n    \"aset_trlabels = co.add_ndarray_column(name='train_labels', prototype=sample_trlabel)\\n\",\n    \"\\n\",\n    \"print(f'Beginning non-context manager form')\\n\",\n    \"print('----------------------------------')\\n\",\n    \"start_time = time.time()\\n\",\n    \"\\n\",\n    \"for idx, img in enumerate(trimgs):\\n\",\n    \"    aset_trimgs[idx] = img\\n\",\n    \"    aset_trlabels[idx] = np.array([trlabels[idx]])\\n\",\n    \"\\n\",\n    \"print(f'Finished non-context manager form in: {time.time() - start_time} seconds')\\n\",\n    \"\\n\",\n    \"co.reset_staging_area()\\n\",\n    \"co.close()\\n\",\n    \"\\n\",\n    \"# ----------------- Context Manager Form --------------------------\\n\",\n    \"\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"aset_trimgs = co.add_ndarray_column(name='train_images', prototype=sample_trimg)\\n\",\n    \"aset_trlabels = co.add_ndarray_column(name='train_labels', prototype=sample_trlabel)\\n\",\n    \"\\n\",\n    \"print(f'\\\\nBeginning context manager form')\\n\",\n    \"print('--------------------------------')\\n\",\n    \"start_time = time.time()\\n\",\n    \"\\n\",\n    \"with aset_trimgs, aset_trlabels:\\n\",\n    \"    for idx, img in enumerate(trimgs):\\n\",\n    \"        aset_trimgs[idx] = img\\n\",\n    \"        aset_trlabels[idx] = np.array([trlabels[idx]])\\n\",\n    \"\\n\",\n    \"print(f'Finished context manager form in: {time.time() - start_time} seconds')\\n\",\n    \"\\n\",\n    \"co.reset_staging_area()\\n\",\n    \"co.close()\\n\",\n    \"\\n\",\n    \"print(f'Finished context manager with checkout form in: {time.time() - start_time} seconds')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Clearly, the context manager form is far and away superior, however we fell that\\n\",\n    \"for the purposes of interactive use that the \\\"Naked\\\" form is valubal to the\\n\",\n    \"average user!\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Commiting Changes\\n\",\n    \"\\n\",\n    \"Once you have made a set of changes you want to commit, just simply call the [commit()](api.rst#hangar.checkout.WriterCheckout.commit) method (and pass in a message)!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b'\"\n      ]\n     },\n     \"execution_count\": 25,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.commit('hello world, this is my first hangar commit')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The returned value (`'e11d061dc457b361842801e24cbd119a745089d6'`) is the commit hash of this commit. It\\n\",\n    \"may be useful to assign this to a variable and follow this up by creating a\\n\",\n    \"branch from this commit!\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Don't Forget to Close the Write-Enabled Checkout to Release the Lock!\\n\",\n    \"\\n\",\n    \"We mentioned in `Checking out the repo for writing` that when a\\n\",\n    \"`write-enabled` checkout is created, it places a lock on writers until it is\\n\",\n    \"closed. If for whatever reason the program terminates via a non python `SIGKILL` or fatal\\n\",\n    \"interpreter error without closing the\\n\",\n    \"write-enabled checkout, this lock will persist (forever technically, but\\n\",\n    \"realistically until it is manually freed).\\n\",\n    \"\\n\",\n    \"Luckily, preventing this issue from occurring is as simple as calling\\n\",\n    \"[close()](api.rst#hangar.checkout.WriterCheckout.close)!\\n\",\n    \"\\n\",\n    \"If you forget, normal interperter shutdown should trigger an `atexit` hook automatically,\\n\",\n    \"however this behavior should not be relied upon. Is better to just call\\n\",\n    \"[close()](api.rst#hangar.checkout.WriterCheckout.close).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### But if you did forget, and you recieve a `PermissionError` next time you open a checkout\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"PermissionError: Cannot acquire the writer lock. Only one instance of\\n\",\n    \"a writer checkout can be active at a time. If the last checkout of this\\n\",\n    \"repository did not properly close, or a crash occured, the lock must be\\n\",\n    \"manually freed before another writer can be instantiated.\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"You can manually free the lock with the following method. However!\\n\",\n    \"\\n\",\n    \"This is a dangerous operation, and it's one of the only ways where a user can put\\n\",\n    \"data in their repository at risk! If another python process is still holding the\\n\",\n    \"lock, do NOT force the release. Kill the process (that's totally fine to do at\\n\",\n    \"any time, then force the lock release).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 27,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"execution_count\": 27,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.force_release_writer_lock()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Reading Data\\n\",\n    \"\\n\",\n    \"Two different styles of access are considered below, In general, the contex manager form\\n\",\n    \"if recomended (though marginal performance improvements are expected to be seen at best)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"\\n\",\n      \" Neither BRANCH or COMMIT specified.\\n\",\n      \" * Checking out writing HEAD BRANCH: master\\n\",\n      \"\\n\",\n      \"Begining Key Iteration\\n\",\n      \"-----------------------\\n\",\n      \"completed in 5.838773965835571 sec\\n\",\n      \"\\n\",\n      \"Begining Items Iteration with Context Manager\\n\",\n      \"---------------------------------------------\\n\",\n      \"completed in 5.516948938369751 sec\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = repo.checkout()\\n\",\n    \"\\n\",\n    \"trlabel_col = co['train_labels']\\n\",\n    \"trimg_col = co['train_images']\\n\",\n    \"\\n\",\n    \"print(f'\\\\nBegining Key Iteration')\\n\",\n    \"print('-----------------------')\\n\",\n    \"start = time.time()\\n\",\n    \"\\n\",\n    \"for idx in trimg_col.keys():\\n\",\n    \"    image_data = trimg_col[idx]\\n\",\n    \"    label_data = trlabel_col[idx]\\n\",\n    \"\\n\",\n    \"print(f'completed in {time.time() - start} sec')\\n\",\n    \"\\n\",\n    \"print(f'\\\\nBegining Items Iteration with Context Manager')\\n\",\n    \"print('---------------------------------------------')\\n\",\n    \"start = time.time()\\n\",\n    \"\\n\",\n    \"with trlabel_col, trimg_col:\\n\",\n    \"    for index, image_data in trimg_col.items():\\n\",\n    \"        label_data = trlabel_col[index]\\n\",\n    \"\\n\",\n    \"print(f'completed in {time.time() - start} sec')\\n\",\n    \"\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Inspecting state from the top!\\n\",\n    \"\\n\",\n    \"After your first commit, the summary and log methods will begin to work, and you can either print the stream to the console (as shown below), or you can\\n\",\n    \"dig deep into the internal of how hangar thinks about your data! (To be covered in an advanced tutorial later on).\\n\",\n    \"\\n\",\n    \"The point is, regardless of your level of interaction with a live hangar repository, all level of state is accessable from the top, and in general has been built to be the only way to directly access it!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/mnist \\n\",\n      \"|  Disk Usage: 57.29 MB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b \\n\",\n      \"|  Created: Tue Feb 25 19:03:06 2020 \\n\",\n      \"|  By: Rick Izzo \\n\",\n      \"|  Email: rick@tensorwerk.com \\n\",\n      \"|  Message: hello world, this is my first hangar commit \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 2 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"train_images\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 50000 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (784,) \\n\",\n      \"|    - dtype: uint8 \\n\",\n      \"|    - backend: 00 \\n\",\n      \"|    - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"train_labels\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 50000 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (1,) \\n\",\n      \"|    - dtype: int64 \\n\",\n      \"|    - backend: 10 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Metadata: \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Keys: 0 \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b (\\u001B[1;31mmaster\\u001B[m) : hello world, this is my first hangar commit\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.3\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/Tutorial-002.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 2: Checkouts, Branching, & Merging\\n\",\n    \"\\n\",\n    \"This section deals with navigating repository history, creating & merging\\n\",\n    \"branches, and understanding conflicts.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### The Hangar Workflow\\n\",\n    \"\\n\",\n    \"The hangar workflow is intended to mimic common ``git`` workflows in which small\\n\",\n    \"incremental changes are made and committed on dedicated ``topic`` branches.\\n\",\n    \"After the ``topic`` has been adequatly set, ``topic`` branch is merged into\\n\",\n    \"a separate branch (commonly referred to as ``master``, though it need not to be the\\n\",\n    \"actual branch named ``\\\"master\\\"``), where well vetted and more permanent changes\\n\",\n    \"are kept.\\n\",\n    \"\\n\",\n    \"    Create Branch -> Checkout Branch -> Make Changes -> Commit\\n\",\n    \"\\n\",\n    \"#### Making the Initial Commit\\n\",\n    \"\\n\",\n    \"Let's initialize a new repository and see how branching works in Hangar:\\n\",\n    \"\\n\",\n    \"<!-- However, unlike GIT, remember that it is not possible to make changes in a DETACHED HEAD state. Hangar enforces the requirement that all work is performed at the tip of a branch. -->\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\\n\",\n    \"import numpy as np\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"repo = Repository(path='/Users/rick/projects/tensorwerk/hangar/dev/mnist/')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/mnist/.hangar\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo_pth = repo.init(user_name='Test User', user_email='test@foo.com', remove_old=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"When a repository is first initialized, it has no history, no commits.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"repo.log() # -> returns None\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Though the repository is essentially empty at this point in time, there is one\\n\",\n    \"thing which is present: a branch with the name: ``\\\"master\\\"``.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['master']\"\n      ]\n     },\n     \"execution_count\": 5,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.list_branches()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This ``\\\"master\\\"`` is the branch we make our first commit on; until we do, the\\n\",\n    \"repository is in a semi-unstable state; with no history or contents, most of the\\n\",\n    \"functionality of a repository (to store, retrieve, and work with versions of\\n\",\n    \"data across time) just isn't possible. A significant portion of otherwise\\n\",\n    \"standard operations will generally flat out refuse to execute (ie. read-only\\n\",\n    \"checkouts, log, push, etc.) until the first commit is made.\\n\",\n    \"\\n\",\n    \"One of the only options available at this point is to create a\\n\",\n    \"write-enabled checkout on the ``\\\"master\\\"`` branch and to begin to add data so we\\n\",\n    \"can make a commit. Let’s do that now:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"As expected, there are no columns nor metadata samples recorded in the checkout.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"number of metadata keys: 0\\n\",\n      \"number of columns: 0\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"print(f'number of metadata keys: {len(co.metadata)}')\\n\",\n    \"print(f'number of columns: {len(co.columns)}')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let’s add a dummy array just to put something in the repository history to\\n\",\n    \"commit. We'll then close the checkout so we can explore some useful tools which\\n\",\n    \"depend on having at least one historical record (commit) in the repo.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dummy = np.arange(10, dtype=np.uint16)\\n\",\n    \"col = co.add_ndarray_column('dummy_column', prototype=dummy)\\n\",\n    \"col['0'] = dummy\\n\",\n    \"initialCommitHash = co.commit('first commit with a single sample added to a dummy column')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"If we check the history now, we can see our first commit hash, and that it is labeled with the branch name `\\\"master\\\"`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e (\\u001B[1;31mmaster\\u001B[m) : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"So now our repository contains:\\n\",\n    \"- [A commit](api.rst#hangar.checkout.WriterCheckout.commit_hash): a fully\\n\",\n    \"  independent description of the entire repository state as\\n\",\n    \"  it existed at some point in time. A commit is identified by a `commit_hash`.\\n\",\n    \"- [A branch](api.rst#hangar.checkout.WriterCheckout.branch_name): a label\\n\",\n    \"  pointing to a particular `commit` / `commit_hash`.\\n\",\n    \"\\n\",\n    \"Once committed, it is not possible to remove, modify, or otherwise tamper with\\n\",\n    \"the contents of a commit in any way. It is a permanent record, which Hangar has\\n\",\n    \"no method to change once written to disk.\\n\",\n    \"\\n\",\n    \"In addition, as a `commit_hash` is not only calculated from the `commit` ’s\\n\",\n    \"contents, but from the `commit_hash` of its parents (more on this to follow),\\n\",\n    \"knowing a single top-level `commit_hash` allows us to verify the integrity of\\n\",\n    \"the entire repository history. This fundamental behavior holds even in cases of\\n\",\n    \"disk-corruption or malicious use.\\n\",\n    \"\\n\",\n    \"### Working with Checkouts & Branches\\n\",\n    \"\\n\",\n    \"As mentioned in the first tutorial, we work with the data in a repository through\\n\",\n    \"a [checkout](api.rst#hangar.repository.Repository.checkout). There are two types\\n\",\n    \"of checkouts (each of which have different uses and abilities):\\n\",\n    \"\\n\",\n    \"**[Checking out a branch / commit for reading:](api.rst#read-only-checkout)** is\\n\",\n    \"the process of retrieving records describing repository state at some point in\\n\",\n    \"time, and setting up access to the referenced data.\\n\",\n    \"\\n\",\n    \"-  Any number of read checkout processes can operate on a repository (on\\n\",\n    \"   any number of commits) at the same time.\\n\",\n    \"\\n\",\n    \"**[Checking out a branch for writing:](api.rst#write-enabled-checkout)** is the\\n\",\n    \"process of setting up a (mutable) ``staging area`` to temporarily gather\\n\",\n    \"record references / data before all changes have been made and staging area\\n\",\n    \"contents are committed in a new permanent record of history (a `commit`).\\n\",\n    \"\\n\",\n    \"-  Only one write-enabled checkout can ever be operating in a repository\\n\",\n    \"   at a time.\\n\",\n    \"-  When initially creating the checkout, the `staging area` is not\\n\",\n    \"   actually “empty”. Instead, it has the full contents of the last `commit`\\n\",\n    \"   referenced by a branch’s `HEAD`. These records can be removed / mutated / added\\n\",\n    \"   to in any way to form the next `commit`. The new `commit` retains a\\n\",\n    \"   permanent reference identifying the previous ``HEAD`` ``commit`` was used as\\n\",\n    \"   its base `staging area`.\\n\",\n    \"-  On commit, the branch which was checked out has its ``HEAD`` pointer\\n\",\n    \"   value updated to the new `commit`’s `commit_hash`. A write-enabled\\n\",\n    \"   checkout starting from the same branch will now use that `commit`’s\\n\",\n    \"   record content as the base for its `staging area`.\\n\",\n    \"\\n\",\n    \"#### Creating a branch\\n\",\n    \"\\n\",\n    \"A branch is an individual series of changes / commits which diverge from the main\\n\",\n    \"history of the repository at some point in time. All changes made along a branch\\n\",\n    \"are completely isolated from those on other branches. After some point in time,\\n\",\n    \"changes made in a disparate branches can be unified through an automatic\\n\",\n    \"`merge` process (described in detail later in this tutorial). In general, the\\n\",\n    \"`Hangar` branching model is semantically identical to the `Git` one; The one exception\\n\",\n    \"is that in Hangar, a branch must always have a `name` and a `base_commit`. (No\\n\",\n    \"\\\"Detached HEAD state\\\" is possible for a `write-enabled` checkout). If No `base_commit` is\\n\",\n    \"specified, the current writer branch `HEAD` `commit` is used as the `base_commit`\\n\",\n    \"hash for the branch automatically.\\n\",\n    \"\\n\",\n    \"Hangar branches have the same lightweight and performant properties which\\n\",\n    \"make working with `Git` branches so appealing - they are cheap and easy to use,\\n\",\n    \"create, and discard (if necessary).\\n\",\n    \"\\n\",\n    \"To create a branch, use the [create_branch()](api.rst#hangar.repository.Repository.create_branch)\\n\",\n    \"method.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"branch_1 = repo.create_branch(name='testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"BranchHead(name='testbranch', digest='a=eaee002ed9c6e949c3657bd50e3949d6a459d50e')\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"branch_1\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We use the [list_branches()](api.rst#hangar.repository.Repository.list_branches) and [log()](api.rst#hangar.repository.Repository.log) methods to see that a new branch named `testbranch` has been created and is indeed pointing to our initial commit.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"branch names: ['master', 'testbranch'] \\n\",\n      \"\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31mtestbranch\\u001B[m) : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"print(f'branch names: {repo.list_branches()} \\\\n')\\n\",\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"If instead, we actually specify the base commit (with a different branch\\n\",\n    \"name) we see we do actually get a third branch. pointing to the same commit as\\n\",\n    \"`master` and `testbranch`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"branch_2 = repo.create_branch(name='new', base_commit=initialCommitHash)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"BranchHead(name='new', digest='a=eaee002ed9c6e949c3657bd50e3949d6a459d50e')\"\n      ]\n     },\n     \"execution_count\": 14,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"branch_2\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31mnew\\u001B[m) (\\u001B[1;31mtestbranch\\u001B[m) : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Making changes on a branch\\n\",\n    \"\\n\",\n    \"Let’s make some changes on the `new` branch to see how things work.\\n\",\n    \"\\n\",\n    \"We can see that the data we added previously is still here (`dummy` arrayset containing\\n\",\n    \"one sample labeled `0`).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='new')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - dummy_column / False\"\n      ]\n     },\n     \"execution_count\": 17,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint16                \\n\",\n       \"    Shape                    : (10,)                \\n\",\n       \"    Number of Samples        : 1                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['dummy_column']\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)\"\n      ]\n     },\n     \"execution_count\": 19,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['dummy_column']['0']\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's add another sample to the `dummy_arrayset` called `1`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"arr = np.arange(10, dtype=np.uint16)\\n\",\n    \"# let's increment values so that `0` and `1` aren't set to the same thing\\n\",\n    \"arr += 1\\n\",\n    \"\\n\",\n    \"co['dummy_column', '1'] = arr\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We can see that in this checkout, there are indeed two samples in the `dummy_arrayset`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"2\"\n      ]\n     },\n     \"execution_count\": 21,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(co.columns['dummy_column'])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"That's all, let's commit this and be done with this branch.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.commit('commit on `new` branch adding a sample to dummy_arrayset')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### How do changes appear when made on a branch?\\n\",\n    \"\\n\",\n    \"If we look at the log, we see that the branch we were on (`new`) is a commit ahead of `master` and `testbranch`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 (\\u001B[1;31mnew\\u001B[m) : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31mtestbranch\\u001B[m) : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The meaning is exactly what one would intuit. We made some changes, they were\\n\",\n    \"reflected on the `new` branch, but the `master` and `testbranch` branches\\n\",\n    \"were not impacted at all, nor were any of the commits!\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Merging (Part 1) Fast-Forward Merges\\n\",\n    \"\\n\",\n    \"Say we like the changes we made on the ``new`` branch so much that we want them\\n\",\n    \"to be included into our ``master`` branch! How do we make this happen for this\\n\",\n    \"scenario??\\n\",\n    \"\\n\",\n    \"Well, the history between the ``HEAD`` of the ``new`` and the ``HEAD`` of the\\n\",\n    \"``master`` branch is perfectly linear. In fact, when we began making changes\\n\",\n    \"on ``new``, our staging area was *identical* to what the ``master`` ``HEAD``\\n\",\n    \"commit references are right now!\\n\",\n    \"\\n\",\n    \"If you’ll remember that a branch is just a pointer which assigns some ``name``\\n\",\n    \"to a ``commit_hash``, it becomes apparent that a merge in this case really\\n\",\n    \"doesn’t involve any work at all. With a linear history between ``master`` and\\n\",\n    \"``new``, any ``commits`` exsting along the path between the ``HEAD`` of\\n\",\n    \"``new`` and ``master`` are the only changes which are introduced, and we can\\n\",\n    \"be sure that this is the only view of the data records which can exist!\\n\",\n    \"\\n\",\n    \"What this means in practice is that for this type of merge, we can just update\\n\",\n    \"the ``HEAD`` of ``master`` to point to the ``HEAD`` of ``\\\"new\\\"``, and the\\n\",\n    \"merge is complete.\\n\",\n    \"\\n\",\n    \"This situation is referred to as a **Fast Forward (FF) Merge**. A FF merge is\\n\",\n    \"safe to perform any time a linear history lies between the ``HEAD`` of some\\n\",\n    \"``topic`` and ``base`` branch, regardless of how many commits or changes which\\n\",\n    \"were introduced.\\n\",\n    \"\\n\",\n    \"For other situations, a more complicated **Three Way Merge** is required. This\\n\",\n    \"merge method will be explained a bit more later in this tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Performing the Merge\\n\",\n    \"\\n\",\n    \"In practice, you’ll never need to know the details of the merge theory explained\\n\",\n    \"above (or even remember it exists). Hangar automatically figures out which merge\\n\",\n    \"algorithms should be used and then performed whatever calculations are needed to\\n\",\n    \"compute the results.\\n\",\n    \"\\n\",\n    \"As a user, merging in Hangar is a one-liner! just use the [merge()](api.rst#hangar.checkout.WriterCheckout.merge)\\n\",\n    \"method from a `write-enabled` checkout (shown below), or the analogous methods method\\n\",\n    \"from the Repository Object [repo.merge()](api.rst#hangar.repository.Repository.merge)\\n\",\n    \"(if not already working with a `write-enabled` checkout object).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected Fast-Forward Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94'\"\n      ]\n     },\n     \"execution_count\": 25,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.merge(message='message for commit (not used for FF merge)', dev_branch='new')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's check the log!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31mnew\\u001B[m) : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e (\\u001B[1;31mtestbranch\\u001B[m) : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 27,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'master'\"\n      ]\n     },\n     \"execution_count\": 27,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.branch_name\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 28,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94'\"\n      ]\n     },\n     \"execution_count\": 28,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.commit_hash\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint16                \\n\",\n       \"    Shape                    : (10,)                \\n\",\n       \"    Number of Samples        : 2                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 29,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['dummy_column']\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"As you can see, everything is as it should be!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Making changes to introduce diverged histories\\n\",\n    \"\\n\",\n    \"Let’s now go back to our `testbranch` branch and make some changes there so\\n\",\n    \"we can see what happens when changes don’t follow a linear history.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 32,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - dummy_column / False\"\n      ]\n     },\n     \"execution_count\": 32,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 33,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint16                \\n\",\n       \"    Shape                    : (10,)                \\n\",\n       \"    Number of Samples        : 1                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 33,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['dummy_column']\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We will start by mutating sample `0` in `dummy_arrayset` to a different value\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 34,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([50, 51, 52, 53, 54, 55, 56, 57, 58, 59], dtype=uint16)\"\n      ]\n     },\n     \"execution_count\": 34,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"old_arr = co['dummy_column', '0']\\n\",\n    \"new_arr = old_arr + 50\\n\",\n    \"new_arr\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 35,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co['dummy_column', '0'] = new_arr\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let’s make a commit here, then add some metadata and make a new commit (all on\\n\",\n    \"the `testbranch` branch).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 36,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=fcd82f86e39b19c3e5351dda063884b5d2fda67b'\"\n      ]\n     },\n     \"execution_count\": 36,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.commit('mutated sample `0` of `dummy_column` to new value')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 37,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=fcd82f86e39b19c3e5351dda063884b5d2fda67b (\\u001B[1;31mtestbranch\\u001B[m) : mutated sample `0` of `dummy_column` to new value\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 38,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.metadata['hello'] = 'world'\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 39,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=69a08ca41ca1f5577fb0ffcf59d4d1585f614c4d'\"\n      ]\n     },\n     \"execution_count\": 39,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.commit('added hellow world metadata')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 40,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Looking at our history how, we see that none of the original branches reference\\n\",\n    \"our first commit anymore.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 41,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=69a08ca41ca1f5577fb0ffcf59d4d1585f614c4d (\\u001B[1;31mtestbranch\\u001B[m) : added hellow world metadata\\n\",\n      \"* a=fcd82f86e39b19c3e5351dda063884b5d2fda67b : mutated sample `0` of `dummy_column` to new value\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We can check the history of the `master` branch by specifying it as an argument to the `log()` method.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 42,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31mnew\\u001B[m) : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log('master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Merging (Part 2) Three Way Merge\\n\",\n    \"\\n\",\n    \"If we now want to merge the changes on `testbranch` into `master`, we can't just follow a simple linear history; **the branches have diverged**.\\n\",\n    \"\\n\",\n    \"For this case, Hangar implements a **Three Way Merge** algorithm which does the following:\\n\",\n    \"- Find the most recent common ancestor `commit` present in both the `testbranch` and `master` branches\\n\",\n    \"- Compute what changed between the common ancestor and each branch's `HEAD` commit\\n\",\n    \"- Check if any of the changes conflict with each other (more on this in a later tutorial)\\n\",\n    \"- If no conflicts are present, compute the results of the merge between the two sets of changes\\n\",\n    \"- Create a new `commit` containing the merge results reference both branch `HEAD`s as parents of the new `commit`, and update the `base` branch `HEAD` to that new `commit`'s `commit_hash`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 43,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once again, as a user, the details are completely irrelevant, and the operation\\n\",\n    \"occurs from the same one-liner call we used before for the FF Merge.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 44,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected 3-Way Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=002041fe8d8846b06f33842964904b627de55214'\"\n      ]\n     },\n     \"execution_count\": 44,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.merge(message='merge of testbranch into master', dev_branch='testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"If we now look at the log, we see that this has a much different look than\\n\",\n    \"before. The three way merge results in a history which references changes made\\n\",\n    \"in both diverged branches, and unifies them in a single ``commit``\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 45,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"*   a=002041fe8d8846b06f33842964904b627de55214 (\\u001B[1;31mmaster\\u001B[m) : merge of testbranch into master\\n\",\n      \"\\u001B[1;31m|\\u001B[m\\u001B[1;32m\\\\\\u001B[m  \\n\",\n      \"\\u001B[1;31m|\\u001B[m * a=69a08ca41ca1f5577fb0ffcf59d4d1585f614c4d (\\u001B[1;31mtestbranch\\u001B[m) : added hellow world metadata\\n\",\n      \"\\u001B[1;31m|\\u001B[m * a=fcd82f86e39b19c3e5351dda063884b5d2fda67b : mutated sample `0` of `dummy_column` to new value\\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 (\\u001B[1;31mnew\\u001B[m) : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"\\u001B[1;32m|\\u001B[m\\u001B[1;32m/\\u001B[m  \\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Manually inspecting the merge result to verify it matches our expectations\\n\",\n    \"\\n\",\n    \"`dummy_arrayset` should contain two arrays, key `1` was set in the previous\\n\",\n    \"commit originally made in `new` and merged into `master`. Key `0` was\\n\",\n    \"mutated in `testbranch` and unchanged in `master`, so the update from\\n\",\n    \"`testbranch` is kept.\\n\",\n    \"\\n\",\n    \"There should be one metadata sample with they key `hello` and the value\\n\",\n    \"``\\\"world\\\"``.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 46,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - dummy_column / False\"\n      ]\n     },\n     \"execution_count\": 46,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 47,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint16                \\n\",\n       \"    Shape                    : (10,)                \\n\",\n       \"    Number of Samples        : 2                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 47,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['dummy_column']\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 49,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[array([50, 51, 52, 53, 54, 55, 56, 57, 58, 59], dtype=uint16),\\n\",\n       \" array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=uint16)]\"\n      ]\n     },\n     \"execution_count\": 49,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co['dummy_column', ['0', '1']]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 50,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Metadata                \\n\",\n       \"    Writeable: True                \\n\",\n       \"    Number of Keys: 1\\n\"\n      ]\n     },\n     \"execution_count\": 50,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.metadata\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 51,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'world'\"\n      ]\n     },\n     \"execution_count\": 51,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.metadata['hello']\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**The Merge was a success!**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 52,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Conflicts\\n\",\n    \"\\n\",\n    \"Now that we've seen merging in action, the next step is to talk about conflicts.\\n\",\n    \"\\n\",\n    \"#### How Are Conflicts Detected?\\n\",\n    \"\\n\",\n    \"Any merge conflicts can be identified and addressed ahead of running a `merge`\\n\",\n    \"command by using the built in [diff](api.rst#hangar.diff.WriterUserDiff) tools.\\n\",\n    \"When diffing commits, Hangar will provide a list of conflicts which it identifies.\\n\",\n    \"In general these fall into 4 categories:\\n\",\n    \"\\n\",\n    \"1. **Additions** in both branches which created new keys (samples /\\n\",\n    \"   columns / metadata) with non-compatible values. For samples &\\n\",\n    \"   metadata, the hash of the data is compared, for columns, the schema\\n\",\n    \"   specification is checked for compatibility in a method custom to the\\n\",\n    \"   internal workings of Hangar.\\n\",\n    \"2. **Removal** in `Master Commit/Branch` **& Mutation** in `Dev Commit / Branch`. Applies for samples, columns, and metadata identically.\\n\",\n    \"3. **Mutation** in `Dev Commit/Branch` **& Removal** in `Master Commit / Branch`. Applies for samples, columns, and metadata identically.\\n\",\n    \"4. **Mutations** on keys of both branches to non-compatible values. For\\n\",\n    \"   samples & metadata, the hash of the data is compared; for columns, the\\n\",\n    \"   schema specification is checked for compatibility in a method custom to the\\n\",\n    \"   internal workings of Hangar.\\n\",\n    \"\\n\",\n    \"#### Let's make a merge conflict\\n\",\n    \"\\n\",\n    \"To force a conflict, we are going to checkout the `new` branch and set the\\n\",\n    \"metadata key `hello` to the value `foo conflict... BOO!`. Then if we try\\n\",\n    \"to merge this into the `testbranch` branch (which set `hello` to a value\\n\",\n    \"of `world`) we see how hangar will identify the conflict and halt without\\n\",\n    \"making any changes.\\n\",\n    \"\\n\",\n    \"Automated conflict resolution will be introduced in a future version of Hangar,\\n\",\n    \"for now it is up to the user to manually resolve conflicts by making any\\n\",\n    \"necessary changes in each branch before reattempting a merge operation.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 53,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='new')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 54,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.metadata['hello'] = 'foo conflict... BOO!'\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 55,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=95896880b33fc06a3c2359a03408f07c87bcc8c0'\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.commit ('commit on new branch to hello metadata key so we can demonstrate a conflict')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 56,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=95896880b33fc06a3c2359a03408f07c87bcc8c0 (\\u001B[1;31mnew\\u001B[m) : commit on new branch to hello metadata key so we can demonstrate a conflict\\n\",\n      \"* a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**When we attempt the merge, an exception is thrown telling us there is a conflict!**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 57,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected 3-Way Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"ename\": \"ValueError\",\n     \"evalue\": \"HANGAR VALUE ERROR:: Merge ABORTED with conflict: Conflicts(t1=[(b'l:hello', b'2=d8fa6800caf496e637d965faac1a033e4636c2e6')], t21=[], t22=[], t3=[], conflict=True)\",\n     \"output_type\": \"error\",\n     \"traceback\": [\n      \"\\u001B[0;31m---------------------------------------------------------------------------\\u001B[0m\",\n      \"\\u001B[0;31mValueError\\u001B[0m                                Traceback (most recent call last)\",\n      \"\\u001B[0;32m<ipython-input-57-1a98dce1852b>\\u001B[0m in \\u001B[0;36m<module>\\u001B[0;34m\\u001B[0m\\n\\u001B[0;32m----> 1\\u001B[0;31m \\u001B[0mco\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mmerge\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mmessage\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0;34m'this merge should not happen'\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mdev_branch\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0;34m'testbranch'\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/checkout.py\\u001B[0m in \\u001B[0;36mmerge\\u001B[0;34m(self, message, dev_branch)\\u001B[0m\\n\\u001B[1;32m   1027\\u001B[0m             \\u001B[0mdev_branch\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0mdev_branch\\u001B[0m\\u001B[0;34m,\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m   1028\\u001B[0m             \\u001B[0mrepo_path\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_repo_path\\u001B[0m\\u001B[0;34m,\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m-> 1029\\u001B[0;31m             writer_uuid=self._writer_lock)\\n\\u001B[0m\\u001B[1;32m   1030\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m   1031\\u001B[0m         \\u001B[0;32mfor\\u001B[0m \\u001B[0masetHandle\\u001B[0m \\u001B[0;32min\\u001B[0m \\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_columns\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mvalues\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/merger.py\\u001B[0m in \\u001B[0;36mselect_merge_algorithm\\u001B[0;34m(message, branchenv, stageenv, refenv, stagehashenv, master_branch, dev_branch, repo_path, writer_uuid)\\u001B[0m\\n\\u001B[1;32m    136\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    137\\u001B[0m     \\u001B[0;32mexcept\\u001B[0m \\u001B[0mValueError\\u001B[0m \\u001B[0;32mas\\u001B[0m \\u001B[0me\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m--> 138\\u001B[0;31m         \\u001B[0;32mraise\\u001B[0m \\u001B[0me\\u001B[0m \\u001B[0;32mfrom\\u001B[0m \\u001B[0;32mNone\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    139\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    140\\u001B[0m     \\u001B[0;32mfinally\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/merger.py\\u001B[0m in \\u001B[0;36mselect_merge_algorithm\\u001B[0;34m(message, branchenv, stageenv, refenv, stagehashenv, master_branch, dev_branch, repo_path, writer_uuid)\\u001B[0m\\n\\u001B[1;32m    133\\u001B[0m                 \\u001B[0mrefenv\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0mrefenv\\u001B[0m\\u001B[0;34m,\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    134\\u001B[0m                 \\u001B[0mstagehashenv\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0mstagehashenv\\u001B[0m\\u001B[0;34m,\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m--> 135\\u001B[0;31m                 repo_path=repo_path)\\n\\u001B[0m\\u001B[1;32m    136\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    137\\u001B[0m     \\u001B[0;32mexcept\\u001B[0m \\u001B[0mValueError\\u001B[0m \\u001B[0;32mas\\u001B[0m \\u001B[0me\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/merger.py\\u001B[0m in \\u001B[0;36m_three_way_merge\\u001B[0;34m(message, master_branch, masterHEAD, dev_branch, devHEAD, ancestorHEAD, branchenv, stageenv, refenv, stagehashenv, repo_path)\\u001B[0m\\n\\u001B[1;32m    260\\u001B[0m         \\u001B[0;32mif\\u001B[0m \\u001B[0mconflict\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mconflict\\u001B[0m \\u001B[0;32mis\\u001B[0m \\u001B[0;32mTrue\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    261\\u001B[0m             \\u001B[0mmsg\\u001B[0m \\u001B[0;34m=\\u001B[0m \\u001B[0;34mf'HANGAR VALUE ERROR:: Merge ABORTED with conflict: {conflict}'\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m--> 262\\u001B[0;31m             \\u001B[0;32mraise\\u001B[0m \\u001B[0mValueError\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mmsg\\u001B[0m\\u001B[0;34m)\\u001B[0m \\u001B[0;32mfrom\\u001B[0m \\u001B[0;32mNone\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    263\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    264\\u001B[0m         \\u001B[0;32mwith\\u001B[0m \\u001B[0mmEnv\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mbegin\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mwrite\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0;32mTrue\\u001B[0m\\u001B[0;34m)\\u001B[0m \\u001B[0;32mas\\u001B[0m \\u001B[0mtxn\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;31mValueError\\u001B[0m: HANGAR VALUE ERROR:: Merge ABORTED with conflict: Conflicts(t1=[(b'l:hello', b'2=d8fa6800caf496e637d965faac1a033e4636c2e6')], t21=[], t22=[], t3=[], conflict=True)\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co.merge(message='this merge should not happen', dev_branch='testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Checking for Conflicts\\n\",\n    \"\\n\",\n    \"Alternatively, use the diff methods on a checkout to test for conflicts before attempting a merge.\\n\",\n    \"\\n\",\n    \"It is possible to diff between a checkout object and:\\n\",\n    \"\\n\",\n    \"1. Another branch ([diff.branch()](api.rst#hangar.diff.WriterUserDiff.branch))\\n\",\n    \"2. A specified commit ([diff.commit()](api.rst#hangar.diff.WriterUserDiff.commit))\\n\",\n    \"3. Changes made in the staging area before a commit is made\\n\",\n    \"   ([diff.staged()](api.rst#hangar.diff.WriterUserDiff.staged))\\n\",\n    \"   (for `write-enabled` checkouts only.)\\n\",\n    \"\\n\",\n    \"Or via the [CLI status tool](cli.rst#hangar-status) between the staging area and any branch/commit\\n\",\n    \"(only a human readable summary is produced).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 58,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"merge_results, conflicts_found = co.diff.branch('testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 59,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Conflicts(t1=Changes(schema={}, samples=(), metadata=(MetadataRecordKey(key='hello'),)), t21=Changes(schema={}, samples=(), metadata=()), t22=Changes(schema={}, samples=(), metadata=()), t3=Changes(schema={}, samples=(), metadata=()), conflict=True)\"\n      ]\n     },\n     \"execution_count\": 59,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"conflicts_found\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 60,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"(MetadataRecordKey(key='hello'),)\"\n      ]\n     },\n     \"execution_count\": 60,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"conflicts_found.t1.metadata\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The type codes for a `Conflicts` `namedtuple` such as the one we saw:\\n\",\n    \"\\n\",\n    \"    Conflicts(t1=('hello',), t21=(), t22=(), t3=(), conflict=True)\\n\",\n    \"\\n\",\n    \"are as follow:\\n\",\n    \"\\n\",\n    \"-  ``t1``: Addition of key in master AND dev with different values.\\n\",\n    \"-  ``t21``: Removed key in master, mutated value in dev.\\n\",\n    \"-  ``t22``: Removed key in dev, mutated value in master.\\n\",\n    \"-  ``t3``: Mutated key in both master AND dev to different values.\\n\",\n    \"-  ``conflict``: Bool indicating if any type of conflict is present.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### To resolve, remove the conflict\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 61,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=e69ba8aeffc130c57d2ae0a8131c8ea59083cb62'\"\n      ]\n     },\n     \"execution_count\": 61,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"del co.metadata['hello']\\n\",\n    \"# resolved conflict by removing hello key\\n\",\n    \"co.commit('commit which removes conflicting metadata key')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 62,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected 3-Way Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=ef7ddf4a4a216315d929bd905e78866e3ad6e4fd'\"\n      ]\n     },\n     \"execution_count\": 62,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.merge(message='this merge succeeds as it no longer has a conflict', dev_branch='testbranch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We can verify that history looks as we would expect via the log!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 63,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"*   a=ef7ddf4a4a216315d929bd905e78866e3ad6e4fd (\\u001B[1;31mnew\\u001B[m) : this merge succeeds as it no longer has a conflict\\n\",\n      \"\\u001B[1;31m|\\u001B[m\\u001B[1;32m\\\\\\u001B[m  \\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=e69ba8aeffc130c57d2ae0a8131c8ea59083cb62 : commit which removes conflicting metadata key\\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=95896880b33fc06a3c2359a03408f07c87bcc8c0 : commit on new branch to hello metadata key so we can demonstrate a conflict\\n\",\n      \"\\u001B[1;32m|\\u001B[m * a=69a08ca41ca1f5577fb0ffcf59d4d1585f614c4d (\\u001B[1;31mtestbranch\\u001B[m) : added hellow world metadata\\n\",\n      \"\\u001B[1;32m|\\u001B[m * a=fcd82f86e39b19c3e5351dda063884b5d2fda67b : mutated sample `0` of `dummy_column` to new value\\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=c1cf1bd6863ed0b95239d2c9e1a6c6cc65569e94 : commit on `new` branch adding a sample to dummy_arrayset\\n\",\n      \"\\u001B[1;32m|\\u001B[m\\u001B[1;32m/\\u001B[m  \\n\",\n      \"* a=eaee002ed9c6e949c3657bd50e3949d6a459d50e : first commit with a single sample added to a dummy column\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.3\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/Tutorial-003.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 3: Working With Remote Servers\\n\",\n    \"\\n\",\n    \"This tutorial will introduce how to start a remote Hangar server, and how to work with [remotes](api.rst#hangar.repository.Remotes) from the client side.\\n\",\n    \"\\n\",\n    \"Particular attention is paid to the concept of a ***partially fetch* / *partial clone*** operations. This is a key component of the Hangar design which provides the ability to quickly and efficiently work with data contained in remote repositories whose full size would be significatly prohibitive to local use under most circumstances.\\n\",\n    \"\\n\",\n    \"*Note:*\\n\",\n    \"\\n\",\n    \"> At the time of writing, the API, user-facing functionality, client-server negotiation protocols, and test coverage of the remotes implementation is generally adqequate for this to serve as an \\\"alpha\\\" quality preview. However, please be warned that significantly less time has been spent in this module to optimize speed, refactor for simplicity, and assure stability under heavy loads than the rest of the Hangar core. While we can guarantee that your data is secure on disk, you may experience crashes from time to time when working with remotes. In addition, sending data over the wire should NOT be considered secure in ANY way. No in-transit encryption, user authentication, or secure access limitations are implemented at this moment. We realize the importance of these types of protections, and they are on our radar for the next release cycle. If you are interested in making a contribution to Hangar, this module contains a lot of low hanging fruit which would would provide drastic improvements and act as a good intro the the internal Hangar data model. Please get in touch with us to discuss!\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Starting a Hangar Server\\n\",\n    \"\\n\",\n    \"To start a Hangar server, navigate to the command line and simply execute:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ hangar server\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"This will get a local server instance running at `localhost:50051`. The IP and port can be configured by setting the `--ip` and `--port` flags to the desired values in the command line.\\n\",\n    \"\\n\",\n    \"A blocking process will begin in that terminal session. Leave it running while you experiment with connecting from a client repo.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Using Remotes with a Local Repository\\n\",\n    \"\\n\",\n    \"The [CLI](cli.rst#hangar-cli-documentation) is the easiest way to interact with the remote server from a local repository (though all functioanlity is mirrorred via the [repository API](api.rst#hangar.repository.Remotes) (more on that later).\\n\",\n    \"\\n\",\n    \"Before we begin we will set up a repository with some data, a few commits, two branches, and a merge.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Setup a Test Repo\\n\",\n    \"\\n\",\n    \"As normal, we shall begin with creating a repository and adding some data. This should be familiar to you from previous tutorials.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\\n\",\n    \"import numpy as np\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"testData = np.loadtxt('/Users/rick/projects/tensorwerk/hangar/dev/data/dota2Dataset/dota2Test.csv', delimiter=',', dtype=np.uint8)\\n\",\n    \"trainData = np.loadtxt('/Users/rick/projects/tensorwerk/hangar/dev/data/dota2Dataset/dota2Train.csv', delimiter=',', dtype=np.uint16)\\n\",\n    \"\\n\",\n    \"testName = 'test'\\n\",\n    \"testPrototype = testData[0]\\n\",\n    \"trainName = 'train'\\n\",\n    \"trainPrototype = trainData[0]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/intro/.hangar\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/context.py:94: UserWarning: No repository exists at /Users/rick/projects/tensorwerk/hangar/dev/intro/.hangar, please use `repo.init()` method\\n\",\n      \"  warnings.warn(msg, UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo = Repository('/Users/rick/projects/tensorwerk/hangar/dev/intro/')\\n\",\n    \"repo.init(user_name='Rick Izzo', user_email='rick@tensorwerk.com', remove_old=True)\\n\",\n    \"co = repo.checkout(write=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"10500it [00:02, 4286.17it/s]                           \\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31mmaster\\u001B[m) : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co.add_ndarray_column(testName, prototype=testPrototype)\\n\",\n    \"testcol = co.columns[testName]\\n\",\n    \"\\n\",\n    \"pbar = tqdm(total=testData.shape[0])\\n\",\n    \"with testcol as tcol:\\n\",\n    \"    for gameIdx, gameData in enumerate(testData):\\n\",\n    \"        if (gameIdx % 500 == 0):\\n\",\n    \"            pbar.update(500)\\n\",\n    \"        tcol.append(gameData)\\n\",\n    \"pbar.close()\\n\",\n    \"\\n\",\n    \"co.commit('initial commit on master with test data')\\n\",\n    \"\\n\",\n    \"repo.create_branch('add-train')\\n\",\n    \"co.close()\\n\",\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"93000it [00:22, 4078.73it/s]                           \\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) : added training data on another branch\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 (\\u001B[1;31mmaster\\u001B[m) : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='add-train')\\n\",\n    \"\\n\",\n    \"co.add_ndarray_column(trainName, prototype=trainPrototype)\\n\",\n    \"traincol = co.columns[trainName]\\n\",\n    \"\\n\",\n    \"pbar = tqdm(total=trainData.shape[0])\\n\",\n    \"with traincol as trcol:\\n\",\n    \"    for gameIdx, gameData in enumerate(trainData):\\n\",\n    \"        if (gameIdx % 500 == 0):\\n\",\n    \"            pbar.update(500)\\n\",\n    \"        trcol.append(gameData)\\n\",\n    \"pbar.close()\\n\",\n    \"\\n\",\n    \"co.commit('added training data on another branch')\\n\",\n    \"co.close()\\n\",\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31mmaster\\u001B[m) : more changes here\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = repo.checkout(write=True, branch='master')\\n\",\n    \"co.metadata['earaea'] = 'eara'\\n\",\n    \"co.commit('more changes here')\\n\",\n    \"co.close()\\n\",\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Pushing to a Remote\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We will use the [API remote add()](api.rst#hangar.repository.Remotes.add) method to add a remote, however, this can also be done with the [CLI command](cli.rst#hangar-remote-add):\\n\",\n    \"\\n\",\n    \"    $ hangar remote add origin localhost:50051\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"RemoteInfo(name='origin', address='localhost:50051')\"\n      ]\n     },\n     \"execution_count\": 8,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.remote.add('origin', 'localhost:50051')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Pushing is as simple as running the [push()](api.rst#hangar.repository.Remotes.push) method\\n\",\n    \"from the [API](api.rst#hangar.repository.Remotes.push) or [CLI](cli.rst#hangar-push):\\n\",\n    \"\\n\",\n    \"    $ hangar push origin master\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Push the `master` branch:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"counting objects: 100%|██████████| 2/2 [00:00<00:00,  5.47it/s]\\n\",\n      \"pushing schemas: 100%|██████████| 1/1 [00:00<00:00, 133.74it/s]\\n\",\n      \"pushing data:  97%|█████████▋| 10001/10294 [00:01<00:00, 7676.23it/s]\\n\",\n      \"pushing metadata: 100%|██████████| 1/1 [00:00<00:00, 328.50it/s]\\n\",\n      \"pushing commit refs: 100%|██████████| 2/2 [00:00<00:00, 140.73it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'master'\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.remote.push('origin', 'master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Push the `add-train` branch:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"counting objects: 100%|██████████| 1/1 [00:01<00:00,  1.44s/it]\\n\",\n      \"pushing schemas: 100%|██████████| 1/1 [00:00<00:00, 126.05it/s]\\n\",\n      \"pushing data:  99%|█████████▉| 92001/92650 [00:12<00:00, 7107.60it/s] \\n\",\n      \"pushing metadata: 0it [00:00, ?it/s]\\n\",\n      \"pushing commit refs: 100%|██████████| 1/1 [00:00<00:00, 17.05it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'add-train'\"\n      ]\n     },\n     \"execution_count\": 10,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.remote.push('origin', 'add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Details of the Negotiation Processs\\n\",\n    \"\\n\",\n    \"> The following details are not necessary to use the system, but may be of interest to some readers\\n\",\n    \"\\n\",\n    \"When we push data, **we perform a negotation with the server** which basically occurs like this:\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"- Hi, I would like to push this branch, do you have it?\\n\",\n    \"\\n\",\n    \"  - If yes, what is the latest commit you record on it?\\n\",\n    \"\\n\",\n    \"    - Is that the same commit I'm trying to push? If yes, abort.\\n\",\n    \"\\n\",\n    \"    - Is that a commit I don't have? If yes, someone else has updated that branch, abort.\\n\",\n    \"\\n\",\n    \"- Here's the commit digests which are parents of my branches head, which commits are you missing?\\n\",\n    \"\\n\",\n    \"- Ok great, I'm going to scan through each of those commits to find the data hashes they contain. Tell me which ones you are missing.\\n\",\n    \"\\n\",\n    \"- Thanks, now I'll send you all of the data corresponding to those hashes. It might be a lot of data, so we'll handle this in batches so that if my connection cuts out, we can resume this later\\n\",\n    \"\\n\",\n    \"- Now that you have the data, I'm going to send the actual commit references for you to store, this isn't that much information, but you'll be sure to verify that I'm not trying to pull any funny buisness and send you incorrect data.\\n\",\n    \"\\n\",\n    \"- Now that you've received everything, and have verified it matches what I told you it is, go ahead and make those commits I've pushed `available` as the `HEAD` of the branch I just sent. It's some good work that others will want!\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"When we want to fetch updates to a branch, essentially the exact same thing happens in reverse. Instead of asking the server what it doesn't have, we ask it what it does have, and then request the stuff that we are missing!\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Partial Fetching and Clones\\n\",\n    \"\\n\",\n    \"**Now we will introduce one of the most important and unique features of Hangar remotes: Partial fetch/clone of data!**\\n\",\n    \"\\n\",\n    \"*There is a very real problem with keeping the full history of data - **it's huge**!* The size of data can very easily exceeds what can fit on (most) contributors laptops or personal workstations. This section explains how Hangar can handle working with columns which are prohibitively large to download or store on a single machine.\\n\",\n    \"\\n\",\n    \"As mentioned in High Performance From Simplicity, under the hood Hangar deals with “Data” and “Bookkeeping” completely separately. We’ve previously covered what exactly we mean by Data in How Hangar Thinks About Data, so we’ll briefly cover the second major component of Hangar here.\\n\",\n    \"In short “Bookkeeping” describes everything about the repository. By everything, we do mean that the Bookkeeping records describe everything: all commits, parents, branches, columns, samples, data descriptors, schemas, commit message, etc. Though complete, these records are fairly small (tens of MB in size for decently sized repositories with decent history), and are highly compressed for fast transfer between a Hangar client/server.\\n\",\n    \"\\n\",\n    \"A brief technical interlude\\n\",\n    \"\\n\",\n    \"> There is one very important (and rather complex) property which gives Hangar Bookeeping massive power: existence of some data piece is always known to Hangar and stored immutably once committed. However, the access pattern, backend, and locating information for this data piece may (and over time, will) be unique in every hangar repository instance.\\n\",\n    \">\\n\",\n    \"> Though the details of how this works is well beyond the scope of this document, the following example may provide some insight into the implications of this property:\\n\",\n    \">\\n\",\n    \"> If you clone some Hangar repository, Bookeeping says that “some number of data pieces exist” and they should retrieved from the server. However, the bookeeping records transfered in a fetch / push / clone operation do not include information about where that piece of data existed on the client (or server) computer. Two synced repositories can use completly different backends to store the data, in completly different locations, and it does not matter - Hangar only guarantees that when collaborators ask for a data sample in some checkout, that they will be provided with identical arrays, not that they will come from the same place or be stored in the same way. Only when data is actually retrieved the “locating information” is set for that repository instance.\\n\",\n    \"Because Hangar makes no assumptions about how/where it should retrieve some piece of data, or even an assumption that it exists on the local machine, and because records are small and completely describe history, once a machine has the Bookkeeping, it can decide what data it actually wants to materialize on its local disk! These partial fetch / partial clone operations can materialize any desired data, whether it be for a few records at the head branch, for all data in a commit, or for the entire historical data. A future release will even include the ability to stream data directly to a Hangar checkout and materialize the data in memory without having to save it to disk at all!\\n\",\n    \"\\n\",\n    \"More importantly: since Bookkeeping describes all history, merging can be performed between branches which may contain partial (or even no) actual data. Aka **you don’t need data on disk to merge changes into it.** It’s an odd concept which will be shown in this tutorial\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Cloning a Remote Repo\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"    $ hangar clone localhost:50051\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/context.py:94: UserWarning: No repository exists at /Users/rick/projects/tensorwerk/hangar/dev/dota-clone/.hangar, please use `repo.init()` method\\n\",\n      \"  warnings.warn(msg, UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo = Repository('/Users/rick/projects/tensorwerk/hangar/dev/dota-clone/')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"When we perform the initial clone, we will only receive the `master` branch by default.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"fetching commit data refs:   0%|          | 0/2 [00:00<?, ?it/s]\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/dota-clone/.hangar\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"fetching commit data refs: 100%|██████████| 2/2 [00:00<00:00,  5.73it/s]\\n\",\n      \"fetching commit spec: 100%|██████████| 2/2 [00:00<00:00, 273.30it/s]\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hard reset requested with writer_lock: 27634b20-3c5b-4ee0-aac3-b5ce6cb7daf0\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'master'\"\n      ]\n     },\n     \"execution_count\": 12,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.clone('rick izzo', 'rick@tensorwerk.com', 'localhost:50051', remove_old=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31morigin/master\\u001B[m) : more changes here\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['master', 'origin/master']\"\n      ]\n     },\n     \"execution_count\": 14,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.list_branches()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To get the `add-train` branch, we [fetch](api.rst#hangar.repository.Remotes.fetch) it from the remote:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"fetching commit data refs: 100%|██████████| 1/1 [00:01<00:00,  1.51s/it]\\n\",\n      \"fetching commit spec: 100%|██████████| 1/1 [00:00<00:00, 35.85it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'origin/add-train'\"\n      ]\n     },\n     \"execution_count\": 15,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.remote.fetch('origin', 'add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['master', 'origin/add-train', 'origin/master']\"\n      ]\n     },\n     \"execution_count\": 16,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.list_branches()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log(branch='origin/add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We will create a local branch from the `origin/add-train` branch, just like in `Git`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"BranchHead(name='add-train', digest='a=957d20e4b921f41975591cc8ee51a4a6912cb919')\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.create_branch('add-train', 'a=957d20e4b921f41975591cc8ee51a4a6912cb919')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['add-train', 'master', 'origin/add-train', 'origin/master']\"\n      ]\n     },\n     \"execution_count\": 19,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.list_branches()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log(branch='add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Checking out a Parial Clone/Fetch\\n\",\n    \"\\n\",\n    \"When we [fetch](api.rst#hangar.repository.Remotes.fetch)/[clone](api.rst#hangar.repository.Repository.clone), the transfers are very quick, because only the commit records/history were retrieved. The data was not sent, because it may be very large to get the entire data across all of history.\\n\",\n    \"\\n\",\n    \"When you check out a commit with partial data, you will be shown a warning indicating that some data is not available locally. An error is raised if you try to access that particular sample data. Otherwise, everything will appear as normal.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" * Checking out BRANCH: master with current HEAD: a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/constructors.py:45: UserWarning: Column: test contains `reference-only` samples, with actual data residing on a remote server. A `fetch-data` operation is required to access these samples.\\n\",\n      \"  f'operation is required to access these samples.', UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = cloneRepo.checkout(branch='master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar ReaderCheckout                \\n\",\n       \"    Writer       : False                \\n\",\n       \"    Commit Hash  : a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d                \\n\",\n       \"    Num Columns  : 1                \\n\",\n       \"    Num Metadata : 1\\n\"\n      ]\n     },\n     \"execution_count\": 22,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"we can see from the `repr` that the columns contain partial remote references\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : False                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - test / True\"\n      ]\n     },\n     \"execution_count\": 23,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleReader                 \\n\",\n       \"    Column Name              : test                \\n\",\n       \"    Writeable                : False                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (117,)                \\n\",\n       \"    Number of Samples        : 10294                \\n\",\n       \"    Partial Remote Data Refs : True\\n\"\n      ]\n     },\n     \"execution_count\": 24,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['test']\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"testKey = next(co.columns['test'].keys())\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"ename\": \"FileNotFoundError\",\n     \"evalue\": \"data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.\",\n     \"output_type\": \"error\",\n     \"traceback\": [\n      \"\\u001B[0;31m---------------------------------------------------------------------------\\u001B[0m\",\n      \"\\u001B[0;31mFileNotFoundError\\u001B[0m                         Traceback (most recent call last)\",\n      \"\\u001B[0;32m<ipython-input-26-cb069e761eb3>\\u001B[0m in \\u001B[0;36m<module>\\u001B[0;34m\\u001B[0m\\n\\u001B[0;32m----> 1\\u001B[0;31m \\u001B[0mco\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mcolumns\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0;34m'test'\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mtestKey\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/layout_flat.py\\u001B[0m in \\u001B[0;36m__getitem__\\u001B[0;34m(self, key)\\u001B[0m\\n\\u001B[1;32m    222\\u001B[0m         \\\"\\\"\\\"\\n\\u001B[1;32m    223\\u001B[0m         \\u001B[0mspec\\u001B[0m \\u001B[0;34m=\\u001B[0m \\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_samples\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mkey\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m--> 224\\u001B[0;31m         \\u001B[0;32mreturn\\u001B[0m \\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_be_fs\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mspec\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mbackend\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mread_data\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mspec\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    225\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    226\\u001B[0m     \\u001B[0;32mdef\\u001B[0m \\u001B[0mget\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mself\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mkey\\u001B[0m\\u001B[0;34m:\\u001B[0m \\u001B[0mKeyType\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mdefault\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0;32mNone\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/backends/remote_50.py\\u001B[0m in \\u001B[0;36mread_data\\u001B[0;34m(self, hashVal)\\u001B[0m\\n\\u001B[1;32m    172\\u001B[0m     \\u001B[0;32mdef\\u001B[0m \\u001B[0mread_data\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mself\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mhashVal\\u001B[0m\\u001B[0;34m:\\u001B[0m \\u001B[0mREMOTE_50_DataHashSpec\\u001B[0m\\u001B[0;34m)\\u001B[0m \\u001B[0;34m->\\u001B[0m \\u001B[0;32mNone\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    173\\u001B[0m         raise FileNotFoundError(\\n\\u001B[0;32m--> 174\\u001B[0;31m             \\u001B[0;34mf'data hash spec: {REMOTE_50_DataHashSpec} does not exist on this machine. '\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    175\\u001B[0m             f'Perform a `data-fetch` operation to retrieve it from the remote server.')\\n\\u001B[1;32m    176\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;31mFileNotFoundError\\u001B[0m: data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co.columns['test'][testKey]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Fetching Data from a Remote\\n\",\n    \"\\n\",\n    \"To retrieve the data, we use the [fetch_data()](api.rst#hangar.repository.Remotes.fetch_data)\\n\",\n    \"method (accessible via the [API](api.rst#hangar.repository.Remotes.fetch_data) or\\n\",\n    \"[fetch-data](cli.rst#hangar-fetch-data) via the CLI).\\n\",\n    \"\\n\",\n    \"The amount / type of data to retrieve is extremly configurable via the following options:\\n\",\n    \"\\n\",\n    \".. include:: ./noindexapi/apiremotefetchdata.rst\\n\",\n    \"\\n\",\n    \"This will retrieve all the data on the `master` branch, but not on the `add-train` branch.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"counting objects: 100%|██████████| 1/1 [00:00<00:00, 27.45it/s]\\n\",\n      \"fetching data: 100%|██████████| 10294/10294 [00:01<00:00, 6664.60it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d']\"\n      ]\n     },\n     \"execution_count\": 29,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.remote.fetch_data('origin', branch='master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" * Checking out BRANCH: master with current HEAD: a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = cloneRepo.checkout(branch='master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar ReaderCheckout                \\n\",\n       \"    Writer       : False                \\n\",\n       \"    Commit Hash  : a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d                \\n\",\n       \"    Num Columns  : 1                \\n\",\n       \"    Num Metadata : 1\\n\"\n      ]\n     },\n     \"execution_count\": 31,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Unlike before, we see that there is no partial references from the `repr`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 32,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : False                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - test / False\"\n      ]\n     },\n     \"execution_count\": 32,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 33,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleReader                 \\n\",\n       \"    Column Name              : test                \\n\",\n       \"    Writeable                : False                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (117,)                \\n\",\n       \"    Number of Samples        : 10294                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 33,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns['test']\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"***When we access the data this time, it is available and retrieved as requested!***\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 34,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([255, 223,   8,   2,   0, 255,   0,   0,   0,   0,   0,   0,   1,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0, 255,   0,   0,   0,   0,   1,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0],\\n\",\n       \"      dtype=uint8)\"\n      ]\n     },\n     \"execution_count\": 34,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co['test', testKey]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 35,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Working with mixed local / remote checkout Data\\n\",\n    \"\\n\",\n    \"If we were to checkout the `add-train` branch now, we would see that there is no `arrayset \\\"train\\\"` data, but there will be data common to the ancestor that `master` and `add-train` share.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 36,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log('add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"In this case, the common ancestor is commit: `9b93b393e8852a1fa57f0170f54b30c2c0c7d90f`\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To show that there is no data on the `add-train` branch:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 37,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" * Checking out BRANCH: add-train with current HEAD: a=957d20e4b921f41975591cc8ee51a4a6912cb919\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/constructors.py:45: UserWarning: Column: train contains `reference-only` samples, with actual data residing on a remote server. A `fetch-data` operation is required to access these samples.\\n\",\n      \"  f'operation is required to access these samples.', UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co = cloneRepo.checkout(branch='add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 38,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar ReaderCheckout                \\n\",\n       \"    Writer       : False                \\n\",\n       \"    Commit Hash  : a=957d20e4b921f41975591cc8ee51a4a6912cb919                \\n\",\n       \"    Num Columns  : 2                \\n\",\n       \"    Num Metadata : 0\\n\"\n      ]\n     },\n     \"execution_count\": 38,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 39,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : False                \\n\",\n       \"    Number of Columns : 2                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - test / False\\n\",\n       \"      - train / True\"\n      ]\n     },\n     \"execution_count\": 39,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 40,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([255, 223,   8,   2,   0, 255,   0,   0,   0,   0,   0,   0,   1,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0, 255,   0,   0,   0,   0,   1,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,\\n\",\n       \"         0,   0,   0, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0],\\n\",\n       \"      dtype=uint8)\"\n      ]\n     },\n     \"execution_count\": 40,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"co['test', testKey]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 41,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"trainKey = next(co.columns['train'].keys())\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 42,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"ename\": \"FileNotFoundError\",\n     \"evalue\": \"data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.\",\n     \"output_type\": \"error\",\n     \"traceback\": [\n      \"\\u001B[0;31m---------------------------------------------------------------------------\\u001B[0m\",\n      \"\\u001B[0;31mFileNotFoundError\\u001B[0m                         Traceback (most recent call last)\",\n      \"\\u001B[0;32m<ipython-input-42-549d3e1dc7a1>\\u001B[0m in \\u001B[0;36m<module>\\u001B[0;34m\\u001B[0m\\n\\u001B[0;32m----> 1\\u001B[0;31m \\u001B[0mco\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mcolumns\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0;34m'train'\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mtrainKey\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/layout_flat.py\\u001B[0m in \\u001B[0;36m__getitem__\\u001B[0;34m(self, key)\\u001B[0m\\n\\u001B[1;32m    222\\u001B[0m         \\\"\\\"\\\"\\n\\u001B[1;32m    223\\u001B[0m         \\u001B[0mspec\\u001B[0m \\u001B[0;34m=\\u001B[0m \\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_samples\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mkey\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0;32m--> 224\\u001B[0;31m         \\u001B[0;32mreturn\\u001B[0m \\u001B[0mself\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0m_be_fs\\u001B[0m\\u001B[0;34m[\\u001B[0m\\u001B[0mspec\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mbackend\\u001B[0m\\u001B[0;34m]\\u001B[0m\\u001B[0;34m.\\u001B[0m\\u001B[0mread_data\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mspec\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    225\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    226\\u001B[0m     \\u001B[0;32mdef\\u001B[0m \\u001B[0mget\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mself\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mkey\\u001B[0m\\u001B[0;34m:\\u001B[0m \\u001B[0mKeyType\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mdefault\\u001B[0m\\u001B[0;34m=\\u001B[0m\\u001B[0;32mNone\\u001B[0m\\u001B[0;34m)\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;32m~/projects/tensorwerk/hangar/hangar-py/src/hangar/backends/remote_50.py\\u001B[0m in \\u001B[0;36mread_data\\u001B[0;34m(self, hashVal)\\u001B[0m\\n\\u001B[1;32m    172\\u001B[0m     \\u001B[0;32mdef\\u001B[0m \\u001B[0mread_data\\u001B[0m\\u001B[0;34m(\\u001B[0m\\u001B[0mself\\u001B[0m\\u001B[0;34m,\\u001B[0m \\u001B[0mhashVal\\u001B[0m\\u001B[0;34m:\\u001B[0m \\u001B[0mREMOTE_50_DataHashSpec\\u001B[0m\\u001B[0;34m)\\u001B[0m \\u001B[0;34m->\\u001B[0m \\u001B[0;32mNone\\u001B[0m\\u001B[0;34m:\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[1;32m    173\\u001B[0m         raise FileNotFoundError(\\n\\u001B[0;32m--> 174\\u001B[0;31m             \\u001B[0;34mf'data hash spec: {REMOTE_50_DataHashSpec} does not exist on this machine. '\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0;34m\\u001B[0m\\u001B[0m\\n\\u001B[0m\\u001B[1;32m    175\\u001B[0m             f'Perform a `data-fetch` operation to retrieve it from the remote server.')\\n\\u001B[1;32m    176\\u001B[0m \\u001B[0;34m\\u001B[0m\\u001B[0m\\n\",\n      \"\\u001B[0;31mFileNotFoundError\\u001B[0m: data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"co.columns['train'][trainKey]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 43,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Merging Branches with Partial Data\\n\",\n    \"\\n\",\n    \"Even though we don't have the actual data references in the `add-train` branch, it is still possible to merge the two branches!\\n\",\n    \"\\n\",\n    \"This is possible because Hangar doesn't use the data contents in its internal model of checkouts / commits, but instead thinks of a checkouts as a sequence of columns / metadata / keys & their associated data hashes (which are very small text records; ie. \\\"bookkeeping\\\"). To show this in action, lets merge the two branches `master` (containing all data locally) and `add-train` (containing partial remote references for the `train` arrayset) together and push it to the Remote!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 44,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31morigin/master\\u001B[m) : more changes here\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log('master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 45,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log('add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**Perform the Merge**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 46,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected 3-Way Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=ace3dacbd94f475664ee136dcf05430a2895aca3'\"\n      ]\n     },\n     \"execution_count\": 46,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.merge('merge commit here', 'master', 'add-train')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**IT WORKED!**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 47,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (\\u001B[1;31mmaster\\u001B[m) : merge commit here\\n\",\n      \"\\u001B[1;31m|\\u001B[m\\u001B[1;32m\\\\\\u001B[m  \\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31morigin/master\\u001B[m) : more changes here\\n\",\n      \"\\u001B[1;32m|\\u001B[m * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"\\u001B[1;32m|\\u001B[m\\u001B[1;32m/\\u001B[m  \\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We can check the summary of the master commit to check that the contents are what we expect (containing both `test` and `train` columns)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 48,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/dota-clone \\n\",\n      \"|  Disk Usage: 42.03 MB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=ace3dacbd94f475664ee136dcf05430a2895aca3 \\n\",\n      \"|  Created: Tue Feb 25 19:18:30 2020 \\n\",\n      \"|  By: rick izzo \\n\",\n      \"|  Email: rick@tensorwerk.com \\n\",\n      \"|  Message: merge commit here \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 2 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"test\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 10294 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (117,) \\n\",\n      \"|    - dtype: uint8 \\n\",\n      \"|    - backend: 10 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"train\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 92650 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (117,) \\n\",\n      \"|    - dtype: uint16 \\n\",\n      \"|    - backend: 10 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Metadata: \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Keys: 1 \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Pushing the Merge back to the Remote\\n\",\n    \"\\n\",\n    \"To push this merge back to our original copy of the Repository (`repo`), we just push the `master` branch back to the remote via the API or CLI.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 49,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"counting objects: 100%|██████████| 1/1 [00:00<00:00,  1.02it/s]\\n\",\n      \"pushing schemas: 0it [00:00, ?it/s]\\n\",\n      \"pushing data: 0it [00:00, ?it/s]\\n\",\n      \"pushing metadata: 0it [00:00, ?it/s]\\n\",\n      \"pushing commit refs: 100%|██████████| 1/1 [00:00<00:00, 34.26it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'master'\"\n      ]\n     },\n     \"execution_count\": 49,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"cloneRepo.remote.push('origin', 'master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Looking at our current state of our other instance of the repo `repo` we see that the merge changes aren't yet propogated to it (since it hasn't fetched from the remote yet).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 50,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31morigin/master\\u001B[m) : more changes here\\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To fetch the merged changes, just [fetch()](api.rst#hangar.repository.Remotes.fetch) the branch as normal. Like all fetches, this will be a fast operation, as it will be a `partial fetch` operation, not actually transfering the data.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 51,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"fetching commit data refs: 100%|██████████| 1/1 [00:01<00:00,  1.33s/it]\\n\",\n      \"fetching commit spec: 100%|██████████| 1/1 [00:00<00:00, 37.61it/s]\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'origin/master'\"\n      ]\n     },\n     \"execution_count\": 51,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.remote.fetch('origin', 'master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 52,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (\\u001B[1;31morigin/master\\u001B[m) : merge commit here\\n\",\n      \"\\u001B[1;31m|\\u001B[m\\u001B[1;32m\\\\\\u001B[m  \\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (\\u001B[1;31mmaster\\u001B[m) : more changes here\\n\",\n      \"\\u001B[1;32m|\\u001B[m * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"\\u001B[1;32m|\\u001B[m\\u001B[1;32m/\\u001B[m  \\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log('origin/master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To bring our `master` branch up to date is a simple fast-forward merge.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 53,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Selected Fast-Forward Merge Strategy\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=ace3dacbd94f475664ee136dcf05430a2895aca3'\"\n      ]\n     },\n     \"execution_count\": 53,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.merge('ff-merge', 'master', 'origin/master')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 54,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (\\u001B[1;31mmaster\\u001B[m) (\\u001B[1;31morigin/master\\u001B[m) : merge commit here\\n\",\n      \"\\u001B[1;31m|\\u001B[m\\u001B[1;32m\\\\\\u001B[m  \\n\",\n      \"* \\u001B[1;32m|\\u001B[m a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d : more changes here\\n\",\n      \"\\u001B[1;32m|\\u001B[m * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (\\u001B[1;31madd-train\\u001B[m) (\\u001B[1;31morigin/add-train\\u001B[m) : added training data on another branch\\n\",\n      \"\\u001B[1;32m|\\u001B[m\\u001B[1;32m/\\u001B[m  \\n\",\n      \"* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**Everything is as it should be!** Now, try it out for yourself!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 55,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/intro \\n\",\n      \"|  Disk Usage: 77.43 MB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=ace3dacbd94f475664ee136dcf05430a2895aca3 \\n\",\n      \"|  Created: Tue Feb 25 19:18:30 2020 \\n\",\n      \"|  By: rick izzo \\n\",\n      \"|  Email: rick@tensorwerk.com \\n\",\n      \"|  Message: merge commit here \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 2 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"test\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 10294 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (117,) \\n\",\n      \"|    - dtype: uint8 \\n\",\n      \"|    - backend: 10 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"train\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 92650 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (117,) \\n\",\n      \"|    - dtype: uint16 \\n\",\n      \"|    - backend: 10 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Metadata: \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Keys: 1 \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.3\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/Tutorial-Dataset.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"CQhd0TTQCMeh\"\n   },\n   \"source\": [\n    \"## Dataloaders for Machine Learning (Tensorflow & PyTorch)\\n\",\n    \"\\n\",\n    \"This tutorial acts as a step by step guide for fetching, preprocessing, storing and loading the [MS-COCO](http://cocodataset.org/#home) dataset for image captioning using deep learning. We have chosen **image captioning** for this tutorial not by accident. For such an application, the dataset required will have both fixed shape (image) and variably shaped (caption because it's sequence of natural language) data. This diversity should help the user to get a mental model about how flexible and easy is to plug Hangar to the existing workflow.\\n\",\n    \"\\n\",\n    \"You will use the MS-COCO dataset to train our model. The dataset contains over 82,000 images, each of which has at least 5 different caption annotations.\\n\",\n    \"\\n\",\n    \"This tutorial assumes you have downloaded and extracted the [MS-COCO dataset](http://cocodataset.org/#home) in the current directory. If you haven't yet, shell commands below should help you do it (beware, it's about 14 GB data). If you are on Windows, please find the equivalent commands to get the dataset downloaded.\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"wget http://images.cocodataset.org/zips/train2014.zip\\n\",\n    \"unzip train2014.zip\\n\",\n    \"rm train2014.zip\\n\",\n    \"wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip\\n\",\n    \"unzip annotations_trainval2014.zip\\n\",\n    \"rm annotations_trainval2014.zip\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Let's install the required packages in our environment. We will be using Tensorflow 1.14 in this tutorial but it should work in all the Tensorflow versions starting from 1.12. But do let us know if you face any hiccups. Install below-given packages before continue. Apart from Tensorflow and Hangar, we use [SpaCy](https://spacy.io/) for pre-processing the captions. SpaCy is probably the most widely used natural language toolkit now.\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"tensorflow==1.14.0\\n\",\n    \"hangar\\n\",\n    \"spacy==2.1.8\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"One more thing before jumping into the tutorial: we need to download the SpaCy English model `en_core_web_md` which cannot be dynamically loaded. Which means that it must be downloaded with the below command outside this runtime and should reload this runtime.\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"python -m spacy download en_core_web_md\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Once all the dependencies are installed and loaded, we can start building our hangar repository.\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"### Hangar Repository creation and column init\\n\",\n    \"We will create a repository and initialize one column named `images` now for a quick demo of how Tensorflow dataloader work. Then we wipe the current repository and create new columns for later portions.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"HGXOwLJ3IWPq\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"repo_path = 'hangar_repo'\\n\",\n    \"username = 'hhsecond'\\n\",\n    \"email = 'sherin@tensorwerk.com'\\n\",\n    \"img_shape = (299, 299, 3)\\n\",\n    \"image_dir = '/content/drive/My Drive/train2014'\\n\",\n    \"annotation_file = ''\\n\",\n    \"import logging\\n\",\n    \"logging.getLogger(\\\"tensorflow\\\").setLevel(logging.ERROR)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"fHehOEhwCMej\",\n    \"outputId\": \"210f9b87-9c59-49ea-fd31-92ba18d140b3\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: hangar_repo/.hangar\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"import os\\n\",\n    \"from hangar import Repository\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"tf.compat.v1.enable_eager_execution()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"if not os.path.isdir(repo_path):\\n\",\n    \"    os.mkdir(repo_path)\\n\",\n    \"\\n\",\n    \"repo = Repository(repo_path)\\n\",\n    \"repo.init(user_name=username, user_email=email, remove_old=True)\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"\\n\",\n    \"images_column = co.add_ndarray_column('images', shape=img_shape, dtype=np.uint8,)\\n\",\n    \"co.commit('column init')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"QENDY8LvGGhb\"\n   },\n   \"source\": [\n    \"### Add sample images\\n\",\n    \"Here we add few images to the repository and show how we can load this data as Tensorflow dataloader. We use the idea we learn here in the later portions to build a fully fledged training loop.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"g61tY81hHr8c\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from PIL import Image\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"images_column = co.columns['images']\\n\",\n    \"try:\\n\",\n    \"    for i, file in enumerate(os.listdir(image_dir)):\\n\",\n    \"        pil_img = Image.open(os.path.join(image_dir, file))\\n\",\n    \"        if pil_img.mode == 'L':\\n\",\n    \"            pil_img = pil_img.convert('RGB')\\n\",\n    \"        img = pil_img.resize(img_shape[:-1])\\n\",\n    \"        img = np.array(img)\\n\",\n    \"        images_column[i] = img\\n\",\n    \"        if i != 0 and i % 2 == 0:  # stopping at 2th image\\n\",\n    \"            break\\n\",\n    \"except Exception as e:\\n\",\n    \"    print('Exception', e)\\n\",\n    \"    co.close()\\n\",\n    \"    raise e\\n\",\n    \"co.commit('added image')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"dvFci5P8Lm7C\"\n   },\n   \"source\": [\n    \"### Let's make a Tensorflow dataloader\\n\",\n    \"Hangar provides `make_numpy_dataset`, `make_tensorflow_dataset` & `make_torch_dataset` for creating Tensorflow & PyTorch datasets from Hangar columns. You can read more about it in the [documentation](https://hangar-py.readthedocs.io/en/latest/api.html#ml-framework-dataloaders). Next we'll make a Tensorflow dataset and loop over it to make sure we have got a proper Tensorflow dataset.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"Sc7XGXMVLuDO\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar.dataset import make_tensorflow_dataset\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 357\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"tb5g_JrJVbqT\",\n    \"outputId\": \"a8fe4e7d-243d-4dae-dc94-66364342a913\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" * Checking out BRANCH: master with current HEAD: b769f6d49a7dbb3dcd4f7c6e1c2a32696fd4128f\\n\",\n      \"<class 'hangar.columns.arrayset.ArraysetDataReader'>(repo_pth=hangar_repo/.hangar, aset_name=images, default_schema_hash=b6edf0320f20, isVar=False, varMaxShape=(299, 299, 3), varDtypeNum=2, mode=r)\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/usr/local/lib/python3.6/dist-packages/hangar/dataloaders/tfloader.py:88: UserWarning: Dataloaders are experimental in the current release.\\n\",\n      \"  warnings.warn(\\\"Dataloaders are experimental in the current release.\\\", UserWarning)\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAQUAAAD8CAYAAAB+fLH0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzsvGmQJFd57/3LrKx937eu3rfp7pme\\nTSPNIo1GGgkhWUiAAGFWs5nF4Gtsv8Zm032xMS9gVoMBGxAYLEAgJCEb7RrNII00+9I93T3dPb13\\nVXXte1ZVLvfD6BLXEe8N60ZYcXHE/L5lxsk85zyZzz+f85yTR9B1nStc4QpX+J+I/7cbcIUrXOF3\\niyuicIUrXOHfcUUUrnCFK/w7rojCFa5whX/HFVG4whWu8O+4IgpXuMIV/h2vmCgIgnCLIAgzgiDM\\nCYLwsVeqnitc4Qr/uQivxDoFQRAMwEXgJmAVOA68Wdf1C//plV3hClf4T+WVihR2AXO6rl/Sdb0F\\n/AS44xWq6wpXuMJ/ItIrdN84sPK/HK8CV//vCjtsFj0ScAKgKCq6rqHrOoIooCgqJpOZdquFIIqY\\njEYEQUDXdVRVQ9N0DAYR0Gm3FcwmM7V6HZfLgdyUMRnNoGvogCCItFtNTGYjrVaLdruNZDACOqIo\\nIhpEJMmILDcRRQNGoxFFaWMymahWa9hsNhSliarqgIhBlBAMOu12G4MoIRkN6JpOq9XCbLEiCgKS\\nwUCxVMTpdKJqbQTdQKvdxGCQMEgGVFVDEAQMBgPtdguTyYyIgChKNOTaS2VULGYz7XYbo0FCF0Uq\\nxQJOhwMNAUEU0TQdAWgrCibT5T5dvq8RVdPQNQ1dAMlgAAR0VaUhNzAYDBiky6+BZJAQBBB1kVRq\\nFUHT8IRDSCYbqAoIICKCeNn+mqaiaRq6pmMymzAYDMiyjFGS0AH0y21AAHQBQeByH4xGGnIDk9mM\\nKBgwGAxoukq71UKSTJfbIhmoVauYTGYsZivLyyuUqjX6+zuRDCYUpY0kGQFotZpYrdbfPlNN0ZAk\\nCdEgIYgGjEYQMbxkB5nV1TTxrhi6BqJgQGm3kYwGAFT18rUgoLSbNJtNLBYHgqih6yCI6uUyBglU\\ngUw2i9PtwmIxo2kaICK+1Od2u4UkSfzPaFwUL9u32ZQxGk0IgoCqKAiiADqomvbbPl1+frx07eW6\\nLxsSBAFURb38zigKJpMJTdPQNA1JMiEKIqqmIggaqqaiaxqVSpmVVCWr63rwP3LeV0oU/kMEQXgf\\n8D4Aj9PC33zkdiqVEpIksbywitlqIxiNMX9plo5YGKWtUa3WcLvd2O12pqYuYDCYuP32W/nM//sp\\n3vWu97C0tEIoGGdpaZ5Go4HH60TVFUaGN6EoGtVag2x6DaPRSH/f4EsW0HjooYfo7u6mJ9HHt751\\nL5/5yqeZnLhAOp3huv27KRRy+P1BctkycrOErguEwj6qlRZNWcFsNpIrFLGYvPgDDowWM8nkGmq7\\nTSTg5/zFJTriYbS2Rmb2LIV8g46BUbo3deBweUmn0zgsZk6dOMHpE+cwqSJvevMbCXeHkdUmiga1\\nhozfebk/Ykulpmvkk2keu/8BbrjlIN5IkEqzQaupoIlGOiJRAoEA7XaTZrNNrVaiXK7idnuJRyO0\\n5TImm5NCoYDT5+DE8VPs2bkdo9HIT771j7gjO9h8zVZ+8MWvs//mLWSLBm55/R1MTp1n55YttIBz\\n58+wc8suCuUSkxfOcN3+G1heXGJkZBMzU1PYbDZUtc3p0yfp6enD7w3Q1dXFgw8+yGvfcBeVSgXR\\nABPnL+BwuRAFiUQiQWYjhaqrBII+jGaRoe1v4guf/APe9+nP8dcfeDcOW5DBER/btm0jl5Eplkr0\\n9XZjNBqZuzhF0O1mZX2e3pEB2oqA39PNn3/oY1xaneajf/QeLHYLs+urZFeT/P5b309Xb4yNTJJc\\nuoLFYqPWqGMxGTHbdSSrkdZahUDEwyOPPc3pYyfZvf9abjx4PWanhKgbQBU5fPgwbU1l+/YtaJpG\\nq9UC0YDdbqeQy2ISBQqVKh2JGIV8CY/HRzabJxhyUyyUsdkcpNMZAgEfitJCVVUi0SDlSpHcRoZE\\nZz9ra2na7SbRaBS1VcPhdKLrOtOzFwkEglQqFUKhIOvJVfK5Itu3byeXy+H2OGk0GvTued/Sy/HN\\nV2r4sAYk/pfjjpfO/RZd17+j6/pOXdd3OmwmlpZWkOUW2WweXVex2a1ks1li0Q7QRVwuN9FonFKp\\ngslkwuPxoGkaU9OTHDhwIzPTc9RqNWq1CkaTBIKO1WrF5/NRKBRot9ts3boVu92OyWihVKogmS2U\\nijXuev3dtFsqPp+PW24+QDK5RiwWY3Cwn2q1iihKJJNpNFXEbLbidLopl2QETFjNEucnzqEoCugK\\ntVKRarlIo9qgp6Mbv8NPOBKn2mgRCAfYSGZoNmWcLjOSRSSZSdPWdboGBrj2wH7e/Pa72bt3L/98\\n7485e+okrWYdm0UivbqOrus4HA4mpy5QLhTxeFxs3baZp554lIDHjd3mxGixYrfbOfLccyiKgqZp\\n+Hw+isUifX09KK0mVrORSqVCq9Wi1VZBFLjuwD5WVhewmU2szyxy5+vuYmBolPf/+Z+wY88efvmL\\nXxDyB+hNdFNr1Dl3/gwel5PJyUl8Pg9btowxMTGBw2FnYvIcmqaQz2ep1Spce91efD4P8XicCxcu\\nEIlEmDh3jpYsk0mlGRsbQ9d1ItEQL7xwmGKxwOzMLKm1NBbFTFg08tZ3voND3/8B1123jze+6XVs\\nHd9FLtuk1qgSjgQAkGWZcDiK0+mkr3sU2lZEwYQn0cfy8gXu+8H3KWbT7D6wl3e+9Z3cfPMN/Nuv\\nHuHi1Dwuu49GrYbf7ycUCeO021HbGrFglJ8+8HPuuP1dfP3r3+Mzf/0xXn37zWSLKXKFHBfnZkmn\\nM9x4400EA2EWly6RSq/TbDYxm600Gk28Li/JZJKBgSFy2QJGo5FCIYfL5SCdymE2W6lWy4iihs3m\\nQJZbWK1W6vU6RulyhFipVEgkEjidTkCjVm9y/vwElUqVRCyOJIgEfX5MEvg9broScWamJjFJImqr\\nSa1ceNnO+0olGiUuJxpv5LIYHAd+X9f1yf+/8t0xr/7e27fjdDqp1+tUKyk8viDBaIJ2S8DvdZLN\\nZjGbrKyuruJ0Omm2GkSjIeqNMi57AFluEYtFOHXmML193WgqVCp1Ep2dHHr6Wfr7B9F0gdTaAsFA\\nnPVUmgM3HiCd3kCWWyhtmXg4QC6Xw2A2oaoqTqeX1ZU1cvkUb3jj67g0t0E4HObkiXMMb+pBURSM\\nYpH5xVVC4R5sJgMbhQoSArVaDV8oyIlzZ7hq21XIlSY2pUTZYCHWGaSUXGPq/AXc/iE6+4fweFws\\nrszRquXxRH2IZoGIOc6n/9tfIjcUPvf3X6BUy5Mr5xke3ITf7eLEiRdJr13CoMFGusANd76ZuaUF\\nevp6ELQ2WlvB4fAwMzXN0EgvhVyero4uzpw9wdaxLZw4ewHRYCLeE6fVahFz2Lg4d4lf/sPf8+Gv\\nfxmn5iCn5nj0Z7/kjz/6J/zxu/+Y/+cT91DUKnR2dtCSG9QbGk25hskM7baEprZxOu2YDBKFYg7Q\\nsNttrK8naVYVTEYLTqeTRG838/OzBAIByrUqPn+QdDpNR9RDS24hNwQcDjuSw8zJI4cJhHsJRSI0\\nlSxWsw9dqOHxxFEVA81mAwEFWW7QkQixeHEGm8WKy+fF43Vz/b7X8s2vf5FqI89vDr/IzXfcic0s\\nYjE7UdpNTKY6jUaNp546xY5rdjKyZYTUUop7/uLzZHNlfnXyB2SXNliYm8cT7cZmMlEuZmnKKi67\\nh+9/59u8/R3vQBcFjOY2ug5GyUpT0SmXCgz39VEpZpGsPhS1iSCoyLKMpisoLYlQ2Icg6ChqE7ku\\noOkKyeQa1VqRSCRMdmOd8W27XhIQI1abGb1toFotk06n8Xhd+P2XP5KlcoZAIECj0WZ1OYPL7cBs\\nlkill9n7xr85qev6zv/If1+RSEHXdQX4I+AxYAr42f9OEODyWCsa62DT6Ag9fb0MbtpGIjGE0+qn\\nXi2ycGmNQqFAo9FEEAwoioIgCNhdLtRmgwsT50itL3Hk0LMU8y0k0c6FqTkkg4mgL87evdciGtro\\nuozDYcfj8bJvzx7m56bxe324XXZqlSyNZg2L00JPVxcOWxhZ1ojG3cSjUc6dmqJSlZFl6Or0k0yt\\nogsa5ZqR/sFttHUdg8mO0WCi2VYJhhJ4XSFG+odZW04jGS14+sZweXzUqyqbd+ylY2CEzr4YDqeR\\nQ089ik2S6O3qZ6RnjHpaoa42ufP9d/OeT34QzS6QTa/z8H33c+jBB3nuiccopLMoeAl1D7P/lps4\\n+fwLbN80hkWwEo91E48n8Pu9DI8MITcUeru6OXfuDF5/mI1Cle1bx1hdmqNZlUlEYigGC5VSnt7B\\nQZR6i/VclrXZJB/68Af50he+yPv+5MMk+jtZnbqEUteYnbpIuVik1VSQZQG320M0EkcXoFGvYxAk\\n4rEu0skcEX+cnp4uOrpjBCNBVlYWiMUi2GwWQn4fQZ+DaMhFNieTLykEO0J0DXZx/OgJbr3pIE6/\\nke9+66u4PSEEg0C7aURptWk1ywQCPgTquH0SK8k1bA4Hjz/zOF/58r20WhJ/99W/xhGwMjQySjjh\\npqcvRktXyeQWaDQzZHMNesYPkC81+eLffpF7v/UPdPWGueXOq3jNm64hP6dQ2JAJ+CNYkdFbFQSt\\njdNqx+d1c/fdb+DZQ8/gd7sRjWY8Xj+IOiZJIeB30lSa2Lwu5i6d459/9F1EUcTpdGIx26jJZUwW\\nK+vrG6CbSafTNOsN+nq68Lq8lIp1dNGO3Lj80VMUBVUR0DCgqA18ASs2h4aADVmW0QUzHm+cekMn\\nU1zH4bFispkYGdnysv33FYkU/k9JRLz65/70TRiNRvL5LJ2JMEefP0Ys1kEk5uKpJw+x+5prEQUr\\nui5Qr5cJhjw0ag2aSoVmXWcjuYzb7eXY8VO86tZb0HUVVWtiMzlotNoEQy5MBiOqqjM5cRaX243F\\n5qanr5fkepqAL0SzUWN+8RJ2ixmXx0+5WiOzscJg3zCNRo1wvIOFhUWS62vs3befUKSD9ZV1BINE\\nW1WQizlcXgeNRhOH3QPAwqUpduzeR2ojgyAItOQGaqtNu90mHAuzvJrEZnUiahp2s4lcZoNAIMB6\\nMoPD62agv5O15DoWm4tqqYjdZqNeyfKNr34NuQrf/dGPOXbqKLVyhuzKOjomovFOPF0xOjs7+N63\\nvsOffPTP0DSNWjnDs88eZt+1B5mZmWFkdJBquYjTE8FgNpFaT/LkAz9nfGiQ0Gg/WhO2bt3KenqV\\npx9/nN3XXMfw2GYefvBB9u25FlVqM3lhHq/Xhy/ooVSs4PN40TSN2ZkpNm8e5aGHHuKNr3s9+XyW\\nTWNbmZqeYmpqhv6hXlRVxe/xk81mCYfDOBwOnnzmaW6+6RbUpkwmm+TGgx/kzz54gP0H7yDRGaeh\\niShqi+XFRQ4fOcT+/fuJhoIYTEYMRhPhUBSt1WbT4E386t/+AZfTSyAcYvL8KYYHNzEze4p4Ty+S\\n6ABRoF4p8/rb3s3WreP8048+yeLSHHarj0ymhaZWCfojTFx4HgQJjy9KJDiA0SLgcTuZmjqPomh0\\nRrtZXl7i0qU5duzbBYDVbEGSJMrlIq22TCTcgdpWqNdl/L4gP/nJT7nllltYSy0RjycI+gNMTEwQ\\ni0VwWB3Iskw+nyUYCVIuVzFLRsrlMoFAAIvFwtmJ5wiGutBUCZMZrJKFaqOIw+0iub6BxWxD09tE\\no1GKxRK1Wo3NN3/k/16k8H+KUTJSKuTIbqyzvHiJudklXE4fVoubfK7CyKZxMpk8VpuBWj1LvVEm\\nky6QWl+no6ubixfnMJuNKIrMnt07KReKoIIkiAiiTrlQQm1pGAwGjr9wio5ECK/Xxuz0LOvrq4gG\\n+OQnP4MuGPD6/Dg8NnK5HEajkVAwzupKmnZLRJbbuFwebrnpIOfOnuAH934Tt92E02YkubIEBhEQ\\nWVq+xMXZCSYmz+D1+3n83/6VXDKJ2JSRtBaNaoFGvU56Pc2+a65iZXGKsc39PPDQz9hx1XbOnT9F\\nX0+USiGPKIo0alVaDRmbw4vN5cYXCvNHf/JR3vKut/DBt7+FuRdeRMvXKcsK3/veU5x78TTDvd3k\\n0uu8+fffytLSEsVikTOnJ9k0vBnJqKHSwu8Ls7acQhKaaGqTYMBJo1Zj/Kpd9Pb0UG+UmZ69iCia\\n0IC63OSH3/0h267ayd9/7atMTExgNoi4XJcTXhaT+beZ+56ePpxON+98xztIra1z+vRZzp47zXo6\\nRTASxGJ10dM9SLUuMzY2xvHjJ/n4xz/Jrh27SK4s8/RDv+Yzf/YZNvfbefu7PkDv4CCZXA6jGQyi\\nzpaxEf7g7e9g25ZxVldXaTcbGPQW81NT3HD9rSysPovfb6bSLHB+8izBiJ/1VJItm3fgsAV48MEn\\neNXBNzA7P8ezL/6Cz33lL1lda2C1RDEZnVhtXuI9IeZXphjoGSER6SYWjKJR+20uplorY7UZUbQW\\nkViQldUFDIJEb98giqKg62aCoQQuV5BSucGpk+ewmG184xvf4M7X3o5Oi1i0C1mWWVtbweuzs7i4\\niNwqkExfpKMzSD5fxGzVOHzkKbw+N41Gg2QyicvppVgsEo2GMZo0WkqJUjFLqVjF7XYTDIdoNGXM\\nVgvNdhOXx/Wy/fF3QhREUaClKLjdboaGBrHbrYTCAWq1Cn5fiEg0SK1eRNc16k0ZQRIpNXI4vR6y\\n62U6O2JYrU6GBjfTqLc5dfJFCvkNDj3zJA63nWcPPUex1KAsa2zaPE4w0IPN5uPW19xJvtAkn6vw\\n8Y//KZlMGpvFjq4ZcTq99PcOIBp0wrEwXX29SJJEpVSk0VTYdfUeOhM96IKRcqmKx+2kUMojiBL1\\nukYs3ofB5KAsK3R3DxOOdJDOprHbXGQ2inT3xGnIZZ595jCdHQnS2Qyf+ORfcvToURqyApLARnqZ\\ndDpNOBzHarYQCvhBBb0NnZ1djI5v5ff/8PcZ2b2VruEBTj9+lDt2x7HpWbr9Qc4enaJQTWFzWtC1\\nFkajkUhHHMlopV6tcfjIIYY3DeL3BclkclgEC5FIkP5d17C2kEQwuYl0JDh75gJ33vE6bDaNt77j\\nbh554AHe/5GP8OzjRxjfNIraqmOz2SiWNqhVM+QLaWw2G9VqHbnRwukO4A3HCEU6iMcSjI+P05Rr\\nVKsVNE2jWMxz08F93Hn7QZRCA63U4ImjkxydzXDfL+6nbTSTKedp0mbp0hpWi535+XlsNgfLy8uo\\nkoDdIrC2sMHdb/kIh55+iEvLy5w5O4HV4qKvP4jX5eWBn/6Ia8Zei65UuenAVfzqkX/EJBlZXFwl\\nl8sgN+tIkolkMkk+s0glq7B5aAe6aEeQrOiCitPpRKDFRnqN3v4xLBYvxUYZk9NGZ1cPtWaLZHID\\nu9WHqjTZSKWoloqocpGbf+9GLs6c4S1vez2pzAomixGDpGGSjLjdbmxmC8GAg0pZJhYZIJ3K4nEb\\n8PrC7D9wM063F7MDgnE7RquTTZsHOXrs15w/sYDZLBEMhJFEFafDgCQ26Ip1kN3IEvAFEISXPyL4\\nnRCF5kvhdCAQ4NjxF2jIVVS1TSQSoViocmlujbHRbaAbETAhGazUq20adZ3pqYs4HHaazQaHjzzN\\n8MgQ27buQpZlRjYPMn9xjrvf+hZ0ScQgmTA7LMzMzODzhrgwMUUsEgJdYXZumlg8zEYmidvjoacn\\nwZmzxxGQ8HoiLCwssby8SDAYpFgscmFikuTaOmazmUajgclkwul0UiqV6O7sQW0rOO12OqMJ1tZX\\nWVhYIBLtolgss2PHDmoVhe6uTdTlMnannUszCxx7/iSxDj8DQx1cWlwi0t3HP3/726zNzpDMbrCW\\nTtFWVXSDxHoqi8loY3RsG5ValZWNFRJbO3H2hnFFE3z8Yx/j2j3bsEteWi0Fo9GIz+sERWUjlSGR\\nSDAy3EO1VuDM6QtoqkCrXiMaiZBNrZNOpfB47Whai76BHmr1CkefP06pkWPTSD+dkQhXX72DL3zp\\nczisDgyaxsBwP9F4Ao/HR71colUt8/yRZyjWigwODlEul7FYTRz9zXO4bFZajQoP/uw+Hvr5/eQ2\\ncmzaNEJJzzG5NsU/PnqEe7/zeUxuC3MXZ7Fb7WzZMkYiFuXwoUPUGk0aLZVYohe3zYfV0UummOXr\\nX/8Y1RZEPQl2b92HpDfJXpS5ftvrGB7t5bmZ+9BabWQ5j6CqjI8OYrOIxCIBgn4fpVIJm82GP+Al\\nvZHk2LHnWViY59KlS6ytrKG2asg1GaPRDHobq11A0DVEBHp6egj6/LjcTtbXF3E4HCiKitPhI5et\\n8emP/yUtXaUhtyjmShhEXpoybpJOZ3A4HICC1+tFEAyUSkV0ocXUhRl02lTLeTbWs5w7OYXDbmJ6\\n4iI37r+FXbsGyGxU0XUDqbVF6uUc6dVLbCSX8LlsaK06kvby/fF3QhQMBhGj0cjs7CzX7ruezs5O\\n/H4f09MXmJufxmgCQdCYm5/G4bBhsUoMDffTkQhgdaqkMutsGd9BV1cPiyvL6MY2mzaP4ff1E470\\nYpB0wn4vhfU0CAbMdifFcg2jSSG5vnR5Hr23HwGJSDjK5LnjrK6t4PMGCQRCPPfcYcxmIwaDgXQm\\niyy3GOgdoLuzk/VUklKlTLlSIRIOkE2vYpTatOQqhUwKj8OMqrYxWU2kNnIMDA+RL6SpVCqkUusM\\n9I8hYGG4vwdFbZHNNimVNIYHB0jOTvHRT3yC+MAQM2dP0hHw8MLhp5ibmiAc9DI/P8ul5SW2bttB\\nwB/h3X/4UW59/dsxBzroGRrg2JGnMGtNRuLd6PU2Y1vGLtsSmaG+flaWswQ8HVjtAomOALrWIhzy\\nUShm6Bvoxmq1Ui6XiccTZPN5LFYPjz30PE5vkCce/xXxWIgPfOAPOfbCUaxGCVGQmJqcRGy3MRng\\nzOnjbN+xg96eATaSKfp6eqiVa8RiHXz9S3+PWbSQTmd469veQSZT4Af33kduo8Ktr7qDF3/xJXZc\\nM0JydhKMTTShjq5InD31IvGwj+GhbhrVHOXiOpHeAMNbD1LOpIkF/VyYOol1Uw8//fEDfO+7P2L4\\nYA//dN+nuXrXXs68OI2qC+iahN1up1ZrMj+3xNLSCqdOHMNqNuL3e2lrbbq6uti5cydyPcvIcCfN\\nRhWXN0C0I44s11HaDXRdZyOTIpVcwWQUMJlUKqU0SquOqlZQ1Qp/89lP0dfXxfv/8MOMDm0n5I8z\\nMjKCrtTIZDJoWot8foMnn3yaUrFFW6lTLK9hMBtotSwE/F4ikQhNuYHP6yUejWOVPHQnOllfTVIq\\nZ/H6fBgkiUCoh4kLayyulKnUGiwuLzE5dYGV9MtaogD8jiQae+IB/Y/esI/u7k6SqRUEUSMa6aBc\\nlZm7tIDb4QG9jtsTxe8PsJFJIqgKzVYVr9+B0WAjn10nFOnAarUjCgZUVcfrc7CwMI8oGXG5HAia\\nmckL0zTkCl6vh6t2baOQryCZrKylJ7EZOpAbNdwOJy63mWIpz/j2XZw9e45MPoeoQyLRRaOqUK8W\\ncXvsmKxuSqUSdruVibMn6evr5dL8LPuv3YdgkJidu4RmcLBp0yZ0tc2hJ5/g4MEb0ESVSzNztDHg\\ndftwW0SMbjeptQzdHV1cnJvBICqcO3uCgYEhxsZ28sPvfIuBkSH27r+OyekFApEYIwMdTM5Mky+W\\nuX7H1cytrqO3Feq1Er/42T9jVdtMTZZ553vuomvrEA5nEJvNgUk0UCjliXVFyWwkKWZyZPIpRoc2\\nkyk1aVTKjG0eRG60yBZr2K02Ah43Dzx4P/uvuw5vyM3D9z3A1fv2c/9DD/H+d72ParMCZtCLNU6f\\nP0vPwACaYGV4dJxmqcz9P/8J4UCIG/Zfz9Ejz3DmzDk2bd6C3Cxitpi4/sbradXbGIwGbEEfG6tL\\n1Opl3P4wlUKVoC/G9ORpIsEENUXH7DXhtFqYfP44Fc3M7a/ZzYXJc2wb38emrhv4/s++SKI7Siad\\npiPqQ1VE5LYBxBqRcILpmbP4ggESnX0szF/CZDLRrF92dLs1QEMuk8un0IQK9VqL/p5BTEYnuWwJ\\nDCqbNu8glykhSG18XjcXzpzC6vbT093D1OmzJHo6efSxf+OuN93F6sISIg0kswtEncnJ87idPvr6\\nh5CVGlq7xcpimkRfnGZDw2Sy0dbqWKwCzWqLdkvFbrfTbLYoFEoEAl6yuQ1URScS9bGRLOLx+PD7\\njSwtLONyusnksnh8Abp7+1heT9K35z3/dRKNOhqdXTEq1RLbtu1AQmf24hQWk8hAfycdcS/j49uw\\n2Wzk8zlkuY7T6cRsdiKZHMgNlUyqSCwcv/wFTidpt5skk0lMJgvr6+tspNdxOy3sunobPp+NmekJ\\n5qfnWFvOEPLGqBU1LFaRp55+FFWTmZycxOPxMDN9gXw+z/DwMJJkQlEUiqUsvX2dlEp5LCYzZpNE\\nuyVz22230ZQVto5vJ5nKUinXiMTi9PV0sbayxKmTx9l59S5yuRyFfBGz2YzRaMThtFGTK8hyjcWl\\ni8xfmiQWD9BsNuns7qLRaCAIAre/4bWkc1n0tsLQ9jEG+/tYmZzBaTIT8Pn4+a8eIBoLY7dbOXHq\\nJG9569uxudzccPMOvvD5+9AVI5fmLzA9dYrDzz3O/KUJcvl1JKOG03V5OqxaLaPrKlvGNvPorx9H\\nkIx0dHZiMBhYWlpi06ZRAv4QqdQGtUadbG6Dmw7egNNpp91uorYVFF1l77X7iMfj9Pb2snhpgYpc\\n5X3vfy81uc4vf/UAkqhxYeIMPo+LUqnC/usPsLqexuoK8pvnz1PINXBabQR8HWgtHaPBBJpGz6bN\\nlMUqosnKYKIXuVLiZw88yZbRMLIsMzZyHbt33MJTz32fru4oxXKGcNjJpflFmu02mUzypdWxVWKR\\nPpJreaYn5+mI9TM/s4TZbMXhcKGobcxmMzabnXCwmy1jV+F2BXA5ffgCXhwOB5nMKopWIZVapFhK\\nIxl1VtZnWVyewmhSeO43v+FRhYgaAAAgAElEQVTpJ58iu7bO2soKJqOZaqWGquhs377z8tJ3XaeY\\nrxHwRylXS8i5JmvzSR5/+HEqqRqTJ+dwu92sra3TarUJhUIsLy/TaLRIdHSjqQZqVQWnz01LaSNK\\nJpweP2aLg1AwRrFYZXkpiVFyvGx/NNxzzz2vnLe/TL72lS/cs3ush1w2j6apPP/8EUZGNiPLbZLJ\\nDKFgiFq1QbVew2QysLQ8h81iwm4yYTSKNJttAhEXjVaNSrmGy+XAZrfTaMgIGMhnNshuJOnr7kfR\\nIBSOceDAQXKFJCur53E5wWRwcuLUETq7OojGEjhddorlAlbJxDNPPknI52V+dga13WZsyyjVmkwo\\n1ImsaHR297CwvExHRwIdDZfHS6VSJt6V4NnDT2HQNLwuB36/h3whh9fvpVorIYoi/kgYm8XG8eeO\\nkcymuOnAftqtBiaTke6uHmbmpvEFfKyurNLZ3U3EH6RplDj666eYPnue+ECczmiYH337+7zxTW/k\\nm1/5Mk25gslsIp3cIF8t8aEP/ynrmXmuumqQT/3FFzh99BTv/eD7UBSdZDqLroHfEySZWsNuNjM7\\nu4DX4yGTSbO6vIxZEnG7LSRTy4wMD/HlL36eLZvHGb9mG81GiVa1wcljp7j+wI0cO/Qcozt2oCEh\\nN2TymTXUZp5YZw+nT57hzMnjvPudbyMSDiJKBlIb60RCPi6cmyCzkeGLf/cDHnjkcV516zitJpgs\\nXpx2E416neT6MgaDkd7ODmjX0LQaf/7hv+aHv7yXt9zxRhZSWfr7A7z1D+7CYrNhlAx4TA4OPfMM\\no1t3Uq3VsTtsCBipFGtYbRba7QZ+j4tUcgWH04SmqhgkA6IBnE4HBoOExeymraksry1RlxsIkhWL\\nxU2zJbO2skJnRy/JlSTPP3cMk8OHy+7jFz/5BVdfu4fde/didztwej20hBaPP/oUo6NbUDQNXzCK\\npurYLEbm5mbZunUbc0tZxneO0zvYRTqXYsvWrchykaXFFfr7+5GbNSwWIx6Pk0ajhs8XoN4o47Db\\nAAOtRoV6rU65XKJvaIBoLILTaWV64gzf+/nh5D333POd/8gffyciBUVVWFpcp9lUqdVqmEwmVlZX\\nUXWdeDxKJBaj3ijj9bpxOp3s33897XabbD6H0+VhemqOWrUJmoFWq0W1WufI4d8gICJJElarnXqt\\nQSqdIZvNMj+3xOzsPOFwmGgkRq1WJ5VKM751Mz6fh2w+g6JotFsqM7PTRKNhlpcXsVrtVKs1Usk0\\noihiMpnI5bJMTU2RSMSZvHAep9PJ4uIiiqKQSSUZ3TSELMssLi+haCLVcpl6tYIoSvh8PpYXL+Gw\\nGTn6/BES8TA2qxFJ5PKwRxS57vrrESUTt912O21ZZsvYGOVKkYO33kwkFuTE87/h2SPP8PHP/neO\\n/eY53vG2t7CwOE84HCQYCbGpv59HHv4lu6++inZL4S8/8UFUVWVjLY2IEZfdy9JCCo87SG93H1fv\\n2k0s1sHaWpLhTaOcOXseh8OFw+6mWCnTbDYYGOzB7/VRLpcZHd3Mi0dfYN+e3Xz7q19jbWWZc+fP\\nMDFxDlmWX1pqHmB9ZZWHf/kIb7zrTbRaChMTE7jdLsKxCIFgkFwux2OPPcbZ01O8at8WQqEgDqtE\\nu1Gn2WywMLfEQP8wwUCAmqJTqbZ57vCLPHZykUM/+hV3v+F2/uZzn6LRsGB1isjNKu2WxokTJ2g2\\nZFotmUKpwvTFS/T2dVMslrG77GSzGeqNGqtrKxgMArIsUylV8bgDbKSznDl9DofLiyia6Ih3Ybf5\\ncDrcKGoTk0liZGQYuVnG43VgNYvc+nu38fOf/5zhoRFcrstrVRqN+uUftTSdgYEBCvkMqfUVSpU8\\nmWySer2G1+OhVKrgd4vo7SIep8TS3CSV7DoA/oCXSrV0OVlrsSAKEnabC6NkJRKO02w2MRpEWmob\\nq92O2WpnI5enXK7SbLQYGBh62f74OxEpfPHzn73nbW+4jnqtxML8PNdffzsul4/fHDlMT3cfhVyR\\naEeMVCqJqiq0mm3sLgeBSIhqqYjPbaNZa9KoNGnUqmwa28TIyAjptTQLc9Ns27GDSDgOZhGn2Uom\\nu4zTaedf7r2fd77z3URjY7hcJi6cn6KrZwBRkOiId+L2eOlMRLFYXKwnN9i+7SpsVjterwtdU5Fr\\nDRLxMHK9wOryLEGfm410ic54J2fPnMDtdnD1rquYmJwh0dGF3Wqhp7uLldVlzCYLF+fn6OiIk06l\\niMZ6GEr0UpFl3A4XXoeXzMYGxWKZzo4eyuUys5dmeOLQE4wNDZNcXad3oIvp6QnuvOtunjvyHE8+\\n+jAX52f5zOf+lq6uPr7xzW8wMjzG5MRJxrcPMrbtahRJAqnN4vwye3eOUSiWOHP8OTYPbefRh+6n\\nZ3gbLbnK1Mw0W3fuoLO7C6vRzko2j0HVyKxvsH3vbo4fPc62zTuZWVigu7OXoc3D/PIn/8L1NxzA\\n4LOzffMYU+fPo6oCmO006yVGtwzRFQrww699hd6hPkKDfex71e+RTq8xNDjGrr3bePi+p7nhqi4u\\nbmS5cPo85469yPJKhvWNFEd/c5ijh45w/MUjSIKXz/7Vl9magD/4izehm3yU83X6hxKcPDlFxB+n\\nkq8wvnWU1bV1YokOrFYvPX2dzJw/RygcZ25xGblWp6tjmM6uAWZmp+js7qCtaFycXmBkdIxSqYTJ\\nDrViiZlzM5RbDYJRH+nVFCYLVGsSansVp93E0888zr89/K/8909/HE1oIRmtBLxePC47qdQaDouL\\nYCBIoVjAbDLgtBlwOlyoDZFCLofb68RsgaYMK4tLjI4N0lbs2GwWLGYjRslAsVCkVq1gMZuZODdF\\nS2kSj8fYSCURBB2XM4Suifh8QWrVKkajRLGUB0XgS997+GVFCr8TovDVL33xnjsO7GYjlyPWESef\\nT3Jh6iwHb76eeq2N1+un2W6ysLDI9u1XUa1WsVrszF08j8NpJ5vfYOu2XZw6dwy330ujUiW5so7f\\n50UQNdbWU0gGCYfbydKlJVSlTaIjit9vIptNc+rUETwuN36/j3qtgs/lZyOZxm61MXVhEq8nwN49\\ne5hfuIjDYWVoaJjFxUU8Xhe5XJbHH/81N9xwAF0zY3e6eeHoMW677dVYLDaqlQbBgB+rzcjK0jyt\\nVotQMIyqasQi3XjdbgwGga5EiL/61D3cdvAG1jJZVMFIRyJOu93GZrOi6W02kln27b2W1ZVVpi/O\\nsLS6ysimHditDv714Yf5q0/cw9pGnqC3k2ImzwM/e4J3f/BDBD0hYr0DfPNLX+ehX/ya4bEurr9x\\nL088dojPfuZH2K1FwrEAXYkQ6XyWSlNmx86dlAtpDILKk488xNDoKB5fCLvNSqlUAARMJiO6ptBu\\nyvzyoZ+zdWycp59+loOveg2ptQxbNo/j8/kv28pu5+iTT9BSWshKi517b6Stw9LiJYxGD/lKhauv\\nv4Htewbp3buX4aiNJ/71QfzhbnbdsIUDB15F3+AW9t62i1tevYdX3/pRJlvTvPrWgzhsHfz4X77H\\nwsIiPR1bcNskRKGN2+tCEzV8Pj8qVoLBCKn0CsFICIvZgqDUCEc7kHWZSr1AIh6jmGvgcrlQ9BKt\\nVhOz1UGis59UdpVw1IVRt2C3OrHYDVglO1MTJ4iE+jl1/ByPPPIon/n83zIxOUmuUEDQFFRBpVyt\\nYhCMVBsydqcTu8NFWxUJh/tZX13EaFEpVnIoBj+x7iGmpk8TDPoxmZ2UWjkqhTKpVIpYLEGt2qS7\\nqw+ny8ns7CyZ7AYms0AoEMVmtdNWmjQaFVaW56mWizhsFlpyA0WR+foPH/+vIwrf+Nrf3XPDrmGs\\nNhvlcoXuzgF0XaRer6NpYDRKTE1PMj6+hUuX5shmMyhqE6NgBt3EyOgo66t1AiE33d09qG0VVdN4\\n8JcPsmXLZqamZjCbzKyl1vB7PXi9LtZWl5GMdhRVZ6BviEq9SkttUipn8Hg9ZHI5+vp7WV5PIYo2\\natUG41tHOXHiBBcvzrF161ZMJiNuVwCH3U2pWKWjo4NGo8ATTz7K5i2bSSVT1Gstunr6kJsyqqpz\\n9PmjZHM5HE4bzWaZZCqN0haw29y89z3vYvr8JJFolAceeRiXx4fXH2QjmwVdBAScTjeVSpnX3PF7\\n1OoNLkxNEgh6aGtNTp85x5133omiaPzoR9/lxhv38NP7f0qqmkYQ2ly7bRdPP3SI4b4ert6/h0S8\\ni/d95HU4zRILs0U0tUH/2Bb6+/pYXlymp7eXck3GZXfj9HppNDXsFivFYp7rbrmFQ4/+mtGRTfx/\\nn/1bXvO617Gpb5BapcHg2DAb+Sxmi4VnHnua44ePUi0VGNg0RH//AC+8cIzNW8dYmp8nGglRKufY\\nfe01HH70CL3dLqSmxPJCm9ff9V62bh1DKcYIhPy022VmXkjymhs+xPjoCMP9fgobBXTJQDAQ49Zb\\nXo2oy3z5i1+mt2+ApqwwdeEcie5OVB3y+eLlX5M1Hcmgs7SwgC/oxWD00pRFVK1Nu1FEEiVs1gCl\\n6uX81PT0CuNb9mA0erDZdarVDbyBAHPzMyQ6/Xzj69/mxeMv8tE//SgmmxOv18/gwCAuuxs0HZ/P\\nS0tt4/OF0TQNWW6yvLyM2+NG1zXsFj92m5NqMQeagbXli5iNJurVNpIBSqUK3b1dpNMZRNGE0WSm\\nVCzgcrnYs3cPVpsVva3SUtrkcmnC4RAer5dqvYiKgGQyY7MH+NI/PfBfJ6fQbissLc/RbtawmCTy\\n5SQunxW314vFbkFuyzjdPlIbGep1mVisk4A/xKXFebo6Y2iqisNRQ1fqTJ45TaPVwOZwccudB/EF\\nY+zYtZWN3Coeqwu/L0Qg7EJWZAZHxilUymiSCYvdj6a5cNhj2P0BRAP8/P6fYrdYiUW9lMopRNHI\\ntu3jDI6MMjkxQy5d5PS5k5isEh3dcQIxD7lClfe+972gK3R0xlhanWUlmWLywjzBWIih0T72XncN\\nzaaOIDhx2F1cddVVPPDIIxx98RjJXJovfOozvPtdH8Buc2IzGYkmOjBaHXhDcVq6RCQ+wOHDxzh8\\n+AjxeJzk+ga9vSNcd90BVpaW+eEPv8PIpkE2NjZIzi5w7bZdbN08yjNPPMxr7r4GyS6CwUU5W+fo\\nmQXWMiqf/eYDDA5dw+LsGpVaDUGyk8mXWVlOsvfAfn764x/yr794mOXFVbq7e1mcmMHpcvDU07/m\\nmmvGabYMfPfH32XX3s2ceP4FbBaJE889x74bruMPPvwOCtkC6/Or3PvdH/KWd72btqIzumUnFy5M\\nE/H4eezBX3HLOz/NsdPL6JoBi8PAamGVUqNB93Y3hlaBL/zt5+hK+JlA4V/u+yyl5Vl88Q4ERWN8\\ndCuzFxfxh8K8/yMfRH5pOu/Xv3ocSXJQypcoVwq4vAHMJgelcpvR8euwmN1MTh4nEnVezn9EYtj9\\nTjQjRBNDGExhOrvDLC6dJpNbxB2IohotLM4uMDi0ma997V5iYQ8f/NAfsnnHDlwOJ4uLy9TrLdZW\\nL2I1KhSySVpNhY2N9cv7hIQjRKNhJEMDu0OirVcwWAx09XWzMH+Wjo4YdqcNi82I3Rpi685x6jWZ\\nWCRKojsBBrBbrXQk4kxPTVLKFjEYBARNxWKx0WopSCYz0VgPfl8UXRVp1rMv2x9/JyKFL//d5+65\\n48bdLC4uo2otjGYTZrMFs8WEgBmP24/ZbCCdWqero5N2q0mlXMBqMFIplVhanEfT20xOTvDa195F\\nJpdlYuI80ViYtdUskUgQp8uF3+djI58hHAyTTmUwSy5URUbVmjTqLYJ+H2ajkeR6HotZ5IaD1yKZ\\nBZKpDaqVGoVSmURnAslgxWK1YDabicTixONx1tbWKBVr2KwebHY7kxMTL+3/UCbkD1As5NBUje6u\\nPmxWNyASj3egKC2OnzrBHXe9jvnZBXx+Pz2dnWzbsYVMpsRKag2Py8Py8hrhkIffHDnE7mu2Ewz5\\nsdvtjI2MEAgEkCSRlnJ5B6Lr9uxmYHSMkfEBxgaG+fSnvwqais0tYhI0yrkCm7b0UK9nWZ6bYfs1\\ne7j7tp3c9+172bl9M+slHcGuYJOc+AJ+pmemiccSvOqW66k3qigYEGiTz6WZmJyiWG0yMjLOa1//\\nZn7w3XupLF/i+JnzvPq1t2NA4Zffv58PfOzP+PVPf8Jgfz9mh4Nms4msKQxv2YTPFeSD7/sUm+I2\\nPvnXH0dtNnGHvfjtVqpCjSM/O8ydd/83vvaDr7L5xrdx/5f/CHc8imj3EXK48DocPPX44/R0dVKo\\nFlhOrrH32v0UymWCDg/LS2uYrQ56+/qZvHCW1dUVXA4vpXKJpaUlNo9vQddV5ucv0BnvpV5XUbUW\\ncrVKOGTFYTeiCwoTE+exmJwobf1/UPfeQXLdZd7v55zOOYfpmenpyUkjaTSSrBwt5IhtsDFgk4N3\\nWXaXF152Yd9l114wBtvswtrserEB44DBOOKAbVnJsnIazUgaTY4dpnPu0/n9Q763bt2q+17qvnWr\\n4FR1ne7Tdc5/z/f8nuf3PN8PlXSBJ37xCPd88XOIgoyWlnbi0QRSMUu9Vkan16FSqhi5NI7F2oDJ\\nYCObL6HT6oiEgyjlMow6M+PjE/gDEWQyBaIMLCYrVouVWCyBVqtFpzUwM7tEPpunXJJQKAVkSpFk\\nMk2tXkOr0yPK5SwvR4hE44hyFVqdAZlCwcL8HHabg1w+QzKV4PHfHvrzSR9+8q8/urfba0Kj1VKr\\nC3gavMhEJSql9qpZilZNPBmnrb0VvdHI9Nws6UySRDaFTJTR2d7B6MgF1g4NMXZlmvn5RbZt287I\\nhYu8//5pDHojfV19pLJ5bA47iVQBq8VBhRQmswsBLSsG+jlw6ABKtQqnw0mDq4Hhcxcw6GzE4mkC\\n/iUGhwaZnZ3Bbvfg9HhQ6TQolTLGrlxBodBQKcspShJanZ6u7m4CwSA2s4V0OoXZZPgghagSDPop\\nV4rIZTJEUcng4FrOD1/EZFSjUFbxNLj40fceRmlU0NndTiQQoKN/JflkEpfDRbko8dYf3gFkXBi+\\nwIq+FQT8flRqA6Agncvx9BNPUEjmcXgbqGUTjJ0b4ZZP3MXbL+3jrjvu4tLwKGt3XI/R2sjYpVHK\\nFYE/vLSfNZv6UekspHN5XC4XSpWS5tY2NGoj93/vn+nt6WQxsIyvtZETx4+SiKf4yC134GttIp4t\\no5LLOPHeC/zL9/8dOWp+/btnUeYKzPlnUWk1rN22hcbOTnIS6DVKTp85xqHXj/O7A6f4x298nKJM\\nw9K8n1OHj9LWN0BLSz/FUpTv/ee9lIMJFi+M87lPfYpYJoVBrqVcqzA1Pc2aoT7MVg0FSSIczmI0\\nWimWivhavFTkIvMz89htTnyt3SiU4Gm0UK3WoFqipbmVek3AajSRyeSJxeMo1XVsRg+ZdIY6NYKh\\nIC6nh1QqRWdnBzOXx7n+hlto9HmxW22Ew8tYzGbEeh2LzUEdqNaUNHi8ZLJZdGo5WnWVxbl5NGo9\\nZpuV2aUZ3O4mDHo7NpuL5eUgNQEKxRIKlZpSpYpGp0arUWG3WwlHlikWy2RzBWrlAqVyHknKUCoW\\nkSnlTM1MIpfJUWpUFIoSjU0eliPLyGUyHHYrD//sz6jQ+OAD3713z8b12GxObDYLUjFFJBxmcXGR\\nbDZHoZChtbGVVCLJ5Ytj9PX0cfzYUTrb2+np7iYSjrFq7VoQFZhtRkKhAGWphEpZY8M1g6QSERQy\\nOROXJ8mkklTKBaanL6NRWchmsii1RVLxJM3NzeiUGt564xVmRy5z+dwY0WCIeDiAQaXj2IGDKBCQ\\nlxX45xaJRaKk41kanG7KkgTyItV6iYtjFykWJcavjLOir59apYJcIcfrbUdvdKDR6EmlEwT8Qbq6\\nejCaDSTSUSwaOzmpQkUGXZ1N1DJl6noFjmYPF4+dJZPL4Ha7KWQzbNu2BY1GyepVKzm8fz9Bv5+e\\ngUEqVTDpdfSt6GBobR8nT46gMRu5/e7PM3zmLbLpCJcuj3DizSPITSba+5twmOyIokAiNs8bT/2B\\ndCKKscGKXiHy2qsvMbSul9nxCR57+GUSkQX++r/9BbNzS2y85ho2bVzHiQNv09jo4vSxfRTiEepF\\nGYtnz+C/coliuc7Kzev50K03sxwIUa5UCCcjWEw6KpKEtJziG/f9krdffIDW9l50Gj2dq1cwGwjz\\nlY/8JVu29NCxcgCVzs0bB0/wjw99g+GThzDW6ohGG96eNqrFFMlsmVhSIhZP4GvxYNZUOPPeIVq6\\ne9EZnTiMCi5fukxLaw8arZrl5QShYBxvSws6i4l0NovBaKEuFDCYFNRKalBCIhNBQY1sbJnxi6M8\\n8fNH6eruYt26NaTSaZ58/Ek8nnZEmZpyrY7LYyOXyyBTiOh0evLZDNSKJJNREukKLS0eFhevkElE\\nsehsZDIRarUyiUSUar1Cb2cn8UgCb2ML4WAIq8VItVKiXq/i87VSrwt4m73EU3laWhopV0rU5TXU\\nShU2m4VmrxepWEKlVnFl/AoKuRqz2U61KvGvT7z25yMKP3rogXu1tSRP/epNQoEJNm/ZikqpR6+z\\nUCiUrppR5HJIUh6n08Hhw++xfft2gqEF5DI5Or2R8HKE5XAYURAwGU0o5XLcjmaKRYHWjk4ERQ2P\\n14pWpUer1RCOhLC7GtColdSqNWQqDYlYimMnTnHtnh1YGu1obFr6hoYwO5ysXDeEt6uDTdt3MOsP\\nIFcI1EoSI8MXOPbe+1wevUAlXUBWkbN3816W/SE+/6UvcOz0CaoyGXWZknypSDIWxGjUYNAbQKag\\nVJRIRCPo1Uqeeeo37N65jVgkhMFm4rFHH+PrX/8GiwsL1Mo1GpsaSMRj+FpbuHDhAiq1gnAkjN6o\\nR6XVIgBarZJ8PoNep0Eql3n/0GHyyTTbdm7mzPnTuBROPvapzxIOTzN98RJdazdw6uRplAoFdbHG\\n7s1dvPr2ce648wauXJmi2e0hmkiSz+a4/RN7uPGmvTz3y6fYv/9ddu3eg8lsIRaYxetpIhBKsHbD\\nBobPjmKW1XD39pDMltm0YycvPv8bSqUqFrub3p5VlOsCF04dY2pujnWrurjtjo+RzCURVSqsjQ2k\\nZmf59N/ejdKiRZDKzI5c4s5Pf521bSZmJ8dZjodwNXo4f+Y8XW2riYcXMBtU6PR68vk88zMLTM/O\\n4nQ3E4+nMZo1tHd38vh/PUq7r4lyvYTRZKAoFVj2zzA/M0+tXuXy6CX0Wgter4+j7x+ir6cHg96I\\nUq1hcNcuPvzRT0O1QiwR49e/+zUf/9THSKXiKFV1KvUiMoWcUrGEAGTSBaCOyahDLiro6OqlXC2h\\nlCtQqTVoNRqq9QqJRAqPpwG90UC5VCQUCqFUKrHb7YRCIUKhAKIoY25uDq+3iXw+SzIXRsqWiUVT\\ntLf0IwoCSoWS2YVF8vkCsXgcp8OCx9OMgBqDRcf3//03fz6i8J+P/vu9N+/uoqlRy+rB1SATicbj\\nVKoVKpU8TqeFuiAnncvQ29uDPxhEo9XQ2uJhamoGpVJ99aNQIpMpKeSL1BFIp+OEIws0NfmIRhK4\\nnM3EEwWmpsYZHFzD3NQsCrmCWh2mZ+fZvmM71UoFm91BMZ/BZDFj0OgpSyXy+Ry+lhbm5+ZYDC+i\\n0ImgqTF0zRoG16/E3WTH1+FlaO1aTp4/ydiV8+z/w+tcPn4KVQWsSiMqVJRLNXp7VjA/v0RzSxu5\\nfB5Pg4vA0jxOp4O//+b3uenmG6jV4KYb9zA9NkNbWwcXLp1HKhSw220olUq0Wi1utwt/cAFfazta\\njR65XEY+l0GlliGKKpaXo2zeuJG2LhcPffchOlf1UUuEOXjsEDqjEimaprnHy8DKDUzMzfCxm2/g\\nu//yAD9+7Mf86If/xax/grwksX3LTuKpCPGcRO/qFYyfP88n7robo9nAO2+9w+kjhzGrTew/eZae\\ngT462r3klsNIOj2BRJZIcIl8Ms2qdRvwtXcxM7/AIw89BDUZn/vIR9i0cxf797+Lr7+fU+8e4NZd\\nn+XOT+zErtWhQIEKkc17P889H9vAtl276ejuo4KGoaH1eJs8hBJzVwM1msZkdqHUqJienqbJ0wgy\\nkKs1NHoaSMRzrF65hmo5i8XUgNXuQK/XkkxE6erspVquYLUYGbt8kUw6js1mZWF+HrVRz8jEJPd9\\n45/Ztm0LRr2dxYVlNq/fiM1sR5DXr+b3ei2xcBJRFJkcH6e7o4NkPEK+kCdfKZHNFjCZTSwHg5SK\\nJQx6PQ6nk9m5BUymq+mzICpwOe34/Yso5CocdjcatZZqtUSlKiGXq0in8qhVNmw2M1qtmgX/ODJR\\nRaVWwd3gQa/X47A7KJevzkvo9UYWFiZ59Mk//PmIwo9/9IN7925ex/T0Ar5WH9lkhjOnzpBOptix\\naydLgSCCAFaziVAoQFNTE/V6lbGL4wyuGWJhcY6ODh+ZTBJBKCGXyRkfn8BgtNDd00upmEavlTE7\\ncwV3gxmr3UUqnaXB3UgmmyCZTJJKxUjE4qiUKhKxIOFYijpKlHIl8USClFTA6vCSzecZ7OkiFoqj\\n09goZetUinWoiyws+anLlRQLJXbsvpaqXIaptQmt3cxSPMSKwV5q1RQajUA2meL4gXeZGbtENBik\\nqakRpVbD+q0rkcvk3H/fD9l70y0899QvOfX++9x2x+1kczHSqSROZwNXLeYVGA1manUIRaIUMhle\\nfP55Nm/eThkRk8VCXiowPDKBoJJRWIjz1ukZ+ptbkBm1RIIF/vDqYTZcu51t2zcRKsT55Q9fZXDT\\nIMFggKG+bgbWrOT0uTPYHB6621rx++f48Edv58SJo8wuRugZ6iMXClFRaOlo9XFy/yFGLlwikkvw\\nyc98kYlLU9TrRbp6OylIZbo72ymkY1gbXaxaP8C3vv4wGpvAqqGVaJQetLoSX/m7e+jobGcxEmVs\\ndIQbb/8mLz//b6zbMsqxbLEAACAASURBVIRMrqe5pQmzVeTt199EodIxetlPR3c3NpcbqShhVIkk\\nExl6B9bQ2tlDLBzFYtaTTEaxWDT89tfP0trewuTMFKKoZGHJj0ZrxuK0YXM4SOfSmCxGCoUyHZ09\\nPPrQj/n8X3yVXXuuxaTVsxyZ4amfP87QpiFSuTRmrQOL1cFyMIpGZyGbl2jr6KZQSpJIpVGozFhN\\nDqr1KoIgQ67U4HJ78AcC1CqgUipJJhJ4vA34F/zIZCI6nY5SWSKRiCMgYLPbUWt1aNU6stkMGr1A\\ntVKkXodKWaBcKZLJJDCYdORyuQ8K9TJCoTDvvL0fq93Mfz27789nSxIENFoFAyt7EUWRUGCW1at7\\nuOP2mxgfH8disaBWaJibm0MECrkMGoWSbdu2MT07S6YgMT01RyEjoVYZyOclVq9ejd1pIxxJUUeL\\nVNRgc7RRqlbRKNWk01mUGjUFScLptLJ3716OHDlKTiqgNWkxG7VMjJ1Ho1Fy6dJF9AoVxVSCDYPr\\nCMUSSFUJjRYcLjM1IU+penWWQS6IdPb2cfHiFdavWYevwU2D1UF/7woS8TxqnZ3TZy5RV4k093bS\\nPbQKb18X4UiE4wcOM9TfT0uzk1defIb/9rff4ktfuYeyVEGsqYhGMzgcDqbG50klY9SqEn5/kHQq\\nS6O7AY/Hwxe/fA/P/PJpkvEwo6Oj6HQamtuauXn3DZw6eAKjKsmtX/4wd3/1q4TyEoFQASkv8urL\\nbzJ+8jIdzWpeee4lbrjjZt54+Sg2k4eXX3uDhel51CYDvStX8Marb0BFRW9fN+mMxHv7jrJx9xai\\nQT/7332Pv/6Hb5HNZnn4Jw8yP3MZu9NKPBNj27Ub+elj/061KNBg1XPu4DkOXJnjtptu4DvfeZC7\\nbr0FZ6MdsZDhwFtv0aA3snb1Su779mdY1beSSk2OQJXzp89AWUUknkakxpq+FmQykXxOQiVToFKp\\n6O/vpyZUOHnqOC3NDczPz7C4OE94OUapXEMQleg0aowmHd5mH0qVjFQswcJMiIbmHoxmJ6vXr+fc\\n6TN8/p5Pszg7QqGa4ZGH7+fV117h/h/dS6WgptHZQSi2zIlTJ+ntW0FDQwPdXb2oVToUKj0WZwPV\\napVaXWJpeoFMSsKgs1MpKbDZfEilIk1uN3arhfByFLu7AZ3Bhs3agFKjxtZgQqNRUAP0BguBUIiO\\njg5EVEjlEnJFDaNJg8/bS0tzH1RUaNWmq3M/opwaVfbs3UQ0Evmjo/FPQhSkYhGFQkFraytWq5W2\\ntg5GR0d5++23cbs82KwOdDo9g4NDJOIpfD4vGo2KJf88K1f0MTS4liV/kGMnT1yFk9SFq+OlNica\\njQ6rxY4gCORzEjJRQyKVwmazMTU1hcvl4rnfPs/8/DwrVgxgNVnQa0x4PB46O7toavZgMGo48t4+\\nnn76V0xNXfnAUDNLLJYiEg5hNhmolMso5FAXBcLhMK0tPpKJNNlsHpPJiFIuUirlyWbTbN26lXgk\\nikGnx+VpQKpUGFi9im0bNvL4jx9lYWKCuZkJhobWI8oV7Lx+L/6QH5VMTjyeRK0RicTCxBJRisUC\\nCsVVr4ez588RCATo6emhVqsytHqQeDyJTqlDoVby+a9+jnJFYHpsFpNKwY7N60mWRVocFqbm5mnr\\nbKG7z81iIIFOqefSVIxUTqLN104ik+bS6DCLs4s0NTXh8TajUqmoAR1d7RQKBQw6I9ff/CGOvv8+\\n1XKVez7zZbR6A3mpzMZrNvHrp5/F09SEs8GJUi7j+OUL2OrwzW9+h0tnJnnh/dcJzgY4uP8ga9de\\nw1I4hKupkw2bNpKNx5CXa2jVanKFIrlCGbvDzNlzx9HoVYxPjCFQJZtKEVr2U60VkaQ8brcTAJ3G\\nQEtLK6lUivb2dkRZDbvdTj6fpbGpAZVagd1hpVIrYzdbqBTz5DIp/un7D1AqqzGrmnBoPWRLKr76\\nV99maX6ZYjnO3PwYBoONOjLGxsdJp+Jk0gkC/gVEEQw6HW1t7RTLNfLFJXLSPDnJT1VMojDWUKlU\\nRBIJ5EoVRoONXCaPgIxcoUgqmSGbySMVC0TDYdLpJGaLkXA4yNTsxFWYDALVap1g2I+gqFOqVEil\\nUtTrNZQKBXarA7lMjdNl+qPj8U8ifXjogX+5d8+mVZRKNTL5Ahq1md6+legNBpxOE8lEBLVGy9LS\\nEi53A/F4FoetgVAojFqm5Ne/+gU3ffhG+lb0kUkXcDhsGAx6lgLzrBka5PLYOMuhZVauXMmxo2eQ\\nijWUKg2NjU2ADKVKT09PL5l0AkGoMz4xi1xQ4vW2EfTHcNjdNHtbGBxchZTPcf7COUqVIpVKGUEm\\nw+lwkUrlkcmVTE2OUa0XqdeKDJ8/zZUrYxh0RhRyOY2NHi5fGaNYLlEuSjgdTuSiDKvFQiqbB7UC\\nndaMy+Pl0uQEenURebXEpmvXc8utf8+tN++ip6uPxhY3qWSWSChOY7OXdCrF+Pg4O3fvRKvXYXPZ\\neP/QEWwWC/myQP/AapK5HI4mO4PXrGRh3k94wc9H7/snmksp7vnid1i/YSVrNm3hvv/xGF9+4Gss\\nnb7IY889yAtPPs91N+5CpzFQqqrJpJPIZEpqMpF3Xn+NNrebyQsXWbdpC+7mJtoHh+hZMcSR37/I\\n8JkR3G4T3R3tPPYfj/K1b38Htc7Awffe5/DhM0wcOIXboOXh3zzLJz++i/EzZyhpjGzbvYdsvoC3\\n0cu3v/ltbvzwreTEImaXG43KiN3dSDDsp9fXwNkTJ1mxYhCVSovH7SEWj3Dw/X20t7dTLpexGi0c\\nPXEST7OPOnIEQUEgtMTg0DYSqQJms4OgP0wykcJiNiNVq5hNRj5550fYs2cXt3z4JkbOjvDD79/P\\nHZ+6g217byDiD6DSaChJNYrlAk6nDY1ajSgKWI0azp0+TpPbTjZfRykqkaQcJosVX3s3Doedmcuj\\nzIyPIdYreJp7qMqgWq+RTcXJ5tOYTCakYh6ZTE44HMdpt2Iw2dDpTcjrckqlGg6HC7EOQlWkXC1B\\nXUImgFolkIgEMOrU5ItFpFKBXCFLJpzk5y8e+fNJH2RyGTqdgWg0TlEqE08lOXHiBGfPnieTSkOt\\nzuzcAhdGR3A4bJjMBmRKBUJdJBj0s2pwFWazleVglFq9TqkkUamU6Onp59LYJLWqQFOzD7/fT3dP\\nD4V8Bv/iAqVi4QNVrTP1AYNgeHgYfzDA2Ng4C/NLNDc3o9VetdXK5fKEI3EGV6+lxdtGvSaSS2co\\nFAqoVCpMRgvd3b0YtDo0mquOS5u3bESn01KplAEYGBigLgg0NDSSLxQIRyOMXLqISqEgsLSESq6i\\nUqrT5POxeWgtj//8cX79q2d5/JG/5fTRs1hMZman5kkk4iDWWVoKEI/HKZUkJq5cweVyEU1EEWoC\\n1WqVWDLGxPgYakGJ1eZi5aq1GO12onPTVMIxvN1e1nQZ+e1zz3Py4Ht0uQyIwRBnL19gfjpK+8ou\\ntCoNh9/bh1BJUUimyaYznDp2HL3JiH9xiXg4iH/Bj0KjRRTlVBHo7W4nFPJTlAq89PwrdKxYTVYq\\nojcauO322zj81kEUNVjZ7aOUCBJLxKmUJFQyGfl8gXQ6jlyl4dzZGaRaDYfViUaQcWXyCosLc/T2\\n9KM0OIhnCqRyEhaHndmlBXR6DSWpiN5w9c2YzuTYsXUbpUoRu6MBt9tDKLhAuVSiUq8Qi0UQZTVS\\nyRixSBBZuURdFPjmvf9AsZTHojPwzDNP8NBPHiC8OE0hG0et1TE7t4jNacRi1RNfjiDUazgdDtRa\\nPb62dhBFTBYz2UIGs1lPIZ9l9Mwojz/2NHKlg3Ub9tDk7SO0HCASXEYhiKjVavRGA8ViAbEu0tzS\\njtHqIFeoMTu/wOLiPH7/Iga9GkQZqWwGRBlWuwuD2UkFGal0FqenCUQVUi6PVq1BIZOjNRr+6Hj8\\nkxAFpVLDk08+SWtrM22tHvb94SDHj51hzeA6EFQYjHa6u3ysXztIPB4lGFxkfGKETD7Kzg9dy+q1\\n6zk3fJZKrUi1etXt1mZzEVnOks+VaevsoFItksxE8PncrFo9wPnhs2hMCmTKKruv3UJHey9mi5c7\\nP/Fprr9uL3a3g5a2VpaCC+RLGbZt28rc3CxSMce582c4+t5RdGodap0aZCKH3nsfs83OwnwQmUxB\\nrS6nd8VqZHItM7MLOJ0ugsEAMzPTVMslZmf9KORaAothNDIds9NzmM02AokodZUcl62Z6Uie7Tfc\\njM7mQ2e0cfbQEf79gb/i7Sd/w4ZNG1GolOiV4HbZ6e7vo9Hj4dmnniQTT9PT38P0xCVW+5oppcLk\\nCmHOnzvH+Pgk6XiUfUeGWdu0hWQyQFks8Q/f/AzhwBzxfAFHpcK6oQ4S/hl2XruNz33yO/zFl/4W\\nhajD43JSq9TZtec6vvi1b3Lu6Am8LQ0YjXoWMjGq1SIKvZYr47Ooyip0yPnInbdhV2t54Zlf8ebz\\nL3D27bf45Ic38JXvfI2//PEDRObnsOschBJZelZuIT2bJBaIYmzZwue+9FHS2QzxXJz55QBNzc14\\nfc2Mj0+hVOjYvn07HSu7Gbs8QjGXwulsYtPgRnKFGuWqQDC4zPFjpygUSiz7lzGYNeQTOc6ePQ2l\\nCoVsmrxUQKZU09jZw6nhc4i1Cm5jE7994iXMbg/fvf97mGx26modyXCEkdHTNHpsFHNZ6hIsRyOo\\nVCoCS4scfHc/dpubbK7Mof3v4LTbWJxbJBWO0eC2c91119LS4aVEkVIlQTmfxG63g6AGQUdbWyeh\\nUBidUcfl4ZPY1GUCC9P4mttoavKCUKZSLXLx4imcHheCQs7SXIBarY5e66Auk4NMi1RS0tjQgUZh\\nJBFJ43Z7/uh4/JMQhWw2w6c+dReLi4uMjo6i0SqIx8scOnyQTD7DUnCJC+fPYzFakHISbqeT7s42\\nLBYbE+MzICqpIZLMpJGJcpqavMzNzX2A3soyMnKBSqWC1WxDkkpYrVb+6qtfJRpLsrjgJ5+XEASB\\nI0feQ6NUUa/UkCtVxBJxVCoVoigyPT2NQqHk9OnTDA0NsXXLJsLLQWr1q8SfzVu3kMlmUevUKJVK\\n/H4/MpmMdDKByahDqRAJL/vxtTRz5fIYra2tyOXy/7MecPDgQWQyAZVaRioZZXJqnFw+jrelF4PR\\nTLqYZfdNG7h8MYtGXiMyG8Ftd6LTWVCrNIQCfi4Mj7Bu/Ubsdjs1ajR47MxNTOBwOHjp9d+jkisp\\nFiSoVLHbPJTqKg688jbZssTk4hJml5pIvIIkVGj0uHn5hRd44ZVXGVrTjqwm0NbjpampiY62Jn72\\ns0ehIGE1mpiYmcfX1gHlKlNXxtDKqvR2tbLr2s20dniZW5il55oBNu3exp6br0OoKpEXa5x5dz+J\\n5QjFepVAPMJHPvUpfvidr7OQWuLY6REcwLYP7UAtKMnGk4j1GgaDiVQiiUajxGDSo1YrKeYlHDYT\\nvT1dlGtVBgYGrmLb1DJyuRS9fT0YNDpSUoZ6qcTm3btoaWxieTmIUqmisaEBm9XEwtgEPo+Hc8eO\\n8aV7vsS3f/5jRi+eRqvWEPYHifiXEeUyBlasIJXKYDabKVXKFMsVotE4FouF7dt3IkklvM2tXBg5\\nSygUYG4hiNncSDqdxtfqJbIcRYacYr6IXnO1c1GlUTM1NUU6lcLtaWR+cQm1RkV4OYROo2R0dJRo\\nNEqlUuHEiRP09a5gafEqJOnSxYvEw0kmJ2coFito1GZkcg2iQo7OqCMYWoJ66Y+Oxz+JmsLjjz1y\\nr7wYxOv10tnexaZNa+jocNHT206ukMPj8RBdjtLX00+tWqdcKXL69En0OjMjF8fYvGkrMrmceCTG\\nzp27mZub5fTpUzR4GnC6XDhdVirlEvlckUujF2lsbiSdyREMJWhuboEqJBIRGpucRGNxmhtbcboa\\nSCayLC0u0tzsIxFLkkykWX/NOnLZAqlMismpCTq721la9FMpVVkOhZiamUSj0XLNug1MjE8S8M+T\\nzaQZGT5HpZRjfmaRN9/ch6fJQ6FwtcjV0dFGNpuhtbUVk8lEMBjAZjOj11oJBoNkUxK1ukShrmJg\\nfQ/P/eoI+fBZcpEMzatXIcqgu7WVpWCYs+dG6e3qxGjWk85mCAdCKDR6env7OXbkOBq1mlwqjVQp\\noikl6W+xcvpcCoUo0uFzsDQ2i9ttxOm28+Krb3HPVz6LXICR4eP4BrrQKXWcPfkeu/du5dEfPYxV\\nhEgiQWNbJ5KUZXF8kqMHDjBz8RLdK9eQqQtcf9td1BQK3DoP3/jyf6dUyXDDR29k/5kTV8lQrT5s\\nVjOnj7zNp7/4BbRGPZ/89P/ghad+gN6kpVit0tbegSTlKRdKGI0alBoI+f1cHLlAb2cXKrWOWq3K\\n/NwsWqMOUamijoDJqEeu0CCVC3iaGhkdGeGaTTtJZ9JYHFZavK3EYsu0NDbwlbs/TTKd5MypUzzz\\n8qsE/XMI6Rx1ZKh1eix2B+ViCZlMTq0msLA0T66QZ+OGbRhNZhKpBBdHRnniiZ/T19vLjR/+KIVC\\nCafbTl0sAhK1sgyzxYZcrsSodxCLLGO2OrBYzUCZYqkEghKj0YTDaeHIkeOolDIQBBQKNfFYmLZ2\\nH9RktHe2I0NGLBKhwWuhVMkhl5cxmU1MT48TT0bw++dp8zWjVAk8/F+v//n0KTz4g/vvvXHHKnK5\\nLOlMmlSmTHOLj2QqyZrVq6hV4NLoeerVKhabGRBp9bUjihXq9SqiqCCbyJHNxqlWa4TCIaw2K22d\\nXSRjSRLxONFojGQyzvbtG0gkEyz5F5HLBZTyq4LQ1dnH/v2HuPZDuyhLJaan5zCbDOj1ZirlGkqF\\nDIvFgkypQCEqUKu1NHp9iGiRCjXsFgvVksT6dWtJp1JEYhnsDheORg/5QoH+gZXMzQfxtbWzfcd2\\nKpUyJ4+fwOfzotIoWPQvUi4LJDNFWts7KUhZLHYrpUIWn7eBeDyBy+WkVJdx7bXrKMkFGls8KNIl\\nHO2tnL10Ca/Xi0apotXbilFnJLScxO408/Ybr7Oio5sNO3fS0tRENBZn48a1tHU7uTQ1S7/XwuyV\\nAP0bBhFkEqVYkvlcmlt2buWpx56mbaiP139/kHyqwJtvvMzea6/n4pVJaqkEVMooNQIrVm0jV8uz\\nNBvg03/xJXLxed58+g027l7D6UsXkVHjRw89zBOH3uS7f/P3mC1aPvuFL9G/dgN6oxnqIs3NXaQz\\ndbLZFDs3rmf7h/dw9MBrbNiyiyuXRymXalgtaqhWOHVqmHXr1xOPLCNTqa528fmXEEspbHYbsWgS\\noVynUitj0LvR6uWYDDomJyaxms0UynkavA1kkmm+cOdnqcsq3PcfD1KLZDlz4jjd3T7szk5KQh29\\n3kg6HUOvN3Dx8gh6gxZBAI+7EUnKYLa4EYUyUiZFQ1vnVSBRPEJdFFAr1SQiERrddhBzmExOkukY\\n5WqOcgVEQUNZklCIMgLBKE6nGYvRTq1S4czZ42zbvgez0Y7daUAqFHA4HMgVAslkGrPBDDUBtdaE\\n0eBEo1WhEFTksxJCvYpSkOGyu9BrTVRFAw8/9ts/n0KjRq3G6W7G4WpEUKiRq5WEoxE8TY0c3Pcu\\nl4bP07diPUqtAbVGx08ffRT/0hIKpYpkPEEkGMLlaqC9owu92cL1N3yEVl83YxdH0em1WCwmWltb\\n6eru5czwCEaDna1bdlMuQCSWQWuwsxyOsmnjOt549WXiySwejweV6irBuVqtcPL0WeRKHVqtk+m5\\nCZCJ5Atlzp4dxu32cODwewyu2UghX8Pb1E06GcXn9SCvqz/wWSjw0Y9+hN6+FaQzBSwWG739fbg9\\nLkZGhqlVwGw2o1IpWA4HefOtd5BTx2FzIopy2ls7EEUNWoUOg6MBk9rCH946xD9/71F++vV7cZdK\\naEWBwVVrSedzzC/M4HKYCccTfOLOu3nrrX0cee89fvfiy6wcGCRXrVPXutm6+zaePuhHqVeBUo3V\\n6WFufIKbbroJmV7JmrW97Fi/ncnpLHd8/Dq++fff4uj7h+nrbGHszBWyySgbtu3hxJEDLC3P09Zm\\nZ+ziOCffu0xfh5ZyWUIrA70o46dvv84dg5sp5+tobHZmlhaIxZcpSVlKxQJnz57mxOFX0cgUbBzq\\nIeUPcc2GXTx0/31MXp7AbjZTLRS5NHKBLdesIR0OcXl4hNByBKlYRVDqMDe2MzIToq23l2AkhLe5\\ng2wuhrqq46Xf/Z51/WuYHPcjygzk4nD3HXfx6tnDrNmxg7nxACkhSyER5bV/e4qF4AJOh4OFxQCn\\nT5wmEFjA6+kgHFpGo9ZxaWyKCyOXGD51iJPHj/DK629QKVQ5NTyCydt4lfhtNhIK+skkM2RTctLp\\nNGJNjVAzUC5VcTU6mV28RDQ5h9GkoiDVKFcr6IxaVqwcoFqvIZMrOXb0FC5XAxq9DrlKTjKTYn5x\\nkUKpSGNjE8VShkg0RCAQQBRBoIrbrWf00knC0TmkbPCPjsc/iZXCT/7th/duG/SRiseIhUN0tveQ\\nTaeJR8M0eVuwO900NDaTl3IUSwUaG1zIxTqSVCWwuEh/fyeXx8eQqasoFHoC/hBqlZLFhRlMJiMG\\noxG5XIHD1YB/yY9CqSQUWsZsNmGxW1FrNEjFIpWShKfBg8lk59SpUxgMBmq1Cj5fC83NzZwfOcmZ\\ns0dRKMxYrBaaGt1UqiXmFqa54cbrWFoMItRFDh48RFePl1Q6QbUsIKIABM6eO006lyWTy5LPZ65u\\nRSYzyGVqVq9ZjSgIXDh3mvmZaT53912cPnuepiYfCwsBanWRWqWG3e0ilcnS1NFKJpGhubuHS5cv\\noBGhZ80qFDoFy6EINrsdq9kKSh2yuhydTktndzctbZ3IlSpczc3o1TpWrlnFvrdPMDLtZ123hqVo\\ngXSiRrKUQqU243GY+N0zL3J5KsSt191AsQJCpUwut0x8LsjeG7fzwx8/SWd7F84OJ1adF6kscebd\\n92m06/CtWU+jrZWvfe47LA2f4rljMzzwja/Q2tXKwQMHEGugMVkxWayUqjV62lt4/tlXsXiM6F12\\nqnk5boeVrVt38cprr2DU6VDrjIgqFceOvMdyJMGeW/aiFJTMz81hMJvoHxgCuUgsFqahqZN4Lsk/\\nfOs+NuzYTnNbE6lUhjdf/Q0P3vc/eO2dtylKVXSCjv1vvUYpOIo5nSQ9M0/3hkE0ahvtq1bR5HaQ\\nysRwmh0MnznHvv37+Oidn6Szq5XlYJDugZVsu/46tOUajY1eZEodBp32KlzWoGH08iXGr0yjVNVQ\\nKQ0sLgWoCzmMJit6nRG9wcpiYJ6mRgcBvx9RqFOtVggHE+j1SsxmIxqNlly+RDYjsXr9BuQyEY1W\\njVTIkkqlaGtrx2I2k87EcLstXLw8zq6bb0UQRJRyOQ8+9vL//ysFQRDmBEEYFQRhWBCEMx9cswqC\\nsE8QhMkPzpb/t+fIZArkcjlqrYZkOkOhlCMY9qNQK8gViggyFZVilsBiALPBTlNzO03eFno7W5ld\\nnOHU8AU0WivFrMjc1CTlUpxobJFytYKvpY252VnC4RBXLl9k3drV+APzFAoZotEQExcv0t3aitNm\\nxWRqoFLTEVpe4PobdqPWyDl67AjDI+eJxOL0dqxg19bddPb6kMsEDu07SCSc5a03D1MsyAkvJ6hU\\n6nzik7dTrykxmhoolHOodAKxZIz+FWtJp0qsGdzA4KqNvHfkGGqNDoVSy5WxKVQqIyj07L35FhaX\\n/bS1tfDiS7+lViuhVAh0dHuxm/XojWqsJj1rt6xjceoUrxzZR/ve21heXOae27/MlQun0ehNFKs1\\n9u0/QDCdprmvi3g4xNTiLLlamWMH3kRrUBKJRfHZA6wctLHv9SOs6F3FnH+ZbVvW4fD4UJobKAoZ\\nHv7+3/Czxx7htRefoVArsXbL9XzhLz/DwROz7FjfxeabtrBz3Y009nQgUqC9SUPHmnUszS7xyx//\\nlEdf/Cd+9cIZHvm7T9CzuYeRiyfo72slm4lg0ZvJJlIYlSI/uPenPPDoC8jzIuGpCZaXxlhamOf5\\nF35JR3sLlVyGp37xNPKanvPHT9DT6mXk5GlGTp1CL5eh0imYm58gNJ/EpG/h8tgwX//S3/C3f3M3\\nTpueH337fuYvnuK6m/fyyLM/pyDlEMo51FQYeeMVfHKJrTv72LDJxsF//T7/+OEbyPonSQWXaLA0\\nImiqbLxmgC999g6y8TxajQudwYBBbaYQzrKcnKdUiJKLzJGIhMglkxj1Llo7hzCYdSgVOoaHh2ny\\nuNAZNCSiSZRKLelMkcBimky2QC5fIZurIIo6Gr1uqpU6iwtzTE+MUi4WaPW2k46nkKQK0WgSqVhH\\np7vKoygUa6g1DgplDauHdjA+PEutrCNfEP74uP7fgcEIgjAHrK3X69H/y7UHgXi9Xv+BIAjfAiz1\\nev3v/1fPaW1y1B/85p1kMhlcLheyuniVoej2oFDpKUplCvk47R0tzMzMkIhniEWi9HZ3o1ALzPkX\\nsducVCoVutr6GR45icvlIBZOo1IL6HQmdDrNVYdkQYveaECplCMIMpqaPExPT6LXm4nH4yiVcqan\\nJujv70cmk+H1ekkm0rx/7Cgd7a14fc1EwxlGR85z0w0fYmJqEavVgihCJhsjtBRn7/XbyaQLTEzN\\nkUjEUKkV9Pb0s2/ffkRZDalUwGZ2k0rE0Kh1rBjoQ62WMzk5S1trB+FICLlcxGI18NILL7Jy5UrU\\nKj3FQomZuWm8Xh+FQhGLychvn3wWRIHHHnuUZ154jpXrBsj542zbvodYNcPkjB+n3YVWJmPi0nlW\\nr72GK5MLRBbGuX73Hk6OTyLLZXnyJ/9BMCnnW9//Gs88+Ah/d//fICksjF84RaNLzi/+4yW+/7Mf\\nk0ymKRbgVz//Gfd87FZ+/vK7fOK2jSxnCoSX8txw425+99vfUZq8wrZbb2FiZpZYLMKtX/gMl06N\\nsOND13J6ZIxcPOmFDgAAIABJREFUfJmt27fwyyd/zuTMItdeu5vGpgZu/dT3+Pytm9m+ey0rBlaT\\nTmcJ+Bcp1crIRRGtXEk8lWNs9CJtLgt6u42hTZuYmriIKMrJ5kVqsjpv/v5lbr7pDi6MjfL1B37A\\n09/7Pr959QXeeP8IweEpUrEIWgVEY0HGJ6cY6OiE0GUMmhrFUo14MIrObGR8cRnb2j1I2TQdAwO0\\nrVhHPhElmYxzdvg4s7ML3PMXX2NhYZ6ClMNqbqBczSKIZTRqA8lkHKNeTyScoLXZQ6laQ6vTMTY2\\nRntHKzJRRZ0yyWSSRk8rxUqRuekZOjraCC0HSGUyqBVqtCoZ77z1Nnd/9gssLi1htJhZXg7S0dFB\\nrpBkcnIMvV6PxezCYDAiCJDJZK42QuXTlPIxGjd97Y+Cwcj/PyvC//NxC7Djg++/Ag4B/0tRKEkS\\nM1fG2Lh5M6Ojo7jtXqxGO9FwHJUmR71eJhaNcvbsUdwOO719a3E6GslJWeanZ9m1axejoxeQCXUC\\ngRBFqU6tKgdBzcT4JGvWDJHNSAiCjOVwhLaOVmr1EiaTgYsjFzh79izbtu4gnYqxfftWGtxWxsbG\\nyGRymE12SqUSKwf6KdWLHDl2hHw8h8ViIpPP0NJkY2zsMh0dHSgUGhrcV8ddY4kM1WqVjvZ+LFY9\\n2WwOt9tNNB5BIa9htdlxu92cOn6Cru4OUqkEHo8HtU7LjoFrOXDgAK1t/awamsfpdCKTq4lEIrR3\\n93DNunVcmZimq6eLVWtW0OB28Jnbv8Ljz/6Ur9z1Ze665zP83V//JSVJwt7Wyap1gyBXcf7wQaYu\\nH8BiHuSpJ1/i4YdeZNVgJwtzV4gnRXZv9DFz+DkqlTLhQABfTwPd3n7ePvA8H//s7fzkn/6VPbdu\\no4iagS4vx4ZHWNHbwsJ8kJahtfz66ce44aMfoaXFyYuvvseaGwVmTh3m7n++j1cf/z3f+M/7GTt3\\nno5WH4tUiMcybNu+ly//ZTeBQIBf/OIXrDIp6VrhYWDgGuKp1FWh9DWSTsVwOuwsLISYXpxnz94t\\nZDNRzp4ZoaerF6e9Bb3VilxWY3FunF5fGy/+8nmy2SX8193IxjUDDLb7eO6+f0HSqVjZu5LldAmb\\n04klC40draRlSYyCgkKlQDAdpdlhZYXORL0YRGXUIYVmCVuNXDp/gWw2T0tbF12tPVwZOUMkEqNY\\nLGIfslDMlbDZHIhCmVKhhM5pYiwyQ4fXy+TYMN29XchEUMiUTE9eRiYT0Wp1+OdnyOdS6DRqosFF\\ndCo1nrYuKtUCVy6Pcf11ewkszSHlctTrdZQyJelkEqPBjMPiIZfLIdYFSgUJu91ORaozO7nEwMqV\\nBIp//Mv/f3elMAskgDrwX/V6/WeCICTr9br5g/8FIPF//P6/3ftl4MsAHpd1aO+AGpvZwsaNmykV\\nC+Ty0lVYisPJ7NwMvkYfJ86eYOeu3QQDCdRKNVannZnZK1gsNvJ5Cao1+vr6WFicIZGIIRM1KNQq\\nqIsUchlafI2UJAGrzUSxWGBqaopKucy6NUPMzs4yOzvLihV9JDNZbLardmcAGo2G2alJ0oUc9ZpA\\nb1sbZpuVUr3KwXcOsWpgBQhXW6frggJBEIjHo1CrcPb0OTZv2UZNgLEr49x2220sLi4Si/lRq7U0\\nN7WSyaR4/nfP8PE7P83MzDxlSaLF18iaNWuZnZ0lk8vR2t5BsSBx4MABVq1ahSQVgTrlQhanw4Kv\\n2cfR9w+zeu0GPn/3F/j2D75LJR5l+4c+xPT0PIVignJOzqc+/t9xuMsc2P82Dz/yCHWqrO/rwalX\\nMnLgHC+8dYhcHf7pga+zEFzknbePsfv6bbSZtLzw/CH2fGwP0UCS0+++jMfXxa2f+xwnD7zJ0Lbr\\nUJit/OM/3M+eFX2omee1N0bZNNBO1841fPdbz/LDJ/4RUS5g0JkJ+QPU6wKZXA6DVYPb5aG3d4BU\\nNgZyBW+9+gf0Wh3nRy+xfesGEtEIHreTpUAAj6+N9NIY5oY+NFo95w6/yp2f/Z/UvVeUXWeZrvus\\nnHNeq3IulaoUqiQrWrJkyQkcwYBtoJtkoKGNe9MNTdNgmmRwg92kbTIYbJywcZZsBSsHK1XOca1a\\nOec894U4e5yLM85mjL3PGTCv5hxzXvw33/+93z/fcD9L3jB1sRahKuWhf/8wH/3we+nr7OT1A5ex\\nNLpZs64fKTVsBhOJkhhlrYSlrZOFhQmk+RzaRIhKOUMdMXWxjFIqhlSmwmi2kUzlsHg8jKVzGI16\\nzGYz8WAQn99Hz5o1VEpVpBI5I6OXaGvroFAsolapkMmVmM1marUaEjGIRCLm52cxGg3IFArq1Rpn\\nzpzmrrvuJhAIoFTKCPj9SKVSyuUy1WoVkViC1WrF5XDyxlsH/szDKNPgaWN2YZb29mZkcinFYpFy\\nNUcgEEAsErGmu5uFhWX0RiO5fJIN7/7q/y+xcTsEQdgI3AT8g0gkuvb//lK4uuP8P+46giD8TBCE\\nIUEQhtQKCfff//GrFlUry8RiIfrWdF21Rfd6yedK1Go13J4GVoMBzGYzhcJVivLo6CixeAiZTIJE\\nIiKZilKpFFEo5DQ0OvEHV9FoVUAdrVZLIpHgxRdf5NChQyiVamxWBz6fj/7+fmQyGZlMjnyuiFql\\nRYSE8fFxyuUidrudns4+jEYrr732BkeOvM384jJWi4NcrkQ6lUUQrurXFUoVXV1ddHR04HTYkEgk\\neDweent7mZycJplMUy4XAVhYmEOn16BQKEilUjQ3N9PT04NSKWdmapp8NofH6aGQyxOPJ1i//mp8\\nXltLI067mQunz3Po6Em8yTgLU7MYlRr+8Z+/SCVboru3h6ce+zHFYop4tMTTLzzPJx+8HW8QTp8+\\nzZtvvsXs9CqCRkMsH+Ly1GWK5Qoei5Rjr7/Jt775LDfctZ/9N9/CxRPnuTy2yNmLw4QjPgxqLRJE\\nJNNxFpdXyCYypDIJ8tkCUlmFRDpBMVtDqhbz25+9il2lpqWlicbGZqbmxrGYHbR1dFMu12hvbSMe\\nixEK+BEVSkR9PjYPrqe9s4V169bR07MGBCnpRA7EIEFELLDK4oIfpdrAtm07eOfcGDKJHLNDze13\\nfYhHHvoGLc29vH7iPHd/+D42bliHVBChVOkoCQJmrRady8Azf3geaa2AQS6mkM4hoECoS9BqdFTr\\nNQSxQCqTRiIRkU1FsOq1zM170evMNDY10NnRweLiAuVykYZGNxsGNiMS1alUMvj9PsIBP5lkggvn\\nTnPo8GFisRgOmx2ZTEaxkCWZTOPxeBgeHkYulzM3N4/D6UIqkVEpltBrleRyBVZXAywur7B58zUA\\nhIMRAv4QMrGMs+fOkE0WKGSKdHT0sm3LTpo9rSzM+zHonCilOprd7X9xUf8fC5gViUQPAVng48Bu\\nQRACIpHIBbwtCML/azxNW4NF+NYDtyORSNCpNVwcvsI111zDxPgoG9etp1QqIZfpEEklTExOI5fL\\n6e7sIhCMgajChfPvsPPardRqFfLZDFaLh2q1yoG3DnDvfe/B7/VRKlVYWV6lb20Pk1Oj1Ot1spkS\\n27ft4sknn+T+++8nHo8Ti8VwOp1kcmGGr4yyb98+AoEQY6OTWJ0eDDo91VIGuULF4OZrmJueIpvN\\nXtUylOqYrDbGhy+yvq8HkUhEsV5Hr5Fy4Z1LtDR1IFco0Bu0nD17Fqn0qm5+fHycBx54kKmZSZJ/\\nhoNLSz46O1uZnBoDRBSyVXbu3InX68VsNlKplsjns8zMzbIwv0JLSxsfu+9evvBP/0YlkUbX6GTn\\n+k0YO7VU/Un+85u/45fHnmZldpR6sczLz73CNds2oFbJUNucmOxu7r7502h1Mva0GFC3tjKzNIPF\\nrafR7WLDlt38yz98H6Wsyq9/9zC/+tq/U9Dp2Lb7biTyFF2Du+hr6uKfH/wsN996OweefpZ9e7by\\n/T+8zqY2C3d9+C7a1lzDM8/+jpbmHiRSKQPrepDW5KwuTdO6po997/okX3rgdvzRDP3rhshXCug1\\nUvxeHz09axi5fIn1Q334vVH6OjuRyEWcPXmCTFHKj7/8IzZt9lAxKfjGwz9lcm4Cp9VKXSIQCIRw\\n2J0oJBJmZscwNjbT7uzghz/5Bh/7xwcJjU3T2Ormdz9+jD1r+1BKlZTKOYxGI/lqlbpUSbVWRiWF\\nnEhDWWHA0tzI1MQ069ev5+hbB3A6nYCU9tYWzpw9QXt7K61tXQSDQVQaJTq9GqlcRTAYRq9XMz4+\\njsvlwW238tTTT/PB+z7K9PwcUgrkc0U6O7uo1mscOXIYp92JwWBALJJTqwl0dXXx7B+f5JrNg9hs\\nFiYnJ6lV6ugNarRaDYFAEKPBilar5szZU9RqNWq1Gp965MD/t0hBJBJpRCKR7v+6B/YDY8DLwIf/\\n/NmHgZf+l4sQizl//qrsWaaUMTS49WoHaetiYWGBTDpJNLyMy26gVs4Q8M0yM3MRu82IRAz9A71E\\nYyFy+TSRSASfb4VQKERnZzvHj5+kLgKLxUJ7RysO59WxoL+/H5VKiU6nYePG9SSTSTQaDd3d3ajV\\nWqRiDV1dvSQTJUxGJ/v370enVaNWa0mn8rhcLkaHL1KuXO0QyWScfD5PMZfnN795AplMRqFQ4qU/\\nvUJdEFCpVEikUlKpFJVyjaHBzXg8jczOziKRSHj76HGcdgfNjU3kcjnsdjuTU2M4nBZmp6dZ29+D\\nXC6lt7cbpfLqugcHB1nbt559+/bR39+LPxzj3o+8n3OTfnL1LK8dfZuWNb2UNArE5jJOixKj3Upj\\nexuVaoFIKMy50+dYmFtgeXmVXz/5NTQOLYuxOAaTlk/ffw+ky/R09DI2P0OwVOVbX/8SCDLWre3i\\nfX93L2dnL3PjvhuxGZU8+/sfIakl+eFPXsRtljF+9iTVupj2hja6+/pYmJrC7XZjczmpUKdcF5Gu\\nlVjyBTj0xltcXPGzGAnw6qFXyFfLNLU10trSztbtu9DoDFdnZ62NuHeRklJGMhwl61vl8PPPs2Vn\\nG1uu34xMYyCbS1Mvl5mbnqJarRJLJJEiwmxU8vprryATW6gJcj74dx+mWq7R2tTB8tIKmzZvoC6U\\nKeaT5LNJ4pEwVCqYtGoMKiU6uYRKKoVGIiG06sNoNBJPJlg70E8ilWTTNUMEwwH6+tdSqdU5euxt\\nUpksSqWWp558Dp9vBYfDRiqdpL2jDYPBwMLCIsVCmZOnT5PNF2lsvIoann72GRKJBPtuuJGOzlbG\\nx8ew2U3IZBJOnz3F+o1DLK9cbXZNTS04nW2IxVpm55ZRKAzU6iIEQUJzUztr+wYYGtz8F9f2/874\\n4ABOikSiYeA88JogCAeAh4F9IpFoFrj+z8//y2toaIh8Ps+FC5cYH7nEysI8YyPDOJ0eujr7EIvl\\nrPrCbB7awm233Ulrayvh0DJjI5fIpOIopBLMBiNtbW00NrmoC0U8DU46O3rYPLgZnUZLJOgnEY1g\\ntzpw2jzs3XMtPu8S27Zeg1wm4vChAwxfuUA4HESjVdHd3cX8wgSxuJ9qrUAyGuTypVN0dnby6iuv\\n4PV6EQlikvEUEpGUeMTHT37wHT7/3z5FrpAnkYjRv7aTUqmKw+HA61ukXMnj9S1QrZXRatXMzc0y\\nMNCP1zvP5MQ48XgcpUIB9RpNni6iwSI33/gepiaWmZ2bJhjyc/HiRRCkXLo4QiQURqXSEI9lCQaW\\nWFic5Z//7S7uv/dT7Np7PclohBf+dIavPfYtdrfeTGpikXq5yH/84FHi6Tj79+2hvbebrRsHyKxG\\n+O1Tj4IEwqMX0dvWsWXvNk4fOYVJb0ArAn8txjc+/6/84oVxRt8eZW/vEA888FnENTmF7DLdPR2M\\nT0+gldWoKuoEFsJs27OLt05d4sjLL5CKxBgfHUZczHHm0EEOvPgUdruT3z9zkBadnK6GJvZv2YpN\\nIyUys0Q0nuDSyDAjY8M47WaGZ0bxexc5+9oxfvjNrxOMV7DYLXz1J/+dTE3Dt7/zKI89+jDr+ntp\\navWQSURpcjup1QSSySr33fsRDrzwY954+Ze8c+wi0yfPcOriMbR6LfMTAYbPXyYdu8p2LZbylKsp\\nxPUcybCPRDBIXSgxMXyOsZF32DS4AaNWg0gqsHf/dfhDKwSiQUw2Ozqjle7ePiKxBBq1jjU9vWiV\\nSuanZ1ieWyC8GqCcq7G6GmDX9XvYtmMrGoWcqakZbrzxRj7wgQ8g1EUcP3qcuflFdu+5jngyjt6k\\nYdPmDWRTaa4Z2k65JKDTGjGaVHR0NuNyuVAqFQwNDeJbDaJUaXG5mxFJ/vJS/z82PvzvXN2tLuHr\\nn72VbDZLvV6nr7uTeCJKrSpCLJag15lZDcxhNOqwGF2srq5isxsIR1KMj03R3tmAUqm8GrbpbuTS\\nhXO0tzVhtzv5+S9+zf33f4xSMU8xXyCWzOByOIlGoxiNRtQaOX6/j6WlJfR6PXK5kr4164lGoxSL\\nRdRqFXq9nmg0xsLCEuVymXQ6jVwu5YYbr6dUrFKv17l48SKdnR2IJXWUCh3lUhGlUo5MLMFoNJLK\\nZoiFY4RjUcQyKe++5V0899zzuFxGCoUiA33biSeCiGVVotEozU2dRKNRwuEww5eH+dhHPoLf7yUU\\nCtHQ0MTCwiL9/f2EYlH0ej1Op5OZmWn0RgM2q55NOz/JFz+2i8/+/BhPfOtTmO1qTh76E77RZeYn\\npDz25Jdp6mjjRw//iDvuuIGlcID4ygrFCsyPTvDqcS//9sC15LUddNhFFMIxvvebg9zc3cx7vvBx\\nfvYvX+F9H7ubkkiLq8PKvhu/xrFDT/HOxQu8+fPvsH37Dup6FS88/RY/fPa3FOtZrAoVBw69Rqkk\\npre9A0drK4jrfO/RX/H8gUsc++PjmN0GnnriaTZu2kjfukFef/lPuGx23CpIFjL8/ge/ICWGdQOD\\nuLuc3PPZLzN3aZhUNo7eoGZuapJUJMHAunW8c+Y0EqmIm296F4cPH8busNLZ3QtiDcGYn5On3+Lv\\n7/scZ4eP4LTY+Nwd/4RSIkIqEvj7OzspJqOU6wLbb74DucSIWqMiX60gstg4fHGMm265jVQszsz0\\nMPFEhn033MjrB1/lQx/8GL7VIBJxDa93Fa1WSywSoKevh1KuRiiyQCgcuGq4KhOolmX4An7aOxqQ\\ny6UUcjkKuTyRSITO7i7UKj0jIyNs3LiehcVZisU8mwY3EwqFiUYj6PUGVEod5UoRp9OJSqVicmqK\\naq1EsVi66ufpMmHf9NG/aHz4K2E0PvLQmhYtlUqJdQP9TE6M4XY7mZ9boLOzk+GRK/SvGyIU9GM0\\nmIknwridbhQqCWazFYPBhEKuQCQC/6qParVMoVjBvxpm0+Ag5y+c4dTpk1gsVorVOmNTozS3NqKQ\\naZianECplFIpl9m6ZTtKhYJQMEhXZxulUhaXx0YgsIpGo2BmehYQ09jQis1ioqmxgcXFRVZWFrFY\\nTCTiEXp7+giFQpw7dwqtVodQE7G8vIxEJqPB4wHE6PUmJDIJPq8XtUqDVKpkeXkRpVJJIBCiv28D\\nfl8Ui8WMQi5n27ZtnD93Dn/IT3trGwaDnlgywTsX3kEhERMOBtCqVbg8zei1JkLzEXYNNfPVR//E\\nliYb33/+GHadmj1bdrFjxwBf+e5nKPj81PI5+jb18oNvP8Kdt95BR986fvOTx7njjps4duQKzaoM\\nVpGdrXfu48kfPI7e6cTsMLK2p5Ef/eBt7v/8fXz6gW8xNT/PWqeEwJyPlbFxPvKJDzN+4jDn3lnk\\n3bfvpm6w4F9cQKu309rZjbgmJuT1smn3foYnxvnqt5/iKw+8l86OForFLKVCjv7+HiSiOhu3bcFg\\n0GI1GpgfH+O+T/8DQqVKppTi+htuY3zkFAvzM3gcDlwWB6FAlM3bd1ItFSiXyly+cpGB/g00tHZh\\nsjgYm5pFkEkxO+w0dfQyN36Zrdv2YlBKeenpPyJUIVURk08lKBRqTM3mkGkiIJJg1qtIFzNk4xls\\n7ka0bg8KjRq5uE7/wACReAStQkatXCSXiBEMh9BotbjdDRjMRorFItlcAY1ajkajIZXKoDNYySRj\\n9Pb28NIrb2C1WwiFQpjNJqxmExOj4zz93B/o7upDKlHS3tGMUqlgbnYJkUigWisjkSgw6M0kk0lO\\nnjzGo49+n4H+tbS2tZHNZhGJRIhFUv7rt39DgqjvP/KNhz5422Y0ShW5TIJ4dJ5n//Aand09eFfD\\nuJw2qpUSCrmCZCKNQInV1VUcdgfnzp5BpzUiQoxCrmR1dQWzxYhUKqe1tQWJqM7Z82fZsW0HBp2e\\nQiFPb2cXja5GLl28yL4brufy5RHWrOlncmKSWq2OWqWjXK4wNTmLx9VEuVhDqzYwNLSBVCaGXAZK\\nhZTxkSsghvb2NtRqFU6bk2A4RD6TYeeO7YhFdURiyBZrZLI55HJYWJyjp3cNP3z0cbbtWE+5XMRs\\nMaPS6EglksikAvFYkJpQwe20UyxmWFpYYNvOHZTKWXx+L2v7+lj2+skXSuy4djdKpYo6AvVamcW5\\naRobnKjsJj5+724s1RLPHvgRhWSCjuZOBKmc0FKYkeVpHnnwR+zZvIbd24aY8Yf5/ZNP4rFbqRZT\\nlKJRPEoBCPP2qct89POf4xvffZIP3H0Hi5fPsfUaFXKtnWv37+XG2/cye/kCJ98co79LT8Ou3Uwe\\nPsTIVJr9778Ft9NOOV8iEPYj1Gs0NHXQ1dvO+dOnMBjMOA1KPnjv+6kLBY4fukQoEGTN+o1MT02T\\nCMb55X98hZv+7r08/quf8s1v/I5/+8qnSJUFJmeXcLgbaPE0MjdyBbW0ztlTh5ArwLsawGRxce89\\nH2J+eZq6WI5IIqGpsQGDTk0oEEQpkmJ12KiIVETSWX77kxcQCSKy1TrRAsxEa+Ty4NLUWVkMcPzk\\neewGN/lClkSuTFN3L8Njo6xfN0ggFEGn06OS6xiemMHi9FCvlFi3boCZ6SmmJ0exWG2YTWZMZgtj\\nY7Ns3XYtqVQSjVrC0vIMvd09lMpi7HYbdocDjVqNVqOjb80AZrMem01PJBxheWmFlo4udDo109Oz\\nCNSxunUcOXyIfK7AZz/72at28ZXKVXl3uYzD4eDhx5/72xFEiZFg0rsw6lxYTI20ta/hgx/+EFKp\\nGJFQoVTMMjk1zKrfS6lSoL2th127rkMq07K4vIDWoMTpdBCJhNm0aRNzs0vk8wWi0SjJdIL33vl+\\ndFoTUrmCJk8TuUyeA28cZONgP8V8gQZ3I1KpFLvdzsaN6xGJRKRSCarVMj6fj2QqQalc5E8vvoLd\\n4iadjVCqlmhs6WZ1dZXx8XG0Wj35fAmNUkNdqJJOJ7kyOkI4EcbT4EIsAYutga6uNbz20ot8+O/v\\nJp+v4fWGCYVCZLJxUpkw9ZoYrcaCzebgxJm3kYiV1KuwND2NUiano7WNSDiGw26jp6ONS+fPYdTr\\nGB6+zB//9CJ/OniAL33za6gVRoIRgZ/8/CDPfed55GIxn/3YZ8gsLtPjacHtauQHT36Hf3/ox3gj\\nEXSyInffditSmRxPYzfH/SXEBiVLi0EGNw0hKtVY12PGG1hBarFiVbj4ypd+SCoUZHpkFbHRhtte\\n46Z3D6Gp1hhbLmJQyjCbVSx4lxFqZbRyMd4FH1PTY7z+wgu89upLuFwOtm3dxPjYOyRSabbt2URz\\nexdCKcrEO6d47/seZO/unXzvc5/h7nffw7mRoyz6k5SLNe685d3EVoJY5SqmFsYZmRmjWKowNTnP\\n5s1bMBrlXHjnGE/97jl6mtcQC0UZmx5HrlTR3t6KRFWjXFcyN3wGVSZGCYGyQo5KI6EmFVMugVKj\\n5M1LeU6NZZhYKPHor17md68fpW1tL0G/j7amZhamprBZzMTjcTKZDIODg0gVciZGL/HLx39AV3sL\\n3Z1tVIoFlhbmOHHkOA6LmWee+g1yydVD6NmZeebnZlhdXUSolZiZmiAej1EXSiyvzBNPhIhGo4hF\\ncqwWN/I/O5abTCYkYhkWRwt3fuAe3M0NJDJZtGorWo0Rh9OJyWQkmUz+xfX4V4EUHvveww81mipI\\nJBKmZ6ZoaW3EoHchk14N1rDZzBRLeaxWM5VKkZMnThIJJ3jnnbO86103kUomyWaLmM0WUukkg0OD\\nQB2bxYpYJEWpFjh77hShQJxatYrJpKOpxY0IOfNzyxhNRl599TWaGhv46U8fp72jk2q1xsaNgxQK\\nBer1KuVyiR07ruXw4cO0dTRz6fJlYvEw73rXHSgUatKpDK0tLRSLRQRBjFKpYU3PGjQaLaJqjWKx\\nRDKZYHF2jmg0Ru/aDk6cOM/ePTeQTl817BhYu45aBXLZPBKZCIvOyuzMLA0NHqLxMC0tHSTiKTKZ\\nDG5PI50dHbQ2OfCurlAq5tl/7S6a7FbuvvMODh58ndbWZvLFMI//6iBr2xu49ZYt/OS//sDR0ydR\\nW+R0Dwzx8Qc+wcypg2RiGcQ6PaNTVxDJU0zOZNm9oYFMSsHxU+d57wfv5o1XDxNb9fGee2+nFIzw\\nqX/5FJ/68uN85lMfwmR3Y6yUEcvKGM3N/Po3b3D7ne0Y3J3YPDYaGiyotGrC8RC+VR9rWuyYjBq8\\n8z4cHi3xWJZKrcTi3Cxe/wo/e+RRtDoDf3z6axx97U0uLOU4884yWm0ZBIGBNV0YDRKOHnkdk93K\\n4Obt1AUle/fdRKGc4fjxI0glUnp61jK0cSOvHzlI/8BaHHYPq4Egl0YuIinXifgWifuCfOQT30IA\\n6tRQiEXIa3UUUhFieR2LSkAmq6Mxy3nwm/9Ox8AA66/ZRj6TpV4vc+XyBWwOK0qVnGKhyIrXh7uh\\nga72JjQ6HX6/n+XlJZRKJSqFHJ1OjUqlwGDUIxIEZmdn2LZ9O1aLC7GoikKuQCqRkEgkcLpcdHR0\\nMjMzS6VS/bOSVsHi4hxKhYJ0KkWjx0U6nSYWidLc2ETQH+TI0cNs3Lgev38VkQj0eg0PP/7Hvx2k\\nUKsLDG6e2M+OAAAgAElEQVS6llyxRlfXWpRqFSveWZa9U3R39rCy7Gf79u0EAwEcDge33HILzz33\\nAg6bHO+cD3FFRiy8QiEXJJWMEoslkMjUjI8vIhbLkMkU2Kx2tm3bQS5foFgs4vV68flWqFMllY7z\\nrnffzOzcJJ/57KfRaDQ0Nnq4cOE82exVcolcLicYWmXrtiEyiSoeZytWsw2FXEwplyGViKLUqBDL\\nBBoaOyhXZGRyVXLZOuVyGb1eSzaXQiypc8+97yGdrLJh/SZy+RQajRqRIGFmehGXy4JKJZBOhCmW\\nS3Su6WTOu0CqkPufLk0dHV2Mj44wNTnOk79/Cre7Aa3Bzolzw4jVOmL5HKVyleXlZRTaMl/8/D6G\\nxxdo2bSBh3/9Xb7wrx/jwqlx1FIdC+NzPPLLIxx6/QQXXj/It3/4KOW8gr9/dw+/Pz7N5psHqeSr\\nJFJipKUSRrmYeijNmHeGBe8yR1/4Ad6pK3z2I/9Kb38LG67/BDJFnCoSaoKIBW+Arq612NwtuFrW\\ncOt73sNdd93FNbfeglLr5kNfewKbq4GhbevpaOtheXSB08++wvd+80N27hjk+MGTJJRN/Oz5X/Gd\\n/3yATUMb6B/o4VdPPIHa3spdH/goelsLNoed9q5mLo+PotO62bp9HzOLAULxLIJMwr0fu4d4NIjd\\nZKNv7Vpu2H8zVqeL6+/awz8/9FtkchEqpZgGvYBHX6fDocDmENiyq5/vvvwMX//jH/jmc89jdFlw\\nNXiYXVxCZ3Sg01lobWvg5PGj+JYWSSaTaNVK4uEgY2NzhMNJQpEwMqkSoVbHt7KIyaBFIqpjMhjx\\n+Xx0dXWxuLBApVxEpVAh1OqcP38eq9VKpSqQL9fYODSEp9GD0WpCkEL/hgHMFhsdXZ2UKxUMaiWi\\nahmtRonNbmTHtYMIoiLBkJfz58/jX/0bs3hXyGVU6iXsLjPBqJfV5SATk6PYHWbSuTjrNnRx8Z3L\\nWE1OYuGrzsX/+uUHcTd1Y3HZyFcTuJsbmV9apbm5mWwhTzAYxGhTcfnKWZKJHDqdgWQqQHNrI06n\\nm3AojsttRaURE4+HiURCAMzOziCXKchkEyhVUqxWKwsLSyBIUMjVjI2N0dzYyKaNG8ikEly5fI5w\\neIW2Dg+XL18kFFvi+JkjqPVyMrkEK75F3C2tGC1WguEAniYnZ84eR6WWI1NUGRl9B4lURCYToVrJ\\nMDc/iaguUMzUsdkNRENhWhuaKGfzpDNxlCo5oVAIqVSKXC5m7017uXD5HHIZtLU0YdDqkIlrdHW2\\nU6vm+OCH72N8fpIv/MenuXDkBPlijpreyIOf+SS37nsvX/i3L3LbnTvQ93fiX54nnaqw6do9VONF\\n/IEqkxdnGdq+h/v+8UuEJVX6dl/LE68dRFcoQknBQijAay/+jp1r7Dzx22Msz77Jb548TW+vDInE\\nyHX79pMr5ohG05w/f4VCvsrK0gqTlyb4zdPH0QPhWJZCRoTeKGZ85AK//OOvGLlwiTaLlSvnzvKJ\\nBz5GJV2mkI7z2Lcf5olfPM6NN92AUBFAJgJRgUhoGYWiRiS6gtEoY3LiEo1uE5mkH4Qyx954jUwy\\nxtT0BdK5AOVcCoPBwPkDU5gkVVTI6DLVcerAZlMwtH8bP33pBT7zHw9TrEkw6E0o6iKq6BkenWJ4\\nbIpkOku5VsVgUDGwtovVlSU62hvxuK247AbsNhMiqjQ0O9Hq1TjcHrZs34FKpcLhcCGRydm4eQi5\\nXE5LSwvlSgGjyYFcqeD2O97NyNgIYrGUXL5MOptFqZJy/OQJfKsB4ukMSp0Ku9NJR1cXsXgGu6OB\\nXL6EWq1DodRz5fIYI8OTtLa2IZH+5SrJv4pNoVqtkghHMWp0OG1OYokYHR0deNytaDQqLl4YY2hw\\nK3KZiny+yJo1axFLNAio8PqWOHHyGNFIjEKhQDSWoqOlGavZQDTsp6OzlWymgEKuQSQGpUpOJhXF\\naTWxvLBCg8PN2q4eNm8aRCxRoDPYSWai1KoCzU2teFf8CHUxTU1NnDx1DJfbgde/AqIq3T0diEVS\\nZufnSCRiIKpit7hZ091JKOTF0+CgodFGLBLhuWefxmw0EY+n6Ozu48yZU7S2NtLc2IJMIqWvu59q\\ntYzL6cbqcpHKZwgFomzcsI5iocANN9zAyJVhqAskYnFkUjGVSoVkPEFfTx8ej4f2zk5mZuaJptIk\\nEgnyuSqRYAypQs/RA69z9tRp1FUp1UyFvKjEzTcO4p0pEQzO0NnuIp4R8fLLT7P7hg0szAVolsLS\\n+Crx8Ay+xRDimox8qcwbR6dI54o4GlxoDEYGt+2htUvB8OwSlVKe4386TbPFQEEsIZv0ki9lQajR\\n1eZBJJQpZ2JIJEaOXBzjxg2dSCgRXA1SzJVY19dIuSJDkk5z6dIIw/OLLPmmQchhcThxNbdxz4f+\\nDoVMYHFmjODKMhq5FJ1eQToVo6+9DZEgoqerGZXiquelf3UZp9nK9p1br87mXi+lSh61SsKxk6+g\\nkYBOWaYsyBjY1MvA1kE++c0vE4wnqVeqjA2PkE/niaWy+GZm2Xbddob612AxqqjnM5SKVXp6+2jr\\naMe76uXVg68SS8dJJKPs2n0tblczg5s2IhaJkMqUPPPHFwlEImSzKarVChabE4lMSWNLC8srM2g0\\nGqzuFpyORjyeRix2A267g4XZZa7dsRO7w4xcLEGlsFIuQSAcpLNvALXegEKlQmXUI1drsNnsbNy4\\ngc7ODqRSyV9cj38VZwo/fOyRh/obDVgtNpaW5/G43ShkOhYXVnE49fhXV7HabERiMQx6Ax2dHfj8\\nq5TzBVqamnE5GwgEfNxw015WAwHi0QwajRyP283KUoBNW9aSycbJ5yEaTpItlDCYLEgVcrKFMnqj\\nhVQqh1aro1quYDHqSaXzOJ1urFYTiOpUKiWamlsR6jVUKjVP/eE5BvqHSKUSqLVa3j52nB3br0Wp\\nsFItp3Farbzy0mvY7B4W5xcwG/R0trdQyOWpVwU6u3rJZSqEIyE0GhXjo+No9CZm55Y4fPht3nP3\\ne6iUM0wMX6aro51EKs2GgY1UK2X6167hxPGjyORyfveH5xkbmaC3u49Dx47ibmzArDdx6O232H/T\\nDbgtJuLRJC6Ph47GBr74mUeIhiYRallWIov0rHOiqhn4whPnOfnWUf7bx79EavYKlgE32xs1tKzZ\\nQk9XAYlYz9hkmFtu3oHe0sjo1BzZTIjrd21l43sf4vqOXkzdBvYN7iQTPcNKSM6dH70Nnc3ExXNn\\nSCwHKInKJFejLM/Nc9OnH+HEc//J33/+Vk68dpgzhw5x0x03071pF0cP/ootm6/H2tXC9uvv4Hc/\\n/i+WZieYmrzMlh27QCynb9MWFlcmkUmV6A16UvESw8OT2FyNJOIZnA4rBo2BeDROMZdDIhPhW11h\\n285tvH3wDQw6A4FIkJtvvoGtN2/mtve9m8Y1/azffh27brydQhbEyHHYHcjrFbQqBRKxgNvTgEqp\\nRimVMT09RjwapbuvD63eiFSuwGgysW7DRkLhCFPDw0SicewOD1Whil6v5eSZ0+zffz1Go45IcJX2\\nzmYWFxdRq1UolXIUCjFKtY7gqh+prE65niQRz4AgIFWokEoVFPIlDDo16WwQrVpNJltBohIoVfOs\\nLPqRoCSfyqNQSrh8+QrlcgW9zsx//fYvi6L/q0AKYpEYs8WGd3UFtVpNPJ4GRHR2dqJWq6lUKgjU\\nKBWuCpNee/VVzAY9tVodgRoNHht6vZ6gP4LJYEQQ6ojFUorFPA3NZmZn59FoLOTzeQwmJZ2dnRhM\\nRpZ9XmKxGBcuXCCXz5AvZFEoFExMTaHVaonHEkgkEsLhIFKpFL/fz+uvH6ChoYHrrttFNBZhenaW\\n7u5uduzYgU6nQ0yV5eUlagK8+/Y7WPX6aG5x43LbANBqzAh1CbVajUwmRSqVgHqVSr1CuVympamR\\nbVuuYX56gumpGQbWDRFN5BGL1UhkCiLhBJOT46wbHMBiN/D+u9/D0Kb1PP3M79m5YxcGvQmTxcz6\\n9Vep26VyHb3BTI0KL7/6Evtv7Wdhzo835gWDhHgij7cUY9Aj5wv/9CBOZ53zR2cRSTWIw2FeevMV\\nolE5a9e0UilXkeaKaMtFyigYuzALkioaAQqlaZamUsxGkyhlVfQGNVarFa3Gwt69+xHrxTQ6XIR8\\nXnLlDAM2E12tHkKrCc69fpAPfPLjzM7O8cYLv6Wnvx91o4EQReLZIDv27+aTD36OoQ1b0Wg02Gw2\\nfv2zn2G12AmHw3gDSygUGnZduxuZqkYqHaVar7C0Ms/AxrWI5CLUGhkuu53VJT8SqQyVWoZdryVb\\nldDdOYBObUah1OFsbGV8ZorlxQVeeelFvL5l1CoFXt8ysUQUqVRKJpMjmcmgVqtxWG0sLi+xuLxE\\nMBhEQE4qWaC3ZwPvu+d96A2aPzekNGqVgpbWZoqVMnVBQKlWMzUxSSIeJx6LUCwWyefzxKIhZFIR\\ndocVlUJBo6eBWlWge2A9lUqFtuYWioUyGuVVwxan045QqyMVgcvloiZUsbuteL0r3HLLLQiCgNlq\\n/4vr8a8CKXz/ke889JF7dpNMJ7h46Qo37L+OaCyIVCaQyxdBJEEmFZHLZUilk2wcvIaR4QnUKiW5\\nQgFnYxuVsoDN7mZhbhmDwYxCoaaQL1IoQKUqINREyKRiOjqbuXJpBL1Bj1Imo1gSI5LJ6GzvRG/Q\\nIwg1SqUaMrmSWDTK0pIPo8nK2MQEmzcPYjZbqdfqlMuVP5NMjEQjMWQyKSqFHJBgtOoIBBOcPn2S\\nxmYbctnV34pSqYpY3H+Vgu124fMGMOiVtDQ3kIoHKKQTrK540Wk1ON127DY7cwsLSOUy5udnyOVz\\nVIUaCpWGw4eP4fK00dXZDojo7e3C02Dn8KED+FdXiEYzzM8uctNd9/D8M8+g0Chxt65l3y23YbOa\\nWfHlUVT0fOB9N3LrrTdw8017ue/j9/DBbz5JT6OFQ0fmyJXz3P6BPTzxixNocmFGIkXW23W09rtY\\nGFvErNby2OO/5E+//j5/ePYlFgMS3njpbeQC3HrrZno37iGaCiJR1KkUS0iqCpYXZ7jv0w/S02Oi\\nrauZp/7zB2z54F6q0Tzf+fYjFCaW2PWuu3jj8DEO/vY57rzjvdQoMz0xSaVeY3p2ikIhS09nF9lU\\njp7ubo4fO0Fvj5Pw6izHjp7g/fd/ju98459Y8S1jNbjoaG9jdGKGQqGCSq2lu7eX0bEx3E0NSKtl\\ndFoDQd8i83OjCDIJuVyBwc3X0NrWSj5XRKfXUixVMBrNyGUCuWyUTCpGIOWnv7MdrcWNSqEikYjQ\\n1OSkXAQJas5dOEJ7Wxcda/sIhCLU6zIWFpZZ07uWbCGPy+VErVXy2itvolYaUChVdG8YYnZmmp41\\nfahVWnwrq2QzWaQKkIlFGLQalpYWaGlvYPjyOSRCGX9gEZlYQj5VJBUPIxUXWVkeo7O3n3KlgsNl\\nZWF+lJ89e+xvBylIJVIkiGlu9NC3poPnnn0KhUxKtVpFrTWi1hpJpTMMbhpCLBVRKGSxO6zUalVG\\nRkZYWFhALJISjyXRavUoFAoaGz0IIpiYHMPtduPxuPD7/VTLFVQqFSKBq6nNWi19a9Zw6vQJUokk\\n9WoNpVJJsZhHrpTT2t6CWAylUol0Jkdvby82m42mpia8Xi/pTIJcPolMJiYej3Px4juo1WokEhFr\\n165FJlMgl19lsMXjMTo6uujs6KZSqQACapUKg07DjTfeSCDkZ9d1u7BYTFRrdXKFPIhFVGtlZHIJ\\ndpebXXv2kspmsNhtnDx5nIWFBSwWE5VKhXcuXEYklqJR62loaOD9H3gvT/zyFxi0Nm6/9b1IxGL+\\n+OxzaHVKYv4svWs3INaoqdaMLC1FOX/mHGee/xoHphP8yyduY7kMy3NeNuxci94sQy9IUaukVPNF\\n6oKITFXE2IpAY4ebagbMVg0NCkjnpEilcgSRwOKyH73OhkKlYWlhkanL5zn49Gvs2Ladw889zfTC\\nJFZTO6fPHUdVkrNl334uT1xi/Mo4D3z9C2RqZdwtTbia3Fy7eyfXDG2iu7ubZe8KoXAYqHPtzt1I\\nZBpEMjlr+7uZvnyZe+65h02bhpArFKRyReKJLE5PIw5nI16fH6vdjECZYrnC9NQiJpMZvV6LSa9j\\n+/bt5HI5TGYr1XoNmVyBTm+kUK4QiYSwWi0olQo8Djtjw1dYWpihXisRj4RZXl7mzMlTPPGbXyPU\\nRczMzREMhGnraCOfSdDa2ECxWsHhcHFpfIIro1e49c67yOSy+Obn8C0s43Z5WJibIxwOYXe4KVVF\\nTM/OkEhEGB8dQyaRIiDDZnWDSEEyU2RmahaVSoXBoMNis1KtCQiCQCwWY2xsjPUbBv7ievwrQQpf\\nf+j267aQimdRKVToTUZqdZDJlBTzZZYXl2jv7OP02XMYTWbaWlpJxBJoNBqcLgcyifzPFlQimho9\\nJFNRlEopgeAqgxuHuHz5EieOH2fvnj1UqiUsJhcrXh8WiwmVRkEml0CtVNHd1k40FCKeTmO2GJDI\\nQCKWYjAYaG1rJugP/k+SU61WQ6VSkC/GAchlS2wa3EaxWCYSjWI0WFlaWkIpU6JUyqnWBXQGDWMj\\n49SrAmqNhosXrpCIJ7hw/hIgxuF2EwhGUai1lEtXNRVOp51cLoNGLUUmEXPxwllaW5oRiwTUKgVd\\nnb2US3Wc1kb8Pi99PV3ks0XcNhOFdBStQsV1+7bx+I9+TiIcYMeOQdRa+NYv32RyZp733LGXcCDC\\n/OIyu4f6WZhP89h3v8wPv/JFJlYVJJdDXHNbPzabnpG5VVb8i/zDp+9heD7OpXMTKOo11OoCG9qN\\nPPPmMtf3SLg0D/t292Jr66ZncJBjB8+y5ca7eOOJn3LDHbv59jeewGIScezAER76+sNcPHuKgweG\\n2b17M9s/cDPrBrewf+c1RGI5pEoBv9eH0WiiVK5w/NAR9Fo9VqeLWDpD/zVb+MNvfs+mXduR6xwI\\nVRWNLjlyuQGzyYLF7kRAjFymwtXgJpaIUCoUGdx2LclwDv+qHwGBdDqJCDlSmYrh4UvkCwVy2RSF\\nQppqpcpqIIjb4yGbCrO84EOj1mHQaanXaswvLJKIpdl/461o9Roksiq9PQ24XI0U8hnMOgNVkZym\\ndg+HDx/GIFcxOTaFWWdi89bdpAopmppdLIweo6O5nenpeUwmE2fPnUZrdmG2OhnYsJZCOkEhV6JY\\nLKLUGTEYtITCQdZt2o5IVAKJgFguRaKQYXa4yKQLuDxObDYT8UiIR3998G8HKYAY3+oKNaGKWCrD\\nZLCgVupIJXNcuXIJk1FNMhZEhBi1Ssfhw0eRSAWQylGptDR6Gsjn0hw79haRSJh1A0NMjM/T3tJL\\nrVZHo9Gy5/rrWVz2YTQ7iafjuBtcBINBsukM1WKJnq5ufvrLX5LM5Onq6kKlVOJbWUUlFyOhTD4V\\nIxxYJpdNE/AHuXjhEhKxnFSyCHUFTQ2dDI+cx+ubpcHtJhmPYTXq6GlzIxLqyCQiwrEgrR3tyBVq\\nQsEIO3dvY826QW687X0Y9Q4MOic2WyMajR672YRSrkKEjHpNglikwr8SoKWhFblciUKhYOP6DSSj\\nEUauDPPOxZM0tDVQk2joaF1HOp+gItQYnb3Cm4ffppCLc8veLYRXvIxOTnDHLg1D3XY2v+tfyWZi\\n7N3eQUUkpblVTjw6TnOvB5tRTVEEPe52/KOT+GMiIik5b5xZpFYV4TApWLe5je9+7wCvHJxlm7pA\\nuSZDbldgtNlZXg2QDk7j8ahJzL5DwpfkscdOodKAhDL/8NV/4Tu/+DmpuAin00hVgK9+8T8I++cZ\\nGRnh5Vd+z/ylEdb1riUSjrK4tMyOvbtQaWWEVybZPdTD5OlDbL6mm8XJOTQyGbGIj2ymgiDU0ej0\\nODxtiCoZfAtT2E0GHFY7TquL5fl5pCoZJqMOEXWUKgVSjQqLWYVKDkKxhFmroV7IoZLX6W5zIBcK\\n9PX1YXHpaOhwYW9wUlOL6Wp0YDPIOPjqM4RCQRpcbrQaFWaLjlBoEQklYuFlnv3t79mwfoDFsJcd\\n1+9g0b9ANhuFioDV4uKW227n0sv/nYHeVhweN919azl/6gzLc2ME51fwe6O0dLbT0t1GOjLLpbGz\\nTHtnUKBg7brdNLYOUC6BGBUzYwt0dncil0uJxZMUS395Nf5VIIWf/PCxhz50+x6UShUKhZLLV65g\\nMOpocDuw2i3oDUbEyFCrZDjsZkrFLGqtlmoVlhZnEYlknDx5hLvfdydSmcDigheZXCCTi1EsVslm\\nkpSLZfoHennydy/S1OxmanoUtVxHPp2jp7eH+bk5BoeGEEvEFAsFPA4XZqORcDhEKpXEbDajN5mo\\nVWuMjo7Q2dXG3OwEoUCUSrlEKhVHb3IikytIpXNodRpMJiNVoU6pXCKbztDe2sHs3CKVWhWX241C\\nqSERDxOPB6kJZU6ePolEJGF+bg6T1cKbb76J0WCgq7OHZCaBUKtisVupVGssLvuYmBglEPaRSIVQ\\nKDX0dHVgMaqIZ5bJJiScOn2W5uZmNDo1wyPnEYvVeANRbr3tTpSyCv/00ENc22knsirB0dNGLS9l\\nZuYK1XwNkTTPIz/6Om+98iaHXztPt0vP2Goau9HEoVNXuOX6zWSXJjkxEua667oJRCt0t6mpyhWM\\nr/wP6t40SJPzqvP9vW/mu+/7Wu9S+9bdVdVd6m5tLbWk1uJNkjFgsA2ewbrMMFyYgQEMM4xYbAwX\\n7LGB8YVhGBsMBu9YUkuy1K3eu7qrq5fa9/2td9/3NedD60YQc+MOihvzAT8RGfnkyScy88s5cf7n\\nyfP/F3npFz6M0eXm7sw0JreLWKrAzOVFLs7M851vfpa/+e6riFUFbp2M6xen+K0v/A4PP3KKM+9/\\nkmQsBko1zz71fmy+ELVWlb6+Pgr5Io36fY3IaqNDPJ1HrlDjdPopFsrI5TIiB7tUqmU83iA6nZ3b\\n92aoFvMM9Pbw/e98B7PZyOtvv8nIocPIFSKdRotatUij2cLjdFApFalXy5gMBuKxGK12k57eHvL5\\nPEqlArlMRKcxkckUSMfifPcb3+PJ979As92i1anRatYJdXnY3V7j9vQSgd4BurtHqJcKVOslugIB\\n3nztVfxeJ6FwCIVCSWx/D6VKyfzsbbLJCFvzq/gDQaptBY9MjqFWy8gVCqQzWUwmK/F4ilJNhclo\\nw2GxsbGzSb6UQerIKJYzZHN5RkaPsLq8hqjSozeauXl7mm+cvfXDkyk0mg1EhZz1jVUWFufo6gqi\\nUqgx6PXs7e2RLxaJphMEQl1oNBrMFgeiSk+tVmFzZ5tSOc+//Bc/S6nQRGrflwQ36C1USm2ajSoy\\nmUS+kCUa3ePFH3kOq8VJKDDEwGA3Dqeay5feAFrMzS1g0FvJZAvEkjl2DxJ09w2TyddIZco0mw0a\\nzTIajQqHXY/JIpAvJiiV05jMWtRKDYePTGB3ODCZLHh8XcgFkVjsALPFxNy9WWQCKFQiJouRdDqN\\nVqslFPDj6fJz4sQkfQMBBoe6iSai/ORPfhifz0XkYI+hwRFqzRb1ZoullTXC4R5+4mM/TTSewu3p\\nw2xxAQLVSouFe+scxDYIh4PY7BZmbt9gcOAwjWYJta6BxW4g0H2IYmSP8PAAn/7DL3Hpje/z7371\\n13nk9JOk61l+9BM/x/L16zz/zBN0VEpq5QqDASef+uRzlOptXr94CXW9jBwZP/4rn0Yna2EQKly8\\nl2Sn3mRzZZ3Xvv8KPkcQm9aMqdHmrevzfOyxbn7uJ3+FX/vF/8Ti3Vv82Zdf41c+/RtsbGzwi7/4\\nS1RLbRQaKzdvTHFrdgaNXksyXiOfa+H1hJEhIrUhGb//S29PuJvp6Wl8XT52d3cIh4OsLi3x+mtn\\n2dlZwagTcVh9SFqR0+9/mlwux9HDI7QrZWSNJq1mnZ29XWZu3aSYSRHZ28Lv8dLpdPD4HARDbkQB\\nRLmAQhApFnLcvncbk8XI3OxtPvKRF/jLv/7vnL90BZenG1HQki9UCYQHaQsCgiiysr5EvZZBKaru\\nMyD97M/TaspJZwp0mi10WiVyQcAdPsyTP/FLaFUi19/4BwqFAmffOsftmbs02yKhsJtMcpV2LYHP\\na8XvNeCyqRjq9iBKTerlHEeOHeXIxATLqyto9UZMZjvNlsCjjz3xnv3xn0Wm8KUv/MHL/R4dOr2O\\nhx9+mEq5gCDIKZbKpJJJivkcuVyKcqlBpVInk8ngcdvQa0VEuYDRqGdpaZFo9ACVSsndezOYLUYk\\nJDa21mm1agS7etDr9MRjuxjNamLxXVrNFqVKG63BQqGQ4+jRY9yYmmZoaBBRFGg0aiTTSQ6/y9+4\\nvbXF4OAIyDpUKk2czgDJVJaD/QM2N7cI93Zx++YMzXYZvd6MRmvkxvQMXYEQi0vrKNRaHHYbY4eP\\nUClXabfbVMoVmi0JJAGvz8/qyjIDg4No1UqKxRLxWBytVovVZmJ3b5vz58+RSsZxu5x89zvfZGf9\\nAI1GTSKxy+TRR7l95y61ehWVUssTTzzJb/7mf+Sp00/xyIMn6Q7343YFmJ9fYqi7//7WabPKhx49\\nwq+9/Ff8+1/8KTQKHVcvzqBVyrh44Q2MGvjEBz/AZ/7yAjZ9DZ/XzdWZZfSdCr/xHz/N2+9M86+f\\nH+Hrf/EWzsPjLMf2eMwDn/zFT+Hw2rDYlUgSLGxcI7cbQV2vo/a7+NaXv8LOepyT7w9z6/YiL378\\nx9FbLIT7e1DI4MjEMRxWN7fv3MTl0KJWyRHkkC3m2d7ZIBjoIhAIEYkcYDObaDVbpFJJTBYzbk8X\\nDz78KNlcAYNWT76cwefpZXcnQn9/L4VqiYGjk7TaoFbr8Lh8GAx6kDr0DvRRqJTRG0zsb29hNRlJ\\npuOYbWY6ckjlCph0BnweL2q9HqVGh9/jx2y2odMbicfWsDsdWFxuBkcnKBeL1NptBKmDzxtmbW0Z\\nrZc6lXIAACAASURBVNYAkoDRbESl0VIoVujydVGpd2iqrbjdJi5dvMhHPvkvcXq9mAwqOlIVQRDx\\nBwcxWtzE03l8R04QieTRaTQoFQIGvZ5iLs/s7TlcDjvlUoF6o4bZZGR7bY0/+/pbPzyt03/4ud99\\n+YkTvWg0amq1MvPzs9RqFax2G06HB5vVRa3aQaGCYjGHTqNFFO7P7XYbhUKBer1KMNjF9s4GSoWG\\nbDpPONxNrdmgOxgkmcyxv5/AarARiyYxm8wIYhtRrkGtVtPTHaDZrCAq5exub+APBokeRNEZFDTb\\nNaqVPDq1hXy2SrWWQSEqiCeyLC8v4/b46OntIxnfI3YQZXi4j/6ewyATkHXqHD/+ECsrq/T2htHq\\nDERjCTrtNhqtAkGQMOp05LMFLl5+hxMPnaRQKDN3bw6vqxuZvI1SqWBpcYV2q8WhQ4McnzyJDD1P\\nPvUYLpeLtbV11tcXOXbsIRaXZpmfvcvzH3mBg/0IT5x5FqPZyCv/cJajExPUOy0Uoki1Uaery8Zf\\n/dVf4gn38du/9Qv8xIu/yod/8nlsjiJBmx+x2WB9eYvvv/4mo+OjdOpRrF1DdA7WKDR0zC+sYtBY\\nmF+8xu5WAVGZIZdu8qFHh8l1KihVaspNFasLG9QrKqLT1wg89Qy//vn/wLf//Gv8ypd+jY989Kew\\n+Jxsb2xweOAQxXINvUlJuVwnnU9hNtowW0NkizWUeiNGVZsunwelQkkqnScQuF90bTQbeNwejAYL\\n6Wya/b0dnG4LMnkHQVCysr7JyJExNra2kZpNNpfW6Ds8wezcbUKhPmx2N8Vyls31PVrVDoICGs0m\\nDrsLSd6m1W5TqTZQqzT4u3xcvHSRcrGIwWhGUEoYDRoyiSyj45OkE2kuvHWJWrNNpd4gFPSRiMZo\\ndepkM3GMFhtdoQB3Z26gUqvZ3thCo9Vx+8YMDpcdm0akiYxaR2R2boFULM+Jk8fIl7K8+v1X0KrU\\nGGwapGIFtVJgL7aNSaMCmUBTUqI1qFGoZBhU9/1ke2uJiYkRfusLf/fDAx8EUaRSKRKPR9nbizAy\\nNEF3aAiXs4tIJEkqlUWhhHa7ic1mweVyYTZbiR4kkCECIjabA5lMoFgs4nY7sTnMrK4tc+zocdpt\\nidHRUfr6wyytLWJ3WKhWGnzx81/F7bGhUHQolausrKyiURsIhUIYdDp8Hi/FfAGFKJLJZFAo5CBr\\n4vcFsNlsmM1GXC4HBqMas1WFyegkEArTbMtYWV8jn8+Tz+dZXllCb9CxsbEGkkS9XsdqsbG5voVe\\nZ0SlvB+Yjh17gHq9SafTYWLiGHsHuxhMZvYiB5w7f4Gnn/kgkYMkSpURq8XF2uoWo4eGeeiRCRxO\\nI+trK7z5xhuEw2FuXLvO2bNn8Xrvp8Jur4d6q0k0esD6zhY+X5BOR8mJk48yNDLE0soqo+MhOqUi\\nOqMLvdfDK29eZH8nw2akzlMfPIlBAUGbjVpDwmcT2VnapZjPUsy30YqQideQC2AwaDCZLGiUatKJ\\nOBMPHKcuaNmKygh5nNz4m//Ocx95DLPOzH/49G/THQ5jMBhZWrnL+sYKWr2OerWG1+VFFEUq1Rzl\\nSpbI3iZqtUg8HkGjVXL54jmuXHobuRx0ejVyoUMiHsegM+Lz+1EoVNgs1nchZBGpU6WrK0gkHqFY\\nr7K9sorFaEKtUtCoVYhGozhdNuwOM81Gi7W1Ndp0aCFQLFepV6rkiwVqtRrd3d3ksgXy+Tx6jZml\\n9VV8fV6EegO1XOTRhx9CatfRq1S0G22Gh4fxhnqQBBGDyUipWkEuB4NJS10mJ5kvcuL4OK3YAt/7\\nxleYmrrA0MQRVCoFxXyJlZUlnG4Xwe5+2jIFjWyZVqtFvVpDaMJGIsnK3h46k5nugVFsdg+lYppS\\nIcOx8QnWVnffuz/+s8gUfv8zL//CT38Ui8WOxWzFZHWSzmZRiCIWs57dvW1EoUWj3sRqtXBr5ibx\\nRBKX28Pe7ja1ahmD0UKlXEeS2qRTCcrFLCuri8zeXaBUTLGwMItcEHjo5MNYLDZUKg19AwHeefs8\\n5XIWp9NHsZTH67Lz5ltv4HQ5sTocdFoyUokMXV3dKDUC27ubGPQOdvcP6HSa2OwWvJ4gmVSd0bFD\\nqDQqRIXA5tY6MnkDr8dFsVRmauoqP/9//hzzs/cIdgWJJ2OYDBokqUEoFKRSrrK3f0ClXL2vAFWp\\n0pEqqNUaLl2Z4vSTT/LFP/48bk8AhULJ7OwNPB4vSHIGBrpwOy2cfeMVTpyYpN5osbS4wM+89Cmu\\nXbnI+soGuVyK7/z9NxCQM3F4gkQmTiweQanSkMkXKJZzfPSjL9KWF/iRD/wWff1Wjp06wSd/49/y\\n2c/8DX/4n3+ZD/3mdzguT/H95TohZZWfeul5TGKLitePrbzPWlXgoAA/++JjmI4MIWp09Ng87K7O\\n8tlP/wlPPtfLhTcukthNgl3Hsy8+g8mhJh4tIlMqcDocpHI5WnWJeq3A7u4+Fy9eoloqYzCaqZSr\\ndDoqukJ9NNsSPb199PT1kownuPz2ecbHxqlLDRaXZknuR9jdPUBvsBHo8mM0W6hVyySie4yNHEGj\\nFyjkM1hNVurVGkajgUa9QTDk5trVi3T3DJEr5FnbWEKqNjl6/AEajSL1fAq1RkmpXMBg0CNJDWxm\\nKxsrOwhyLYGBEGa/l2whSb2URaFU4nS52d/bx9s/SrnaRKPUIMiViGoTgkyi3WjQ1z+ATG/B0zOM\\nwuDgoSfeR7lUors7iEJsoFMrkDXbpBIxjDoVdpuVQrXKuXNvYzQY6e7uZXN1HZNCTjWXYXVxmcDg\\nCIJKQySVYXZug2++fv2HBz78xZ/96ctPPzhMLBYjHA5TLNeIRPaJRqIcRCL09vUwNNiH3e6kWq1R\\nrdXQG20EA35u3pgiEAiQyeaJxaP3mYzsVkrlAhaLlf39ODq9SC5XYGBgkHwmT7FcBFmTyMEWep2O\\n/v5BBocOcePmNcKBAA6Xi1u3prHZrCgEAa1KxeUrl7HZ7VQqVYrlIlqNkna7jc1qp1yuoNPryGRy\\n5PMpjCYjly9eQYYcncZEpy1RKpQoFrLIEGlJErlMkmDIR6fTQqlUUyxWmJu9x+BAP++cP4dCoaRW\\nK7O1tU0ynuD0Y49ht5vJpHKsLC8QCvupVhscHBwQiazRbte48M49PF1+Hnv8CW7fusHJBx9k6vI1\\nhoZH6Ar4iUei+H0Bwt297OxFqVWr2B0ODHo9ClGBIAnobRq8VhMXzr3JB59/ntRumum330RRjnJ5\\nagevGjZSDY50KdGb1ZixEX7oMNvXpog0bVRaIiFTmofe/2GajSa5dBxRMPGHX3sL7V4WmanFS5/+\\nJY709NEQZSQzGWqFKjarBTltAsEQr599k8MjfXSHexkYGrxf5BMFbE4bmxsbuFxOMtkkZouJWPwA\\np9PB9tYWi8srjIyNMdjbi9RpIxMERkaGmbkzA/ImEhJ2i5Od7U1sThsalYpyuY5cJpDJZMjn8jhd\\ndrRqNfXGfcgwODCEx+dhZmaaQrFEtV5HrdbTarbJ5grs7uyDVoVCoSSXTBBNJFiYnsZlMdBsagkE\\ngyTT6wR9NnZ3E1jtdhRaDTJBgdHmRmpVUStVdJCRK1RJxTdwu92UCyWkFmQzGba3YmQzSRRqJQND\\nQ3Tk9zse/eFejp56jEo+j6BWEe7tQ6tWU6mWicaj9PSNo9XY0GgMTIyF+a0/+toPT1D4/B987uXh\\noIpcPo1CIUelFlGpBC5ffoeHHnoIm83G1l6U69dvYrU6yOXyOB1m0ukMBr0OhULk4CCCzWbh9Okz\\n7G4f4A/0YDI7ePGFF1CKGnp7++jQYmV9gVqjRjqfx2Htot5oka8UWV1ZZ3h4gHqrhVplJhQMIJfJ\\nEERAJiPU3c3K8gKTk8ewWazvynXpqFRrdOhQb1TR6XUMD/TjsNqYm50jGothshtpteucfPgE7U6b\\nYuk+B+H+/j71Wge/r5u7d+4xMDBApVpAVAlo9Vpc3vvq17FEhB/98Y/QataJRVMMDfej0aix21zY\\n7Q7azTa3bk5z+/YdfvTjH8PttvGtb3+P8aEhgl1eotEEg8PDIEloTGo2tteAOkMDfQTDYRrNFvlc\\ning8yu3ZFbQqFeVGko//9CcInvo3bO9s8Bd/+Rn2t7apJrYYGA3yuF/Pxa0G+WyF5x/p5jO/91Xe\\n94CVv59NoxBbnAqoOSjVUKgMDJ+c4IUHf4YfGzfxy//3Zzh36TJqtQ7rcD/B3mHS20k8NgevvPkD\\nfP4Q4cFxPGYzkkwgmkihVSkwW00YdSrktTI2u4XZhXv0u33cvnsDs0bN1asXOHnyYaxuOwqhw+tn\\nf8DRyWMkE/tcv36FEydPIsgUvPbqG0w+8ADIJErlKqKgR6dVYDGr6HQaNGod9vei3Lw5w9j4ERx2\\nO+lEnJYEwe4BHK4gxUqNxbUV9EoN/mCIYCjE22dfw2wycebMMzgcFg4iezicflKpCt39wzSaec69\\nfR5foAt5s4Zdq+bcG2cxGzSIMg2CzozD7aYY3cRstGHQ66i1qlSaRTKFNNtb9+gN9aBSGinXm1Qq\\ndQSZkrM/eBNdB2QKOQ6VjVq5xPUrl0mlDnjmfc8wv3CJWGyeUjaCJLn5oz//2x+emoJSpaK/fxBJ\\nkohEIkQjEarlMt2hHgRRyfb2LnKZApvNjtFo4v3vex/tVhO1WoEgwu7uLpOTk6hUKiRJon9wEJPF\\nzsihw+xsr2O3mokdHKBSatGo9YiiiM1iYmHxNqICFIKcq1evsrS0SCKRwOvrIp1O4/P5UIgqbt68\\niUKh4MbUNNPT0+zt7dHpdMjlCgSDYXQ6A5nMfbori8VCMplCoRAwmvQoFffZfy5dnKJWg9GRcbp7\\nBxmbOMb+QRyZIBLu7qVWuy+T53Q6cTgclEolWq0GdruN7e0tBKUCp8dLW5LxwAMPIJfLuXfvDgsL\\n8zTbBeqtJpGdDMsLBwgyOXaHlUh0n1AohM/tIp/PYzZbeeGFDxOPpZHLRVrtDj09PfT3DzNy6DC0\\nG7i7PBydHGZ1dR2XwcD3rm+jktXJ3VrAPzDE8aMDxKotdlMlssUSKGSYJIHF1QxGEWSNDulUCbUg\\ncf7iebaX1xkPmMiUSqTzefL1Op/45Z8nP7fBhYvnGX/haax+G48cHaeey3L2a1/jzs1bWK121Go1\\nbb3I6r1ZkFrcnbvDhSvX6e8epNiRc/zUszT0RjptObUaeJw+9Do1nWYHg9FGqGcQjUFPo9FgPxLl\\nyaeeZnF5ib3IPi6PD38wgFxQkEil0Wi1zNy5jd1u5fFTj7K5ucnKygrz8/NUa2VmpqeQ0cKg0+J3\\nezBazORicWxeJ0+deZYnzjzH1PRNZJKahx99gmwhR7NTp9Npk8nXCAUn8Ib6iSRyzC0tc2jsCO12\\nG4vdht1qIp9JsLu7TSodJxKJoFEZcbpGGBl9gp/+pd+hgRp/oAupXWH+3hRai4nJyWPIDWrK2Syf\\n+73PcPGdt+kJd+N2elhf3cLqGMRk6sPvGaWjKr9nf/xnkSn8/md/++WAReDwoXEEQYlK1KDTGAA5\\njXods8VCrdJmZGSAxfkFpq7fQC50yGXT6HQalpe3qJQbtNsdLFY7ly5ewenysLW1RbWYoTsUIpNO\\nkc4U8XlCyDpKluYWcfls7G/vEN2LE+r2o9GoUapULCwuUKtW0KhUOB0OAv4gG6trON0+Wu0OClHA\\nbLHR290PCJSKJdqtFn6/i+W1LerNFmqlSKmUR61Q43C5GBkZIRlPEIvvUypkkItqhoaHyWSzrK6v\\nYTEbsDncFIplmg0Jl9PNOxd+wMjIINl0ilKhjFKjgXaL18++wtChYQ6NjvLtb/83XK4+gsEwkf0I\\nB9F1PE4/balOtdEgnkzR5ffwre98m1Coj6GBQ/R1D1OuNylXazTaHWKROGtr69jsJqw2B+WKgMes\\n5dO/8jEu/cMljp14iDe++3e4zB4e/8hjfP3Pvs9uWYdW3WZsKERxa5lYS49O0cLQUKE1Vfk3X/gi\\nvn4PH/vIHxHSZ9CFellcWefPv/F3XHv9DQbHhhkeP8Te1G3+5A8+h0It8tDxY1Q7LVxuO9Rq7Kyu\\ncO2tizgCXmxWG/6uIGNHJzH7HMT3YyyurGBUqtGpVbQ6KqrlMteuXWR48DgGswVrVxCrTsP29jZ+\\nf5Dr16/jctooFor0dA+SLsTQ6a3I5Rq0eiMTE0e4NX2DVDpFoVjm1KOPs721hdfn4OBgD7VKjiiX\\n0dfdh16hIpZNMXXpCoLKgMFgxBv0oDW6WF/fxOdzIUltOrSRq0RqxQ5KnUDY66dezuL1ONne2WV1\\ncw1dR8/W6hqjE4MUyxWqlSYKhZ5SqXaft7OSo6e7l0yxSlfPOMGecar5PDKZQGJrj8FjE/QEfBht\\nOjQ6GyqFBovZjEolx6hUsjy/iLfLzef+5Bs/PPDhj//z//Xyx148jb+ri0Q8jkYrw2DQ0pEkKtUy\\nFrOJUrGATqfEbLbidHQxMNCDSqVFLgi8+OHncbvdqNUaSoUyDqeFdrtOo1VDZzIzdWOaQLCHE8dP\\ncm/+Fja7DqvdgNvpo9Vp0T8QZn52nv7efhwuJ31BH8VikfX1NbZ2t5mbn0cuishFOV3hbjLJFHt7\\nezg8dqLRAxYX56g3KtgtFgRRzu3bd5k4egKXy8/lK9ex22wYDUZKpTIqhYDH5aJVrzMy2M+lC+dx\\n2u2Ua1WK+RyNWhWH3YzJqGX6xg3GxiZRqLXIZRqysSyDA0Oo9Vq6ewJcu3qBZ848zVvnXid6kGV8\\n/AhPPnGGcKgLl9NDNpNibWWD1Y1tOp0OOp0GmRw0eg1vv/Um7UoNn8tBOpvEYTTRbDTxdwVot0pU\\n621alQyTj4f5V//2izjtA1y/PsvkxDD/7lsz/B/vC3N1Oc2zh92UaynCbg25uBZBLKBVSzzz0r9g\\n6iuvsjj1Dr/zp/+Jjc0NXvzEj7F1b5blhXlGDw+xv7JJvV7n8IMPkc5W6aDE6fOjVeuot+q05XLG\\njp9kf22FlZ0obleYL/7+Z5mdusED4w/QarWxWNQIZLB2ewj5AxhVJhLpfXpHR0jsrnP1yhVGDx1B\\nZzIxMTFBOZ/C5TSjVOjuN9K1GwiyFu1mjWyuSLMlR6U1c3RkmOmpaUbHhqmW6ni8Pjy+Lrz+ATa3\\n91jaWMVoMjIxeRK9QYkotjCYTNy8eYVsOoFR5yCTiVIrlyllqoR73FQLSZRqE1VJzkE8S5e/H4/Z\\ngL1vlI4okq80KJYErBYjayurHB4eY31zFZXSzX40QXePj7npi1j1sncz1QzZQg6r0YjGaCC+vU8k\\nWsLS5cHgc9Iul1Dr1DRackwWL7/3x3/1wwMfBLmAKCiIReM0Wm1q1TqiqEYuE4nHkhxEYuTzRXZ2\\n9igUCnSkBjabhUajRk9PD2+99RZLS0totVpazRpWqxWlQoHNbKGYy+Pz+fjmN/+ee7N3qdebiKJ4\\nn0RTo8Fh99CRFPT0DbK7f/89pUqRo8cOM/nABIJcRTQapVzJ43Lb2Nvbw+t143BauHHjOr293Xg8\\nPlpNuHzxMrl8Fq1Wy+zsHPV6nVOnHsFo0mK26FGpVJx7+yKpdJ5mq8PO7iaBQACb1Um92WJubg69\\nTk0+l6XTud+enUplcLk8WO02RLmcarXK6toyGo0Gn8+H1WHDYvKyvrpLpdwgHA4jE0QajRaVSoPu\\nUJCB4QFOnX6MaqXEwX6EtbU1XnrpJdbXVjh/7hw2qx271crAwAClYh05OpxuG4lsmnZe5ENPjXJh\\nepZKBzqFDgDxRI6DWpPXbq2jF1TIFAKCQoXOAKq2EZlaxZXLUwiATKllb3WNlavXWF9a4NGnHiOV\\nLuD2B4inktgdNtxeD6VqjS63C51Jzc7mDjqtCVEBdo+LVrlAOpfmIx/9GI1Gg1e/+y3SiTROlxeF\\nwc29W3dZmZ0jmkpgtuu4fO4tOu0Wp0+d4rXXXuHgYJ/V1SUcDgvJRJRUfB+VQkQul6NSqalU6zTb\\nHbqCYdxOJ+lEGrvTQTyRotNpodFomLp+nVKpRN9AP06XB5PFilyhxGr3EYmnuHDxLR44/jjHT5xG\\no9Vy+NARzAYtXo+JbL6E3Wzk+vUbNBttTCYTVoeVq9M3SKfT+Hv7WLp3j7XZecw2K57uIJIox21x\\n4rJpMZl1dCQFmxt7fPc7r2Iy6KlVK/h9bkxmPaJMhi3g5qGH7mcRlWSRQi3D+uYc9UaJcrX43v3x\\nn0Om8F/++PMvPzrup5jPMjTYR6l4n5m2UqtSrOTpHxigXKlw6PAoly5dYHBwiHw+T3d4iPPnLmAy\\nmSgWizgcdnoHeqhUGigUKubnlgh0dZFMRDn9xCni0R1WVpd59NFHuXr1GicfPMnO7j7ZfIFHHn6K\\nu3fusbK8RCqXoVKSUas3SaS2KJfLIClJJZJEIzvM3buKUa+h3WqyH01gsThotWU05TIG+gYwmXTU\\nm2USiQTZdInevgGcLhfRaIynn332PobV6clkc1jtVtK5FO1Om3q1wtNnzjBz6zZqtYGZmTt86IXn\\n2dnZoNGooDVqaclaDI4McfnSNFcuT3Py5APMzNwiGPYy+cAEBqMepagkmcrwla9+FW8gzNjYJE6X\\nmy6vC6/XTWR/n3uzt0DRRKNRIjUk5KomSqWe6Zk7dJDT0zuAUjBgMfp535lJfvu/fJcG0MqvsHXQ\\noN2qE61LiMU6E04FBbWSaF6FRV3goCxDfnCHr74yx1NPjzF16R1+9Cc+yon3PYfN7+Xst/4BmyeA\\nXNRgtfnIZwv09PQhl8s52N8iWUpjtXjx+ELUqgUGHnkSg6jCaDVjMesQNUosDh2Lc3cJ+4O8/soF\\nfvSljyMq5XhsAaKJFBOH+7h47nVcFitDQ4dIxaM4nU4k5Nhdfow2O5VGlWazzcb6Oj093VTKNRw2\\nC+lsgpbUQKMSWF9ZZfjwIRqNBsFQiFQ2SSGbIBHdRSUK1EpFNtYWkXdETjz2JJVkhquXznMQWaP/\\n6CPUGx2ShTwDAxOsbu0yf2eeY+OjtBs1tjZ2eeTZD7B15wYzUxcIdfkZHB2h1a4T7A6zMDt7v8hq\\ntLI0O0Mpn+DRJx/DE/Bz5foMh8cn2NraQd4Bo9WMw+WmVszhshmoFhKotTZm57ZI5VNsb63zrdff\\nW+/DP4ug8MUv/OHLD08EKVUqaPU6LCYbUzduMjIyQqvdoNlqYre6QdbB43OjEOVE9mNs76wT6Aqh\\n1ZqwWtzIZUoSsTh6rQ6DWoVa2cHmMCMIKpDkHBzEGBgYIxpNUim1kCQwGHWks2kSiSix+A7hoBuj\\nzUut3OIgesDw8DCjh8bo7x8iEPRiMbtwO21UanVCvQOsLC+QSEQo5TME3C62dtYIBXoQZGrsNhsz\\nt29QKWWplIoIMgGNSkk+lyIYDjA9fZ2DWILd3Sj725t05DAxeZRiuYAk1UgkUpTyeQI+H5sb62gM\\nZpKJOEoZ7O5sEo3H0evvw5K7t5a5dOMaVnsAp8vPxatv8amXPkWr1SG6v8X+foS+oVE2t/Yw6g10\\nJImxkaOoVFquT10mnShidrmwuf24fW5m5+bxBsMszN/FbNIg6mysrWxyfbPEB148zcydDWwKOT6V\\nlqBZRqkAjUYVo6GOXGHCLKXZWK5w4vggH3j+cVLJLHvJHAaVhtHJE9y7c5fdvShuvwen3Ui9UqLV\\nqRHqGeCNV8/x0JkztOtNavU2a7O32Nzf5nBfNzKUBENBZhcX+eAHX8RgNvCdb3yDYw88iFiXs72/\\niiQDv6+HYHcf+WKFciGHP+AiFotRKN4nqSkVU7htBuQSXL5yla2tLUYPjRFPRCkXK3g8PehtJorV\\nFmurq1RqNUbGJkhFDmhJClRqHaVKG6PFjkZrYXV7B4vNQbHRoGd0jJHJh2jnEwhSG6Wgpd5pI0gt\\nJk+coNwosLW9hVZtpJiN02qVOX74MHvROFqlDFdXFwsLSzicFuw6kb2deTRaDSqtmZtTd/C6g2TS\\nMTR6HTaHA0kAa7gHsaFg7u4M9baA0++jlMswfuQISrkSo8nFn/3ta/974INMJvtLmUyWkMlk8//I\\nZpXJZG/JZLK1d8+Wd+0ymUz2JZlMti6TyWZlMtnEewkKgihndGScvt5hBLma1Y1dnvvA82QLZdot\\nGT09PbQ7NZRKNZVig52dPZKpKBPj90Vpa9UGnQ7k80WMZhuioCKezGAwmGm3O4iiiCiKjI0dQ0ab\\n7u5ujk5OYLZYKRbLdIfCGPRqjDo91WqdcqbE/u4iPSEfXncX29u7rG+ssrS4TTZbZ3D4JC5XHwIa\\njoxNsrMbZXM3wtbOLt3dfdycnkIQBAYGhhjs7yUajd0XCTk2TiGfxazTsTx3j/GJY/i8ATweDyqN\\njlOPPsr3v/UqWpWWZkNOo97C4XRzcJC4r8C9tonD5iabL5Er5lEoIZNIcuPKFBqVyNGxIerlBNub\\n84yNHkKtUGM2qXjiidOEAj6uX53CYDDh9Poolaogqgj19vPI46c5PDnBzPQt9nfXoF3nwWNHWZ6b\\noctv5/zF87zw9GGqhRJGhYzpV96h1AG7AnK5IjlJj16vw6tuIaq0CJUmwYHTaDoSkrxONlPBPzzC\\nkYkhFGoJq07FM8+e5vHHJ6FTZOneDO1GkXoxx7VLF2lUc2wvLBA/iFIt5xjoDtDv97GwOM/8/CwX\\nzr/NB9//PvLlClv7CX7iZz7OO69+jz/90h8R8AQwaTW8/uZZZm7d4SCWoq3VsLy+RiaTQS5vkC0W\\nUQgiO9sHVEptTj/+JD/+yU8itdqIgkAmEWPuzk1WFufQKgXqjQ7HHjjJ5tIyDUnAoDfRaYs43A6K\\n1RptQeKDH3o/VCtkItvkE9vsrc0R2d9iK7JBRyijpkytnCGTSFItV+jp6aVareIK99EUdZRVOtK1\\nJsVqhVqhQNjtwW2zMT1zF2fgEF5PF5noBqODLjTqMkcPDaFpt2ikU8T2drjwD98imt0kX05S1pMU\\nvAAAIABJREFUk+WQiU1qlQTf/Pp/w2U1Mjwafi+u+N6CAvAV4Jn/yfZrwDlJkvqAc+9eAzwL9L17\\nvAR8+b18RLvVolgscunSJVxuO33dHnY2l1AKLY5OHGZnY4t2p8ns7CzBcAiTwYlareXW7dvY7E5M\\nFjMmi57uvi6KpQx3792gI9Wp15pIiKxtLPOnX/4TDqJRhg8PsrE6x9zcNEqFiEGnp1quIchV9A2M\\noFbbSWU3KJVSFPIJzr31JiadlmI+RyBkp2/AzdrGCoGwn/7BHmbu3OaJp57k4UcewRcOcuP6PO0O\\n3Lpzh29/91UsDjfZQh6bw0U0lSPYO0gsU6KnfwhBDjqdQKWSwed3cvv2Aj/+iX+FXDAiteQcGp9A\\nqdeTr1QxO9ycfvppJh58kGg8gVIQOT45zqWL10in8nh9TgYHRjh8ZISZG9eROm3efPN1RKUZpVaL\\nweREp4XVlRkuXzzH0fFjlMtZ6uUiA4PDiGoFYyM9SOUyX/nynyM0yhSTSWSdCpNHH+PunZucPBLA\\nKIdej5mqTElNUCHvQHQzzm4+RUtdI3JQQ64v8rk//TZHh2QMPjaJ2hdmf3cHtaggncwxP7+M0eyk\\nWKxy7849NEYtFocdpVbHQH+Yob5+1Ko2u/uLRPbWWV5aAbmMYNjP/NI8Gr2WnZ09TAY9aqFDfP+A\\nnoEhPvrTn2Bjd4NXvvc6zz77HPlsikatSLaURy7X0mqrGBgaIH6wRbOtRNQbyBWiSLTZmVuh1CrT\\nkjUZOjyI1apFo5BjtRjRakSmp29QLNUI+N3kKyW6e8PkM1n0GitKpYbltU3aCjUOu4fkbpzf/dXf\\nxKL3USrC7N0ltAYTagWYzEr8AT9KlYDVpGJ7bh6P2UYhFsFrUKC3WllZ26BUrYFcxGi3k03vka8V\\naAoKNHon9+6tY+oZwR4eROfq4uhDZ+gfGEJeK9E79CBCy4ISLU6/n2MPj3Px+lvsLK+8F1cE3kNQ\\nkCTpEpD5n8wfAr767vyrwPP/yP5X0v0xBZhlMpnnn3qHIAjk0gmOHzvCzsYqC/PzjAwPY7fZOHv2\\nDar1BnTaeH1OSqUiSrUCrU6BTtdBLm8gCh0KhRTLS7N43G4UCgXNZpPrN2+QyaTotBt86mc+yeFD\\nI9BooFCrOHnyJDs7W8TjUQRRQqvV0pFkeH0uwuFuxicmqDWaHD8+iSRJdDqQS5e4cf0myXiSN86+\\nztLCIlq1inNv/YBSMYtKlGMyq1FrZLg9FkJhD9s7exj0Jvb3DziIxDh37hzVWoV2u02pWCGbKRIM\\ndXNs8gHarQa/+/Kv8/ff+BoNWYPXXv0ehWIKQZSIxiIYdFouvf0mrVqZoaFR+nuPkMnHOfXkCbb2\\nNpEJCmIHefoGj6BWq7Hb7aysLpHOV9BotWysL5NJJ+nye9jc3EQhqlnf2qRUKKHsqJFQsrq1zNFj\\nh/nSF/8IkQrJWB5Bq0apgQ+++DD5uoRSdb/6Xa7XqNKhXGxQKEikck1sTgcWUUOXVcI9NIJKMOFx\\nuujt6aNSB683RFfAzfSt67gcdp57+jk8/jDNjgByBXs7W6jUAh5/kEA4zOGxoxRKRYxGA/V6lcH+\\nfgxaPclkivm5FSTUpNMZOm1YWV0jFArx8KkTpBIpxiaOE+weQi6Bw+Vk/NhRdvf26e8dQC6X0x0I\\n4feFiGdLOAb7ULRFdCoD5UoLg9nFvbkFUuksapWK/v5+StUKMlFONZ9k9tYlVAo1CnUFKiXUrQZe\\nq5XZ5RVChw/x5W98nVgpRjjsJtjlZXVlntW1JFPXZrkzvcz8vS2KlRY6nQ5XV5C1rW0UCjU6hYrh\\n4VGSqQwHsSTNFsgkHSqlHae3l9XVZdwuK1dffR1Ro6JQLiCvNtk7qCJTOqlXWqh0KqqSQKGuQxLd\\njI4/Tmjw0P++oPD/MVySJEXfnccA17tzH7D3j9btv2v7fw2ZTPaSTCa7JZPJbuWLVaCD2WzEbDbj\\n93UzN79MOpPjscdOIwpKmk2JRqNKs1Wm3qgw2D/IyuIO9+4uUq/B62ffxmnzUquUERUa6i2JkdEj\\nOO0ugsEgKrWMO7fu3N86tPvQ6m2UK3nKhSy5WJqFuTtkU/dZc4b7DqHTGbDb7czNzTI1dY1YLMZT\\nZ57hwQcf5vQTjxAOB1leXkar1+Hr8tNqtUhFM5hterKpOjq1mpmZ65x58hl6evoYHBhBa9AzNDhA\\nyO/jb//277DZHNy8Nc3de3MU8xVcLgeVygHdYRcrS8ucfvQRqvkCerWKrq4+ovtZJg4/yO1bd+i0\\nG0Rjuzz7gSc4NDaA1aFn6to19vf36evrI55I0tMbosvtZ29zm0a9yvjYJGNHjiGTCZTyObq7u3G7\\n3VSrdQSFmnpHzk/+1Meweuy878UfoVDrcP7cVZJbEY4cOs6DZ17A223EZDIxOeBgYCjExKQP48le\\njGEPxuA4ksNCWtvG4oATzz1HX2+Y/dQO3mAX5VwJpVqHymjCpFeTzWZRKTWsr+2yvLKBw+0kkytw\\nZPwozUYDo95APJWlWimj1+spVdpMTk6i02lxOvRsbi9Sb1R48plTlIs5RGTs7+xTq5TZi+zyzb/9\\nJqG+IawGM81ylZWlZVY3tyjk8tAss7q8wNLCDuGBfqSaRK1VoVWuohVV7O5FeerpM0idDnabA71G\\nz9j4OOVGi4nHH0bvMNJEZHdjh3iqzPzKLqtbBzz44Bh3r7/BV//4ZcSmgMMRQC4Y6R89gSvgY/Kh\\nSfoHwpQrebp7Ajh9HpZXV3j8uQ+wuhVleXGF1ZUNDo1PolAo8XhdJBLrXDj/JsVCipvXL/CFz38W\\nu93MjWvX0RiM/PXf/zV+vxdRZUarl6FSdNhenifQFWaodxCTwUitlH/Pzi2TJOmfXiSThYBXJUka\\nffc6J0mS+R/dz0qSZJHJZK8Cn5Mk6cq79nPAr0qSdOt/9fzR/oD09c//a6LvysI1WyWWl1dp1Fu4\\nXSGkjoy+viDQYX1zA4/Tz8WLP+DYsYfIZdNINFEKIjabDZ3WxPLKHNt7u5x+4gwrS+vML83j8XuY\\nOPQA2xvzVJsdlCotmXSCRr2ARqvCbPRg1BkJBF00WjIWFu/RqJXZ2tqikC/j8Xjw+MLIBQGf18nq\\n8gLDQwNodUYuX77MyMgI5XKJSqtIsGuYK1cuUSkXQRIZHz+KUimi1Khx2t0YDAb2Izvo1QpSyQQO\\npwtJJrCxtc3+/jaVcont9TV+7Mc+ypvnzzN59AGOHj3K1M3zLCwsInSU/Ptf/nW2t7eptdosLS0h\\ninLWlucYHRom3NfH/Pw8E5PHiEYyqFQ64vEIp0+dYHNzk729CKcefxSVSsPy8iI2k4FKpUJTLtDl\\n62V5eRmfx4wCOY1WkZvTt9nZTTB0KMB3v3eLs1d2yAsK1O0mOiREAVpqgS6Dh8//+V/wyn/9Av7e\\nHtxOCZO+Tl/fEJVakfm5NXr6hnA7nKhFGTu76xSqRXzBADdv3kKrMfHs08+QzWapN2u0Wi36hkZJ\\n7m1TrtVIxRMUKyUCXWFy2TI+v5N2p4FMpiEevb9DE+ruxxUa4sa51ylksvQdOcKdW9PM35vn4VMf\\nQOtQc7g3RD4Tpd6oUCk22N/Z59jYJJFKDqvJTCqVwWiwsrezi0yS4bTbmLo1zcQDJ3B6uqlLTVzh\\nXi699l1Onnk/N7/zVWotCbPDgys8TDAYRJLa/Ncv/gGHjxxFrTPj9tiRBCWx2D59Pd3s7OwyNHiI\\nuXs3GTv1GNlkik6lgk5roN6oIpNBu5anlE9Bp4SgdCM3mKnlDnA67UhtNffmljAb9bTaGUw6GVZb\\niLrYJrKbYmt9lzMfeIrIwR5GnR6F0ohj6IMzkiQd+6f8/f9vphD/f2DBu+fEu/YI0PWP1vnftf0v\\nR7lcZmlxnWCgl9m7KxQzDcwGJxqliWQyxsbOKnOzC2xt7tHlDVEs5ent6WdtfYVcrkCz2SDcG0Kp\\nURNLxjAadBwZGSWbSSEqBZ568klsFhPRaASn14fd5kSn0zJ+ZIxgMIzT5SES2aPeKLO/H+HgIIbX\\n5cVp76Kvr4e+/m5m7ywzNXWNRr2KTm8hmkyzvR9BrlTx8U/+DF3BMIIoIxMrcf7cmxj0KgKBLpaX\\nV7l+/TL/g7r3CpLsvO48f5k3vfemMrOyvDddVe0bjW50NzxBghyKFEUnipKG2tWGdjTSxO7saoSZ\\nkKU0FGUpzlIiCa5EiQYkAAKEaXQ32tuqLu9NVmZWeu/t3YdWTMw+7A5jYx/I7+nGjft6zj3/c77z\\n+ycSMUr5HDqdlv39fZQKDasrm8gVSra3t5C16zitRsxGG/v7cUYnZvje91/h2WeeJpNNcfPmbVoN\\nLX5PHx/+8AtksnEuX3mHTCqGXqPm8sX3+IVPfoqnnvsACpmcz/3qr9CuNQgFN9nbWeWDH3iWQChB\\noy3H3zNIOJIiX6zS2TVAIBilJUqpFGqUC0kMOoFarcZuaJ9gqIjBZEIiVFhZ3KBn2MHRI71o5XUk\\niJSkoFLJsBtVPPGhMf7gdz5FR4eGYnUbh9uASmXhIJam0dLx9LP/isPHj7MdP0CqV9NSyDn9xFni\\n4QBHJ4cZ7fGQjKRQCCpWF5fYXt9gZ22NveA+vo4O7HYbk2MTaFVa+gb6uDf3AE9nLxqDlGs379Hd\\nM0GjDuGtRfoGB1AaTdi1ArJ6mRcvfBCHvs2Iz0ImEeQgmkQi1WO02xk9PIPZocNj1JPPtvA4e7Fa\\nrczMzKBSCzxcfIjH++hnlUjtEQ8HKIXWUCMSX53D5BvF2zfJ6OgUifAm9UqEWj3Or/3GFxCFGoPj\\n/eyvLqNXaRg/dILVnQjJVJHdjRXGjh4lsrRBaHkNk9VMPHJAsVikUChSbgh0dI6jM7jIldoENvap\\nVRusLy9RqVfw97sQNBJGx0dJpirsBsJUS02ajTaB3W3i4X1cVidWo412JvkTB/f/16TwGvDZf3n+\\nLPDqf/P+M/8yhTgO5P4bmfH/eOQyGVqtBhDxeF1I5QqSqQzbu1sIUuj1dyKXy0ml0sjlCtRKFdFo\\nFJfrkcHF4OAgly5dYWVlhc7OTow6I6+98io6nY6RkREqlQrDAwPEYlGKhSoAEhE219ZRKJTs7e2j\\n1SlIJA+o12tIpbC1vUoiGeHa1dtkUlm6/S5ktHj1B6+SSaUxG01cuXKJYjFPLBYhkUgwNT5Gb4+f\\nwb4+vN5OPB4PXd0daLQqFhYfMjQwgCCVksvlaDQaBEJhqo06jz/xOMVqjp2dJfYD6zx1/jHCkR2a\\nzQYarYJ8NkG9XkeulGK0Wnj7nUtkMiV8Xb2oVVqmpqYo5LO4PT5KlRp/+ZX/Qmh3D7PVhkQC0ViQ\\n+YUH6I0m3B4vMoWSbL7Enfv3cff1ceLxJ0CmxmSysB8Ko9XpcLnduNxeWlR4/8ptfJ1OuvvsdHps\\n3Ly3RbYGSqkCtQgeowmDYGFxcZuDeJX/89s/ZGr8EIurm6zvhChUcoRiITY2QoRCKY4cmqFUTCGR\\niHzr5ZcZHh7FbLUjVUrJ5nPU6w1OnTpFb28PiWQUi8lAPp0hnUqRzKUx2a3U6i26urooFnMsPFjm\\n9OMnWV6b496Du2QzFTR6Hf19PchUKmZOHWdvZ5Gurk6iqQIHkSgqeZu2WCOXSJBIpvjmP/wjpWIN\\nv+8Run9/b59w+IByuYzP46bdbBHa36deLjM2OUC2mEQQhH8xGLLQ2zeAwWZn+tAEyUiU9fl53n7j\\nTXr9/dQbbVQaJZVyjpWHd5keH8Zs1CIoFbz7yg/wTPYimNXIyg2sZh2ZdIJAYId2s8n2xiZW9xhq\\npYYT545htugxdzjZC2xDvcxwXw/3HiwwNHGE8UNH/gU/oOff/G+/jdgWSKerKJRW8qX4/0sE/t/P\\nf1c+SCSSbwNnARsQA34X+CHwHaATCAAfE0UxLZFIJMBf8WhaUQY+99+TDgDjAz7xY6edXLhwAbVa\\nzbWrtzl16hSpVIpM+hFBKZ4I0un38cUvfpGTx08wPDxI9CCE09VFuVrD53WxtraGViaQL5UZHZuk\\n3Ggyd/8BR44c4eqNi/h9XTx94RkKpQivvPo9nLZR7j+YZebIKFq5ElpNFpfmSWUKOJwuJCJYzQak\\nUilI5bz2+qv09vkYHTvEvbuz6PVGTp48jkwmQxDkqHUqBLHJ9Rt3eOL8M9gsJtbXFpmfn2d2dpZj\\nx45hc9gxmUz0j4xx6/pVdGo9KsWjRS5BriAaSXLpvR/zS5/9RaLpKKeOHeYrX/kKTucAb75zmYFB\\nL9OHxtnd3cXjdTMycpRgMMjlK+/gdLo5e/Ys3d3dvPv2j9BotJRKJT71C58lHA4TjYdQKBSAhGg0\\njrfDx5HpGSQqGdu7O8RiCWxWJzqdjq2tTex2F+VyEUGocuPyRSbGD/PW+5e4ezdJLJ7HbZHSrjXo\\ndmqRax0IaglOsxF3q024mkfUKPj5z3+adC7D3Vt3+I3/4de5du0acqWCkx99gWuv/xiDSoevZ4iD\\nvT3q1RpKjZx0MkGHx0mzXsVsNpLJlalWmrTbkMlGGRsbZ393l76hHqLhA/Q6G1/+86/ysU99iqmZ\\nCdKJIImDAE6nB1Gq5u7te1x4/CjpUhm52sT+5hLebh+tloheqUbncnDvzm2GB7v5+//yDcqZHEdP\\nP8Pg4AQCIvFUhHazgclppVas4PT7MRhMbC09WtTaCe3jdXow253INSaaLRGZVo1GLZKK7ZPPVfBr\\nHGTaDbQGNWKrTqFQxOHvZ2X2KtcvvopKq+L048+RSDSwDvhZmn3I1OQhurw+4pkI+VSJiUPHCAdW\\nKZcy5HI5+seOIhca/Obnf5F//du/y+CgD43SQTGXpNkusb65g8Hgort/iIOdB/Se/fX/f+SDKIqf\\nEEXRLYqiXBRFryiKfyeKYkoUxfOiKPaLonhBFMX0v3wriqL4P4qi2CuK4vhPkhAA2mKb6akZlhaX\\nSSQSmAx6GrUquVyGarVGOByh2WySSqX4D//77zA+PsGDB3OUKjU0GgOlYoVyuYxcKoAgxeV2UygU\\nWF1e4+jR41TrNaanjtDR0cHq8grbW/uceewJcrkcTz1zAbPVQDQeYWl1DYlEglymQYKIu8OFyWoi\\nk8+h1WnoHxig2RKplOt0+fsQUbC9tUk6maKQyyOVStkN7NPX10ckHCQcDiORCEjaEgZ6+6iVK8Rj\\nSZQKNVevXMNiNbGxsU5PTw9OtxOJREJfXz9KpRKny0GhVMLm8nDsxGlyhSxmixlBEJicGKNaKXDm\\niSdYWVnja1/7RyqVBs899wxHjhwhk0xRKhdwdzhw2G3MzT0glUlisVtwuuxkcym6/Z1EImFS2RTb\\nm+tk0kmcdgcmg5FGrU6H2023vxOnw05HhxeJICccDvPcmVMIQpIuSx2LoMCkE6iUG0jkCpzdXZg6\\n3dRNBloqCaceP027LqWUajExfoj79+/TbLcoVWpkN3exWHSEE7vIqVIo5UAK3g4PvX3dZPM5/N19\\nFPMljCYLap2etgg9/kdUqa7uTnZ3t1GpFLQEkXPPPsHc3eu89fr3UShl+Lq6WVtbYz+4jdtnIxCN\\nUs6VePf1N7A5nZSrFQS5lODeLg/n5hgeGSWVKeD2d2FyOnHabTy4fY3o/ha1UpFqtUq1VOEgHiMS\\nD5PKJImlkhTKFSZHJljf3GRnc4t6o4JaLeHd177L4oM5/L4+HE43hWaOSr1GplTh/cvXaJTr5HMZ\\nhgbHaSMnH88zNDBC/9gkw90DDHX1MnJkgrdee4Uubx+dri5m79zh4ew9HDYrMqmchiii0mo4c+E0\\n3Z0+2q0G4cAa1WqWarlEvV6n3W6RTCcwWOw/SSgCPyU3Gv/iS1986Zc/+QEisTgymQylXMXm5gZj\\nY0PkCzEePrhNKlmgXCxjt7u5cuU6p06doqenh83tTXLZBAa1Do1ah9Vup1wuUa/XsFpt1BtNNGoF\\nkpaMbD4DMh16tYlyoUrkYBepVIZUokSv1TIxPs7wwDi5Uob+/lGmpg/xja+/TC6bJ5VJc+H8ecZG\\nJ7h16zZanRa5IDA43E2xWsft6yESTSARldjseg7CO3T5u9kPHuDzW2k2q8w9WEGj0VJvVtFptGyt\\nB/j0pz7Ltevvo9KqsZjVRCIxrt24h1zZoFGrMzl1DKfbQbmcIREK4nE7+cbXXuGXfvkzVKsV1teW\\ncDn1lPIFisUol967jCipcfLMOU6ffxaz2crVK9fx+XtQaY1o1AZi4RjeTh/xeBKFQsPswhxT01Ps\\n763RblapVQtAi6qkRjCww/KDO4gySOXyWI0ODsIJ1kINUmmR3l4HkyOdFDNJbJ3j/MIXfocfvXmJ\\nX/zlD9BstlGp5Jg0KvLFKrdn7/HsBz7M2OQUa8urpFIZnnr2g+xvHnDrxh18LjdavZ5mU4LL7SOV\\nzJKMp8hnIngcBtTKFmqZiFIpw+npYm8jRK0M5WqOsbERurweQqEDquUKBrWN3e0AZ84/RbHUwGXt\\nwd7Xx+hwP//5P/0xH3nxY1y5dgOnuxO1QkoksIe7twePc4Ch4SO4vS7cLivlSoVkPsPcg1nqjRbD\\no324rCYEUUCvNZAvlQlmizx+5gypVJJStYkglXJ4Zhp/TxeNap1kNEIseIDZ24UgtnF1eWgJAq2S\\nhGazxGPnzjBz/CTFUpl0LsL6wgI6rZZkNItvYJxiIUil2mBodIRCuUax0sbf3UkyliK8G2Py2Anq\\nxSKbu1uYrD7kcjWB4AGJnUVcZh3LS7P09I7yR3/9k/EUfiqSwn/+4z946cljg2hUanLZPKGtDfx+\\nL9HoAYV8hg889zyVehlRbFJvVNDpNSRTCYJ7Ibp7/NhsNtLJFKl4EqVaiVRUoVSpqTcruN12bt26\\njUSErs5ecqVHNu0KmZT19Q2arSIrS/fQGJWolQaKuRodbiuVchG5XEE4kUAqkxLc28NqtBAOBdFo\\nVQDoDCpqlQYedwc+r49SsYLZrKXRKJNJF5HLVHi9borlMvF4hlK5isfbwaHJSZQKJdVKGafbxfT0\\nFG1EfF4fV99/j0w6isVqpt5sUiwUkUqkzM8uIpG0UClMLC2H8HVamJic5PqVy9SbdZwOJxarjlI+\\nh8Xp4CAYx251kskkKBQzVGo1otEIU9MTZHN5qpUyNouNVDbL0HAfa0tLXHj2KeKRCJ3+HmKxGOmD\\nA8qFBBKJknw1R6PRxKI3s7a5wdp+mVy1ikxSRyYT6fUbaEvaLN6/S7uQ5MjhGYq5LDKdFrHWoIXI\\n+SefoJjLEdjdxW7rYG15gVQsxMjEAAcHQQLBMGazHp1BQ61eQm/UkIweoNTr0Wm0BDZ36B0fw2Sw\\nodTpeO+dtxgbH0StU5MvVzAbLHg7OugfHKCYT2PUKJEoJFgtNiqVHPlMgrm7t7F1uBicOIwgBZVW\\nQGs04/F6SIUTeHydVIpFJIKAFBGzxUxfXx92h51kOoEgClQaZdQyBZlSAblSylj/ILlMgkY2Te9o\\nP7Vsiu3lh1gcLiq1Ms1qHZlcSSqRxO60oFIoiUeT9I6NopBLyKZSiKIE5ApcvmFie2EGJsfwDHSy\\nMz+H2eNAbNXYC+4yOTqEVJCh19spFFKoNEoiiTAeq418tYrZ60AqVdLr8XD5ve+TzWXp6hqkLTHw\\nF1//7s9OUvjKX/3ZS0eGOqhWa7hcbsqVDMeOnaBYLGEyWtkL7tNsgc3mIB5NIQhyRkcmUMillKtl\\n1Eo5giAgSCWo1Aby+dx/dd/d39+n0ahjc1iJHARpS0Qa9TL1WgWL2UJPTx86tR6pREUuV8Rmt6Ez\\naBB51BDs6+3h0OQEvX3dvPbaWxQrCWamx9la28NmMVPIJdBrVOxsbEC9zLe//U1OHjtFW5RTq7fQ\\nm8zkMxWkEinFUh6Xy0E6nWZldR6dTo/b7aVSqdJotJifW6DVAKXiEcQ1EY9jNpiYu7/A6sYKOoOM\\nSCyCxaTh85//NLdv3KOjw02rLWV1dYN2u8lzz38Qt6MDo1aDXCYQO4iQzWYYGRpmauoQKyuLdHV3\\nkk3nee+999jb3cNqszIzPc3m5jYquZp8rkhgP8CZC+e5dfsm9UaDWrMKLQGtXk+pXuPHyxF0TRGj\\n0UYgmIVGkQGnneDCbSwaAdtIN6VqFYtKT75VRCmX0T86Sb5cQZQKdHf7SURiICoolFqo1Fpe+MQv\\nUCvkaTVFNGoDCrmadlNEa7Cwvb3LyccfZ25+E53ZgSBVUGs16O3rZ283xuTUDJVKFY1Wydb2JtHE\\nATKVFqlMQaOVo5KJU2mU6R0cxul2srGyRH//BPVGiY31debm5mg1BTw+L4l4nEQ8QrlUBiR8+U++\\njL+rm5GhYcRmnQ6XD7Vag7urE5VcJHYQZ35xkbOf/nkkuTJWewee7hEePrhNh8vN/MI8Tp+T+M4e\\nzUYFmaRFOp0iHgrT0zeGTm8mm6ugVhlYfHgbo81KKp7k4mtvMjY6hFxpwCRocLm87O7vYXe7ENtV\\nkvEo5ZrI9MknkEhrhCMxth4uUS9EmFu4zszhE1y+fpOjjz3Gm2+9xo+vrf7srE7n83kEiYDX40Yq\\naaBQaVhbXyEWj5AvFZFI5Rj0eqSSR/bcrUaN7/zTt0ilo7gdToL7UQw6OwODk0QjQRrNIqlUmFIu\\ng89jQyFrEYvsodNqcFhcvP/+FewOM9FEiOB+iHZLgk6lRi5IWd9cQyZoKZUqIBHxOFzUS1UUEgGd\\nXqBWhVdfeQ+rXYfXb6RUlPLa63e4t7jGwOQ0L7z4cRBkJBLrLC1eprujA41WxuEjU9TqBd588212\\ndw6YnjpOKBQiHo9w9epl5udvUKmkiCeCeH1OdAYlPm8PbpeXZrvA+NgQY+PTZLNVmq0KV65cJVfK\\ncuv2PBcv3kYQ1Pyv/8tL6LRGcvkSUpmSZrvB5uYmWrUOk0HP1voKRo2OfDrN/PIsVrsRj9fGkZlD\\n7OwHsBvdqMwOvIND9PX0Ew8kyURr+Dp78XoGsTvdBMMrxDIi1maLkQEzemmSj56xYjC4SNuUAAAg\\nAElEQVSa+cBvvYzMN8bMhz/Kj964RClXodwuIKUNUpHw7h6hYBC9Xkc6l+aJc6dpt6o0GjlOnjpF\\nIVnAYjOzF9ggEFhna2uZSrVAs1Gj099Ls63g0KEpotEo87Oz7O3scufOPebmFpibWyFTqtFAjlar\\nZWL4EP29I+SiOZw2Pw+WFmkJAtVqmWzsgImhHm5ffhONVs6R6UOoZSBIyo+ANqU8GqUSjUqNTqPn\\n6MkjmCwGNDoN0WiUhYf3iSVClJNZYvEMhXKbycOnKOfrCCojZbFNJB1hfPIECqWZU09cIFurk4gH\\nGR7sYXlnk429XSwdfnbX50Bs0Ww2CcejuM0G6s0cUlkDf18X9v4hoqEAsUSKVKGBQtZiff46tJq0\\n60CjRi2XJBUpEtyKYXH7mTn1DINjjxPJqvm3//7P6BmY4ROf/PhPHI8/FZXC3331b14a7rACbR7c\\nu088GqFaq+DrdOF02JBJ5dQqIvfu3CO4v8+FC+dwuuzs7QVoNlocPXqEu/dv8GDuDiND44RCQRKJ\\nJBq1DrlEjtlmoVjIEwzs4e5wMzY6xvzDWcYmRqhV69SqFdwuD+sbWyRTCQ6CAXa3t8lmUijkaoxG\\nE9ubW/g73dTKUmQKI3qzDEGmZntrE7lMoF2vUM7n6OxyUyq0kcoEfP5OHszO4uv00983SC6XJpFI\\nYTCYaTRLGE1aLGYzu4Ederp8KAQ5D2eXQCowfeQYu7vbuH12otEokraETDpCPJzhuScvcOvGfeKx\\nEFLBTCiYwOW1ceL4YdqNBsVKCa3ahEqpwGoy4/X5iUaj2O12IpEY62sbPPf8CyTTSRr1Ot//wWt8\\n9le+wL27N4mHAqwszKE0aFAo4Wt///ccmhrh3p0bmHXg947zd9+6SK0i0O9s4rfq8Vo0ZIs5GpUq\\nQuwu6laB42fPodHoqDUbPLh3i3feu/oIi6/VYNCo2U9EiEZj7OyGsNnN3L17F61OT6tZwmTUcfX9\\nWxw5cgKJoObb//yPVBtNjFYr4cAKeplAh8dJPLTPoclJBDkcPXOW1eUNeru7KeZz7AW2ePX73+b0\\n2eNs7+xx6uh5BEGFUS6nTYPtgwOsej3pRotiUWRwcAhBAvlCEZ1WDyL4fV6uX7vGvTs32NhY5tD0\\nBGqpmkQySbFRo7NvEKVOSzqdJlmIc/2VV5k4fo4bd66jbFb57qvf5eSF86wsbtLVM8Ze7AC1QoXX\\n7ePUuScpVhs4HW6SuSwup5NYbJ9aQ041ncVotTA6eZRWTYNv6hCzV99CJoN6XcTjG+AgmUKncyK2\\nWiTSKZQGPZ5OF8P9Lt54821y8Ry93Ra21pbYXl9BaEv462/9+GdHPvzx7/3uSx9//jBI2sw+uM8v\\nfuZTqFRqdFojrUaLZq1MOpNgaWmBz3zm0+zs7KBQyJEJIoJUQq1axmgw0dvTQ7PRRiqVUSmXyRcT\\nbO9u0NfVj9iCkZFBdve2SKViuN1O3njtVTo8PmqNNlK5QG+fh/GRQfb2A8wcmqKru4tGS4IgKFGq\\ntAyPjoLQotEsEtiLsLMdwtvpAUmBeDTCTjCA3qDBarMjlahQqqQ4HSa2d/ax2a0kojEiBxFikQMk\\nkjbHj51gLxDAYrayu72LTqenVi9hMBrY3tpjYmiYcPCAequJw+UkFSmwsbVPT38XsWwQg1nPwvwm\\nTreec+eOE42GqFVr1BoFSqUqiUQMd4cHQYBcLs9eYIepqUkGB/vZ3wtjNlqYGD/EyGA/NOpImw3i\\nyTTdg6NYLY/8Ep7/4ONcunSF7u4epPJHBOK/+e5NmlIpPoOez33u46wHI/iGTvKZP/xbaqKAR2cl\\n0iyhEODiu28hCBJGxib40Bd+k0g4ysb6Om6THqvWyKGZaZCItNot/B4v6WyWrb0A588/xfb6GqVS\\nmVOnHmPm2GGi4V3URh07u7ssPFgCpZIOXwe1WpGNpXkezt2hVm2g02lx9w7i6Rmn1pJyd/YSO4El\\nKqUC2UyFpqyJw2GmWSpTSFXo7OuhmM/i7xlAq9VRKZdJpjKAlHw2zzMvPs0TF84T2AuzsrHNyOQ4\\nof09Kk3I5dI0C0Usai0XnnmaaquBUaUhny1STAcoZdJMDE1Spoa8DV5/D81Wk4PVbdQ6G3duXKF3\\nYIztnQCSZpOt3V08Hh+hcIxcOk5gZ4l8OMrh41Mgl1Eu5tAp1MhoUCqnGDp2GIvBAtUcYqNJIllg\\nZHiQgWE/6UyCfLHI2KEp7J5Ofv/PfjLy0k9FUvjSF3/vJZeyzq1bt3nizGMszt2m2aiTSKbx+fwY\\nTWYy2Rxd3V5SyeijCyuzc5h1FsrlGvVGi0AwgN6go94qEgrF6O8fRa0y4fV0c+3Gu3g8ZuQyOa//\\n4MfMzJylWGwS3g8RDK3T2++lUmqxtxsguBvEYtYwvzjP4uJDLjxxnkg4iFoukMmm0Wkf7To0KlXc\\nLhd3782iVeuQSbSoTRrWHu7RbudJpw6oVauceuxJ9jbXuH3zKr5OM9du3KUplSJtqlhaXoSWFK1e\\nz7kzT7G1vcxuIMj04SPMHB7j4fwCCqWCZrVGf+8Ab7x+lVK1zPMvXmBlfZN0poDH10GH24LHY8Vi\\nsSOVylGrtYSjAeqNKq0G7O6F6erxc/HS28RicZQqNUuLKxQKJWqVGr5OL8FgCLXWQKlSwN/lw+30\\n8errb3PhuY8RCuzSFnUIyjYdDhtf/94dbDqBQ1YpM4NlvvTKOqePTvNXv/pZfHqBd669TzybQqNW\\nMTI8hKfDg0RUIuZyVPMZrFYbrs4eVFYLCoWK1fk59DI5Oo2CTDpDqVhDrtcwOj2BXq1jaf4OhVIa\\nnVpDq6LE5+tGVMo4feZxLBYnm/sRhnpHOP/hj6BTq1GqNNg8PVitZgrZLcZGjzA8eQqns4N8Mo7Z\\nYCMaToBSyz9962scnT6CtNEikUgjEWVEwyFmjk+wubGE22UjmSzgsLvZ39tHKshwO700Wk0c7i7y\\npTZ3b76FTKlFpbXSEOqsLDykt8uPyeGkqVDgmzlCNVaiXIZccJFitsnYybNcvfoGp06cIB47wO9z\\noZC0iCazzEyOY7RoeeP7r/ORn/s0HUPdxHbC5CoC5USI2UtvkRVFDJYONIKMfCKEoDZhMrpoSGXs\\nBPeRtAVWF3e48InPcOv6XVw2K3/0lz8Z4v2noqcgk8kYn+xhYrybWCzA+MQUp0+fRmw3WV1b4NqN\\nS+xurbG9vkYxl6NWKnDy2BEqtSwSocqZs8fJ5vPUm23qTTlGi5NAOEymkECubPOJT3wOicSM3e5H\\nqZVwEFthaeUWx09MMTw4RLvRpt1uk89nCYX3UMiUTE1MopTJ+Z9+/d9z+dK7vPra96iU82i1aoKB\\nTaZmxhCEBuvLFdbW0mjMOvL5Ij2DHqYmD7G1uk+1XOdHr32Xxx47x7/5d/+Brt5hHDYTCim0Wm3i\\nsQQSKURDEdLZMA/nljGbrQjSNm/9+G1cHhsLD+/TbtVw2W1kinnawPL8Q8rZJvFomcX5GLfv7HHn\\n3h61ukhnVzedvm4GB8Zw2H2YjDbOnTuLSqXg/NknkUkVpJMZhob7gCqhg23+5E/+mFqtQTgcZmbm\\nCG6Xi1u3b/Or//bfEdjdwWKzo1Er0GgVBA6SKCQtOu1ynAYtW0shQpEaqmKMp547QbtZIl0qcWRm\\nBp3RgElvoFapINcpqQuwGw6j0ai4d+MW89dvIzaaDExMY+jsZCcaRWPQsrW1RLezg437S2xvrWK0\\neTAorJRLNdZWrxKLrNDvc9MsV5l7cIdDY0Mkc0l++I2/JxraYmXuHi9/+Q+Jbyzw4Poit28+QK1Q\\nEtzZwGyWc/PW+6wsP0QmqaNTOVldW2B5bRmpsolMUcXps7C9nWR8+ix7kRR6vYGbN26wvDiHTitj\\nYfEBaqWKxeUlOrv7OHHmSbRaPTcuX0LeUnLo8DGqNLCoHYhleOVLf0J2ZwGnCYYPn6AsFvi7P/8D\\nPvD8h8mnM5RzJaqFyiOHs0QKTHYWNqJ87te/wJ/94W/xG5/8OTLVEgf5IoefOY/MaabH7aWaKxIJ\\nhYlFs8SjOXL5FCa9gj5/D97Obs4cn+E7X/5zguvrZIs/azTn3/uPL3Xo6ng8Ph7MPkSrF4hEo8gF\\nCYODo8jkShqtAg6bE6/Xi9lk5uH8Q1ptKJerJJM5JAjks0UOHZpBIVdwEAkxONDN2so6JqOd4EGE\\nQjHPqWOPcxCJ4fN6iSXieDp8zEyPg0SFyWRgaLCf61cXKZUafPTnPo5GreDKlVl8/g70Ri0PH85x\\nEAqhlAt0d/nJ5sIUakVCB4/KN7lMRn+fn1g8y9zcIoenTvK9V/6Z/Z0tOpyd6LRqnnv2PDu7u+gt\\nKjbWD3C5LGyuzaNWOukfcaJRGhjsG2H2wS2UCiVatRZBruPdazcxmfUE98OMjfoYHR1ne3OL6clB\\nPveZF6mWKrzx+qvcvXMbi9HM8uIyj58+gUYpJ5fOoFFraDUa5LIZFuYf0mrU6ersRCEXWF5aQJDC\\n6vICsUgcu81Ou1ogfrDD/t42UkGKVGzyxg/vsxDIMe3r4PiMjmLayGvbUc47lSytrWIw97K8uMbo\\n+DAKQYpBb0WpUCGjhdhoMD48SCR8gMlmod6qo9Vo0akVyNpNNlZX0RidvPCRj7E1f5OhsT5q1Tqd\\nfj87u3t0dnbROzCOw9LBxavX6OnpRZTJUcnkmA0WnC4n7VaDwYF+YskETqudi+9e4ezpwyzMPcBh\\nsSG09TTqEpKpGFMzY1x8/xq1cokLF57GaDWRS6RRK1TksiUiB0GcVieZXIJOj4tGrU5XXz9KpZqu\\noTH++eX/A4tOT7GQ5MzJs/SMjbAfDGA3mFCJah7cu8bEkdNYHA5uLq0SS0VpSeTYDRo6XBa0tLly\\n+zKWjk7cLg8ys5FaYpdWvYJZr8VmsXPs+Gl6hwf5y5d+h1/97OcIB3YI7YQYOHyYYGib1dmHHDvz\\nBIHgBtl0GaPFSi6R4SAUxeAyMzQygsFkoFAr8NVv/OhnRz589a++/NKvffIseoOSLr8PQSLl8cfO\\nUCyWKZVrJJNZ+nr62dncIho5IJNKIJPKH3lKms1EE0mUGj3HHzvFQeiAh/fv0unrwO3xIBXb3Llz\\ni3Bom0Q8gEzeQioq2VjdwuPtQq5U82B2nnSmyPDgGM1Gk/GxcXr6+ojFYricTsrlHNPTh5BJ5Ths\\nTsYnxtjZ2iaeiNMWFSjkSpLJNK2WCrHdpF7NolIJHD9xnB+++iOevvA4CoWE/f0dLFYTqXSMrp5O\\nllfn+fRnP8ra6jqxaJJ6rY0ga3Pi2Bni8X021jYZHhrA4bCwsLDFXjhKo9GiXm4ik1UpF8pk0nV+\\n6zc/T6VYRq83sb6xyMbqDvVmhYmpSZRyBaXSo9XjbDaL2+tBrlDg6nCj1mqZm31AW2xSKRfx+Fzo\\ndAaS8RQer5dSKU8hnySZSKDXmTFbZLz8/ctUi1Ka+Tyfe8bLD2a3qcXadNkKLK0UGDwyxpX353nq\\nyVNoTHoEhYJmvYFMraFnaIRwJIS7w0oun8Hn9RBPRElG41RrVSamDpFJx9lY2aAskbA8u8LwxBTZ\\nQgmZQoHJYiGZCVGvVTn+7AvkEwmKlRwqiUguk+Hm9asY9XouvvUup06dYWNth6efPker3WB/M4RW\\nrWJ1dQmjyYJUlHD78mVUUpH9nT1SqSCDvTNojFo0GjWRWITh0UEK+TydPi8qtYZquYJcrqV/YoKW\\nIOPpx07g7fZittiIZvNEd0L0HDpMqSlSlchpSqtUSlEy2QqPTQ0zduIx8vE8XYOD7AZDbK2tc+jw\\nGQKb2/RNDrO3vorT5yabe8TYqIkCMq0Zs9FEb083rVYVs6+LdquO3+eiv7+XVrPJwu1b+Px+Jo8d\\nJZutoLQYsVnMvPbtfyIS2OcgtI/X4+avvvH6z05S+PKf/sFLfdYG7WYDs8nA3Ox9ioUy8Xgcr8+L\\n1+vl5a9/k5MnTvLg3l0uPPkkpWKFpaU1avU2ZouDoeGJR25P2QxDA4PkcgXkghJBKsXusOPxuOnr\\n87OzvU5Pdy/1Zg2r3UKzVaVULJNO53G73Lz6yvfp7OxEpI3ZYkShVDA0NIDb7aBWq6NWqwmHwkwd\\nOoQEsNpcKNUqlAolKytxhoY8HJ4eQqMUMJvMbG1tUy5X0ek11Gp1Wu02xWKJp59+lrWNFRwWN889\\n/xSpZIa52XVcbgMb6xsoFRJabSmNep22WGd5LUAyXUUiFVHK2wz0+Wg3JaytxVAoq7z44otsbe9Q\\nrVXQqAwo1BKarRbDg8NIBTnz8wvU6zWUCgXFQoGHCw/JZDJ4fZ3YrTbq9RqDQ4PodAbi8SThYBid\\n3kAmkyKdTmEwWFDKm7xxaYNKsYFVZeDFE13805U1rDIt/b42m3ttOvqszC0FGO53ozLq0Rr0lIoF\\nZCottVYTsd2gkMvQbjRwOOw4nC4aLdgLhnjv8vs8duIw3/jW13n+xU9SyCUBCUaThXg8wc7OLj5b\\nB/VGm8jeHkuLs/R6PSwsznJwEMHpcHDr+g1eePYF9kMhxsYmKJSzpFNppsYPk8/nMNsshMJRhkeG\\ncLvdfOsb30GjlZPNp1EpLGgtGox6LVub65SrFdQaLalMAp1Wg0qpYm1lnYXVJSaOHmF9YR69wUi1\\nVebu+xcpZpLUW036ZiaoZWNoUZHPxYkE94ntRajXa3h9HWw+vI9GJiGbi2B3dROPRcim06gFBS53\\nJ/VSnVazRVts0iiXMHe4yFVLpBIFfH0jGHRG1u7d5+HCHIVsCpddz7Wb90lGo1jNDpz+DlaW5zl1\\n6hQ6m5nBkSGUKhl/+pWfoctLf/z7/+Gl81Nd7O8dsLK0js/npdPXzdrKNgM9fr7+ta8x1NfJE2eO\\n4XLayReqNKR1RkYHsNgcNJpN1lZnyaaSdLh8VFvQ1dUN1AiGd2g1m+h0WtqtNiMjh7hy6RoatZxm\\nrUmrUsXvdlMp5wgEN5BIGqRSaXa3dvmHl7+L1arj/t377O9HMOjl5LIpBKmG5eUVLjx5nu4eD6Hg\\nLpVKBkGoEo3EqUkVdLo9mPUGAsEgY6MDqBQtpG2Rew9uopBpUKkUuJ0uHtx5iEamBlkZn9uKyWBk\\ncmwaf5+Tl7/5Q7Y2D6i14Mdv7TE44EChFNnaKHPysQGOHjlGuhDhox/5ELFEiA53N8NDQyi1WjRa\\nFS5XBz6vh4sX36HdarG0cA+f102pVMLhcGI2WOnq7MbtcqGUqbDZ7JRLNexWGw6HnWI+T7VaoVwq\\nIJVLufneEg/nY/yrsy50Bg1by3He2ynzB//6CF5dF9+8u02fzczsVhqHpILOZsbn9lBv5ZFLZeRz\\nRXK5Mj989S2MWgsH8QxWpxer3c7w+Bhjk2Nsbgfo6e5Dr1RitlrJpUsEdncYH+lHKhV5/bXv09Pd\\nxZ0bN3nqqacJ7IYxGe10WDowaPT4R/sQpUp8HhcHyTB2pwOtzohGp0Jn1JGr5Jk+cZjt7U3S+TKH\\nJob5pS/8GkdPnsHb7UEik9BAyvLdOabGJtgL7JHPF1haWkSllKE36fC6XZSSKbrGhqmWa+TTbewu\\nHzNPP4ndYGDhnUuU0ll84zPUC3Bv+T4//z//Nna9ht3gFiNHThENZejonaCzy0etUqevr59IZAuV\\nWY1voB+ZIGdtdZeB/jEO9law2y08XF5D1Uyj1FT46t9+mYGBCRwdvUwfu4DRYWd0aJj7d67QSB0Q\\n3lph7OQ5TFYzuxvb1PKZn3gk+VPRaJTLlZSKZc6dO4fP5yeRSKBSKzGZDBiMWj70wjNIpCJiW065\\nUGV7Z4PBviF6u8e4c+cWSqWSnm4/EkmdVqvE1fcvsbj0kEq5RC6Xw2q1k4wlabcaXLr4HnanAaVK\\nglQQqVTKIHlEferu7mZiYhqlus304UGefOYI3d3dzBwZwW6zYTZb0ev12GwWZmZmuHLlCqvLa5w6\\n8Ri0RbQqNbmWgoWVEJffn+WVV9/B4LSxF9zD4/eyvLJGswFSqZRCtoJCrqVaa7C4ssjEyAhyQUGl\\nnMfr9VIolDFadZTLbeQKGY02+HoUKDV1EAVkMjkGvZn+vm52dtcJBw/QmZSodVo0mkdWeyqViuB+\\nlOGhMer1OgNDg+zs7RJPRKlXa3R0uDAajWQyGQqlIisrKzidj9bRZTIZcsUj6VEulxAECdF4jgIi\\nYkuFoi0hVi2QFVu0UKDQmhHkctQ6HfV2HUQp1VwBJE0aIhRLOU6ePs65p89z9tzjDI32Mj01xve+\\n8w+E9oKsLa6hNdk5fPg4yViU+/fuEcuUiCcO8Hq6uH7tHlq1hRdf/AgSBOwOK/FokOGRHqCNWq9i\\nbWMVu81GqVzmypUrCMo229tbtNtSsvkyMrUBh81PO1vG1+ngyY8/R6WQJRKLgFqgUi+gUqkwmUwc\\nP32KwEEIm81Gd3c3Q0NDSCQS7HYncrkSlUHH2toGOosJ/9gQdq2Kaz/8DoVSkYZMRN/vIrJ6h1R6\\nmaeOz/DqX3+Rm9du4FIbySYjGI06OidGSYR3CR4E0NttyHU24ok0777+NsuLK7icVsrlFAq5jla7\\nzosffo5iTcRi7uI//uk3ufDzn2f63LOshg/QaY3sBULolGq6u7ro7xsmEthk7tZ1GtUKconiJ47H\\nn4pK4Utf/P2XPnKuh929DXRaBRKJHKPJzODYEAsPb9Osl0EikI5nmJudo1avkc3leOvHFzl//jTl\\nSpnoQRinzUkmW6ardwi/z8+PX3udFz/0Cd594wrXr91kbLSXeqWK22PFZDLSrNTI5/PU61WWFpcp\\nlitk0lnKhRiCXM7i0ioKWY297T2SiTCCVE4+m8dg0iLIwO12IRPU1OtNfF4XN67ex+kwsr1XZj+Q\\nYXjcRWgrgt3uolQokI2nGeobQSrWSWWilIoNwuEoMpmccDCAKCpZWV0ml4/hcjhZmN/k8dPTGI0q\\nlheiuJwqZIKUZr0JjSqBwB7bgTWS0TzDQyOks3mkIlhMVurNMjqtmcuX30Eml2Cy6Gm1BJ5+6nmc\\nzg6ajQa9PT0szM9jNpuRSCRIBAGDwYTdYWdtZQWVQUu1XCZXTKPTafj2D5dotlsoZQ3iiRzJspxY\\npcYRq4ZiPcXV1SgOuYxavUquWqBcrzJ2bBhZTUq13sDZ4WF3Zw+z2YpaqSMWTxAO7DI63MOX/+KL\\nbK4uItTreNxeXBYDldQBOoOW3e0d9Got2Uyadlvk9u1bfPCF52g0ylRqBdZWdzDYLQwN95NNJXD1\\nD9Dj96K3O7lz4xZdPi/VaprtjQX0BjnvXbtEh9PLD/7xH3F19XP4sQvsbgfweFysL69j0RuotKoU\\nyxXUSjUqjRKpAJl0gnpD5JUfvE7f6Bi/+9JLLC+vMDnUyY13L/FgbomnfvlXaeQLJDa2iberPLx+\\nhyMnf4HJ54+S31pkdnaLfn83m8vzBJcWSaSSDA76+ebf/CUffPEjRJJNpqaGUCpF1lfXcNp7sbhs\\nvH/1Ov19I2gcdi5dfp9qNoXZoGFzaQ6nTs7eQYi+0VEuv/MjlAYlequFeCxJh7cTtc6O3GjlS3/z\\nrZ8d+fDVv/7SSw5lFXeHH7vDg9ZiQVDIEcUmu2urNKtVxqbHyaRTNFot7A4XY5PjqNRSQgdxhkdH\\nGBwc4etf/y5LK8tIqJFOhvG4rSjlVQLhOxw9MsDbb77PzMwMtaZItd5GJZPTarWoVGuoVFrWVjcZ\\nGR1FLVfx4P4iKrWGXLKMQoBOjwMJIulEHIW8idlsZf7BMqFImHK5yt079/nwh56mlM8ROYgzMWYm\\ntBOnLK2wuRml1ZSjNtZpS7T0j/VTLRTo9Hnp67PgcuqRyrSkiylCoSL1ep1GrUI+n2Cw10ezHmdz\\nPcPERA9IpLg6zJjNLqwOPY1qlqeffI5U4oC9wC6NWoXQwRaFXA2ptI5SpqS3Z4BarY3b00Gr1UBs\\n1whsb9HT7cNmMVOstHjz7bd54fkPIpXLaTYaHEQOKJfTyKUqkoU0NpWZ7723gl4F2Uobs8FJsVDg\\nE0cd7GXK7ESalIQmiUgcj0tHS2zQlAg8/+wZUrksyCR85c++yNLcXbp9HYQiBygFKVa9EbvTxeOP\\nncFmsmI0G6jXK7RFGBgZQy2XYTLJkCpbZLMxPL1jnHvqPKH9Hcw2H0ZTB5FEnMHBIZRaBWqFQCR4\\ngMPqpJhL4fe7iUZCfPNbf0s0maR/YIzhrkEUOiXThw+RCQV5cPc2CoUUtVxgYW4Ru82KTGhiMRqp\\nFku4vZ3EY0msZitGow1PhwelSuDTn/4kA91dyFHg6PKi0isxKaTIZNDf10diZ5+nL5zgb/729+hw\\ndvHD1y/yK1/4Za5eex+5VEENAU9PL0aNAbffT3gvwFCnhUtvXWTq0GG+8c2v0hCjXHznx5w9NsnO\\nygI/+udXOXnkJJFEDKXcSFdfP3P3ruE1WYhFtlHqFSgxMjp2EplY4Y9+7z8x0N1NpR7mb19+9/+i\\n7r2CJDuvO8/fTe99ZmVWZmVmedNV1dXd1VXVDt0NNADCkiBIkATEoRElUSPNaCe0oxlNzM5gOFpp\\nJIpOlFmKRiDoQBIg0DCEB7qBbrQ31eW9Te+9vXn3Adx9mNgdYjdiIsQTcSPud89n7ss5ccz/O+c3\\nRyn81Z//l8cfu+8wba52VlbWaPN6yGXS6LQKSrk8xXKZnsFRNndWsZhM+H1etnZD6HVm2pxO5hcW\\nyBQzjIwNYLdpqeRK+L3t7N83xvyNTUZH+ylXWxw6fIjQ9jx63fspy4A/QCYTo6OjE5+vg4nJKUZH\\nx5AJElJLTl//HrLpONlsmmw6SzyRIJfJkMpWeO70BbTWGp997Hc489YZ1tbmcbh0tGp1spU0qbjE\\no4+eQllNI5PJaHM66Ou2kEm3uHnlKh0d7fzkp6+i1+hoiRqunL9Oo1xmdi6D3qjHqNcSS0YZGx0n\\nloji8zuJRotUihIPPXwXhw4fQ61SszA7x8LyPHfedRf9fT0UyznefOMMjz76KR31Bw0AACAASURB\\nVKLhOEqFgWg0hFIlw+3toFSqEdoN4Q8GmJm7hlavQaXU0hnws7q5ya1bt9DIBKLJGKjUaFVNiuUK\\nYr7I9ekQHp+exWiNUq6GzyzjU3eOsLgeo1Bo0dVpJxNvMBA0UG22GBrsp2tPF3JJoJarc/dDD/GR\\n+x4gnkrT09+Ly2LFbDYh15lJpjMk43EWFufI596vBVFvFLlx+RpjB4+QyOXoHhqhnMlz4/p1nB4X\\napWEVqvg5oUL2JxmlILE6uIyDYUCeatBdCuJp93MwvwSB8cnUMoMHD31AE888W08The5bAJaGvqH\\ne2nJRaoFiYC/A4VWRXgzTKtRx2K1EU3FiIRjqGRqZhbm8HhdqHV65pYWcDsdvPrOWZ75yZOM7dmH\\nKIiYdSp2565i7RxAptNycM8B4rEaH//EYzz38x+hN9npG9yLO+jDuXeYS2+dZ2xiCrs/QKlcpmd0\\nlJZCzp33fhynu59DJz6JUq3E0ebC6PVQyZdYX5lhcf4mKqWCof5RRFGP1uamr6+TUq1AqpCgkM8x\\n3NeLAglJqeD/+MGrvzlK4e+/+ZXH+20qvO0eNjZWyOdLyCWJpflFbjtyAr3eTDgcps1hZWN9BZ83\\nQKFcwNcRBAGcVic3r11GKYi89epVTt5+iEQ8RTyUwOwQkQtqMqkM0d1ttIKGWkHG6EAf0XgeZGbq\\ntRKetjYyqST1aoWtrQgtQSDY2YlabaTD34nb66JYqFIVayhlVu644wj5dJO/+5sn6Oxq5/4H7iQW\\nS3PvvcfY2Yqxup1mZyeEzShjYuIgZ9++RjSZR6xHOHzoOLFQlGyxRnvAx+LyOquRMiaXjrWNAsjV\\nhOIhdGoV5VKDXLmKXFMDBNLZMvd++G5MBgdirUUkssu9992LTm9gZWGRWCRKV9CPyWQmlU4hUqW7\\nb4DJyaNEoymy2QytpohMaNEdDLK0skLv4D7qzRaxcJRquYpcJqCwatHQwqzREN3Y5daNaSTJCK0G\\n6bzIh4bbsHe50OQiFBoKYsUKoiSREfMc8Du481OPsmegH61cQBRkKNVq3n3jLHNrq1yZm0EhkzNy\\n6BDT125SESV2QmHsLicdwSDjBw9RLRcwGnWY7XbUDi9Wi51yPo3dYUQhk5NI1ShWqyRiEdz+bqwu\\nN8jlCBJ42ryoNTLeO3+dy5cvc+G9a7zxyhmuXb/JhXMX+dznf4vl1RU6B4ZZ34xgd/nQaU3sbM2R\\nSmcZ6O/B7m0nlcujUOgQlBKdwQA6jQqlSk2rCU/+7GkeeuijKFpy9u7fyz2PfAKF0U7n8H6kmsh2\\nKEStAuuzGwzddQd6g5btpTm0qgZyQSSVimI0qVg5+x7Dg/1UMznefPFFHK4OJJmAQa/jnTdexB/w\\nUojtEI9usxMOMzI0RiVX4MT9x1mauYG33Y3VpGH+xjvMLM0grzVpb2/DZnOit9kpFmvcujnPwJ59\\nfP3bv0HZh2985S8f/+idLja2ZpApWiwtr3PnqZMUC3muXrjGwvwy/rY2Lp6/Tm9PJ+ffvYzN6sBq\\ntbC1tYZMVGJUy1ELLZTyFtubu+g0SjY3bqFXK4iHQ8xOTxNLlIlnEzz30jKvvzWDXmsmk1kmGolR\\nyGVoNovshteoVRpIjRoCNQShBDQwGi3sGRlkfi5CPpejb8CDWi7QP+yjVK0gCRIdvnZmZ1bYDS+S\\nysLySgW5Tsn+Pb30dndyc2WdXEnFK68t8q//4AjxRI6D44OYzCZeP7vE2mYeg05BPluiUBa480OH\\nQKHn5Zem6exxoVGaiIRjWKw6Ojv9VIp5sskkVy9d4tb0LQ4fncRk1FEqljFaPKRzJUwmK6VSkVsz\\n04hihWa9iL+jnXg0xezCIpWayM5uCKVMxqFDE9h9HmQKBQ6jiZe+/R2WLi7i1shIJ6tU8mXIiyia\\ncvqlPKOdLt49t8VRbwc1s57fPvlxBrr3Y/QPgaeXUl1HRR8gXtxFWatx5NRhPO1upvaOUatm0Aoi\\nZpOG0OoyjWIGoVFmaX6WdDyM2WrnqZ8/g16p4ZmnfoLVbqHHH+DypZtcvXiRwT4vVpOWdCTK5QsX\\naZQLGDQKNBoZZ99+i7/4s6/z9qtXCce2Wd9Iky00qVckYrspenp973c4j8bo7Q3SaClwejoxB/rQ\\n6FXshiLkC2XcTgvZdJJMKkelWMNsMXL54hWUGi33f/QBVubmUes0bG0nSadyLNy4yrWzryJWyyhF\\nLdnUDkvz19HWRd745et4Ot30dXfynW99i717hpHqIlemZxHkGjbWN0imthGFBh6Pm1azQm9HBzen\\nl2hztWFyuFhbXcKpVSJTtigVyty8PoPD3kYuVyAaT+H1d9HV1U2z1WJhYYFstcjMzDQjo0O8/vwz\\nvHxh9TdHKXz1y//58bvGx1lejPPwQ58hHEtx+cpNTt5+B3qVEqXQolgsYzK3k8vn6e3u58b1GVrK\\nGp29Aa7dnMPn8yEpZNBoEdmJUCvLiURFItEiizMRBHT4Ak42N+PsneojnE7RZlUQ3YkRTuTo6+7G\\n0xGgVBVJhAvIlRIb62ECAT9iq4YotqhVFExNHaDeqJLOZgjHIrz84iIKpcjszDzedju5dB6DUU/A\\na8HrNjG7FCOa2WVlZZ1GSSKSaGEyCdy6ssZnv/BxEuFdKsU6i2vbCHI9MlmLlijQqKtY29hhcz1C\\nOt3A3SYjtJVEJq9x/32nOP3c8yyvLFEpptBrNHjc7QgKE7Ozq4yMTfHTZ04zODiM0WjC2+bF296G\\nSiWj1ZB4/oVnsTlcdPd0I1MqaXc7KeTzxGNhXj19mpHufmqSyLVrS+RqIDrbyW7pEaMNRj77JcYe\\n+SM2Vq9SSKgJNiU6K1t0uydp32Mn/urLxJaX6bjjU0RzVTL5CgFrgHq1wMbmGkqNgbnrCxiUesql\\nOpsb26i1Jm7MzYFaxf79Y9jMBurVGuMT4/QNj1CvpvBabbxz7iqirMnk4XFeeu5ZFm4uMrhnDwqN\\nQF9nkKW5WZaXFjl39l0OT+6nf8BNKBqnUmvSEMBoUVHKi9x771Emj40j1UQEgxW93crCrWuU49sE\\nO/34gl3YDQ7UGjmlWhW7uQ2X00MoGqNUKKHWmXC1e6gXimgsRqrZNJlQkkqtzL2PPoxOp2V7bo1j\\nj3yKyQ/dTzyao93fwZ7RfjbX1pjcN4E30IkkgqsnQK7S4NTDD/L6S8+j1mkQkFOrllldX8Hh8LMT\\n3eDGzRuMjgxjM3uoKvUohBa70ThGo4ODR29DbbTT0e4jlozy3nuXkTVaVLI57r/vPjLpBGMHevj6\\ndz6Y+/CB+j78z6Zer1P6z787TLVaZqB3kERGIptO4m13YVYrWFlaJl8o4bR70GgFdtaXKWdE4rtx\\nbE4dF25FcbuURLMSfUN7cOokOjqcvPLqm0RzBmpiFbNRwz13dGEzmAitzmPQ+giliujMJRRNgXxF\\nzvB4H4sLa/T0dVMt17h58yZ2u53u3h6a9QZvvnmBvXv3kslEGBgc5rlfvI6nw4/JrEallZOMxun2\\n9xBP7BLscKFQanni6ctIDZHeHgPdnXYu30jj7bCxG0miqIv0+l3IlTV0FhlnLxbJFquYzFqS6QZm\\nXYvhbh1BnxuL08qPfnSJ++8f4+SRY1y6fpXXX7nCo48dJpVOo9fZkalU6LUO5AqYXV5iYuIgjaYS\\nt7uds2+/QbmU45FHPonQauDv7OHChQuUC1n8bW6WlxdRWWwILRGdQkdZqPLS90+TzZWwSkYurMXZ\\n6xAotXsJZLc5LxxkrCtIz6UX0XV4aVeraTpcuLQGUoUo2WgKu91OPp7BPjWI78AoC+UtejoDpFNh\\nrO4uUskMbo+LeDyK1mgCBZi1WvoHR/jej3+ATiVn78gYbpeHmljBbDYRDSdBkuMJBEjEIzRyeaxt\\ndi6++zZHTpwCUcHM7E1q9TxBb5BQNkktlyUejWE0m+gd6EUQBDxeJ2889wYNscipD3+Yl0+/zORt\\npzAqJBYWFjh64h4KYoXVlQ26nG5AQC6XMz17HbPBzG4sgi/Yj9PZxvnzr6Ns1NEYzIzf/mF0RgvZ\\nTIyF2UtoFdDj96MzulDr378fI0dOPp/HaNQiKZVMv3MGb3c33YP7kBAplErIhRZSI8/y9CIygxFf\\nwEMimcbp8iMIAuVMDLurA7lMSaWSw2AwsLy6hNWqxm42kYjGCPTsRdDp2N5Y4/XTP+QLX3rxAxVu\\n/WdhKfzln3/p8Wp0A5XChNsdZGHmOvVKiXKxyPrKOul0lnvveZDX3nyNzY0wNmM7sfAatUaBrVSR\\n3t6TvHJuDbm8RSWTZHc9SSadRqKFQd1Cp1RTSBdJ7sYoZjPML+RQKCtk4mWkQpX4co62TitGk5mr\\nF29hMCjp6u4lnc7S2TuCUqkhEU8jVwt093ZQK9SJJjaZmNzDHXccQqOGrdUIjYZEPJnA5WynUWtS\\nqBQIJ6sUq3W0ahVL8wnaO+xEk3FWl6v07rHR7rOztZ1A3tKyvJGkKbVIRGu0hBYOq5aTR6ZYXg5j\\ntSsJR5McOTrCofETfO+7T5KIlbnzrkNk0jnKlSoDg6PMzy8xcfgwzUqFhblpytkc1VKWlaVZ7jh+\\ngheeP825c+cYnzhGe3sHNbHKjZs32D95kEYpyfzMZeRy6O70cPmXZ6iWWrgUeVwKgRJybBWRhYyK\\n8d/+IvLn/gZLuxybwYJjagqZXcfm8lWMK3E6/9MfYv7k/ey+8Sb6dj+ZapbgQIDlzRWCfh/Vap2g\\nv52dzVW8/iD1ZgNRFJHT5Bt/9Zf8m3/3p4z0DHLuzCvMLC8T6OhAIZNRLJX42S+eRimv0NveQ7KQ\\nolKX2HP0CGsz6xQkifGpI5isRs6ePcPkwWP0jezB62kj0NmDQmEln6vg6wgyePIONhaX6Q0M4bDZ\\nUaner6bU4fdQbUElk0XM5shXa+g0GlZXlujfswdBrkShVDC69wAKtRq9RUsyso3UquN3e1BILV59\\n/mfc9qGHsXu62d6KsLE4z8VzV9k/sZdiNYdOo4KajCtXz6Mz2dlaXEOlkVHOJshnk6g1CmLhKG12\\nJ12BALFIBJfTQjy0Tjq6xUBvL7MLt9Bo5Tgdeq5ffIuh/m6KxQKRnQ3y2QR//Vf/DYvBSKNS49Tt\\nx/mzb/7sN8d9+IdvfvXxf/u7H8PjaScWTWAzGjAbTTRqDbztbajVKhI7CeSSjAcfeIC3z79JtqjA\\nqFLzsU8/xJGj4yQLST7+kTup13cZGuxmY3OLqaNDhHYTZHNV9AYZRpOOSKxITSZnfaNFKlHGZtFi\\nNLeoiTraOkzY7FYOHx4nly+hVqsxWsx4fV6cdivFconh4WEW568zsmeYVkPg6WdPU682cDscHDy4\\nD40OLl2+iUatZ3M7yU64RgMRo0lNJinR06MhHKqQLLZoNFrUslUyiTxKHZSbasoVBZKkwOIwYFRV\\ncThNbO5kkSOjWqni82sQRCsrq7PojQJqhQy73Uk2l+P4iTvxB4PE4xkksY5Wo0Fs1EklI1TKRUaH\\nhxk/OI7L7WZxYZOu7h5KpTwdgXaq9Qob68tkMml0BisdHQG2pi8jCjWMLagIEoWSilqzQalcZcDq\\nJRjZQqdS0h6LkFVCn9FBbi2CZnKI+PmbaFQaSqkKht4RarEYyl4T7UEfr738MscOH0cmNWg26py/\\ncpXJw5NsbW1i1ssYHu5jZnaBbDaNx+0kV6lTLRURWi06u7vwBXxYjEZsZjsuv5/tlQ1yiQxSS8Cg\\nUZMuJlEKMDo6ytrqKp6An2wqirPNg9nVjs1s4rlnn6WwnSIQ6EJrs+Ee7CeXSrC5tkE4HCJXqGK2\\nGpAEiZvTC5TKRY4cniQcS9KoNwkE/axs7HJrZp6RkWEK6QxLC4ts7oaYPHoYhUqOoLagNljpaHNR\\nTUUxtwcoFxJsLs3hbvOwuBnBM9DJgYkxDBolncfGSawucunyZTLpDONTh3nh5Vc5c+Esgc5u/AEf\\nckFEhsTy2gZTRw6xurpMJBRCJodwLI2/oxu7y4XeaKR/cB91sUlLJrG4ssMPn3vnN0cp/O3X//rx\\ngLGBUa9neWkBhVqgt78Pj7edy9ducPT4SZK5BLVGje8+8XM+8eiHGd3fjbffzvKtFV56/gwPfnQM\\ns1lNOZfjyMkpPvaJj5KMw+e+8FGkWh4aeTLJGrW6gUq5SjCgw9tjJV1rkshVkQQFQ0MB1AYtK7NR\\nLl2+iNvto1AqMD19i7EDh6hV66ysrtHp81Mo1SmUK/T2deB02NhdT2G0KlBKah788G3U6hIul57e\\ngByFIJLMFDEbBAwqNfFImf4eBStrNdKVGi15C5VaweJGkWodRGoURQmZHG5e2yYZreBpb0BdRi6b\\nYvrWRVxtNhQyFR+6+y5+/tQvicYiuN02vvrlr9JsNpmdm+XVVy9zYGqA4dE9fOhD9xBPJJhfmEGh\\nkHHoyGF+8pMneOfM6yhqRQJeL62mnD2Dw3QPdPPCD39MNZJAgYBckKOXyVFqRORyNXajF/PcGUJy\\nJVWtg0Ijj9waoL2pIZWOY9zeJqOoIBZStGIxik4fxok+sqk1lC3Y2Y1x9NTdJDMxrl2/zNFTd1Or\\nVhFrdQqFEqOHjxFeXmc7GkapN7B3YpLAwBDNhkQhFKFareLpHmNrd5WV+WX6x8awui0YtXJ0FjW5\\nZBizXs3i8gIWsxl1q4XL7SGXzfLcz59m7PYTFJNhfF0BlhamkSsavPbSC/R39yMWc1h1esw6K/FQ\\nmtXlVY4cP4bH7aZer9KQlDjsJiLhbeQaK0dP3kuyWKTN18WJ+z/MQF8fc0tzPP29Jxg5cAKXP0A9\\nl0esN1CIBSSDEo3MRLlQo9ws42lzsTW/Qlewk+nzF2gpDOzbO4lOaaStrx+Nwkj34BgqAer1KrRU\\nVMotZGoFmWQcvd6Gy9OFyaKmXq5RrVSwOztoyVVsLExjUqnZf2QSqZLnm09+sJjCPwuYsyATUKla\\nrC7PYNQpqVaz7O6sc+XKZQaGhkmkcvT09GC22jl6ZAydWockSSwtbvDWW3MM7fMjSDpC4R0mDg7y\\n/C+e5t1zb2K3NnntlV8SiWaot8pUik062gRkMtAbFdjNNra28wR6HDSbeWTo8Ni6WVi8xZ133EVX\\n0M/w8BD333sfN65e4tDUBB9/+GOUqyJiS47d1o5KriGVzPDRj79fot5utzN/axalIBLwugh2+tg7\\nEqSjzYhOoyaaSiJTtlAIKqxWGQq1gnIN0pEG/Z0qtIYGKq0MqSKyE61Rq8nYO+pDrCmIJvIEuzrY\\nu3cEnU7H5kYIj6+dT376IWw2G0/97Bc89MhdBINu2v1ejhzfi8WoIxNPkk4kaTTrpDJprFY7i3OL\\naLV6ms0We/fuQ2w02FnfRq1VYTLb0GkUlFvQUsgoNbRsbDSgIdEo1EhHN2jqwKbQYrSaEDU6NIkd\\nQukQBnmL3XITo9ePq7efDo+N4vUr1FY3uff+++joDPKv/pc/QGyU8Xd3cvj2kyxMTxPw+9EaTaQz\\neQq1Og6bncHBfnoGe0iuriGILYxWK7FcFp8/SGJrjWg4zODICCa9md31bUKxJCqFHqfDzc7yDqVs\\nld6R/czOLLGyMI/RqEctSOR2dygUClSlKrVGBafdhowWoqxJQ6PE4uvg5XNvMjC5n+10lkwqTaPR\\nYHV9A51eSTSRZmjvOE//5Pu8/NxPUDXqtHJlqpkCG6EYY6fu5ot/8r9i00nUM7sodUqswTY0ThsC\\nKowOP0NTxwkODnPxzfNIMi1vnrtKh78HvUyJ2mpCZzezdO0SalWZUnwDu1lHW5sbpc5MQ6Yk0DFA\\nT9c+Au1dNGtFLl28jK+rl914nGSyiEHnZGBkPzK9id3NEF09Ax9YHv9ZWAp//l//0+P7glrkChmi\\n1EIQWuSyRcYPTJEtpFldWUFQiigVCvoG+tnZ3aFQztHZ1cncXJiuXidqtRqtysPszAonTt3Ge+dm\\nkZVTyMUWgrKFs81HX38fuXSa5fUyWq2JNo+Mb/zT94hEsmTScS5ffL+e4vJahLNvX2d1ZZlAZwd2\\nix2bxUwqmeLWzRk0ajn5QgGbw4XFZOXg+ATPPvsT1BoTfX0+kskkraaeS1cu4Q/uweuzo9HCxkYc\\npUpErzcSjRbp7G2nJTaRteqcmPBy5OAQ12/FEaUmWmS0JDBqWrhtTdztZlpIuJx2Bnv7KFcqbG5v\\nYzYbuXL5Ots7YexOC5tby2SSOTq7upk8uJdUeItaDXp6+vH3BXnrzFv09PRw5u1X0atVdHcGsVv0\\nbG8uMzw4xgsvv8Drr51DzDVoZGoolQ1uLdTxdyrYqluZGHYwdPhuknUlk9oWhcIGFsmC/sQpwqIS\\n8nkihzuY+si9tGIVrkW26H3so2QcStbn5xkaHqMhiYR2trh+c5aJQ0eZvnIBk9mEUq0iEY+wvbGK\\nRqMhHE9wYHKK62enqTYVKFtKhkYCfPPrX0evVWPW6akV85x+7mmGBvvRa7WsbO4Q24lSazU5et+9\\nlLIpwskdhkcP8qMnn+RTX/w9stE0gyN7KWbrTN3xYcSaFofdh1auxWiwkUokGB7cw+r8Ldw2Pe1e\\nH/l8AY+nnWQ8gVYj5wdPfJ8v/sHvY7ebCIfXefGXP6Un4MZkctMsNVlZjfDWCz+gWslQTudxu+zs\\nLM3REeygUK5y88YVujutOEx6jHYnIwfG2ImESeWqdHjauXT+PXp7utFbzKgULb75zX/g9uOnWFuZ\\nZW31FvsO7ePc5bOUK2l0GiU+jxeVTIXdoGB7YxqtTsbpl19HEMBqt1Mu1/nad37xm+M+fOubX3v8\\nDz/3IAqVCbXWiNhsoNWZSMQyhENruJ1u5FoDcrkcvU6HwWKmXGtx5swMd903ytZymKE9e3jnvfM8\\n8omHWd4O4233sT47Q7pQ4cGHH2Zs/AANhcihOyaRail6+uxMHJngx09+h4kjQYxmCwcmRognwjh9\\nLhLZEG6fE6VQ49LFS8zOLeJ0+6hU6xyeGCWdyaJV61lZm2N9YwFJVKJTC1htFrRaG8hKeL1+Dk1O\\n8df/+4/o6/KQKZSZW2wgkwmUSk0KlRr5TAWbSUV/j5ubC+ssbxSRK0Eug5ZMjlIGKq0Sn2eA8NoW\\nXr9ENh+lVmsR3syj1jSYu7XMwICXkZFBGlU5SHq0jTL1TJ6mVMbT7qJYKvHUj58mGc+jVZu5/0P3\\nEmj3EA/tMrtwha5uH+9euEE9H2dowMvFC/OYjQok6mxEYTSo4MTHP0f/iZM88Q/fwHdgH1dtAaQ9\\nB3DuD5J8/g0m7zlOaOkCGpcbrbefks+EfeQohYKIWBWZGu6gmc/ywlM/4sRd91CpVrly5TIf/cRn\\n+fGTP+alF07zhU9/nj3jB9ALcjbDO6RiZVqNGLemr2Jus+Nq72Z4+CBqHXj7ehGVYLeZmLtxEwkB\\nf2cQpUKg0xsgsRGhqVFiMzhIb0Yxm81cv3COtmAPtWKRerOMVtkiGd6gUcthdunZ2drC5XZQqlXR\\nag3YnS7mpq+SiIURm018jjbEBjREkVxDzvnz1+gJDDI+dZxbN2fRG9So1DIsOoGBg8cIBrvJlwrU\\nKg06e0aoUcfp9eNu8/Hu22/Q1z9EOrKLEpFMNkVofYveOw/gMRmZX5tHp9XSqtUZHttHLpfn+s2r\\nGC1mrEYb2XwJg70NpVqOKDVAUKBx+ekeuwutyobdocFoUHP16lV6uwf562/9BgUav/znjz8+5FOy\\ntLyCRqVGrVJw7epNTpw8Sb1exWZ1EPC7efO1V7FbLQiSHKvJidWiw2JRIpc78fg9iC05uUSMUi6E\\nUqlGYbbS5u9gdmaB69ev4HJaEGQ6NMoU9UYLpcXIwpVb3Hfqw4R20sjUdbwBGzSLDPaOIKOFTNZA\\nkkT0BjMeTzv5fJx6tQ5SC4NZgcPqw91u5a67jlKtlinlmzSkBivrc/R09vH33/ouf/Znf4TRIqdV\\nFzl/OYTRCoICKuUG6I0kMgLXb4aRFJAtNlHrBVoNOfmiiFYjoURGPL5DsMuBSlDjdraxE4mQymRx\\nu2wMDvuYmJjgW39/mnCoxmC/E5MGXG0ujt5xnFItCwjcujHP7PU4K2uLGBxabl6/ioTEgdH9BDp7\\niId2cflsRMMi67NhLFoZqXCTgSEHxUSTn790mbdeeI2JgJ6E0sFbL7+MphRh4PA+grePkFKIFDYS\\ntBsc1PvtyAQVC7tpREGFxWamd8TLzOIcPcN7+f73f8jqxhYnT54iXyowPjVJW1sbVaUGg9ZMKV94\\nP8U70Eln1xhTU8dALlGIr7K8Pk17m4/wZhin2UapXsWot7F/7wGuXrvCwMAQly6cpXO0k1ZdoL1v\\ngNmLZxgeGyASi9K/fx8bM7cwqWRcvTXPvvH9CDJIZ3OUFQrmF1YZmzxKfDuCLdiBzWjG7e3AFwjy\\ngye/h91lp9EScbg7OHb0CEvzM/h6OvB3+tEKAltrYVwd/dTT2+hUIqlkjnaXh8W5W2xsrLA4PYNZ\\np8fn91Gv1HF5fWxsbDB37SaTU6MIzSYLi3Pvu1bnz9HmsCPXqGlUakzddozOYDdymQ23x08xnaLN\\naoKyRLFYw6ATSCe2UCkkavkqO9sh0skkwd4+vvqtD4ZoVPy6CYIgfA+4H4hLkjT8q2+PA78DJH41\\n7T9IkvTLX/H+FPhtQAT+tSRJr/66MzQaDR0eF13+AIX8+6CNw0emOHfuHdRKFd2dQ7zxynn2jkwR\\nCado98p5842XMBht6PVGxvfv48l//BkjY+0M77+NHz15AZkixuDYOMloAZe3jXPvJjl0zEulVKam\\nciA3lnHr7OybHGdpa4VsPoau1aRR0dMoaTlz/nUOHBxErXTSprPg9gRZmF8mlQzx6gtnOHRbgNlf\\nJrjn7jtYWprhpz94jo4eG4cmJlE3RF6/Fmbu3Gk0bTLeu7LJ5PExEs/N8+lHhvn5szOIeiUqhYhQ\\nElDJJIoqOaubVfRWE+ViEZdVQGfSkQ6VCOxxYtDC0tIux3/nYexWEwsbReIBowAAIABJREFUeYql\\nCN3dXrQaC2+8dRmNXsbRyQGcRg25Sgq1WcPPfvJT6vUai4vbqLUafF4lHZ06NAo5ExMT7G7vIKFk\\n+uYCTreX9u4geyfkrK/PsLpa4pFH7yWWLPD92XeZvLOT+++8HZfLwb/8k3/k0IEefD47BoOSlsWP\\noZXF/PnPYJPX2Wk20OgM9HRI1Mo11K0WZrGFUG4QLcX4/d//Y8wuO0szN5mbvYXT6aRQLGO2pGmW\\ntXj2DLIT3UUll5OvxDl39jrh7R0GR/fQMzqCRm/C1KgRDe9SrVUw2CxcXLjM6vQNYpE4+yemmJ9d\\npDsQYPvSRdo796C2d2BwbyNvNOgc3Eciso7ZbufV06cZ7OlhO57AYDXT6XNTT6ex2C3Imk0SySQa\\njQ4EOcNDg0TDIWQKJdV8jgvvvo1EHavZRTFT5dLCNfYdGGdpdZ5UZJWeQA8KJHKVKlq3nfhqksNH\\nJ4nHk5TqMpw2J1qNEYfdRVd3B75gB8trWwz1jnDrwi06Bg+gNluoZZK4u9qo5qooNFrUGjnVco3u\\nnn7CG4u0VHKaqhb1fIlSqYFer6KuNRCKphkY7KOSTP86Mfy/6YMEGp/g/S7S/z19TZKksV89/5dC\\nGAI+Cez51Zq/FwRB/usOkJCo1htYrFaMZgPj48MoFSJqtUgkts1rb7zA4HA3Wj0Eu9vJ5TKMj0/h\\ndlo4fvQgf/u1J3n0sY9y4MABzr33LlOHb+PkqSOEN9fJpDZpVAt88uHbKRfiREO71LIl8pkm//Hx\\n71NIpXj37JscO3qSZKLE5UsLqHV6fuf3HqPd200uW0Wp0GMyK9HqarhsXsb2BVDLHAwNeqhX3g+K\\nuWx+gs5uFufWWdoJM3n/RyjKW/ze730GlaHBZz/5p/zRf/wCNouDjz0yRrPRopyXU22WaNWrnNyn\\n58iYgXIij1QFsdokEavQkkGx0iKWqGF1yFmcXaZSrlMo54hF5cg0ChY3N7l+Y5WBrmF0BgGzy8XG\\ndoKvfe0pItE8DUmDo93AkWPH6B+14na78LS1o1UqCW2u89xzz2EwmensHsNhbyOfbzEw3ol3UM+F\\nmwsshVZwWAw43Q609iroauQSORKhCHPT1+npdFNthlA2KqhkKejowCgvoG6k6DSJtFmqCPIUb0zf\\n4OgnHuXwPXdz4fwrPPPEP/DO6y8Q3dnCbjZht1kIeP3UajWSW2vIpQbpQgm50EBu0vKxL/w2V2/M\\nI5eUxNYX0MirXLtxlv0HprCozByaOMKjn/k0B48epJ7PcPT2O1hZuMozz/6QjvEeVEh4VFr+6e++\\nzOrVC+i0Cg7sGWLi6BRtXR1MTe1npK+TjjYjzz31bTxeK4ndNZwOC3aHmVariVajo7N3gKGRfZjN\\nVg4enKSY2eXaW8+zuTXD8ZO3s7a0iFenIegfpCxK9B6cYDeyhdtpYXzfOIVKHafXi0ymoCRBrlbB\\n09PJ4Ohe3nnjPD5/D9lciUBXJy6Tle2tFewmE7mdEAa9kko6SqWWJhJb5/w7b1CvVhHqcmSSjGR0\\nC5dJT6OQp5VLcPzYPtpcViTNBwcpfpCu0+8AH1TNfBh4SpKkmiRJG8AqMPHrFlWrVXZ3dwmFQmxs\\nbBCJZBge2U+50qQuyvAFelidWycVC6NXy8hkcvT3j1CslTh77hLD+/y89OIzFPN51DotMrWSWDSF\\nwy7HbGggFxsUskkKuRC7mzuEY7uUS0k++sAAFpuaTLLC41/6Cwb69nLPvXeRSYZYXV0mn42j14us\\nr83z3M9Pc9fdd7MV20alMnHt+jTZvMj69gqpfIme4U4OHtrPxKET7N07SWfQxDe//r/RyFbZWFjm\\n9Et/x/PPvsGekT2YtVqCdhGxKUNrNNCSSXQHfUiSQF+/lcF+Gx1OLXZzi96gQK0ZY2UrwakTdzO2\\nvxeosrISoVwVSeUEfvLUNNGIjGO3dxNNJnj2pTfINcuMHOjisc88wonbb2dw4DCSUkEyBSJqHE4r\\noUiUrUiCBx95hCoiorKM1eFFqdbR5gmiMsoJ9BsJdjkoilVUkgmz3oWyJuFxm0kXStz3kVPUC3mU\\n6TQ1g4jWJWNrboNaK4nBLFCrlxDLJZxaJYsXL1Lb2aSWSvKhBz/O8Yfu4+P/4jEQ82TjOxRiMTbX\\ntrHb3OhUcjLZJCvT72GyBXA5vTz9ox/wyG89SrFWwuFwsLC0Q3fvOJlCnhdefoXt5S1i6TzJzV3c\\n/b1ENsMcmLyNBx56kEeO38PnP/U5ypKS8anDuHr72NzcZmdrk3/87rdJ7MSoCTC9vI1M7eTUqbv4\\nb1/6r+TyeUKrWywvL6PXaJk4dRdSJc+tS+9gNunYDIcZGJ5kJ5pEi5zNpTkmj9+Gsb8Hn9dKs1Fh\\ndz3M1KFD5GIRzp55ne2FFd46/Qq+djdCMoZRJefGxfPk4klsPjvLyzNYTSoKiRA/e/r7hHaWiUR3\\niKdTpLIJZFqBC2++RYfHxYF9o+RyuyjUZXRKEYPdi9FmIl+M0Go1iMcyNJsK1mZufEAR/oAwZ0EQ\\ngsCL/5378FkgD1wF/liSpIwgCH8LXJQk6Ye/mvdd4GVJkp7+H+3f3W6W/v3H+sgWsgwN70Fv9GAy\\n61ldWoamhMNp591z51Ep1BgMBlpSjf49AdRKC9cuT7O7vcbevXvJ51JUmwIyQY7NYiYejRHo7qVY\\niGGxGtley5BKr2A1WzCZrSws7tCs1VEq9AyNutAqnaxvbJIsxRkd3EsqmUFo1dFoVNCSSCbTJBMF\\nWgoZGq2eUrlFKicikwlk0zm+8uV/xfe//QSjo3s4edeD5AoFQjsLVMtFUpkCe0dvw2TTsrY5x9vv\\nXOPVNyI0FQ2qJdBoWzTqoNYoKZcbaBSgM6lRtWqoNJDPKnnouA+LRU4mVebstTAWixyfV89DjzxE\\nvVagXi1RrghEw7t88ff/Dbl0jrnFOXZDa4iiyIX3FhEEiXg8yTf+6kvk6iVmZm+htbQx2DmA1e0m\\nmUhhM+u4eO41zp+/RldngFwmSksu57ap2xE1VZq1Kv/h3/2Yzz92G0ePHaSl1BBJhFAqjWjVGpQ6\\nG4VcHotJi0oFiXgag1bDgduPkU5nadVr1BpqdHYHSzPTWPQyMqks/mAHcoWBWCKKWC5RzOYo1ot4\\nvV72jo5Qr1SZnlnm4rVrBLu6Gez2kk1GUShleN0eGhJcm77J3NUrfOQzn8fj8NCo1cmnk+TyKd58\\n9TW++Mf/np//+Gn2799Dm0WLe89x3n3+pzi8TvwdnSSjETQ6FxqnBYUISlmLSjVLqVTEYjQhtgRy\\nuQwrq8t4PJ1oNQZWFtY4ccedxHM5rG12NBoNyXSK9OYCt6Znue/+h9jZXEPvsrJyY4XgvhEEOcQW\\n15m4/17EfBaxVkSUgVAXWVqdY+yuO/iLP/wTHv7Q/RSVKmwOKxaLhUbz/XtAnX4Pm6sr+IMBtrc2\\ncThsqDU6lFoz2xs7zN6cJtjtxt5mR6PVo1BbMARPfiCY8/9fnMI/AN3AGBABvvL/dQNBEH5XEISr\\ngiBcTecrVBoqDh25g/mlLULbIW5cvU4g2IHWbEauVfOpTz3G1maY69fmyObKJOMl5lZWsLlsHD56\\ngMvvXqFVVdFmsTFzc4aN9WU6e3zIhSbTNxdJJytkimmmbrufQl1Gvtri0c9+jK7ePgb6R9CbbVTF\\nCgcnxnjongfp7e3G027n0G1TdA904+v04PXbGRnr5tDUbRya2Mv4aBd2sxyhVae3z8aXv/IUS1sF\\nfvqLi/zZn3+Dn/38F6yvbXP46L34fD3oLRIXzl3gnVevcnJyHIVUQyF/Pw2rUbx/LbdZbaBVgMms\\np1iqISFHrbJjc0q8+84Gu7s5zC4LQyNtWGxa7rv3AaK7W+yGtnC1DdJsqTGZfEQzeaaX1nG4daTS\\nCYwmLZeuJEhmW9jsTjY31zG0+/jUl/4L+48ewmyzIigEsqUUv3jxZ2j0AhqtDEFQ05CJVGplWqoa\\nQkNOLlNEJ8nRyGRUyhJSq4lR70AhNsmldnj7lWcwKnUoFQqkBpgNehRik3yySj4rkkpX2drcJJ2I\\nM3XkKFK1wp5uP4V4iEwihqzVxKRW8t6ZM2irYDQaSWXzXJqep7dziIce/hhiJo0gUzA4dpBYpsLM\\nrQWsBgt2q4MDIyM0c2Vef/lVTO42jDYL8qYCuUKHRqnjkcc+haBWUlDouH9ykkIxSYerHUEQ2I3H\\n2dyeY/PKm7SyIZ756Xe5fmuaZDxJqVQmG48T3tri5Inb8Qe7OXPuPGPHJ9lM7+LpamP22gVim0tM\\nX3ibWgWOHTuBTAkmg55MOs/BySE2VubxtjkwGOD1Z54jur3NO2+e5dqZC6yvbDKyf4KVi7M89jtf\\nwOTVIrQKyAUFxVKdldU5zBaB1956DafPw+LqGlqdnnK5TDIRIxTaIpWPceD4OO0eD8VMDo1MRmzt\\nf7Kl8P/G+1WQEUmS/uJXvFeBxyVJuvA/2r+r3Sx9/0ufx2Y3I9Hg9DMv0O51EU+EOH7kbs688zoy\\nRCrlJmq1GoVahtcXIJuuoFKokYQkgtTCZDDS2zPA3PwyhXKJweERmhUZPr8DlVJLoZhlZ2OX6zcu\\n4W/3YG9zY7Yb2Vib5+SxB5ArbZx55yX87UaMOjNqtZrN7S1cLhcr62vIRIGV5S0c3i6KhS00cjUK\\nrYm33r5MIi6iNWrJ5yso5SBTgtmixu00UKs16OnxoTcaaOTK9AaD2H1OLl5f5OnnLqPVKYil6qSL\\nEgpAqQSdXka10ULdkqNUi5SzAr/1UICjU0d49vnTqM0OFIoGRw4dRGoJlCp1tGYzerWFoT0BPCNH\\niW9t8p2v/RUqpcT0rSWWtyTUmjo6Nfzu7z5CJlPA5XJhNjnoH9lHaGedubkZenvaWVu8Sngrg0av\\no9pokM2lOH7sLtRqNdH4Fj/99jm++MWPYQtqaTW1KDRq4rEYFp2KhYUFxvYfQ6mVIaCgWsshoabD\\n10YTAZfHTWwnxHp8i75AH5fOvc2BsTGu35whGBxClEHA5WTx1iJd/YO88PLz9PUO4OvrIxeJILcZ\\n6e3rZmt1E5PNSluHn2tvnQGZjODgCGIxQ75cRYZE0OulWsuzuTxL39AIZ8+8x4GpAzSaIogiWr2d\\nWjGLILVoyUoks3nMmjau37rE4akJitkMdbmGYjKHxWEnnUvS3x2kWCwj1/lQKdU88f3v8nv/8vcJ\\nhUJYLG5MdivFaoXo2i5dPUFytTRnf/kag6P7kUlhAv5Rzp65wO0nxtiNpOnbN0KzXCUZiuL2B3nx\\nly/Q0zfMQEcnX/nyv2V0+BCj+w+RyOWw2rXotCpkgoLt7R38HUEMKhmh3W2a9Qa0Wjz/4os89plP\\n8/Y7F2hztjMysofnT/8Tn3v8lx/IUvi12Yf/JxIEwSNJUuRXw4eA2V+9Pw/8WBCErwLtQC9w+df+\\nhFzG3Mx7VKolJElk8vBR5HI5sViM1Y0L9PUGsDr9lEol0uk0DoeLSCRGuZhj7PhBIjEdYk1GtVIg\\nlSywuRHisX/xabZCO9yaf4+B4UcQG02uXbrIqVOn2A5v4PS1421rJ59O0O3t5oXnn8IXGECl0mFv\\n70Is5VhYXaZQbiEoyiwtbuDp8KM2Ociky6gUNuQqHbnkNmqZyH33DgAy6tUi12fCiLIWuWwNk05L\\nOFRha2MJh1ONRlUlmtzhD/b/IdXMGcZHnSyvpbHrJWoNUOtUVCoNcoUWSgVorWqkpkhnZ41IPEQk\\nvsFmuIgiUaKUl3jgwYc4f+Fdevp76B6YQKJOVYQf/c3f4na3sbW1hFFnwGyQoZFXUcpVlPJ1csks\\nJqOGaHiTbKmEr6cDwaSmv8tLOBwnm62ityoRmyImo4Ht3TBqtRaF0EJoyhBaBVAVySRbGM0WqtUa\\nOkvb/0ndewdbkl/3fZ9Ot7tvzu+Gl/O8mTdhdyfszg6wu8BisQBBAgRJMIi0JdOkKLpki5arRNMm\\nIVG0ZJdoizSDQYUyBcEgABIgMnaxGbs7GybthDfvTXg53Ptujp27/cfQLrvKJeIPygWc//pWV1fX\\nrT7fc37nfM/3MDTrdDsmnjBEcDVE0SUU0hgaFj0B7rx3g+TqPeaOL1FKjlIYm2e8sMr+1joPH18i\\nMzaH69j0Gw3iuRj1fp0f+/FnqdQb5NI6iZFl3vneKxQ0BbPVZmZmio0blxk0duh1DXRFJZkLk4iE\\ncQMI5afg8D4TsxPcvn6F5sZtjGOTZBeW0EMpbj73Fabm5tjYb9FvW1hBk1DW4Pj8cURBB9VH6dsc\\nX36Im3dXSEUTDNp9qq02py48xuH2Lj//yR/h3SvfI51Os7NRp3nT4t6t+zz1iQ/T7VeJRFWOLM3R\\nbTU5ffIYu5VD8sU8nhcF3eO7f/EcqXyWdLHE7t1Ndu5tcWTuCLutGr/yj/4ZbmeAI1p4tS5hOU9r\\nt8btjet85KM/ye72Pd65eYni2CwL89O8fekGP/8rv4IsCTz11JM06010XWVkdOr79++/LlMQBOHz\\nwBNAFqgCv/VX1yeBANgEfvn/AglBEH4D+DuAC/xXQRB8+697iVImFPzS0yMUi0VmZ2ep1AcIAQx6\\nHY4tHcU0fCzfJZ/PEwQB42MTXL9+HcvpIkoqzXqNxblJ3rt6jcW5edrtLnokSm/YIyQriLJEo9Fg\\ndmKK5174LufPf5CrV25w5FiZRCxKJJzEGPQZnyhxsF/j6vX36DTqxCNRGvUKyVSO/tCiPDFOdb/K\\nwsw0O7ubNGotnn76PEPD4LP//otkMined/5x1jbuE02NEAQBr718ha7pUm8N6Xfgg09Os3lvk4ju\\n8+SF87z4ykV2az59ScHoeMQ0EVkOUesMESQBCQnP8ZmbUOh1LbJ5lZ2KRVSFn/zUU5TLReqtIWfP\\nPc63vvNtZmeOUi5lWbn+DtPTk9xducN3XnyVoRlgWRZj5SzNVp1cNsX4aJ7jx5f50DM/zgsvvcjE\\nSIlUMsTFS29zeNgGRCLhEI7bZ/tgj2c/8EkcweL2rTu8+NU3+M/+3k9jBw6l/ARhVeKwb6ILNvfu\\n7rGwPINlO8SiOv1hD0dSOLJ0gvsbWzx64iFef+MttJDC0tEFItksjWaDja11GjvbzB9d5OJrbzA7\\nPcfEzATpiMatG9cezGBkCuxtbXD82ALv3dlktFhCVGQ826RT75ApTdDqHTI2NkemNMrh9gbddpVE\\nIk0mleTG1YsUshla9QGvvneFZ95/nma9RbVa47GnLjDoiCRzJRTVpFmpMOgNeWdlE6d+wM/+6t/m\\nysU3WZye5WBnG02PI6oRDFdhdmGa7337m5THxhl6IsV0jmpjm/L0BNXtA3L5AmHBZWjZ9C2Lbr9H\\nuTSDIPU53N+jVCoRTRW4ce0qEUUiGo2iRpNcvb1KRBoyOb6EFch8+5tf4cQjp4gmi2TyOTLJNEar\\ny9b6Lfp2n5CcwBh0KI2VuHXzNkLgoUV0FufmGT33n/7NZApBEPzM/8fP/+Y/cP/vAL/z1z33/2ma\\nFmX++AdIxFPUOl12d9eYmZ5AQKfV7rOytsLk1BwpN3jASrt2hZsrlznYafNLv/gpElqMlbv3+fAn\\nforP/OEfcXRhFkcIiMWSVA52sWyb8dFxvvP8d3ns8fO4vs3j7z9Do97nt//pZ5iaGyEV0xA8n1Qy\\ngxqTmJyeoNVoY5sekbBGIpEgkU5gmD43795nbKyIGs7w+T/7ImPjBc6efZS9gwOu3b3P6FiZdDKG\\n7/vMTue4vrJFNikRFjysfp2l+QwDR+C9ldtEoxFmYiq393s4uoArunR6QwQJEtEY9sAikYbDigcy\\ntLsSpg2pkMitq1e58e4a05Nlvnjz9/lvfvOf0e13CAKXvXWF9Xt3+PMvvchgGCDpHoEis7HVpteU\\n2VhvceNGi05PpDg5QWm8jCMr1I0m3f4BQkjHMFpoQZpYXGNGO0NI19AFjedeeJXRqWlGymME2Iii\\nzthTH2UxfIpm9Qr7tT/C9n18REzLIazHafYMIm5AWhAxLYtyKcfqvT0GWhxzr4KnhlicmONzzz1P\\nTBSZHBlnaf4I165cZHxujrGxSXq9Hq1eh/HJCaqVJicXj1KtVcnl89y/dxddDSG6A8JKQFj2+O6X\\nPsvpc+fo1mrocoKa02F07ijdVptEIckvX/gQ2zeu4UoDlh85y8vffIORmQKCZGB7MJZN0LUGzOaS\\nRGYnkGyJfCRLrdNm/+CQy+98hac+/CzFuZOsvfs24XiU3tBlenQUKxQg+gG+6+GYFtduXOMLn/kj\\nfvJnf5qFxeOM5gtcv/UWF578EWp7La5eXmVsXiEWjlKv15g+coytjV2sXo9wXEUOR2hV94lEwoQF\\nmalCls997t/zoQ99kGQyzezcDC++9hIf/ZFnsHs13n73NXbur3Hq5HESyTThkPJ9++MPBKPxf/4X\\n/8Onf/XnP4KqKriuy/zcGJ1Ok6vvXcI2LcanSoyNT+F5HrIscbC/y+OPP0alske700AJSZRHp2nW\\nmzzxxJNUDmpEY2nmZmYwDJtTp05QqdY4fuIYI4UStuXgew5j42PMzJYpl2Lk81ls1+dDz3wECZtk\\nPInrWpw79ziVgxpbWxvousax40tksqM0mm3ubdxncXEOx/UZmgMq1S75/AjhVIyDvQMCJ2ByZpL5\\nmTGWjswyO5MmpAZs7ezR7giYvkenN8TxfRzTpl53CUVkTNMHRCzbZHRUx/RdOpaIr3g0uw6yCJl8\\nnN2DDs1Ol5XVPfbqHUaLEp5jcu/uGlcuXWHl1ibmUELVBWzLAyTi8RhSyGJqQkYOfNZu1Lh66QZT\\n0zlCUsD6vbtokRRBICIEMqZhMDRFEuk8mqZy584dBl2TYj5Leaz4YNdlNIY5tNBVCdlps7W5Sq9r\\nsLO9x0h+BMcHAgFRDrj63hV63Q7rW1ucO3+ajKahZ2IM9ncZ1Js89bGPsLOzy+j4BLV6nYWjc7QP\\nt3j+G39JUpe5d3eVdqPK2TMPsb76HhHZx+s0ubO6SiYe4+7Na8iKQjiexncDDisHTC6M8+ZbbxGP\\nRQmHdZqNCoVckt17d9mt7GE5Fs1mg7HRApoeIqRo5NJFQvEUdmvI/fUVJkanqB5UGJ8ax3NdFpeO\\nkkomyGfTZPMJ8qNlSsUy8aiCJ5i8+b0XWV4+jh5SiSXiiGLAj/zMz+IFIpFUjkCSUEQPWdbZ3rjD\\n1PQkiWQW2xPIZTM0Gw1EWaA8Nc1oaRRdl5F9j7AeZeHIMaqtPk8++yOEtSjdQZNbN24yPjZKdXML\\ny2oRzcUYSSVxHJsTJ49zcHDA7/+77/zw0Jz/+H/9Xz69UJTIZtOsra4yUswwUhwlmcjTGzSIRCLs\\n7FRYX79PgM/YeJn+wGRg9ZmZnkJVH7QgQ4rMlavvoaoRGq0WtmdRGMlgWA6aHuONN17joF4hJOn4\\nQYBp9Xj30luIoo8sqxw7uozreNimQb8/IBzViUSiCKJIsTzCsDfAMm3u3LvL3btrlMsF4qkEfiAw\\nO7+Aa4HlGZiuSTqRRJQkJiZniISThEIa6XyKaCKM6QzZ3DTY3BsQCAK+5zKwAgYWSHKA54HjBAiB\\ngicEdNoCsxMJhv0h8UiUXBaazSGBLxONK4RjCtGIQqmUZnpuFsFXWVu9RbXS5bDaI/AdYhGFdFJC\\nFQIU38e1Nbp9i1gixK9/+r8mEk3zyNmHuX71XWRZpd3uEYsm8AMgkIjEEkQjOulUkpF8CT0cJl/M\\nEVI1PA/swYBh+4Dm4T6HhxUc26bb6zA69mBeJBRSSWfyzM4tMjpWZmdzk1QmRTyTZdBqs3P3Dv1m\\nnWQ8zltvXmRqbIxELIJjDanu7fHImYfpdrro4TCe6yJLEqosc2d1DUVW0TWNXDZFJKywubnOydNn\\n8JwATwiAgEg0RiIep1AsoaWibNy+jdXvoSgJzpx/FNd3mCiWMX2FRL7MwfYaoaiAKAnooTDXrt5m\\nbHySamWDSrXFrdurZNNJ9qsVwuEYl9+9hGmYBL5Np9PEdR3W769TLhbY2d2kUC6hKCFGCkUi8TSy\\nEsIY9CiOTWIZfW5cv0aAQ7lYAt+h3+kSi0ZwLINWq40S8tFCEroex7I9UoUCg1YbKQTOYEB5tEhI\\n0ciUMgSOgSDC6s3bTE5NE4klWd3Y5n//0it/MzTn/z8sABxHIZEocPnK58jkR4gnAiQtii/m0CMT\\nnDg1z1f+8i+YmV8mGo3Sbfc4dSxLd9BjdGKSL3/5y3z0ox9hZ/8NHjs/ycCxOHb8PG9873WOHJvi\\n1qWrIKuEozk29ncZKxexkTnx0OPsbVd4+KGzZNMRVtduIeoypcIUiqzzyqsvoesqh4eHjJUm6Zge\\nsXSSn//A00SjSSxrQDgc5hvf+AYPP3qWRr2K5zvsHexTq9UwDYveoE+tUeMjH/4I/SYcm1kkqe9z\\n9foe97Z7NNvgCiEM18bpBAQ+JOIqvmuheDBeCNFuthEEAXM4ZGq8hEwDc2jQrbukUnFsR+DNV9/m\\n5W+9jCYpGAIMDIjEdbzApda0UYcwGD74zxUxQFQhloSZsRFkTeON198hmiriOAaJdAIBhRBR4uEk\\nuUIeXdZwXY98XmdkJI+AwnDgACayLDPotQmFQpRzeQgkRksTBMgM+l16fQvH9bnwvvO89trLLB+f\\np7Gzwc133qF6sE6+mOTsiRP87u/8Y86dPUu7ukaj3aF62OXE6YfZ3a8Tj+fwe22MocWgNyQcDpMt\\nlDEDgdHREp7n0W4NiYd0Nt99g8rOAdmxNNF4Bk9PM7p8lOrqOpXdLVq1LsW5BZLRCJWdLTRR5Nf/\\n21/j7/4Xv4rgyLTqTbK5PN6gz/31FU48PIsoNlldvcvkVJELj57k3u01zOEQVdE4snicXGmUt958\\njUGnySsvv8Cv//bv8vxz32RufIxkbgwsmzdf/w6SkmLp2Cmm5hboHe4Q00Q+8OTj6PE4u7fvs7O7\\nztTcBO++9RwH1UMeOX0WOTLKO9euoqgyS0tL3HvzHSxPRlEUjj6yxkG8AAAgAElEQVTyEOt377G/\\ns8vM7CxG2yYUDzOSy7O3vUU2k+Ls8dnv2x9/IDKFz/zh73367/zUB/F8F2NoMDk1jSgq3L1zn6Wl\\nIyCI3Lp5iwuPv48rl68gCCK27ZAvFOgP2vi+xMTEJJub6+h6mKNHlxEEhV5nwP7uJrX6HrF4jNur\\ntymOz1LM5dk/2GdhYZmQqqJrYVZWb6OqIe7cuUO+PIZjizRbbSrVNj4SkUiM0fI48XgKSRHY2zug\\n2WwiCFBvNtB0jdn5JRzHYn19EzkUIp6MMjKSY3enyvz8PC+/+iqyItFqNxibGmPzfpX9XQNRCGM7\\nAo7rIosioVBARA+RTIrYPQ/PF9CzGrqsoGoeiujTrAzIZaJYjo3t2rS6Ju9//xSFVIl0NElhqsjq\\n3X1838VzHrQ5dU3HNl3iUQ1J8wGfcinDsRPzmI5C/fCQbq9Nb9AlHA5jmDYEENYiIAjoIR0CEVGU\\nsGwTURJxnQBflBABTxDwfQFZEvCRkGQZQZQYDAcMDItoNESzcUC5OEJI8DB7bTxcHjn7CLbp0Kr0\\n+NGf+HEOK1WarSanzz6OFlI4ODgAx6Zl9LAck++9/j2effYZKgdVLMvC9Vy+8+2vk0lnKZTHqFYr\\ntCoPNm+1BlWCICCWGcfsDpBDCTLpDAo+omBQr3aRdQ1JligXxkimMtxcvcejZx/j9VdeIJGME08k\\n8DwLc2gSjiXY3dsnmxthZ2eHQqHEzv4Ot26tMVIskh/JcWRpCUHwCASFx88/xt7mHVzbpT/oszA9\\nytrKHVKpDC+98jwTxQJ3bl7HNUxkSeK7332ZY0cX2dncYGZ6gonxCfZrNZqHFY4sznHxjTeIahHW\\n793n1NlzGIMBt+6sEsZHVXQqBxtsbK6zfOoY1y+/hWubhMMqhjHkX/7pCz88wq2TpVTw7/75r2JZ\\nFiFFo9nuo4VkBr0+0WiUqakp9g52+fznP8+Pf/LjOI7De1ev8amf/gnu3d1AFB8gpmPZnHvsDF/7\\nxjfQVZWJ8RJ/+m//FeFIhOnZBU6dPEfXNBH8gH6vQzqZIJPJUC6XWbl9g1g4whtvvEEsGcNxAiQp\\nIJ0qkk5nUUSBen2XQqHAwBiSzeSRJIWQGvD1r3+Dp576INt7VaJqiMN6FU0LIQowMTpGqVjm4sWL\\nFItFkqkYW1tbrNy4xrVLa9TaXVpdBycQ8SSBXsdDCCCZiqDgoskCtuNhugGOGTA5qdNt9Immwhxs\\nD4kkIaTIuL7Ax55aYqyQpdne480b+xxUujTboEohDMNGUUCRBSQpRCatIEh9SqMjPP3RTzEc9jHM\\nIaLoY1kemhbCNAboWoRkMkM8nMS2BPADPM9BCUmIkoLvA/iYjo2maQiCAB4EYoCqRTGtPrKiYhgG\\n5tAhlUogCh5HplMMOk0808USRERRpLJXYWGuyM7BPpXNPVx8kjGdnd0aH/2xj7Ozu0+xkEeLxjjY\\n3SOaimEODWr7h0xNTZPM5Hj37bc5dfoYO5t3abYb9BsGE2PjXLtxnfL4BLfv73DmwvtIp/Kk01la\\n/T7RRJjLb79FRJIpjo7gGEOatQ6JTIaN3Q3e9/hjfPHPvsT41DyV/V3OnDmN50Eml6NTraIk4tRq\\nVY6dPMv/8YV/zQcvPM3+5jae7OP7LoHtEwrLrNy8TyKuoSoaKysrPPvs02xt7TAzOYXlOmxtbTFS\\nLFCp7JLLZmkd1slm8wwDkXwqgSYLRFIpGu0W9YMaqUyane19lo8dodNqkEgk6BmHNGqHuE7AkcVF\\n3rt6DVkK4QseT/zdz/7wCLf+8R/83qc/eHaRdDpNrX7I5sZNNjfvcHvtGhOzMxiOSb3R5rFHz7F/\\nsEdxZISpqSnanQHJRJwbN97DcSwy2TSNZo/KQR1RkvG8gGQqw+TMNMdPHCcRT3LlyjXS6TQzszM0\\nmw1UTaPb6+IHHtt7u5RGx7l/dwNZkZmZmeP06VPE42F832R1bY2Z2VlKxXGazTZf/cuvMzE5RiKe\\nJhKJs7OzS7XRYmnpGL1On2w2T39o0u0PUfUo7e6ASvUA13E4fe4hjh1fJJXRSaRl7t1t02l6xMIK\\nmgqC4NHsOdQ6Pn3TY3o6Qq9l0h/aaLEMPaOLKMscHoIghRkYHpmMyJHFGUanlnjttZu0mxbhkIIe\\ngnjERw2B5wqEIzpDT+Ef/MNPc3T5NNXaFq5vooZCdDp9REHCc30yqRE0LYEkRCDwCQIXRJdAAFGU\\ncT0bWRHwAx9ZEhECAde0keUARRFwXQdZEQn8AAEZJaTR6nVxBRGra1KpthBDEp7TYiQbRZPDbG7e\\nQ5UkoqkciUSWymGL8ZkFRqenSabSPPftb5PJ5TFtGx+LZusQJXC4du0S5rDL7u4Bvc6QxYWjDHo2\\n+3sbeJLCyeWj7G7v8PGPf4LvfuvLzI5lsfoVAsfm1soKyWSS8ZkZvva1v+S17z7Pp37hP+GrX/4y\\nTzz+BCt33yEm6ZSKZc4++QF2dva5vbVFSICRsVG+9MWvcPntF7C6Dd7/oZ/g1ZdeYePeBkvLi3ie\\nj4BLoThCKqJTa7UpjBZ56OGHGB8tY5sDMsVRUpkM1d09itkiiVSKcrmMLEFuJI0mi2D3ETwby3DY\\nXt9kZCRDSJaQJQHL6HHp2ru8+vLLPHTsKWxTJZbIk0yGsTybeqtPpjjJn3zx+9No/MHIFIrp4G99\\nYJb3ve99ZDIZBj0PxzbJ5VNcuvIumdwIt29c5+zZM7i+h+3Z1GoPCpBz07Nsrq/T67fodttkMhkQ\\nFFqtFgI+D588SyB6VA8qDAcWR5aXcG0HyzS5s3aXer1JPB5nfHKMSCRMEEDgwdAcUC6X2dy4h2EM\\nEHmwiXlx4Rirq2tMTU2i6zpDo0sinqLRaOB6NoO+hWk7zM/P8yd/8ic8/YEPkMlkSKVSVOuHmIMH\\nZ+FWq8EjDz/Mb/3j3+T22gGdnk8gQDwi4jg+jheiZ7r4yMiBTLs/pJhPYvk29fqQh5YyDHtDpkez\\nDAdVfN9lcWaCDzxxASSfb774AvVKl+FwiO+DJAn0BiJaRKZvWPyT3/kNDmqHOI6FZ7i4nocohtB1\\nHSEAwxiQjKXx/YBYOI7rWUiShCQL+L4PgYjneYiihCRJOJ6LGtIJggBJFjDNAeFoAj/wEH2JQFCw\\nXQfLsXE8D1WSURSVkCKQiRpYQ4Neb8jU1Dh7G1scDg181yObz5CMp4gnEwx6fVKRCBu7BzhuwJnz\\n53AMky9+9k85c+4hGu0WI8US7XYbCYnjj5ykvbfJzn4dMXAe7FNwPQr5JLdvrVAojhCJpNCTaXxJ\\nIZUr0ul3UZWArdsbHP/EJ7j+/KsMa/u0ej1GCuMM2l3iuSxTR+aJp3TeeeF1jj32KIF5wNf+7HN8\\n7Gf/IdF4mM271zisNSiM5Hj7zVd59Nz7cG0HPxAZnShjmiZf/fOv8+QT50hkspi2S7vR5L1rV5ie\\nnULXdVRJZHX1NlPTE2TTGd544w0ef+w8W1tbHD2xTKvegUCm2aoTjccol8Z4+ZWvc/ToEVrtLkdP\\nvp/NzQ1yhTg793Y58mO/+cOTKfzh7//up//zT32UkXyBTruL6Q54/oXvUCyUKBWyqKpIKCSQyWZ5\\n5613UGSdE8unuLFyCdd+8DFXKnVcLyCiJwmFIhw9cpx0Ks97199DDSUpl8dYWJxjdWWdfr9Ju9vm\\n1EOnicV0RFFkfHwMwzQpFgtIkko4rJHJJuk0bVKpDLblPFg4g0tIUfFdj0hEw/clLl26RG84pFGr\\nYzoOihLCMIecP/8YrgduEBAIIpGwzurdVRaX5gmHdbZ2dvn8F76DYQXEEyH6wwBFFAnEEIcNi5Am\\nko4GxMMyuuJgdE1U0WGsUOT+dp3mwKFS7dLoetRb8PBD0yRTGS6+fY3jJ5cYKeQZG5vBFxSS2Syd\\n4YDjJ0/xc3/rZ6hW99FVGRDwXJFwOIrneXRaXRzTRpYFYjGdaCSKIAgEnoMiKfiBgO8H+L5PEAQI\\nAgiCiKpp2LaDJIiIkoymaYhI4IMv+ODbRDQdTdMISRIID6ZjXQ+i0TCO46HpEUTRZXxyCgnIpuJs\\nbO8gyx6Vg11iEY3s2BjrW9scWZpn9cZb7Gyv88jDp1m9f5diqYhp2cSjSfrGEGNgI3ki02ce4vIb\\nb2AMuwSBSbc7ZGpiHsOwSOSKmIMB8bBKq1sjFo8Si6bIFIv84W//U/LxOKcefgjLNJmbmSQ7Vsa0\\nB5iDDna9Tb/XJxZLc1ip8uKLrzIxs8TAdJiYPUpjb5dCfpRYIkejto9lQW/Y5+DgAFmQWJiZZf/w\\nkEQiiuP6JDM5knGZpeUTbG3vMjk5h2naHH/4HP1en4P9feaWT7B1cEjgudiuz+jUJL4Q4PkWX/nq\\nl3n6g8/i+xLrW7tk03Fqh3vcv3uL3fV1vvDS2g9PS/Jf/W9/8Omnzy3RH3RxXAuBgHNnz+A5NkpI\\nJZnM4LkBiUSUWFzHMDtEogqxSIZut8fU9CT9QZderwu4bGzdIZdLE4tFH4h1qhIBLpcvX+KJJz6I\\nYfWo1+vEIimarQaSqBCJRMjlc+zv75NIpjGMPi+99CJnz53DMHrEkwk2tjaoN5pMTU4gyRL1Zo14\\nKsleZR9ZkXj09Bl8D1RFwzINUok4oiDQanWYnprkzt07hLUokqhSqeyzcuM2V6/cR1VDWAMbURUQ\\nEXEcgVRUZDynUcpqJCIihuWgKCJBAJ1Wj1RUQwdkfGKqSEwPSGZCfPyTH+fJpz9ANn2Uw8MGTz/z\\nDOfOXGB2ZpZiKYEeVhAF8P2A3mCIMTRRQyrdboduo41tDrDsLqlkHE2NIIkCw+EAVdNwPR8pBOAT\\nkh4UEUOhEJIiI6shNFUhFJJRZJmQIiNLIqIgPZgr8DxEWUTAJ8BHFlxCmoIXeAwMC1nW0CMx0tkk\\na3fukcrkiEQT2A5UDqqMTs6RSI1wf6dCt2+RSCaRRYWwHmHoOSSTRaZnjmG7Pg4my2dP4/kereoh\\n63dX8R2PhSPHyBUmMYYDev19onGdysEBvgu7O3tEEwksw0aUVe5ubHNq+RjZRIrOwMBxAxrNDtVa\\nE1lSEEUFx/V4+9LbiILBjWvXKORGePSxR/Adk9rhIY45BN/DGPRZWJih0+6QLxTI5gooisrt1VXm\\nl+dwLRMRGTdwqVX2Wbm5wvbmFrlcnkw+T+WwRTQW4+Kbb6GrOiePn8J3TGzLotdp0W426A+HfPRj\\nP4rR79Nut/FdD4QBkYiM2XdJpZP8yZff+eEBhX/xP/7Op3/hJ59iZnqabDaN5ZosLC7iOC62bRJS\\nJeKJLLbjksmMsL1VpdMx2NvfJZ/PI0kilmUTjUbRtDinH3mMqak5Dms1QopCEAhUK3WefPJDvP76\\n67z19qssLh4hEk4wOztFrV4jlx/h1q3bjI1P0er0KJcLnDr9CPVGg1wxiRSS8V0BPRzDcmyqtTqm\\n4ZAvlLl86SqiqBAEIo7rPIhWyRi2a5LNp3BtA0kMyOSSJJMporEwuVSaTDrDtffexDV9AiHAtQJc\\nz6dvBviORCQcIsDFNBw0JYprmaQSIoEIouiTSOn0BjZeEOAGEueffIrlU2eQ0OkM14mnYrx58Rqu\\nP+B7r79MMh0mGtUxDQPH9Qh4MPbdadbpNht0u22M4SHxmEKxNInkq4QUhVhUQwvr9AdtRBEkARRZ\\nQQnpgEBEj2F7IoSSaIkCvqARyAqCoiH4Lo7VQZJ8ev0Ojm0Q+DYIPiFZQpIEPE/Csmx8IeB7L78K\\ngoCqqw+AO51keXkexx7S6jRRfZfjS7Os3ryKik+zcUg+XyCqC7h2l3ajTiKSpXnYxjD6GIMBkXQK\\nHQFBFtitV3n+O6/wyNmn8FHxvADTdIgmUtiuRa9nE4+kCHyfgTmgZ/bpDQzubqxz6qGHQJKQpDDl\\n8gSRZIy5hXkEAU6eepTFxWV+73/653z1y19if+8un/iJn+LenetEVJ/XX7nK+HSZ7zz/EqNjU9iW\\nSyIR4/KlawwNCdeV2dk+4OyZ07z71psUcjlq1UNu3LhNv1dH0pMsnzyDNWjhWn0iiThB4CIHIbrt\\nDpNTU7z+2utcvXyFcrlEKpnk5RdfJhZJY5k+ggT/+ivv/vDUFI7NTwSf+Sd/G0mSMPoD1EiMWCxC\\nJKyzt71DMpHh5p3raKEEtmOSSsVRVQ3BlUkmo/i+z16lyki+SDKto8ohyuUSb737Jgc7uzx06gwH\\nBxXSqTzhaISVlSuUiyU0Pc733niZZz/2DGsra6TSJXxPBt+kXB5ja3eHbKGIYQxoNKtMTz8YEjX7\\nfXzXIRmPs7GzgjnsMzk+Qd92EYMQCA8cWiRg/7CKLmmIskChXMQNZEDE6A84rNzj137tHxO4oKky\\ngeMSjQkEBEyWi1hWB8dXESSRfq+JZ0fwAwvH9pEVAQEXHFBDEATwi3//71EcLREEENU0tjZ3GZ2a\\nw7JMVlauYgwqANiWi+2IOJ6HJEk0DnbxEXEMk6nJGY4dO47r+YTDYRzXxnEswuEoBBKiEMK0+mgR\\njSAICIfDDIc2khwiO3USIZojWT6FSYe4rLKz9jbGzgqB06ZZr+H7PqqqEvigqioEAU4g4gcitusT\\njYYpFoskkll2t/eYPzrHvZs3uHHrJvFkisUjyxRKI+zsbnJ3dY2xUpFAkrm/tkp/0OXJJ9/Pu1cu\\nc2p5md1Gk4l0ln/72c8QF2SGtsXM0ZNMlucYmykhBR4DVYeOiRqJ0XEHlNM5Ov0evm/j9trsrG9y\\n+/46eiRG4Hk8cuEc5dIY91ZWKY2X2KtWyMXy9DoN2u3mX3XLJqk1mw9o+wcVlpaO8talS5iDOlPT\\nC9y5t87a2hpnHznFoNNnc/c+ihYwv3iWcDxFt7bF4uw0B7UWXVvE67dp1+u87/HzDGwbQZS49NYL\\nXHjiMQLPp3rYJFsawzQdsokM1cM9FEXGcRyisSTdzpD9w20e+7l/+R9vSvJv2h6QlxwikQgbGxuM\\nqRpbG5sE+BQLJd67eYOxsVEsM2DYtxjJjZDKxti8X6HVbZFOxpidLuN5Ht1Oj2Zjl3Z7D8EX+MCH\\nfxTDMDgxOoZlGTRbNSZmZolEIgh+wLGjC8i+hxqKEIlEMAwLx3Kx7D7l0TyOa9PvNrnw+ONcefci\\nnudRKhTYOthie3NIWI8jCzq721Vmjs5z7eoNHn749IPI225TLhTY3t6hUW8hCjITs5M4joOvSbz+\\n+uvEwjK9rouquqgRmSAIkGWZw2obLRah1uii6zqZRJFhb0C7BZIq4DoBmq5heTaKGEJTJRLxHP2u\\nz9zMJJ41ZKQwSkiRsW0b17VwHIt+f0gkHMOyDHq9AblcDi0cpVAo4Psi8zOzCKJEr90hU8jj9fsM\\nugNi0RCe52CabZLpFEPTod0aIIlhLNvDtYckjBaaYLN/a490YoRDp013c4VOZYdAFtHCYTzHRdFC\\nWKaN53kIwoM9HIHvIysBiWQcTVfpG00KYyme/9a3CUk2ne4hm1trPPbEE6zdW2NqYhSjP2Bhbpb3\\n7qxx5vyj+JaDCCQjCTIjEyTSE9iBw8//wt9H9l1u3LpNvpyjUt1DrmnUDYvpbBE3sBFMC8O0qHtV\\n4vE4ricQzYyiaWnOnnmC/doBputw8aVXmJicJZZOYvQHSJZLJKeSiI2ydOQY2zvreJ7HzOQUm9sV\\nZE1nc3udyclxjH6M7EieVq/P7Ows0bAKcojT73sW0+jRb+9y99YaI6UM9XoLz/MYL+YxhjqKENBo\\nNPjm898hl8uxtPQwd++12dvdYnd3k3Aiwvr9bRRBRJIkFhcXEWWJfD7PjRs3cD3j+/bHHwhQIAgY\\nHx+n1+uxvLzMcNgnlY7iui7V6gOCiu0IIHjMzo2D71A7qGBYQwojWQLfw7IccrkchmES1kcJfJ/R\\n0QJvv30R0zQ5ceIEnXaDXC5Ds2ujZ6KsrKwwVizheSobG5uMFEeRJAEhJPP22xd55plnQZSIRaI0\\nDg+RUEjE49y8foNsNs1es46YU7l14xapZIJQWObRc6dZXb3D7OQErjVA00IsLy/TbLSxbZvqQYVo\\nLIIgumxv7jAcuMRjIaJRBQmBdrtPJBJDkWT6Zo9SKU9YU9nf2yCfLhH4Eu1eF8P0aHY9ZAUIfAYD\\nk+HQYOHIUbrdLjIBkUiUfDaJ5/SpVfdQ1YBIJIIYQL/bo1QeJZvOEYtEmZiYQJYfdB8UUcJxbe7f\\nv09xpEQslqDb7ZJJJ7FMA8uykOUQgiQyMIaIoogsqtjDHoNWA3vgYOv3MBFJhMLsOwKaAIEvYdoG\\ntuthGUMikQiaEkKSBBAhcD06zQblQonACbG9tc/pc4/iOD2OnjrF1auXabTqpDMpTNMEfOr1OmE1\\nxPXr1xkrlygVCrz0yss0ej2mJ2YZXZwnXxxlu1rliY99kjt3b5KTJHoDkXJ+jJbVJ6crZCfHcQ8a\\niF6fVqPN/Y17HFtaJpFMsbJ6m7V7dwmCgI9+9Mf46je/RePyFZ599mnC4Si3bl4nnowRCSfQYyrP\\nPf9tIuE4Tz3zMWzb4oVvf414Mk0mFcewA+bnlrhy+V2KIxkGhkNI7yFJEtNjk7z83PeIxKI0u31U\\nVaVRv838kQXOPH6Br37pz/npn/k5rl+/TjQVI+jBxMw05y+cw/FEjKGFZQywDJNkOoMbuFSrFS5c\\nOM+ly/9BSZP/l/1AHB+Ozo8Hn/u9f4CuRRAEgdWbt+j1eiwsLNBoHjI+PoppeAyGbTRN4/69TWLx\\nMLFwjFQyDn7AYDCg2+0ST6TQoyFkJcLe3gGnThyjXm+g6jqWPaTXb3F86WF2d3cZ9Dtksnnu3l8n\\nHk8Si+j4vkMqkabdadLr9SiVyxiGQTQap9PpICLQajYZDAZIosxe9YB0Oo2uqpTKBSoHh8zNzXF4\\nsIXnucwtLnLz1hr5fB5ZeuBIvuDztS99lue/9TqeHDyIlIGAHEh0uzbxuEw8rCKIYBoP+Be9ThtJ\\nUGg2TeIJFd+z8YIQmwc+qmoykpP4jf/ut/D9CAvHF6jv7FNpdFkYT/DCy99GUXWSuTL1WpXDyh6Z\\nRJyJ0Rk8X6Tda+P7PuVyEcOwkEMKnuMSUUMggu3ZFIplEBUiahjXd3BdF1mWEUURX5ToN5tYtovt\\nueSyBfZrW4Q8FUd0CCsCgSPi8qAwCT7DXvOvjg8CBAGe72M7Fru1LjOLy8TTWVzbpjReQnYDbNfB\\nEQQ8x8M3DW5cf5dzTzxBo1Ynn85QqVQIhzU6jQcpfNc0yU5OsXHpJolUlPWtdS6cf5R7d+4zPrNI\\np12nfGyGL/zhv+GxRx8lUSzRqB0SUUQOq3Wm52a5ef0qIVkhGtLw8VAiEfrdProiIasykzMLrK2s\\n0K/VGR0f46BW5+7GTWYmZ5gaXyTApNfvYroWCw+dobp+H1nRsR2PRCKGFhJ59+3rlMojWEaXG+/d\\n4MLTHwTrQWH28LDC0omjfOMv/hxdkxEkkZHROXL5EuGIzEG1QkgRMXpdRkoj9HoDDg+7ZLJ5Dg/r\\n5HIZUsk4uqbQrK8z9uR//x9Vju1v1DzXpdcdEA5HGQwMBAmmZ2dod3ocPXr0ATtQCjANh3gsw8LC\\nETxXxCMgABqtJuWxUVKZNHJIQRJVbNNE13V6vT7D4QDPsTEHFp4l8uZb76CG9AdAMxgSi8XQNI2d\\n/V0GgwGe51M/bDDoG+xsrxP4Nol4mL2dXW7deABY3d4QQdQo5PNoisTiwgKhUIhA4EFrr2eAqLK7\\nd4BtDzH7HfzARg6pZNIFIrEEohbCMQNUUcPqBSiayvhUgUw2zsC0MIYmuhrGc0UCJBzXRA+H8DyP\\nvu3hEDA6ESYWhZAMzUaPaFyg02/gEXDkoXmUSJR2u0ssopOMJxAEH1HyEJUwkhoiEtGZnhwjqocJ\\nAoFsKk21UmN8ZpSRsTFy+QSRcBxbCOM5JoYxwHdEhMBHUESGjovrBcjhEPFYjpFMCXtQo5wrMlJI\\noUbCiL6KrygkIyE0yUMWXPRIFFUPI6sh9HiEUFjFF0BHISRrxFJp1HCIZnXAne0NGsMBLhBJJdAz\\nGcqTi1iGSyyRxHYdRFlCkjX05AhKOEYgKLg9i2wxQy6XoTRa5q2Ll0jEc+xWKuxWWnzzC3/Oj37i\\n42ysr2HU91ElmZ4xZGZxlkHvgYDLsVNHSaUTFItFfM9htFAgn0sxksvhega5kSzhZJxbq7cJfJdT\\nJ8+wub3HxtZ9PMcmlUpx8eI77K1tENXDNGqH6JrMwe4O3VabeEzl8LBCeWKK1EiR3fu76HoUJaQz\\nM3uEyu4hiWya6YVZTNslk0rgugM8zyMTS5CMJKhWGphWQCY9wkNnHmVydp7yxASV2iGtTpvhcIjr\\n/LWi6v+3/UBkCkuzo8E/+sVnSKUyTE5OUqvusrZ2jzOnz3H/3ipHjy1ycFBlbHSS3d1dMpkcyWSS\\nnZ0NUok48ViEnd1dMpkMnhfQ7/cB0LUouq4zGHYfRHZJwvMcpibnaTQaBKJEYaSE4zhUqtt4nsfO\\nzg7HTxwlm8ji+h6Bb7G+vo5pW8iSTqPeYrRcZv+giq5HyKQjuI6Dpmns7u88WFwzMcFwOKRWq9Fs\\nN1g+Os/N69dYXFqmWusS+AKeb/Lm6xd57hsvIUoQDisoWohuZ0BYkxAAHJV4zEdRFAJHodHroWpg\\nmT6xuIYcBqMXxTAa4MHHP/ljnHr4IQJJYbyY4/rde2QVkede/BrZTIxmvYocWAiuhSBEcAObABEt\\nnUWUVAYdg8D1EWUNy4eJdAxXNlAiKgl5lsjIJmpoBFlIkshMgBSgDNpIkRByVKffEdA0jba5QzKU\\nQJQiKKqMJ6oIzgDBk7EshyB4cIyx7QcgaQkepmPi+h6Hu00CRef4wxcIbBc97KBFC/iBTbfXY3P9\\nkOXlZba371HKpxn0uyiqzLA/IEDGDUIkk0nqtR0UMUDXYmjJJI2Bx9j4GE67jhCYGEYbTBklLLK1\\nvsPMzBx79R00LYwoilx66y2Ozc8RDusMBkPub2wxOTXD6Km9k+AAACAASURBVEieS9cvE9Z1MhGV\\nW7duMbe4gNE3yKRziJpGPBrD9xz2d3YpFkeJpKJUKtvY/T4bO3uEFI2zp09zWDlAUjUarTaaGqbd\\nbjM3O8n+foX4XwkPy7JIoZijVavQbLaZXzpGf2AgS9BpNBkZybG2tkYkEkNVFRxkXnjlVY4cXaJc\\nHMV3bDLpOMNOg+kP//r3lSn8QIDCRCkb/MFv/hLRsIaiKKTTWTzPIZ4IU6tWSCaz7B3s0uv1mJ2Z\\nZ29vD1mW8QWRQiZHZX+XU488TKVaRQhEAhwEIQAhwDQs8iM5GvUmjXqPpaUlrl5+m3gqSSaXpt1u\\nEQ6HEUWFdCqLoujsV+6iSCGymRzW0EIQJPRIjEa7guM4bN/fQVEUUqkUK7ff48L7n+D69ZvMTk+Q\\ny+W5efMmJ04cJyQrrK6tkE5kOTg4oDUYcPr0WZq1Ovlinncvfpc//v3PEI8nkUMSB3t1wmEZXYvh\\nORa262MNTUKKjKbJjBTjDNt9BEUkEsTwVZ+dukSvs09MVfjF//KXCWkJZo8t8O4b1zh59hyVe2/x\\n9uvfIpmMo8oZfLOJ0a6ghsJYvoOohSiN/5/UvWmwJOld3vvLPStr36tOnf2c3rdZuqdn00gWkrUA\\nlkDAlYwxQmDg+prt2uz4YrYAEVi+EtcyyMZCQmiEAGGBJNAyo9k0mp7pZaan9z6nz177XpWVlfv9\\ncJoIriOM5kbwQbwRFfnGm5lv1of8P/kuz/P8TxEzsjjTMUEw4Yuf+zL/+sd/FjVTwBl4ZDNdtjd6\\nyM5NCoUHOP/En2G6Lmff81OM+lMisoVv9TGSERrtIf40Rij7GHGFYNxEDHyEWJJkvIwq7wucNMnD\\ntMb4voMi79Om7YmFFE+SLs0TycwheMH+QqztkUwWEQSBm7fOk0qlSGWLCJKG63j4nsS1Vy+wuLiI\\nGk2wW6tyYHGVnfo2uZkShXSeYWuPSCxKo1Ejl0ri+yGuB53tKtHKArJmkNN1xuM2vu+jRFSm0ymW\\nOaE0U6bZqKGqMoIE5ghSmRzxVIRxv4cgyViWhSRJJDNlLHNMr9tk1N4ml00hhmAOJgxHHcqVGWq1\\nGqVynmG3Qb68iiLvLwYHkkZ7YDEzM0MQBPu7ceYEKRpFFkUcs4cqhrh3tTDmaMjcXIVWu0E6HkES\\nQi6//AqKpu9LsEMP17XRdZ0AgfyZH/rHs/tgGAblmTyqLFLM51jfWCcIBACiRppOp0G322U8HtPr\\nd8hkU5w7d47l1YMkU1FsK0m1vke32+XA6iFu3rzJgQOHqFbrKIpKEEJ/2Kc4k0dSPOaXFrFtm2a9\\nQzSWACFCrbpHsVDBMAzSqSy7Oy2i0YBkPAkEdAcdbt64TiIRQxBCAt8l8G0OHz3G+YsXyOUKaJrG\\n1evXmZmdpdlscv3aFe655yRGJEFkYHLi9IPISkB/CHIoUd+rI0kStuNhmiMMTSSd0jFHFqlEEphi\\nquCHIbqhsVkb4Q4tMuUUg16NqSjz1eseh+dUZCkgaqQozszhTQOKBQM/bDPp+UxaIkndYPG+gwz6\\nTRqhSK5yBEFW8AOH5bmD/OLP/jqf+syHWdvepdr9Mj/z63/C/MECTmPMaekF8r6EkNV42fofLNz3\\nVoqLB7l08Qmy+TzthsnnPvZXlA/NMHfgCH1PpzRbINJvksgkEQyBmCjhOQOsiUPICGIGsVia4cAk\\nb9iIXp9a6wbR2Bk8y8QNIV9I444tovEUtU6NeDxKZe4QkiDSbnRQZJFWp83S8gFKuRSGKjI2B5QL\\nRSw/YGnpGM6ow9qNV5jJHcYae0SjRQJP4MmvPsl9950mMb+MhI/sTxj7IoKm4Q7HJCIGrWaX+dk5\\n6q1NdE1m2OugprLMLM/QqbdoXNtEkERmy3N0Gg1m52bodZrgucR0ldjcYaKxCD4BSnpKPFhi6oSs\\nnj6OqqoEkT0C2cfxbQJVQzYyzERDVHV/itjv95EEEXfQIZ4pIMcS9No9PNsnVyyRzuTxA5d8pcKg\\nY+J7U5ZPnUWL6FiWTVQF15miKyL9Vu3vC8H/T/mmWFMIwwDPnxKKAYPRAEXZR+krV64gqwrDscni\\n4iJv/Ja3c+z4/eiGwdmHHmJhaRnL92mNTRZXD5AtFmh020STWaaOhx7VabfbjEYjEokk0XiaOzsN\\nND3G2LRZPXQMy/YYj01WDy3T7Y+49MorOLZEJBLl3AvnubO9w3BiIasKc7OLKLJBpVIhCAU8JPrD\\nEY88/Hpy+VlQEuRLKxixGdpdn4UD9xPNLnJjc5ubG1sEXsh4NMV1oN6rs75dxUhm8AUB2wmJxtOM\\nhy6OM6Vn9hhZIlokRTyRoteZkFQM8oUY86U4K0cqxOJxXn88i9dz8EyfmzfXmSnPM7dykFQ8RTjy\\nqG5uMxnbrF1Z5+pzF0kKMvZkxNFKis6rm/zK//lRrr70BX7qZ97OG+95H/OzCSZjkdiRxxByZ1Du\\n/yGq/kkstcgXz6ucfudP8+fnvs5f/9Wf8NDqIW49/zzls9/Lv/zNx/muH/8wj7zvV5DXniVm3qYo\\nqchpj6w1QXdF7NaryFMTZxLS2Khhrk1JeBk6X/kQ3Wsv4Lse5s5V8ASKxTySlKY78ZnYDjO5Gdyx\\ngxiNgxFBkjUMI8fq6kkikQiuIONKBkaysJ/klZDt29eJRDRmK0tY/hhNV7AnfWw/5HUPfwvlUgbf\\n6pBOxNEicezRYF8Or6oEnk8qGaM/6lIqr5DKLeGKCWJ6jlFvQjKVYyYfJyq6mFafdCZOJKKRMWAy\\nbLC9cZPptE5t+yb9+jZ64OP3dnB7W9iDHezRLp7TxQk07MAHwWfY3MAcjeh2mjQbuxRn8kzMIYEg\\nIwY+k+Yek+YawWATs72D2d7FH3e48vXnUYM+om9hdncxBA9dnOJZDuO+yXho4vjqa47HbwpQ8H2P\\nZLzE1kaLQX9KJBJlpjzL6uoy169fJZPOISLR6zSwpiaSorKwtIjtjUgkYqRTOap7LZKJAun0vhRa\\nNwxcL+Dk6bP0egPi0Rg3Ll9i6/Z1NF0in8+yfvs6qhyyOF+m12kSeDb5bBpZlkmkY5x95AEajRay\\nLEMQousqvV6Hre07uN6Ea1cuUa/WuHLlCoZh4DgORlxF00WWDyzgY6NqAqVyhdf9kzeALHHzxh0q\\n5QVGQxvbcrDtIYJgoxoCtjfBC2HqgCrHcYKARqfBjVsNQkFG0fe5Bq1aH9O06TZ7OKMh2SToSsj8\\n8ixqNAKOS6/TIAwcLM/CmvTwJhOGzSovfO0puq1dPvIfP8Uf/sEXKUrw/t9+lp0re/zEL7yR/+t7\\nf41iKiSazuBkcoQJi/jyW2lf2OHIA2dANFlOVUg6FdrT47znRz7As7/7m9idL/GXf/CTeDcvsmMX\\nOHvPQ9jOGOfpc2wMXMrLca6f28SyNQonjnDi9CmMUzqL98/QNwNmyhkKygRBkMjkS9y5uc64NyJf\\nKCAEIXe21lEiGpFQJiIaqIk003DCXvUOU3PEkcOHKWQz1PZ2wXPY29nYBxZVoz8wSaazaHEDEQEC\\nG8cdsFer4nkOnUGbZqtKrb5DLpcjnkhQa7WRZQXP8TCHHfqdXeKGQOj3cIZVbl3+GtWNm7RrW0wG\\nLWQ8rr5ygcGgSzqdQJLAcwR8H4r5Aq12nWkQIOsGkqgxaA0pxJNU188jTgNEW6aQL2GN2wiBiTls\\nsX7lIvGIjO8OEEKTdDrFyuETaNkinfYWRlQixGF+aZHe0KZQnEMG/uoz/4P2bgPfs8lm4mxvb5PN\\nJV5zPH5TgIIgiIjylOMnl+l0a+xs79Hv9xkMBliWRT6fR9NjOMGUemMbEHn22a8TWiLOxKKUT3Dj\\n6kWee/pLdJs16rubSKKP7zlUd7Y4fPgA63duUSqVmKtUqO1VsSYTTp24h0w6zaWL55mYI+zphG6n\\ngSKDa5tMxl3uu+8kWxvbtFodzp+/SBgKaJEYqhJhrjLHAw+c5sTxo1x48XmiOkz6XV469zUGvQ5H\\nVg/S3KuhSDKlQonx2KQyWyKRjNFuNbi9dpNEPIWERCqeYjS0mNoe8WRyn9gTuJRyKQ4uG6SSPqOh\\nSTZTYuJ4bFe7SIpMiEs2pzE3N0syFuPKqxcQNIV4KoeqRamvrRFYLnPFJGnDopKJciA/S2erz5wO\\nb34swZwW5+O//zx/8dsvkY/0+I6HTsDNz1Lur7HiTJgaUFJFdGmBq5sTNgYqqw8/SDUx5ssv/CUz\\nh+9ldeExYpbO0x/7TX7ydz7CeO8ZJvXLzD/2Zh47eZQvfPDnedN7/xXrV67zqV/8AA8/8GO87rH/\\nwH/54F+y+tA/5WbLYdzsI+Lh2xPmDhzESObY6gyo15tkM3ls2+bmzavs7e0SeD6TyZRcocxwbLO5\\nscHa+i1WV5YYDscUCgUGgwHVvQaqHsEXHNbWXmY4HNDq7GIHbYxIgkKhhDkcQeAyX8njTMaMhn1S\\nmSSCJNJudRkPR7SbLeJRA2FqEo/HOHHvSUJNI1cqUr1zizs3XmUmn0JVJHrdNpl0mlw6RiYdo91t\\no+oKk7FNzMhhGBliiQy31tY5cuQEyaRGgAmhSyyqsn3nNt1mjdB1mI7H2JZFu77HlctXkNQIYSCi\\nxlNcv30bH4CAeDxOc6+KG0p867vezdzBe2j0W9hBwKGj9+KF/8jIS2Eo0Gr3EIWQlZVlbl5f3xdH\\nuTb33nuMwLW4desm+UIWXVexzR4zhRT5QhLbdmg2quSyKWZOHKXTbBGLxblx+SKCJNNqtZCFFeZn\\ni/i+jzucIsoymUyC3qCOMx0wP1vEdqHT6bC8uoRlj0gmk0DI1LKYKeWIRXXGwy7zc4sMzCn1ao1G\\ndY98oURE00gno+xtbdHudjh06AACFq36BkIQEFOTXL14iYHZ4eChFdqdXWp7m3iOS6892FcLugGq\\nEiEa0/FdD0FwkMIAAhXPD7AsF13N4boDFMEiFYuxuTEmllHoD23qtV1evnCR7/v+93Lh2We49/S9\\n1DbukJ/VAIH6Tp9KzkZ3O3iTkPbAAVXnTk3mzKkpjzwSpTXqkxeTxNQrnPHG5NY3uO+xFJ9+ZZb6\\nD/0yWudrvO7kG4lKY5SRxyli1IYWwvQ5bj3zRR79Z7/Mxp3neP6v/huFmbewq72CNjxO/87nmBkF\\n/Mp3v4/Pb8JHPvmLfPcvxMirGt29F/HGHrEEuI6JIEsEokC/sYeacFnOx7HjcWQlZO3ObYqVWXTd\\nQBBdGvXGvtcAIrNLB3BdF8uy8EOQxBDfs8jlU/S7Tc49c5m3vvWt1Ks1cukcg/GI4XCIqpdI6Aad\\nTothKDC7NL/v9OSZRNQYlZkSmuKgIGD2WkxEg4W5LDcvnOdQuczla9cpVma5ePE8U2eKruvkCwV0\\nQ8UcDUhlcrxw7iUOrSwTjyVQDRXTmRAKPqIo0mx1SaeTpLIlTNNkYrkcOHYvg/4ITdtX8K4UC2xv\\n3OHoyVP0eyMkMUIykSCRynJr8xbHD5/Gw8FlTClaZDoc4vkSBw8cpdWsUmvV2dm59Zrj8ZtipDCd\\nWljmBE1WMCIa0ZiIEVVIJhNY1ghF9VlZqXD40CqyJBI1DDzHY2d3k9FoQK1aJ6LvLwwlEjE0PWR2\\ndhZZ1jl69DiaFkEQZMyxTak0Q21vlwsvvcDu1jZhGGI7FplMinw+y2RsYk+nXLt2lbXbN7HNAV9/\\n/lmefu5J5mbLvHL5EhNriGl18cMJ1doOw+GAlZVlFpYWWVxcYmuziqoZuG7A7MI8n/jj/04qo3Pw\\n4Cpbm9tsb+2wtDi/736sBOTzWQaDKYmUwnDQw3ZHeK6AYUQZT0YIskIQimhGD00NiMV0NrfHeBJI\\nioqqRZBVEU2HSy9fI51Oc/3qq9xeu8J42CKfjuM5I7yJiTt06TVcYgZMnCnnbvRYrUzodSREUaBb\\nl/iT3+sgvBQQT2t88s82KHoXKV3+KI8sKOxMXkEJpvQ3/xrz6q+Sja7x1FWf+NL3kZQsXv3zT5EU\\n58kun6T9F09w86feSev8x/FzMj/xzxQ+8r4UC5GA0eP/nkt/8jPk88ewrjxFbDwmGT2IJKjs1JpE\\nVIOx7zCZTIgmImzfWScZjZKOJ4goKoasMlNYoFBcJFeaxZm6CH5AY2eHdGz/i70wX2Jnc41Kuchj\\nr3+IsdlDj6qMnDFKVMPQVPBsNje2UFWdmK4zbHdxLQfDMNjtdfAiEXwtiRgvo6TLzB04iW2rlGYP\\n8+dPPc/529vk5pZ553vey+LqSYozR5h6cSaOgapnMKcC9555hEAwSGVn8UMZJxBQ9DipdBlFTTAc\\ni4wnMoKaoDCzSKc7JpnOkc7lkTWdzTs7SIqKH8JwZCIoMo4LqXSSM2fu5/FPfoTO9ibecMB42KHZ\\n2UaLC2yt30IRBaKGxqMPP/ia4/GbAhSiMYOZUoFup83WnQ1UXSPwwbIDavUerVYX3/dpt9t4nke3\\n06NYLNLtdrEsm3w+j6rqRKNRarUaw+GQ9fV1et0Bw+GQfn+IObYBkXarx8njJzhz5gz9fp+trR08\\nx2d7ZwtZFhFFEEXIZFOUSiVs2+bo0aPMzS7Q7Q2YmFOcqYUQ+KRSCY4dPYzn2qTTaXZ3d4lGoyST\\nSbY29zAiCZ766rO88x3fyfbWLrKss7R0AM8Dz/MIggBd1+n22uSyMUajIYmEgSgK+L7PxJySzubo\\ndkakUilEKUTTRXarUwRFQZBEBgMTRVEoFAoUijkOHFxit1onEo2iROKMxx5RRSeuhYQhSKIOIiRJ\\nYTgKJSMkGpORGZNNwPb5ASVJIxBdnjhv8SvnBb6wqfLBz21x7fNfImpHKBSi5A7FCHG48VKVb/2h\\nn+HWs1d5/Jd+g+1mj5nXn0HobqNX77BU1sjgkko9goUEssne1asUox7S1GOzcQPXDzHkGKJvo4o+\\nmUQU1w3QRA1ZVuk2GsRiMcbjMaZpEgQO7VadmzevMRh2cb0Jg36D4aDN4cPLyHLA7u4urVZr/wUL\\nAjq1GuNuFzkM8ewpqiihpwzGzpBELoYalTDNFnJogz8lElGYLWWRxABClSCIYJoB3WoV3wlIzi/x\\nrh/5P3jsLW8nmkyxu1tF1Q3K87PMzM/iBB7t7pBoLMnuXp29WhNrug9yvU6L4XCIbkTRojqSIrJV\\n3cb2HK5du0apVMJ1Xbrd/WTvhUKBRCK1v6MQj2NPXXL5OO3WAHMU8KY3vZVoMkE6n0OWBPKZJGav\\nA+xv9WZzBSb25DXH4zcFKEwtm2anTTyVRYkm8UOZytwis7OzFEvL6LE5IvEEjU6HZqePrEcxoknm\\n55ZIJBJU5io4/ph0LkppYZFaY0IqV6C0mGf1wBK9fp/lg4fRjSSpbIaR7aBEkyhGlvsefBQrhPLs\\nCtFkjsnUwjAy3Li2wcZGg85wguUHFMvzrG/skspm2Kvu0huMKZUWeebp59B1nUG/TSafI0BkbnEO\\nQQnIFtIomooSEcnmEuxu3eCpr/wNy/MzNBsdPM+jWR0ihRF6rTGjroQ78UhocQQhZOLamGMHw9AJ\\nvZB226dW9dBUnYjskkkIlDI6huoR+HUWVw7z5LNfYWJ2SEfT1Otdertd0hGZaLbInTs6T10I+cqr\\nEEl4vOX1Cv/87fDhT0h84I8CPvHHGs2uz9D3eW4k8uCCwNe/R+bH73d4KB7ytRdd/ttP/XfCa7fZ\\n/E+3SZ58P41hmhv/7ue48Bdf4gc+8bs8eOxRLv/ov+b2L/0MJ8+exnEK/OQfqxx7/1eodW1cpYBv\\nbmIHENMheOnTyFJIu3UBp7PL1AtJGFFUVSajaNhOgKRKGKkUiVQaZB/Xs0GM8MCDr0MRBXAcVFHG\\nMicIosxuZ0wkPUsg7jMRb6xfJ5B0lGgWwcjiCRJjL0DVk4zHE1LlRRzJwPHh+p2bKAmFoTVFkaNI\\ngYafyuMGIpViEb9bxTZb7Fy+hNsYUMzP0OuOSSSzuB4IgYg9GpHQFOZnZ+k0GqQSSfKVCkY8AoHN\\nZNBCDB0EQSCeKBCJxCnnSoQTm9lSlmG/Q+h7NKo7GIZIr7lDr99AlQUkAjRZZDJ10aMZOsMperIM\\nShw1kiKSrHBne0BvBKnsLIEQY2gGhGL6NcfjNwUoGIZBGIbs7lbxvIBsJk8QBLiuTTQawfddRHH/\\nr5ZKJXzfx7ZtJqZDPJbl/PlL2NOAa1dv02r26PVGeC4EvsTm1jaFQpGNjQ16/SGJRIp0Os1kMiGV\\nSiMKKrOVZdbXtgl8iUxuht3dKt/xHe/C8QJWDh6gUqnQHw44dPAIkqQwW5nD8wJ6wxHf8+53s35n\\nE0FUsaces7PzDIdDLMvi4ssX6fV6iKFIdW+PZDyBQMD62hpB4COKMslkEkmSyBc1Zucz5ApRBNEh\\nnYkRjUYYj8coioYkSciCRiKVJFeIo0gicSPCaGjj2Pu2aOdfvMA7vu3beeMb3ohhGBw/fg/xhMpU\\nUdjutmmYLvWxxcgP2dx2QJBR5Ch5w+GR41F+/AcPE5OhlNLIxkMuvNwnUKC+ZzEyIyRCA1FQ+NSH\\nX8GJqHh7HRqff4a+1EFsVPn0O95L+0tfRJ82mFoezdoWpiVz3bf5d299jPgDP8jBN/wQbmme+lSE\\nyjH81CKiokIgIUYcRCVEj0SQZHC9CaPRCN8PEULIZvOYwzHNZnNfRAXous727jaCJNBsNnnl5YvM\\n5JLMzhRJpuK0Om0qlTmCIMT3Q3Q9QiKRAkRGoxGFfBlNjVDMVSjP7H+IMokk9b0qrushSypxQwfX\\not/aIzmzSK9nohoReoMBiqAytUfsVTdQNJ+R1UbVQtqdPTqdFpFoFEmRicfjaJq2T4xSFOLxON1+\\nj3qjynQ6AQL8MGQ6dTBNi8FgwMmTJ6nV6qRSGYxIjGq1SlTXyGQTBO4EzxqxsFAhm4zhuja3124i\\n4lMul8mkkwiCsD+yCj0s6x/ZQuPUslAVnQMHynhuwJNPPsWJk0cYDLscO3qCUa9HIrnA2u0XOXTo\\nAKoq02juUZ7Jcfv2q1QqJZaWZ5lMJtxY2+DgoUUWlheoNRu0dsd0+rs8+uijeEGNXn+MOTXxfZd8\\nKU2rt8ut67c5duwYshIiCAqRZJz2cF8i/PLLlzl8YJVctoDjwXNfe4EHHz3LifuTrK9t8PyFq8TS\\nFdoDl8XFZarNBqv33M9XnnqWk8eOc+b+B9it11k8cARBlqm1+yysHqZe22Fq+XiyiSSHtMceEmPm\\ny1EiRoxOr4/rw9J8gV6vgxGTcTwZWRaI6HEMo0c6pbC0coBms4mmSpw8dpxmvcUTT36Zt3/7dzL1\\nBtzekHCZ0u7phOEISRCIaQKh4vLxpx3UfJG46SBvT/jcJ6+jJFT6gYlgSjSGPl/+gsJG1+X9Hz3A\\n05e+he71JrsvPU7dhQ/+8L/CVRcYrm8Qe/e/p/qtb0NVkrxoKkypIuhTxHGMj2sNlA/+3xw/4HLj\\nqx8j1tokmZCY9u8wHVtoURVJlhkHOl5ooMRzmFObUPCxJg6lYgXLcekMJhjpMllNx7JMmr0R8WiC\\n5dV7sXyfow/M4dgjGt0uy/Ey/UGDlaOnaVT3iN117wp9n1g0hjh1mYxNJrZNVIK1O9vML8yiaRF6\\nrR7RSBQ/dHFsh9HWGmpUo1Vr0bxyntLCCrIYw3bH9AYTErHYPmHJiHLl+jX6vR6PPPIISPsp4n0/\\noLW3wyiqEwYCoqDhB1CuFDFNE9e1CRGIRDUsR0I3IsSiCdY3NsnlMgiigedZ5Ioz7DS2SKeTaHqS\\n6dShtruBJCkIBGSySWr1HRKJFJGYTKO6L9br9/so0mvXPnxTOC998AO/8x/uOVggCODQoYOkUzny\\n+Ty5XIZuZwDsG0asr68jCCL5fJZmu4mmKgS+gKZFsCYWrufiOD66qmNbNnrEIJ8tENFlwsDHHJlU\\nq1U838e2p7i+zYHlg/i+QCwW4+b1KywsLrK+uU4hlyMMQ7KpOL1OC2syZjgYcez4cVqNBseOHWN2\\nroIiKrhTG8KQwbhLMhXFn7oUi2U6rQ6lUpFOu8uNG7d44IGzBKGIIMqs3brM3k6NVDqGJIGoRHCs\\ngPnZAvbUYer6eF5ARBOIJwx2d9poukSz2aPf75HPFRmMhuxVm5TLBVzXI19e5PQbXo/oe0io7Gyu\\nc/ncC+RzaQInIBzauG6A74SYDkxEWFyaRRi0GU1DWv0AP/AZ22C6ClktoBKDlYcjxGSRyeQx/voL\\nj5MLxnRtl4VoFGSLOCKrv/YhSs9/jvTta4h5C0U7hes72IqG8dFfY/f8SzgVi3sfvoem2UY3x0xk\\nEUGT8B0HVRWQIipy6iBy6gDJTAbbcQhFDdWIIkgisXiM4XBAt9lCCHxKxQKKquI6DtXqLr47xbct\\nMqk0mqYzMQcIYcD21u6+H6UoYJpjxsMuoSCCHxBPJgjxSaXSBG5AGEA0kSadzeN5AXv1GqoAqVSc\\nSMSgNLOvJG232sSjUdKpNGHgIQKtRp3llQMIgkhIgKomEASJZDJBMpEgFtPptHscPH6ca1deJZdP\\nM7UD0qkUuqbj2lP8wCESiZBKJzEiGtXaLjFt32gnmUggE+C5DgIyIiFhaJPLppla+zYD5VIJQZYY\\nDHpUKiXqtRqWNaFUrPDrH/rka3Je+obTB0EQ5gRB+KogCNcEQbgqCMJP3G3PCILwZUEQbt89pu+2\\nC4IgfEgQhDVBEC4LgnDfN3qGJEt82zu+lVQ2yhe++FmWVmeJRHVULUYqk0aNCBw8ssjZh05z5uwD\\nRIwUohRhY6tLvTWgN5qgROJ0+ibl3AyV0jzbG3uogsag32Y0MjFNi2Ipz333n2Q47FMuV7BGsLfX\\nYKY8h6bGOHXvGe5s3KJbr/P8089SSKexpy4hMql0nul0Qq2+RbGYp91uc/nSq8TjSfKlIjNzFcr5\\nHObQZDIZk4wazBSKDDpjLDNgbuYAr7x6nfn5HLZZZ3tnj1DysJyAwVihWR+RSUe4s1Zne6uLIgqE\\njksyGkMTPI4ezOFbEolIlNVDC/SHUwZjlxPHDmGNS8tpfAAAIABJREFUbARfZmlpjhsXz5MpVkgW\\ny7z5u/8FsqAROiGaDLbvIIghoihycCXDghGhv7ZGVoAfeNsqLUVhYEn4jkguGSEth5QfKHDyQJ4/\\n+897/OknfxElbNBxVY7kDIqKxAOAFLEIP/NbHKr9CsvSB3jrlX+BUP8iqizzNvq855mn+WHP4l3v\\nv8qv/f7znLzvbQwllZjvYFgeM+U34shJfFvC96Jo0ZAJIebEZeJOsEcmwthn1OrS63RZOXUvaipD\\nr98CfwKaSMpQiYgesaiA2W+ydec6sZjBcDRibmEeRYsSBAGGLiFL2r4eQ3RpNFrEkwk8xyeiqEys\\nHtZ0zNVLF9hbe5VIMCImOeyu36LVqDEZjIgoOkePHiVdmEGQdXwrRApFeq0Gklomk14ilyiiyiL2\\nxMKd+oRoONY+n2B7bY3ZmQqN3TbRSIRet0W1tkUgiCQzFSRZZzgYEXg+pWwOJ1TwvZDpyGI6cbiz\\ntoEqT5HkgKhq8MTffAlRCTh26CCOZKBH8pSLK/SaLYLQZma+TK05/oYf578tr2X64AH/NgzDi4Ig\\nxIELgiB8GXgv8EQYhr8lCMLPAT8H/CzwNuDA3d9Z4L/cPf4vS0RTuPDCi1TrTUr5OXa2G5imiShK\\nIPhEIhpXrlzF9yR0bX/r8dTJe9ne3qZcniEMA9qdJocPH2d9Y407u7skMhkkTcf2JAZDm6XlIzQa\\nDTxvyKFDR0ilUnetwRS63T67u9s4roWmKZw5fRbLsun1BmQyOaZTh35/TDqdRxAEBoMRsupy/PgJ\\nkskkt243mUwtErEYxZkldqu7lIp5EkWBwXCMKLsMBtv0Nrvox09ijj0USafdBEUdIkmQSkRQ5IDy\\ncpKpG2E4dIhl40zxqFX7+4lk8i71+gCFIoWsSGK67xkxv7BCGIZUO30ee9M/ZTB0CHBxzD6FSpow\\nHsEZmwhyBNedQhBwu9ql5sBbHnszvZe+zH99apus4pMVJI7rIvHvjfP6I3kurGX5zF+8xG+Y+yOL\\n+VDi1xZFstEp+YcFDEwecUBQPsfXh3MMntsj1lQQPvFt6J0OVu/PePJ9aRb1AZmZN6Ae0dixt/jg\\n4xqGpHP2gMZDwXXimTGWmCaMzSM4IXIui56Oko6lmUwbuIGNFpGYMwqsv/o89WqN06ffSCjIONMm\\nWjyJhMTXnn+Gs2ceQDNECBVkUSWRSuN4PpEwwc72BslYimQiw3gyQRR7+LZPLJbAcRwi0TSSJFGs\\nlInoCs1mnXihQGdk4noeu/UuC4tz2NMRo0GPTLrAODCZjMeEus6ofQtf1pEzc4yaHSKRCK12lXK5\\njCcJKIZOo9vESMaRdZVuf5fp2KIyM4cfhoy6Tfr9LvlsGjWiMxoO8cQR+XyOq6++wtGjRzkUP8pw\\nNCSXK3DtynXuP/0Alj1lt9cjmc7Q6dSJxzVkWSaiZek0LRrtjdcMCt9wpBCGYS0Mw4t36yPgOlAB\\n3gF87O5lHwPeebf+DuDj4X55AUgJglD++57h+R75fJFCoYQoS2i6clcavY1h6ARBgCjoZNI5rly5\\niqqqnDt3juFwSKPR4ObNWxiROI7t022OyWdniccyWBMPy55y4NBBGq0ms7OzXLlylXyuRKPeptfr\\n0O8P2drc4dixYxw9cpxScY7Pf/6LdLt9fD9kMNh3xWm19v0FNze2UBUNSZLZ3d1lNBpw8NAqCwsL\\nNJstnnjiK4RhiOcGXDr/MgdXD1Kcmefh172ZoSVw9fYNdpsb+F6fM2eKzM4mWFkqIGFRKBhMTRez\\n70EYEPgig55LMlmm358gSgYrq4u0Oj0S6RS27xNP5Zk/sEosHWcyNpFEBUXRiAoio8EQczgiqSjg\\nuTiOS+iFiLJAREswk4oSU3qsTWFx4QgrcxWS5RTa6yIsNjQ+9NMN/ugjr/LrOwLHl4/x80fvIwx9\\nvn8zYHpEZjZwSY1S6IrCtO0wUWY59cufYfSBv2SvWSPU6hjCYZKuihdf5OponRf+4nmaf/IUv/yd\\nMgcLHj//fIunqyqpE2+m1ZNxQp/h2ML3Q0QxYDjqsHZ7D4iRSFaIpWYplQ5x7Oh9WNYIWfJQRQEn\\n8ECVefh1jyLpMoFnM52MEKUQ25owHg6QwpC5ShnPc5lMTGzLojxTZDCwmdou48kIUVDotjqIgkzg\\nhmiizPbGBoNeG010WVyo4LsOmiJh2za31m8hBj5z5TKn73uYUJJxpxY7t9aJxSMQeoSBR7/fpd/v\\nMzXHlIt5GtU9JElCUzR0RWfQ6+EHNpubaywvzpLNZVi7s0Y0FiNupPAcj3yujOPI+L5ONBpnPB6S\\nL+a4dvsWqWSUYi6NIEvYnsvEtDCnFhNrRCKukc/m/+FA4e8WQRAWgXuBc0AxDMO/lV7VgeLdegXY\\n+Tu37d5t+5/7+mFBEM4LgnC+P7YIJZmFpUXS6SSWNaI/aHPqnmOsrd2i3qgyHJiMRxYRPYosyzz0\\n0EOcOXMGRVG4dXMNWVKxpy5nzpzAtgfY9oDhsE6pnEcQfaIxjVanzVve8hZu3LhFPl9kbm4OSZJY\\nWlplPB7j+wHz84u8733vIxKJcO7cOXZ2dlBVlSNHDrG3t0O1tkc0GqXf7zM7u5/Uo9/t4doOsYhB\\npTzD0RNH+Nxn/5TjR5Z45qtfQNd8LLfPt7z5TTz84Js4eeRhPCek3zWZWGMkbJYPFGg3umiqiB7x\\nyaajxOIh6XQa15uQSsZwbI3hyMaIJxC1KLPLqxy/535GEwvdMCgXc7xy8QK2O6WxtU00HiOXLSCM\\nTATPRZFkZEnC8kLminEygkBC2GHByKGsvcLOnW0GtSZhz+ZH/2yTLUkm7U54Qyjxv5/y+dmHL/F7\\n74qRDh2+908dNgo/wovDHK98OeDO0wKnei9hf/Gfk6wGZDY+T+2PPs5vPv81qo7H2jDKg+/6JdLz\\nLrEZjSA+oHFjwvofPw4Dkd7NLpruIk06eMEYyzLxfYtMMsXJE8uE4YhXr1yg0e8y8XwCUWEwalLb\\nXadf7yAHIRFZZNxtkIyoELq4noVhqOBMiSoC9qRPu15DxifwbbKZJKY5QtUTiLKCpsioqs5MqYwQ\\nQrfdJhmNkUnEiSgys8U8rdou7XqNyWTCzMIyxx84SzIRw5qanD9/ESmVIxKLM18s4LkWguijaRq6\\nHkVTVFLJJN7U4dDBVTRFodexKGQrTEY2nuNw8tQx+uaQZrdFrlxk4toQmrjOgERKQxA9jKiG4wWs\\n3bqNoig8+OBZ2vUq165dYWxNmZlfRNJiJJNJTKuN55uoYvwfHhQEQYgBfw78ZBiGw797Ltw3Zfj/\\nZcwQhuFHwjA8HYbh6WQsiudaPPPU0/TaQzLpErKks762hSzrJOIZNDVkMGiwtFTGsSdMxiZPfuVJ\\n8rk0D565h5gR4DkdPHsIdo9Jd5vpoIUuB5ijMe1mh0QigR9APB7FtMYMe30UWWNotumPurS6DUbm\\nEMv2CAWZM2cfpNZo8vVzL+ITcuLUfah6lLmlWVKJCM8+/QSxeIRarYoW0fjE45/k8uXLtPdavPlt\\n38bi0ZPcWd9GVTVefPEl/uAjv8tnPv0xfveDv83udot6bYznBtQbNpdfbtIzFUa2wsgM6HcHWCOH\\nwWCEH6hki2WicYmF5SVmV49w4uQZ/skb3gSCiyJILC8dwXMtFCHgs4//EfNnTzHqdYgZBoOJgy8q\\nBKFLIIbIkkB1rYY3GnPppQ4Du0M9n+TH7s/gROGZSx5zgs67HnT5t9+R5jfe6zAvbPJdX1mEd/01\\nhysrBIrEWz7w//C/fX6N2Z/+TSrf923cqpxluPQ9RPUtKsoWsVX48L9cYfWdHyD3hncz1hWeHeZ4\\neS+HIhV4zzt0Pvqj7+HbHx1xc/MWezdGSAmHeHoOIfTAthh19mjt1DB0jdWFOYJRm6QOtjtlprIE\\ngo4RT6AyZdyvMbU8XCekVm0hCiprN9eIxmMEToBrCczOLxOJZkgmy7z88kWqW3uIOLiuj6hGGQ/6\\nhIjUGk3yc4s0TJdQShDPLtE0VQ4ePUMym8MPJW6tr9HYrTF1o/T6HovLyyh+DCOSp2e73L69Tq3W\\no1hZpdVtMhj0GVsOvqhya30bWZVYnF8kUAKypQyCJ3H75nVUNY0gJcjlSxhaikBUkdU09frwrsLX\\nImLEWT54lEy2TLsxxJtKLB24F02OUG90UOMJqq0uYzNEFCIoavCaY/M1mawIgqAAnwO+GIbhB+62\\n3QTeEIZh7e704KkwDA8JgvD7d+uP/8/X/a/6P35wIfz0h3+OmfIiw+EIx7WoVqvMzMxQr9dRFIXp\\ndIxtuxSLRSRJ2Wf7pTOs37lFKh4jCG2MqE6n3kQQBCK6Snl2BcsTSCWzhGFIu9MkmUzSaNTRNI1s\\nMsHGxhalmQrxeJwr166wsLCAEEokkzH6gy6Bv8+jMCcjbt1cI5ncT0q7n3G4jKIoBEGAIAjMLczj\\n2jaCIJCbm6Wxu8ug0yWeTJIr5LHMCdcvneerT36J29dfQlZCun2fRCKBY4fU6m1yuRjlfIZRr0Mi\\nlUQSVUQkXG+C6Xrce/o+jh1/iHRcZnFhhi996W8wDIN+p8/9jzxKKl8hnyzw8tpVZlNpfu9nfxVz\\nPGUytalttJl6AYEPaRmmU5FULkqnPeK+03luXQv58XffizW6QUHpM/GnaEaE//h5l8NLB/nguVfo\\nAIogEAlDHJKEjPHw+eoXnuFi1SLwBZaFawjVLzMb19moTpl58I1s7kz4wV99P8szZW5du83aH/4Y\\nj3/gU7zlB98Ntd9FVGIIToL+ge9g+fhDtAKdQr6ILKskknGs8XjfAl6VaXU6lBbmaTcb5HJZhq02\\nFy8+TyaTIZObZzwYMbswf5f96CPLMp1WjaWVQ2zXNyik8ziOQzqbpFlvIigGqqJhjkboqoogCLi+\\nT+j7JJNJYnGd8XiI7/tsbG2wsrTCuXPnmJ9fxojGSaXSuI4FBDhBBN9ziEY1HLPL1IZMYYap1UKV\\nNURx364vCAIESYbQwTaHdDut/QxkgYRHFF8QEESPfmtANhcnCEQ8f3/Eqygy40GHV195maWlFVKJ\\nNONRA8+XiCXi9HojcrkcU2ef1Rh4Loogkr/3vf8wJiuCIAjAHwDX/xYQ7pa/BL4f+K27x8/+nfZ/\\nIwjCp9hfYBz8fYAAYDv2Ptd8b5tut0tlPk4qI3Hplec4deJBNje3OXHiCPV6nVgsxs7ODgcOHGJv\\nt46ARLEwy151F3MUYIcSpeIsuqLiCzJTs8Fufw/D0DHHA1S1zNLyAtPplGR035txfn6emzdvsrSw\\nQr2xw/ETB6lXu6hKlOGkQTZvUK310PUIi4srVOZmyBUrbG3tUJrJ4doOkiTieA6O7yIicv3ll8kX\\nkhB2ef5rF0jn8zx49lHMyQBCj1Qqh+iDlBzhehPyqTRpI08iaTAyx3jsW3UHvkguH0fS85QrJ1le\\nXaFU1jAUifX1de7ceYVWq4kkKPtqzPI8522L+vYd/FPHESQRkQDPsXG9AM8HWQTbB2SB0dgiGtW4\\ndKvP0QOrfPAPv0o57vPWEwIff04ivxDjo80qK81X+dAxCCYyL5aOcn0QpzSXRtd1ljMTvuvtj9EE\\n0oKALsMGkHZDPvuf3s3tS69ytDzDi5/6AClf4Mpf/zTpe76bf/NXP0YwfJmbH/49tLNvZ/HNv0Nt\\n4zmUYpZ5pYyMjywJjMY+sprHno5x3DFGIkP9zhaD8QhFNoikS9xz72latTqlXJqO7DMYtlE0FSkQ\\n0CMq2UwKURSJRVOISgxdV9nd2yMZKyJrIuNhj9ruOvedfoBOt4+Pjyz5jIYtPH+fzJTPZphfPIGk\\nibz+DW+i0xoQIuI5DqqqoogCgeUQ+DbexCUay5Kf3X+vp6MoK4dy1LbXCX2PXq9PZX6VSDSHLhrE\\njDw3bt1EUzzS6Tni8ThTd8r84hyyqGBZFoIkYjsjZFlGj6RZXlzBnlp0Apt0qsjGnRuoEYVMNkK/\\nXyWVKKLqCp7nIfn+Nwr11w4KwCPA9wGvCoLw8t22X7gLBp8WBOEHgS3ge+6e+wLwdmANmAA/8I0e\\nYBgG6+sbHD16HEWRMOQYI6dGJpFnZ2uD5cUFWo0q08mE6WRMpVykUd1D0VVWDyzv23rhsry0zHiY\\nQJZ0BoPBfueBiBrJ4Pshnm2xtHyEa5cvkYrGqQ4GmFOTF174Oslkku3tDU6dOsUTX/wSR06epNNv\\ncOvqZWpbMU4/8CAbYZXAddjd3qbdbiOGId7/S917B9ma3vWdnzfHc96TT5/O3TfPnbmjiUozCiCE\\nkASSYAkmOCygBVFylcEYgRZsi8UES9iAsTG7GCSWIAQSSQlZaTQKM5rRhHvv3Hxv3859cnhz3D/O\\ntf71bNXWlvRWnf+6qqvr9PO8z/P7hk+a0O335h3962tUGg32dw+oOhW8mYdhNXnopWvksslB94Aw\\n9MiIUDSZJPDpHo0oV2tYtoau1UmikIVWnVYzw/Nm6JZJKgiUrDKv+tbXMJhNWNk8xuOf+Sjj/hF7\\nNw4wTAVV0Xj++Se5fOkiE29CuWTizYbEmUiaiGiiSiEKyEWByPyuJ0oFIgJalqKmBc88d4lNU+au\\nsyt0XruE8cUnMPa7lJHYEXS2ag7P3T7gX/2nn+aDn/oKiZvx8vtP09BFPv8tjyJoGr/9mcuMn/wK\\n8pV9KvqEwssQlD6Co5D0bOLNu3jgh/+cHVmgE3lsP/5XxOUlisBAMbcRhzukpSZFKSSOPFTbxnVH\\ndBYydvvXqLc20A2T8WTG+om7yPOUJAlIMoF6a4HPfvazvPbRRxhORowO9zh19znc4ZhkNqUXxSys\\nHePmjSvoRpl2bYGRO+T6hUsURcGxk6e4des2jUppnrDt9llb6pCIEkvHTjIajbASl7yQmGU5fjiP\\naG9vb2GaNnle0F5cQNckksAjSz2+8tmvcObsWSoLOt3dfSTRpNkyGI17CEnGbNxDJkeSBTqLLURN\\nQi4E/GBClguEakwc9RAEmE6nGIaGIhQMhxPMUhWn1gIhQcgT7r7nXgqxNEf5iUN8fwiiQBiGX3cE\\nv5jnxagPjxdFIRRFca4oipfc+XysKIpBURTfWhTFiaIoXlcUxfDOzxdFUfxUURTHiqK4pyiKp/5n\\nvyOOQgxNYn93i373EM+bE59t275z3D/iwoVLlEoOCwuLbN/epd1uE/oBcRwzHA6xrBL9/hBNNQjC\\nGZZlIIpQqdYp2Q5ZIbCyus7Vq1eRFAVBgVKpxImTZzBti8XFRRYXF9nZ2WFpZROhAFHKueuu0wzH\\nU6bTKXEc0+v10DQdwzAxDBNV1jm2eYJarcZoMCD2XQxNQS87iIqBbpWpddoYToljJx/Cj13iNCIM\\nfWRZRNNlFEWgezREFCHNC0SpwPNmyJLG0soSpqlSbVRpdxY4c+5u3JmPeqcXcHVjk0ZrEbNU5vDw\\nkKxIWVrsIEkScRwjCBKCIpMKBfND35wmBZAlOUWWkecgiyIn6hb3n1A5s9bn13/5y7hCznajjoWA\\nUHh85HrCX7kgCxP+/Hf+Kz/+tpeiGTNM1eP5Zz/JtS/9IV/42/ez7QdsLLV5xy+9m3xmoIaP8dzT\\nT9J6+C1I7XMMI5XNuk7iPU/h97j7x94/L6D50/fQOnk3TtVG0Q1EVWU4nvDMM89w9epV6vX6HYjN\\nhGrFnpffAGGYopkOul3hFa/5VqZhzHA4QlMNPvXJT5ILIgfdEVMv5qA7Yv3YXVQbyyQpxHHK2bvP\\ncOL4Jt50wrFjG6RpgiKLFEVByakxGI4wLH1OtgKK/0HaThK63S6d9hJlu4KQy/QOD3DdKWk+D7Q5\\nTgXfden3DlD1uUekyFVOHLsb1/fwQ48gnOH7Ls1mnTxP6fWP2NvbwynV8NyIKEzw3JD19Q0EQeLw\\nsMvK+iamXcYuV8jSgp39G0ymfbZuXGfrxk263S6yYlAg4gcRnue96E3hG6K49e6Ta8Wf/8efRVEU\\n9vd3SbOCMPJRFAXDMHDKVSqVBq47pV6vUxQZ09mYNMnn0/kkIQgifN9nd3ebRx59GZcvX2Z5aZP+\\ncEzJtLHKJTx/SDgdU2suM5hMMXSZSqnC1vYe65ttoihiMvaQVQVFNNBkg8m0O7eVahrnz19iZX0F\\nx7HYvb3L7u4um5snMU0T3bToDvtsbq5xtLOD017FjxOyxKVWaRGlAofbF0jdKV/4wmPE3ojIn5LE\\nGRkCumFy/4MPs7N9HUUSuXX9EoIksr56CkUzKdfqPPLoK9jrHXHhya9wuHODKIm5fXsPWTaZ+QGn\\nTyyzvnaCF547j6hJSJJCOjAJZxO86YzDrQFpUlAUBZooIIrzK4omQBYnuFlBu73GmQfPotvPc/nD\\nu4wLic2XvJJffPJLvMpJ+fl/8l1892//LQEWn/znZxCPnsa87y4e1pt8ZGvCWkPBmR6xo6wRTEOE\\npW9hmvl87Jkh3/cDb6ddH1IbX2fcvchDb/oX7M4CDj7zR5x69I3k3m0Oty6S1h+gs3mSbOYT5il2\\npYKqyCiCTpz4cxyaVSaKElRNxHVneO6Ueq3JNEhwnCpFkTHzXBbqDjeuXMasVul0FpnNpiRxim0a\\njMcTZr7L5soa29vbVGo1SqUSoiDgjQcMh0PK5TK6pdI9OkBXNfxURNUtYJ5e9H2fqTulVLYY9ges\\ntToIukF/2MefTWm3Gty4dhldN5BUGdusk2QuURRQrXaIowCZHFVVGY3GSJqObeqEYUi5VGFn7wjT\\nUnEchzgJ5vK8CP50Rq1e5+bNLXRVo1LTiaOcKApYO77BzvUtZFkmzWJqjsNTTz3Oa374vd883Iei\\nyLl9+wZ5HqMoErZdZqG9jKGXCMLpPBo77d1BjIn0+33yPGFjc4XRaMLO9iH1epWFTo1z955hNO6z\\ntrbKlasXUVQBzVKZTif0e1N2u2PGowGRO8KxTA72dzFVmWGvT+R51B0bBZEk9XG9HnEYUTZLTIcT\\nDvcPeP6ZZzl//jy2bXP69F1IqsBBf58o9Ti+uoTvR6imhGaoOJUqR/0+maiSZQX1+hJh6NHv7ZHl\\nEbNJiiCmBN4QUZHQTY3OwjKBG2BoOopk0x302T24ydTtsb1zg2Q2JAzmbUSD3hhDt7Ftm7pj4pR1\\ntncvIZsyAhoUMpZlEoQhiqBAoRBLIoEIoWawl4PmtHG1EoO8oCIIMD3g4t99jPN/e0hWQE3M+OqT\\nj/Orr1zjB/7lT/Pe8we84yfezute/iCdl93HHxVv4S3v7lL/mc9zl3qB6fXnOAonLGY3kJaqOMsP\\n0lwo8YNveoib/Yt810/9Lv/+02MeevN7+fRH/5BSXuXu1303JUXh2a9cwo11Ftc2EUURXTOoVmyk\\nJGQ2nnBj69rcNZiHpEnCNJhQoKEpBqbh4Acp9VqNwfAIhJw8zfCCkIW1FVRDx/N9RNEkSWXcqUuj\\nVqfTaDOZjljotDg82KXfO2Rv+yZeENJeXkYvWcRZiGFL9EZHVFottFIJUVYI/DH+rI+t6siFTLlU\\n5/Kl88x6+xxt3SKJQ/IcVk+co7F6N7XmIpIiEvsZrcoCs34fy5DRdInhqI9tlhClDN+foWsyM99j\\neWONernGjesX5uSuwMWb9rAsi8lkhqoI1Etw4Zkn5sNLVea5rz5Fo+JQCCKabhNlBaXyN1lKMk1T\\nHMfh8PCQIAhQNdi6fZ0zd53AMkuUShVu3LiGYcwR7rV6hRdeuMSXvvQVut0uoijy5JNPEYUZcZxR\\nLlURBBkBCd/356cNx6FWq3H27FncMKKQZAbDMbKi4VQrlOwqvhcxHE9QZBFDUzjY38UyFC5dOk+j\\nUWFpqcPx48e55+xZkiRBEkSWlpbIkpR6tcbt27dpNmuEYUQYpIDI8Y0NRrMxMTGqKTHxhoTRjDTz\\n0CwRUQXknHK5zP7+Pp7nMnMncxybBJVKmd2dbTrtBpcuXGR7e5thf4CiKNjGHPIqixKO45BnItVy\\ni1ajyXQ6xrQtsqJAkiTSPEPOc5SiQCrgMPJxjq9Bw+HBV72G4yfuQlxsUKQiJbuEkoFXqtEtV/iu\\nn3onjxYxp5tLnDj7Cn779/6MUa/LX3/4azzxzFXuPlPmux45x7gvIyoN1KU38PfXFZ65IWCVRG4e\\nSgiZSffL53nTd7yeSA048C5x8tzLCdIjLn35L3n+Sx9j4dg9WKlArVmlUq6AoZPngCRTKTvUKhUk\\nVWE4mtyB9mT0ej3SNKXsODhOiTCcnzAlUaNcruDPQqbDAE216XVHABiGhixLDAb9r1eze57HyvIa\\nreYCvh9i2uX5/4es0uvPUDWHkyfuQcg1xAKcUhnIqdWriJpEkERolsbmXWdIZYnlE5tUKhXC0Mef\\nDglmXQoSDo920XSJmT9FN1XStKA/GLG8skKUhFi6RRgEd6hkJge7e7i+xz33vIQ4Tig7VUbjGZbt\\nICkq9cYCsmaxvnwC26hgGhX29oZ8/vGnqDccNEMmzkKWlhde9Hr8hghE/fZ/eO+/+cG3vgrDNKjW\\nasiySp7nTKdTTpzcIM9gYXGFZ88/zakzJzjsHlEuN9k8tjEvO81iTpw8TbPVIUwiXN9HkGSGkynN\\nRpPJeEatViOK5kUraZrfofDo1BtNJtMZUZpRCDJ2pUZvb5dqtTqvclM1NMNkPJ3S7HRQFJ3JaIgk\\nyLRaLWYzl1MnT3HxwkVOnjzBaBQzG/bRDIXxZMRCc4V4OmD/xnUsReAvP/h/IwkiYiyyuNBgNBjR\\naizSWeygyhLeeEKcRPSGY4RCZtQbs7SwSH8wJMsSFEGgUS+zv7/LeDqiZJh0Fip0D3fw3YipO6PR\\nqGLaOkka43sKhaLiDsfsD1xcXUBq1FlYWkMSVQbdHpph4bpT+v2Qe9/5b3l6f8ZkfEQgSUQDj4pT\\nZmNrwP6Vv+Ct9Ss8WNF587nbvPQt7+I1Szd55bKEWHG47K3xoUsRD77kEQ6KiJc+/EqGzftotU/j\\nLJzgriUfZ/g0d0se3S++l7IwYfnhH0Yqn2T9rYagAAAgAElEQVTp3HdQHHwcQ7XIzCpJbhL5QwRk\\nBKvMY5/+LCdW1xF1E0Vz8IMRlirQqJnkSLjTCX4wZX9/h831jbkiIwkE3og49lE1CVWCw8MtnFIJ\\nWZJBLPB8n9nMpd6oc/XadRqNKqIkUSCgKBKeO6VWaeL6PoOJT5H4iFlAniUcHOyhiALBdIouiZAk\\nCJLGjetXKesqtuWQkxMELqKQkbgBzfoiimahGzYpIEkCoqQxmbooioQqq2RJzMH+AY5TQtVUEBR6\\ngyPK5Tp+4LO8tMygt0OjYqApBTe3bzCYzhAlGcOQWN84juNUkLQqSSpSKrcolxu8531/8P9NIOr/\\nj0fTNQBqtRqTyWSuIVfmA0LfC8nznPPnLyKJCoeHR+zu7H+9lcZxHDpLi7i+x+2dbW7duoXrTfF9\\nF11XabVajMYD+oMuR90DFhc7fPnLX6JScVhZWWY8mbGyuk6apnfgqhq1Zosozmm1l+gsLhPECZV6\\nAz+IcapVJFlFlCU++/nHGE1mDMdTllfXuXTlBpVaCUlU2N89QChysjyiUqvQWlzA9T0ECRALBBny\\nIqHiVEnijDTOcKceURCjySYw17MXFtqoqkq93qRUrqDdaanWNRNNnTsrd3f3MQyLU2dPkBESRCFR\\nmJKnBd50hhhniAVYukyeFmRpzkxIoVnm1W/9Tj7z+OPc3O+iLzS4/ulP8Po3P8JL/8lPEIkt6g+9\\nlMH+mOojEcePK3huQFkdsVQVmLzwNGdXNikv3MdvfeQWlwqDnguv3kg43tmklI9Y6j1PcPhZdq5d\\n4eLTj7FePU7Zsbjv3GtZPvZSprMBsReSul2KXCTPUxRZRxRzVMO+w4g0ec1rX8tjX3wcIRfuvIGn\\nTEcDdm/fpGSXECVot5ssLy3huu4cgJJDuVRlsbOMLM8rzhdaHdI0JYpCRFGmXHbI85zhcEir1WJv\\n9wCyHE1RoShw7BISIs1Gh3qtQaveuDPJz2nVm6RpQaHI7B4ekQoFRrnEPfecZe9glzjN52pGo4Vl\\n2li2zmTaI0l9gnCGYSpf55k49lwe9wIf3TTY3NzEdV1UeW6nbrVaxHGIUKTEiYso5fz93/81X/zi\\n5ymXbVYW16mUqmQJiIJC2akSZR5h6jOY9Jj532SDxtMbS8Vf/d7PkSQxtVqdra0tVlfXeeZr5zl5\\n8vS8bKUkUqQSW1tbtFoNpq6HImucPH03g/Gc6iOJCkWeYhrzyfR44qKoEvVqjZ2dHWaTMUe9HktL\\nS8zcCWmazhuel9cYDHvs7R6wtLRMtWbT7w+RJZXDwyPq9TpJGlIqldjd3WV1aZnd3V3OnTtHmOfz\\n0EuvR7td5vrVq7QXlhjPUhqNGuVKmYOtfTS54Hf/w6+gFgXD4ZhGy8TSHYIoZDp22Ti+iJArHE0v\\nM+jPkFIbRZOpVEukScGJU/cQBDOKLMGfjTjqHiAKMpok4icRs/EMp1xH0lXiNGJzfYM81bn9xBbR\\n2MPzPLb2Rv+D54osqHSTmEyA+7/nuwlv7MPtXYbRgId/5H8lVM+iHj9OauTs/fgb+G9/8Fr+4pPn\\nWV91+Ertbdz84Ef4kUdnXL/S58Ozl/EvvmvKsZWX8vTn/oh7XvUDPHG7wcqZUyRewCBeIow0zu89\\nTbJ4jGtXxvzyWx+ifWqBXm+APtjFbtaQTI/oxhGtBx/i5u1bHDtxAm88Ic9TfM9DlmVqToVpmDL2\\nc9aObYAI/e09gtkR5ZKBH0Q06y1mfoZplwh9D6EoiDKXernCaDBAsXWKLGLQP6JeWyBJQdN00lyk\\nZDuEYUhBRBpN0XSVIlPJAUXTcCd9DMtkNJxQcUoMBj0qts2tW7dodzpU6ovkQFZklE2Lvb0dTEun\\nfzj/vlbXlimKnCzLCaKEJPAolcqMx2OqdYfDwx7lcpk8z1EkiXK5zGAwpFxxyMgJAo9bt26w3F6l\\nUi1x8+Z1NjbWUCyL2azHdDRFVWXyFCqVDkdHR3MPhSKxcO/bvnkGjaqmUrJsFElhNBjTaLSYTT1O\\nnz4NzKVDWZwf59bXVxElgXq1wsUXznNr6xqWZX39blmkCQd7eyRJwv7eNo1anc9//nME3lxTfuAl\\n92OaJoudZc6cPsvh4SEXX3gO2zYplQ0ODnd54YUXEASBarVKp9OhXq+RhBGGqrGysoKuq7jTGU9/\\n9SnSOIUc0ihn0J1Sa6+RFyruxMefhVx87llu3rjGJz/x93izCf1xD1Et0B0YzFyORgdkUshsOmY0\\nmiCKNuVyCxELXTepVRtohkW16jDoDvBdj8ANyGMI3JijwRjfS1ldP44sSnjjKbIoUK/P034FGQIp\\nAjkyoCCiivMT1gMvuRtdV5GbNazTG6yfupdBmmFWOlDX2Lv+RRaifTaOi2wdHTE+iNl+9oAHliQe\\nu3SD3/3wkO88tsYj8nne8G+v8M9+8/Po93w7l770QSqVGte7CpFewZ1JlDtn6RlLpKJJR4yxnZQ4\\ncCmrZZSVNUq1TeTbGX2/Ry7NN9qD/QMmkwnDQY9y2aYoMsIk5uhgF0vJGB/cYHT7MtPJ/tzpp6kc\\nHvYIwgzPn+HORuRFjG0pxIFPGLlohkEYxoRhiKJo6LqKoauIokAc+AhihijBYDCal6XmBWme0xuM\\n5g1QokwYpGiage/7aJqGIsmcPnkKqWA+w7JscuD27VtIIpBnmLpBjowXRIRxQpQkiMKcBJUXCYap\\n0u3Oo9RpHJNEEZZl4XkepqEym81wvRBLN7jvnnOYhookCWxubvLC5Sv0jvrkmUDJahAHUOQCvjdD\\n1xRURcI0jBe9Hr9hmpd2dvYQRZGFhUWG4y4gEEY+mmpg2SUmE58iz5FkGVlRGI0HPPrKR0AsMHWN\\nkmXT6x5SMucT697RXGbcO7zF/Q+cYzweU6lXGI/HLC4tcPXqVZaWO9x//zmmszGjQQ9dlalXW2hm\\nDd+bcXi0T6vVwvPcOyWxHgUZ9WqN++67jzTNGfQPmE76lMwSo/4+cRxw7OQ9HN/cII5DKiWbD73/\\nN1HuaNutziq7e1s8+eQOKyunGM/AsVVmY48km5CLEpNZF6UwqMsdkiShtdBEVWVsq0CSCg52u8iS\\nhu/HBJHE4WGPwcCjUVGolMp47oRaq4M761OIAbmUkpMhCHOlhwJSb8zBhR5yCrc/9QVKjWU+9pXP\\n83O/8B7e8au/gAxohcYHyXjul+8i0iNe9f1V1gWLpy58jF978wnEc/fxB3/zF/x57ziLlQkHexMu\\n7Qg8uPmTjIpPIcxayMvfy6NvfA3Xb14knezz0GrBq9/2MAdP/j6JENBZeAXLDz7KM//wPq589vc5\\n+0PvQzJLSIpLFkxYWGgRzBSyKEDIEhRRoKRLJN4BjeVjzGY54Wgfu3kfN25ucfz4cYLEw7I0dFXA\\n9aZsbW8jITKd5rQXlglHOXbFQRZFtm9fp93u0O0eolsmO7s3qVRqX+eZyrKI67ssLi5ydHSAIkl3\\nrrljfM9D0zTCOCZNU0aTCY5i0esdUa44aKpMFAXzjVeWWWy2mUz7jAZ9arUalq5iV0uYusNwOETT\\nNDRdYTQaYZgmw/EIx6mgqiqR62EoKuPBPpNRn3K5jCTVUXUDVTJoWDLPnX+ejc37cZwqfjBGVhRM\\ny2J/fx8/iF/0evyG2BQESaDerDGZzBiOBwiFhCiKaJpKlIR0B12SWKBSsUmSBN0s0dAk/EmCrMD1\\nq5fnMwjdpDs4oNWuUVBid3dIpaIRpxl2qcrh4SFly8adTBHyjKP9PTIyarUaO7deYGlpiSAq0O0C\\nTSshSQaipFAuVbj/voe4fvMiCDnjiYduzd82vp9z7NgyRZ7SaNTJJIX+ZER37zJb169w/quPMXFd\\nkiRgeXmZ2Pco6TbWskIRTzi92qRccfBmfSbuBDKFlfYakV8QhFPKhcLg4DZCGnNraxfTtBEki6KA\\nfn9Iq25hL9uoho6qGQzGIxY2N1AkjyD1SL2MkIxICJBymZyUXIIiTVAFaBsazsu/ncpCE1SVd/za\\nL/DZ//LrqDu/w//yG7u8fVlhJBfULJcPfepl5M4Krdnf8PrXO9SNXfaW2/zu//ZPefd//D12rh3x\\nS+/7KPe9/Ay/8fY3EGz/DYPSaY6CBh0kiq9+lQdf9yM8/w/vx5pe4xVveCWPfe7f0//SEZ/7i/fT\\nPG5iRQmh66NKoDTruMMxt3d3qFbn+RVV91BVFd9NKJIYdzrk2Jn7mLk+zWYbTbVI4gxVUonjmCwV\\nWVzYYDAYIyASBAF5GpEkBlt7N1ld3yCPYxYWlohTqNUV4sgn9KYIssSwN6LZbOO6U2pOjYk7YTIZ\\n4zgl6tUanuchKyJClOLUdHTTRlEkQs9DM+tkTCkEEdmqMhwOaDUbqLJCo93mYHcfu+IQxS6aLhJN\\nRCRZo1ZtYtoWu/v7lISCyXjAdDyj0VygvHAMubKImATsHWyzurrKaNgnVc9w5p6X4blTDM3Bsh2G\\n0zEH+7usrW6ga5UXvR6/Ia4PsiQhiRmWqWBbKnkaoGoS/cEBQeAhAUkY0Ts84PILzzObjIi8mPZC\\nldlsTLXqkMQhklxQsqsEQYQsy6yvLSEJFkVRECcuzWaVJM4JggjDKFGt1hEEhSBIWFpdwwsSFjsr\\nzGYjcgJEOWM67BGFE6ZuD1V2qJTW2Ty1BELC6spxHnzoPhotB9PSuLm1QxqHCLnH01/9NM987XFm\\nsxkSBa1WC1EUiWKXIBzPDSkVA8+fsruzRa87YDZJmU1jBqMxo9k+pm3gulNc12Xr5i0MWSeceSiI\\nhO6MxVadWNAo1DJhrlIt6zxw7jSmmpBFGlevbJMWCmqsYiQSmTzfEORCIlZULicC+UtOsX3lcX7v\\nN3+JDz72cdYEkff+xM9xo/wT/NZJk3tWE0zDRw9lCsFl+f7TbL/mA/z+Rxv0/R6v/u6TPPvHv8JP\\n/9NX8NJTDiuovPHkOh/5vz7Ou/7zNv/6X/4q9le/yE/9o2/jwhc+wUd+8u10f/9P+PIHvsZ/eedv\\ncbp9iivbn4SKRblyLwgJpBlKkSEmCoVe4uQ991Jpt8kEkee+9iSVagnNMvCjgokbUwgylmVhl02m\\nsyGiImNaCt3eHkWRsn+4R71ZRTM0wqCY3+FHewhZRhpGHB3uU+QRzz/3RQaH+5iSRpGEiHlEu17h\\n8oXzSAJIIsRJhGEYxFHEaDzAqZZRNQ3NNlEMnbyQULUSgmBg6jKKBNVKldgPUBSFwXBMnCZcv3aZ\\nnBixgPEwJg5lcilFIGHmTYACCQlvElIgsXbsBLe2dxj198mjGbu7u1QbK6SFzgMve4R+v0dv2MOp\\n19g52GfiuRwdHLHQ6lDkOd2j/Re9Hr8hNoUsyxhPhuxu32Q87DMa9pmOhix0OtQaLYRCYDoZUa+V\\nOXFskygIkUUFz3Mplx0CP8F1ffIcBsN9ymUbp1IiTVP6vV3ypCD0MkaDAYXgs7TcxLZ1RBGiKCBJ\\nEgbDMYZhkmUZtmnhui43rl3n9tYWj33u83S7XZyKiabDheevEfgJgpjhexNuXr/C9vYW95y7H8uw\\nufDs17j47FP40zGSKBOnM4aDA/Z3t/H9AMeposkao+GM7tGAOMqwtQpSkZMlU9xJbz4f8AJ2to/Y\\n2T7CnQVkFGimAWJBIRYgQd1xIAtZWqoiCBL93gxHc+jtbeEPDghSEeScIs9JEckFyMWMQz/mO37g\\nhzn18Mu5/uwF3v2mOi/86v3cznJumzrV/UOmtQ0UXyDXTYZ5ivGlr/DcO9/JyY//HnVrjKI4SNOQ\\nl7/+NKePP8T3/9Cb+Os/+iX+j/d/nN98cpdeknEU6fzg+/4dr3vNQ/zG6x/G2PG4a2kdQ8hQpiof\\n+MWnWM8V3vjGB+j1vkp1eRHVcBBllUL2ifwJvd4RSZrSWVljeWmDvb0eslkiKjLaix2CmY9QSPiu\\njyzLaIrEzs4+lulgWk2q1Q1cPyHPBIJwSpomGHoZzTZBnLdiZ1HA8bUVXNfl4KiLbdXR1Aq7OwPu\\nffBhkixHUmRatTaaYTMYz3AqFT7z6U+RRVOEeMrw6Ba2XuCOurijPkkcoCoK49GIWtWgXLZRVRVd\\ns9lYP4WAzsH+Lu2FBpVqCVm16PbGyKrB7sE+i50W1ZqNKJmECWyeOE0Wxcg5bG5uUqks4Edglev0\\n+wNkWaV/NMDUTcp2iXZngSRNmbkuhvniLwXfEJuCpqoc7vZZaK+iKTpn7zlDpVFGUSWEIrtT4jo/\\n/oxGI8IooFK1MFWFJEo5PDwEseDW7euYZokb129z1BsRuAGqqlK2dSxT4vBwj2vXrjGbjRHEjL29\\nPRY7qyiyjioWUASYusT161vUKnU0TSPKYl73hm9HQEIg5uqV8ywtdFCVnOvXnqEoBEyjyvbtI27d\\nvMqXv/I4f/83f0caZBRpQZT52JZzxz4LJBlZmHCwfUjggW3V0HWdRIyJcgFBLINoUSDhTj3a7Q7N\\nVhtJldA0hTgOSbOQZquKH0y5tX2L9fUNEj/l1vYtBEUgTDN2dnZoFR6VJYtZUMaNdFRASOEwBunE\\nPfiijpbK3Mhi3ve3CX/wiTZtES5EIfb4z6Ai8G0fuMBjf3iJC3/SY+24wi/82v1I/T/jxOBrPPM3\\nF1koFNLFADv9FNtfepxnvvRlFODfOS4f7ehIQsCvPHKOH3hQ5uF3nOZ73nWWYaWHqar0kphjr6jh\\nZlt86uMf59GXPUIhuxQyxLlIHos4tQ62XWbQ7zMZjVk7czdHkxmmVaEoRMqlKntH+0S5gKiVGbsp\\nEzekWutglSpE8Qw/6hImYzRLplJrIgoKlbJDu9aeuwB1nUmUojmL1BeXiKWYXNWRNBlRT7l55RKV\\nksnlS89j2BbedMD+zk1uXnuBV738IZJwwLB7A72Ykk13sZSASrmY2+rjgDgBN0hwvQJZlgnDMXt7\\nN0jCKUtLHXZ3bjIaHpHlEYtLx7BrDRBkAi/A83xE0eNw/wYUEdX2IqJZwvdjhuMDVCklmIw4e+/L\\n8CJhTt0uNwmjDFOR0CSRPI5wx6MXvR6/ISTJsyeWi7/6nZ8hTuZa6u2dQxYXFzFNE8sqMZlMSJMC\\n05pnIaqVBkHocvmF53AqHRY6TcI4QNdsxuMJYRij6zqHB13K1Qr1ep3RaIQsq+iadccclaLqGdOZ\\nh+dFnDxxhqee+ioL7Tq2XebWrVs0Gg0URZnTpE0dRchJswTT1Ll0+QanTp4lRcDzfFRV5RN//SFu\\n3bzGtWsXOLa0SuCFJKTISgF5SpZGlGyVNE3RNI2KszAfABUiURoS+AlJnKKbErVyDVEuGI272LaN\\nKmsoikKz2eT6lauomki73cYLIrJMYDAY4JQMFtp1BtOATrXE1hNPkEgO3aGIN5pw81Di7nNnmIV9\\nBF9CsFVmRU62fYjrJ3zLssivH+TEacr3rEq8+yUW+4dTXvunX+Ndr3sbD3f2wIFzjRRJNqGWcnAz\\n4VVveR17TCESkaouTrHEQIpYWXZov+Jv+fGHzvDO793gylPXeeIvrnD2zDJPXtvh4bc9SHTYp/6P\\n38yf/8l/45UnT/HWd/wiYvMYsZ+RFi5hAKaWkgsimm4g5SIIOVEUMpmMKNk2aZKTFyEgUip1iNII\\n29YJgnlgTleV+dBQmxeiSnKOoSmQF/N8SBqj6yYIKoEfkeQFmqahagKhP6VcLuN7IUEQIAsZ0dSj\\nc3KT2HUZdI/4u7/8AE7F5M1v+DYGw3BusLIMHGcR3wvxo5B6fQ7IzYsIVdOIohRTr+K5uyRBQrmx\\niCxrhO6UWIzJopC6ZXA0GqGrDv70iFSxKZdqlDSFo2EXy6mgiAqDoz6tTousSOd5hzDAnU5xvSFh\\n5M/zG5pN577v/eaRJAVBACHFLplUKhVOnbwLUVDIsoLZbEKr1SCKU1RVJkkDrl67wmjoc+quu9i6\\nfYuZN0aSJBByFFmiUavilGxOnzmJbZexbZONjQ0CPyWOIwxLxy45bO8c4DgOSRYzGXuUS02iJCbL\\nQVEUAHwvZDgYc7Tf44UXrkIq0O0NWV3fZDgZY9o2C4udO+WxRwx6A06cOEGe50iShCRpREmKaZdx\\nnBKaapEmEIUZs9kMz5tRMIeNxnFItVZBEkBVRZI0xjAsJGmei0jzYl5nH8TYVuVOhZzAcDBiOh0z\\nc2PcMCIXBdIkRhVho60jJyGqINFc6vD8xetcvrzP0fY+2y/cYu/SFhtWwLefKfjWl20QFCmSKnPx\\nKCMqT1k77fD7P/7z/MwH/g0v+8l/zcLG/Vx4HA53Uyb9mHatzNeePEA8lKjlEvZIQlBT8sOc3mN7\\nGEiMI50bWyWstQxFEkjGPqJlc8/3vZ7O60/x/W//Tzx7W8IunUAU539jqVSi7FSxbRNVVTFNE1lW\\niaKMwE/Is4SKU0LXVUqWhUCILGZMRmMkqSBNPEqWjlAUhEFClkHg+tiGjePUERWZ8WRElkTMJmMO\\n93fpH+5TbdTQNA0EgSyOUGWFbr+PokoYuk7geRSyyLXnLiCJUG3X+bG3/yij0YTnn38B2zTRNIX2\\nyhKyLFNxSrSbNfI0JA7GFGmC5/n4YcrET0BUURWd0WCM57kUWYqQF+RpyP7eFrVqCU2VsG0TQ5cg\\n89m9fR2nUsb357Kq4zgURUIWBfQPd4kjH9PSiZNwXjxTq309Ufpinm8I9SFNUzxPpT+YgJCwtrZG\\nGAVcu3YNXTOwzDK6GpPHCkkiEkcTjKbM9u19Xv6yV7C3P0TCoHc0olIpc/nyFdbX14njmJyC7uHc\\n395u2Vw4/wIlU6TbO6BdK9Mol9HW17l06Vmm0xkCCqoCulqgGyKmLhP4GUnkceLYKof9AxY6y8Th\\njA9/6E95+0/8Ky688DxxMCZ0j2g3zXnwRojQTJGyWSIORWRZJIklkixF1iQ0eb7YBRJ8b4wkWRiq\\nhCpnlJ0Ss9kI07SZTaeoos7WjV2KQsCd+HcGj7cJw5AwE+a9i0FMuRJjlWwWKg2Orm8z82Tu1RQY\\nBcRUeUAacO4lJr6QUrElEALyApqixM20wms+dJ3jIvQSkWuFwP/+oYIfOu1y38bneOLnPsnRs9Ap\\nixi2gncFjg4Ujr10QhS6PGP8MbF2nHi6x099z9sYAnYBkgJ/d/Uin3vhKr/ynR7f+ZM5gZyxufJK\\nfuMPPsjekzf48JtNVl/u0Z98AVX7RyRZRuyFKIpEnsM0SKg2KiiyRiSHyKKGqs6NX0khYJYMLNUi\\nTWNid4SlaXeUhxTP81hYaDOb9lFUmbiYMdhPMS0Jy2khqjKFm1CvVsjykJ2tq0ynLhubJ7h06Tor\\nKyuEbkBmWvjhDC+KqHWWKJcqDPojWqvLxIHNa7/zR3GceQ+DLKvEcYMgHqPKElmeEkQFKRpxnJOm\\nCc1mk4ODPRaWW4z6A0rVJrIgkRQzFLVJkcs4tkVWNAnCAc3mJuPZFKFIqC10GI7G1OpLCIJIKiQo\\nskHoxgRBQJRmHFtbpeFWGQ4GzIYj1P8XPoVviJNCkswtxr7vs7Ozx97+IQVgl8qsrq+RFRl5KuD7\\n4dwsoihMp1MarTrbuzuouoEgK6i6Qbc/JM/h4sWL5EmKJIgIhYgiqezv7rC5foyp20Uzcg72+1y4\\neAVRLBgPxzSbTbI8oWzZCMD1y5cRKRCyhLJtkKYpNaeCP/FwzBLf993fiyjEJMmMrz37ZdJ8QhB4\\nd3BdAVmWMRkNsG0TRdSJAgiDAnKDOCpwZxFxVDCb+ohIVCo1JuMpk+GUPBcZDQ8xdYUwnMttpuGg\\nyQZlu4Qsy3feoCUQBEwLsgQkUaHXP2JxcZGZmzIcz4iQCbKMXI7IlAGKHCPmImmuoBoqlq0wTCdk\\nUsa7X7GIW8QUGugVePiegsHNDCcx6Szq+JpOHiYUmY7kpmiuSMls0MwGmGaI3DjGe37kXSiFiirp\\npAmcKHKKJMeSC9SF+6ksPUB28TO85/UmP//6U9grBb5fxlI0hqMJeRKQM6dIW4ZOuVIiigIOe4cI\\nRYbvzYjTiCDwSNKAIk0YDQ+RJIFqtUMQRYxGI5IkQVVV8rzAcxOyVMQ0qximjSyoKLJMHIQstNrE\\ncYxlGTSbbZaXVgmDmKWVDfYOekiqRYZCvbmEU25hFCKipROnCZkX4Pshi4tLqKpOjoiqmLhTH0Ux\\n8MOImRdQcipIkoSoqFTrLcI0p96el5wvra4wm81I0xRFKiPKIb4/ZDZJkJSQSq3JzJ8RJx6yZiHI\\nJmvrp7l9e4/JeIYqCiBIWHYJp9qg3VrAnfkgypw8c4ZcEFhd23jR6/EbZKawVPzSj76GpeVFqtUq\\ncRx/PQeRZwWWZdEfTNg8ts6tW7fQNRvDKpNlBXHiURQJTrXG3t4eSyvrWLpFrVJld++Ag8NtWu0G\\nw8EYUZzj1jWjQhylSLLAZHTI8ePH2bq5jSiKTCYjJrMpi4vLVKtVDEMjz+cKSe+oj++FrKwtU5AR\\nxyGf/+8f4+KF82RxgoDKbDrEqdiULJUsT0jjCNNygHnsm1wgSRKyJCUvEiRFxLZNjg66VJ15a45A\\nQhAEiGoNVZ7nQixdIEvHRFHCZBQS5PMTRzwT0XWdkiOzsLhIr9fj0Ve/it3nnubxT11DF0FO4dzd\\nK3zusQOkUOCuExYvOR2QpQJqofHscxMOxzBEJ0/nce4/ThUych4g4//8PpMbl32WTkOtpJDtKfTc\\nhNZmRkaOKJr0RxGv/GdvZDy+SKHcyyc/9TyLr/9ZHLXN0z/2Nh540wP8w0efIxgKyFbG97+jIOy1\\n6Oc1WrqP3A4QpDLHvu+3mMlNEAuUQkI2BBJBwnVdbMMkTzMkWcCbjCnbNnGa4kcZnaUm3niK5waI\\nqkC/f0il0mDUH2JoGo1abW5VNgymoylJnlG2bDIyEAV6R11kUaQo5m5Fq1yiKDImkwnNxjJRnKLr\\nKnky7wo17SpbO1dZW1lFEnWSJCLwPOIoxXbKDEcTahWV8889S6fdplxdIM0hTWOiKEKUBGzDpNVo\\ncXtnG02TGY8GrB87jedPIU9RJJnJZESTNY8AACAASURBVELv6AZ3nT7Dtas3WFzawDQtrl6/yvFT\\n59je3WNtYw0RmTzP8YMZeRIjiBmmbuH7IZHvoSoiC/e99ZtopoCA4zioqkq32yXPc7rdLjdv3KJc\\nLnN0dIRhaoShT7lss7o650aWyyamqdPpdCiynIX2IrqiEwYx/cGIUsnh1Mm7yQVIiZFVHT8QkFUJ\\nxcip1+uYdp2pG1BpNOmsrHL67nt59Wu/jTOnz2LoFt1unywt7shbJVRdQZZVbKtCtdJh5o4psgin\\nbJOkAWWnhCiKpGmGphpohg65iGXY5GnGbDYh9AOKQqDZWMC2y/R6A5aWm3PWoSBRiAa5LCJLKYKQ\\noSgScZKhmwvIaoUwiZGEYr5AJAFZU0kygds7u3hByHA2YXK0y2IV2h2Zl5+r0b19yCwtCJWMi9fG\\nPHUp4vmrAo894XHrAMIIFC0kryi85Q0rFEWCWmTczEUsBx44V2G8ZbD9VEIv9mmcMUAREUVIioLV\\ntso/f89/xw0bkPR57cP/GDtP8TWB/ZHEFz6xhaUULNYKGrrMF290eNf7j/jZ918ml+M5Lk3TiYqM\\nmBwvCNHLZaaej5QKEOWokkyS5SRJwvUblxClHEkSEWWFm9cvUxCSpBECMqZmE7gezXqdhVaN21tX\\nMC0ZkTngtd1oUiqbZJmArtmYps3S0iKt5hKGXsYyK0wnHrblzPmPosjO3i6u6yOQMxnvk6UicVSA\\nkPDU008gyxKVSgXbtqnValRai9Tbi9SaHfYP+0iySqPR4sTJ46wuLaMo/w91bx4r237V+X1+ex5q\\n11xnPufec+d73zz5PU8Y29jYgMMUhoaQbroTLEQradKk04DSQU0jYkQCUcighog2JDSToWkTGiJj\\nw/Pz9Mb7hjvfe+6Zq07NVXue80fdJo6UmBcJRe4tlVS1d/1qb5X2Wvv3W+s7qHSPd1heamFXTBqt\\nJn6QoOsLCXfPjwmjhHp9Gc+P0c0KQZigqCaOUyVNU1aWOkyHA7pH+5T5AgY9Go0WRe44pFFf3OOl\\nrL7tePy6qCmUlCwtd3Ach0qlQhwlrK5tcGb7/MJc1TaR5Jw7t29QsWwoBBXLwp1NsSyDJAooywLT\\nMJCVjH5/QL3eYOpOcd2Q4eiYc2cvsLd/wkMPX2A6GxJFIdRkVtc2MEyFXu8Y2zYpygxLMzk43GMy\\nnlGr1TBMbVG9VqFes6nWDPJMcHzc4/69HURekOc5SBF5qaAIAySdrJCJYpk88fHjhDRNMXWdMs+w\\nLBXPm1NtVLl8+SJRNGc6nTPoT1EVEzeI2FxXmEwGqLpFvz/n/LltVEWhVnOYTufkBRRFRO5nKLJO\\no1Gj4pic7B6g5CrrlZBSzvCnE/JY48pzTd764gmKBLtHCrmkkuUZklkhzRO2DIX/uhfxxL0+rUKi\\nRDCR4Vf/lwCZBFsumcfww//dJ8iNDe5/+kcoC5XLpwR5mvPGSzE/cvVFfuP338vV3/unnPvIz3Jw\\n+/NU1JzqbITREnzoezoUSsxDvzIkWjgjMtFHrKkpU7/FVtVAVmxCRWE6WUDPJQos20CIEkVXqDtV\\nzl64wmg2R0agGTVmSUT36JhGtUPdLukeubRaTe7cukZZpKyuLePNZ8yDiEa9g+tNQdKxLANJyqhY\\nOnt7e9ScGmkG9WYbq+LQ7rSYjnz8MGBluYPICqyqSZLOcJw1FFXi6GiHKxcvEQQe08xls3Ia15sx\\nGHaxK1UkRWd1YxNJ5AzHA5JuxPBkwOOPPUkpCfwgIkkDVAFp5CHKgjKPURSL09tb5FnJzs4OZ89c\\nIi9gHoSsrG0x82LyPKXVbuG7MybzCZXKotsmiZxe9xDbdZEllfZK623H49fF8uHhC5vlb/z83yOK\\nXdqdJvNZhOu6rK6scefOfa5cucTMnePNZywtdxgNxwhZoV6vkqYJQeixubkAnlSsFpIM/X6fZnvp\\nr6ZPcRyzurFKlMQ0Gi0CP160KG2Dt968jmEY2BUTx7JxHId+v4/vh2RpRKPRoNvtsr62QRiGnLtw\\nkZdf/gpTb8qffep3kMWC8ivLMkkaUq1W8dwQigLPm7OysoIQAiGgyFXC0KdVd7ArFitryxi6zJ27\\n1xmNJqRJhm3YDIdDVtfazN0SSzcI4z7VSg0hFMbTMVFoUxYyvf6AKIRTpzusry4o15mfY0YnOFHG\\naFIip4s23uajqxS5wks7Bzx9/jRW9wBRWiQi5NELErNJwjyQ+Sd/Dq9pOe1UxVNTvmX9aZ599v0c\\nPP97ZMe7bFgqj37XR3jPu47ISoNn/pMv8gffLdGbC57+8ONktka1rsPxjNt3D3CPqpjWGlp+zOPf\\nvA6Gw+vdFXZ6xzz9zm/gt/77X8VsZZxMI37qE7/KPS9i6/QWerm47lKVIdeglHCDEYapICiQCg0o\\n6PYOcF2XixceQlMMVDXBjQwkWUWIiP7RfRRFIUljttfPMxz2UK0FIKjuaMzdCapiLWYiaUy12uRo\\nMGBzfYUizwnigrLMKckoSol6bfGUTuKUOAxRlYU9YLVa4+j4HitLKxhGBcWUKQvBfOJjGjaFpBJF\\nEZQp9WqFfv+EpdUVZqM5VkVmOByy1NyiFN4Ckl/roKg5abJ4oKyurfHmG1dpt9vkBVQqFWbzCe2l\\nVYpEopAkLMMmDkLcwKe51GQ2HdFq1JmcHNF5+N/7m5F4//9jK4qU494xa2vrGEYTz+/SarcZjAY8\\n/sQTjEZTFNWm1jGQdJVzly9y/84esmrTbK3j+yFFoWJqJrduX+PM9gZZGvDmay+zsbVBkkUU5KhK\\nQRJlTMcTWq0OBwd7mJ7OSqfK+vopRuMJsizz6msvsbK0yspSh+HohHa7ycrKCq+//jpRFCGrC+27\\nQe+IskgQkoyhKYzHE8qyIA5CNE1HlgSNhkaRpkTJv+1PB2iajKbL1OomWZbgpzAcBlAYSEWC74es\\nLHeIohDHrlLkJa3GElmWU6s2GI3nDEZjAh+Wli3KmiDwQoIYqvUGliqjTo6ZD0pKy6F34BEFCmU7\\noFnT2G6uYY6PWFqSOfBc1pIKV1/zMdsKvRsZH2pCw5d5AdDQ6CoWo8c/SPuj7+X0tasc/8bPM7z2\\nBdztR2Ct5Ge/e4Uw03jsBx8ijHvcPZC5LJ5ivPc/UwnghS+NEfkOf/dHV/m133yNf/Y5j53fey+l\\nblNWUj7z0n3Wz5rUNjvYS+tsL5f48zGDYY9TZ84z8Twso4ooJZbbHWbTMYqso6kOiqZy+pyFN58R\\nRRGxyEjLHE2VqFZsRCGo1htUKiZRHHD/4ICN7TXiJMRaGE/jVKoIdOI4RlEFYTqnIKA/OEIAhlnB\\ndWfouobttOkdHWKaJp4/R9M0IjdZiLYUBZ1WB0lRmbsu+TTk+KjLww8/jqYJwmRCWWY0Gg36gyOS\\n0mfYz7DMKr7vopsOqIsE017qoMgWaRbSaLQpsoz51KfVXKPRaGE4Ha6//hIXLp7htVeucv7CBu48\\nYBiXWIaJYZmc9EaIIqd0JGSz9rbj8esiKWQZLK+tEcQRUeRj2lVkSUUzbHrDY3TNxjZMvCjAsOqM\\nph7VRpMiC7hxc5/Llx9hMhzR7/c5f/Ey3nyMrlk0Wm2WVzcQMoRBwNyNGQ/mNGoNTo57JGFElgQ4\\njsPzz3+Wi5ce5u7dOyx3luh2F9ZgQpTcv3+f+dxjqbNKpVJhd3eHPMuYjMeMR1MUCXRdo16vgSgQ\\nskrkh6RpAiLFqCi0ajaj0YiV1WUQMZYlEycBzXade3fuoaoLgxDb0sgyibk3x9AdilxCkQV5XBAn\\nEt1gTJbLdJYUVFUlnCUouolkSlSdDmnsISk6SSCoVMGdu2xvW+hCMAtcbh0VJCU8dGWTk/4By089\\nxuDq61x8ukb9MOdfT1JOqxWWlur8zPf8GH/5+ud55ol34ekBj/ePefOTv0I1COlsPMStw1f50JlH\\nefRDqyyFJUKNaM9Uah/8UT71JzewvI9xkT/i0sWCZCr41AtDPvk5k5kk8Uefd7HWNtm01/knP9Bi\\ne6OgLy0zy2dU7A6aU6Wqm/i+j207KEJCUxQO7t9E0WyKMqTaTDja6bG9+Qih76JoOs1mh35/gKRH\\nHB/eotNoYlkWM7+gWVtG3ZARSs5wv4ulGljtZeLExZ+NaC93INfJS6g3HEQpoUgyJ4MdnKqNKivE\\ngY9tGRRFyuryGnGSkVoF7nxOSoZZaeG6IXlpoKsFFy9dYDbp0zvZZ2V5A92sQ2nSbGyRpAF5FhNG\\nPqoqoSkSshSyd/8uZ8+e5fCkx/b2WU5Ojmk2aiBSTFtlNO6RHB+jKgXzWZ/lpQozP6IsJZaXqsSh\\njzs+Zuv0Jv3+iH5viFH5dywpqOqCLjrsTaloNWQlJykLTnpDzGpJq97AtHWm7oz5aE5WpDhVg3Fv\\nytapswtRiTLh8pWzhHHJ7t6Akozz58+S5QKpyBmP5pw9vc3NG69TrVlEccZoOuPxJ55gOByyfeYC\\nnj/l3LlzyJJExakhIbF37x7Lqytsbm4i0JjNXCaTCbv3bjI6OUZRFFRZAILp1MXQLYRImc7miLKg\\nKEvKsk+c6DiVFlEQIskpkyil3Vnls5/9HBWzwXgcYFoyk8GUJAdNNSjKjDQdoesq1YqFVZEJwoWp\\nh++Cpgn8MKcMXOw6TD0ZKRdIjoI3zcGAx84vZiEVVUXXCmo1iCoSp3/gO/jUL/8KTx4dU91QCMYu\\nw1lBJQFPLXnvs9/HndUVPnD2P2D5/lt85R9+D127xMwKBiFMzy/x2Pf+NJ8/mbMc3+OtiY/72pdp\\nXn+V8Wd+kNq7fp376jrHa/8FW8n7ufiwy3NKzjBOuPmlgq33f4h5/zr/1d//L1lrpJzaXsGcd6mt\\nNBkcHtOwO5SaxnSvRw2FRLhkpk6ltsTcHbO21CYvFRpWg8lkl4qRohkaURwiyhmS3ERTBKUomLsB\\nimoS+nPyJKBMSvbuvchDF56kiFTSJGZ5qYbnTrEqJkVWICsqaRyRpxmbm6eI45g4jtEVHUkUjCc+\\nkjLHMAzyRKPdbhNGM0xZEIgE1VKZjFPOrpwmLSdc6GyQ+ifcu3udK489wmA4YHV9jdiDie8CsLS0\\nwuH+MefOXmQyGaNqgpPeMYUkU7B4aKiqgeMoyHUJJJO8KLBKDUNTSbKU/sij3W7SNOp4bkq1vsKb\\nb77J1qnq247Hr4+awsXN8td+7m+TpYJ2e4mijEgTQVlAq2Nz8+Yt1tY3F7LdrkutWuVo/4h2a4l6\\nu4Oh68RBSJJE5Cz8DYoC0jTFDz2WlzuUpSAKE+yKjuM4jMdT4ijDMAwMw+CFFz7Ps889QxJnCJE8\\naCm1IBc4tRrz+Zxms02WZbz84pc52t/h6isvkqQZ5YOK+HKrQZKHCGlhI5fGGY7VIEp8ZFUhSTJM\\ncwGscSo2cTSnLGSKUiJJElRFJor+Lymx4cCl2bKpmBZ5EWPoFSbzCUW5UKc2DYU4lilylUIE1ByF\\nWqWJsBT03UM2V0ycNMFRNCaziGhFJ/IiLAyyM2c4Yxlk4yM4nrO/X/DQQwqf/sOAqQqqKFFkA/yI\\nFQX6JlQLAyHlPPeD386OPeTDH/s4wWzC/vXP0DEf4c3eAfFRjw+2Ps9JvMX6+76f3y8+yvpn/x7n\\n5TfRpBLF6vALnwz4/YGHKUMzB9uBK5sWz53X+Fuf+E0iP0I2G7Q7VXp7BwtEp6Ezno1xLIdWs06v\\nd0y9scRoNMKuL6PJi7axH8QsdVYIAxfXG2FXDAyzitBMyrwkSTJUVUYRCYGfkBY5kefSabaQNZ0g\\n8JBVhTACzVhQsI+O9kGWOHvpMvE0IEEgCwlvdIxpmhi6zUn/mEJKUNEQkkRWFmhmHadahzJn5+5N\\nzly4SK87pFWrolBydHCX848+zvCki207SJJBEE0W928OvudhayZxniFkiZW1DY4PdtF1lTiOWdnY\\nYjAYIAOSJOF5HrpuIZCxbZuiiEjSdCHpr1tUt9/3705LsijAtGocHnWZzmbMXZ8witjf3yfLFE6d\\nuoCum1SrdXTVwJ37UEqkucTJcMhgOCTJE5xmFVmW6fV6BEGAH8Q8+tizNDuneePaXaxalUyozGcx\\njfoSuq5Tq1eQlYJTp85QZDp5puBUTzObF8hajUqtxdFJHy+KMSyT/qBLf7jL3OuhahlBOCNMfCzb\\nxAt8kgzCqCBKcqK0AAXCNENSNOyqTZQmSKqCF7iEcUSSSIwnPv3BnKKEApkkyzFti+3tzQUmX5aI\\nkoxSKlFUC9/3kVWV6WxR+JLMkDAoKIoCP54yn47ZaAha1ZyKAUIk1FSNui7hrFRJ1nXicMRxoHNk\\nnyGvZxzaKq+u/QLf+R0l3/tRlShUQEvImwo5MpoEaZFSSCov/Pbv0/sXf4E6vs69WzcZuAVb715l\\nnDTR3vef8erx+/iC9Hf4N/m/jykHWN3XyW4qDHZKprtjfvp7Pab/dJmfeVxjRZf4xW8TjCYBYX0F\\ny1ZptqroukwURWysb6FrJoEfIxXqYspdFGi6jVAkgiSmXjM5ONolzTz8yT7do5sU2ZSaU0GRHKIg\\nxB3sEc+OmfV2ufn6ixzt3kZKxii5jyISxpMTwjhA1STGox6aElKmKYP+Ea1Gh5XWCl5/TOhHlGFM\\n6s7RdYPp3CWnRDNMlpc2qS1tYNdaJHGOZWioioTrupw5s820d4ROxPDkkG7vkOWVDUIvQxILa7fp\\nvEshDEx7ibQwaDZXOOgd4XrzB4A9j9nUw9QsGtUmoevSrFY5PjpC5BmqAFMrMbQE3+2iyQlZMoXc\\np0zjtx2PXxfLhyJPSeOIqmPjuTM0vWR1dZ1G/Qx37t4gTEI6jQaN+hJFIbG9fZYv9v4SNZKptzvY\\nlolAIc8kXnzpFTTN5P7uPt/4je9n9859RqMRV86d42DnJpcuP8R87rGz06NebxCFKUVR4Dg2ceKi\\nmwqvv/EyqiEh5IJbd2+xsrKCpMgcdg+oNarUGm0O9g4Xhb1KDYkCWc6ZJx6O6SAyEz/MgYKZF2Na\\nOnN3QpkqVBwDTRHEmYxQDJyKjlBTnLpDUcQ0m9UFWCqDKPXJ8hRbMclLCdcNcWc5iiJxMkgW+gxR\\niFPViH0DbV2gagaKnlHMQ0pZgTwlzEtKpaSbaOxPc5YaVdasCqVVcu+1e4wzGyebcq//GcojeGor\\n4z/+PouvXE2YTxLGpo42zZnIOU4ZoMoSoij4X/+jn2X5/Q+zfHqFT/7mH2IdhSSdJzj81k+AP2MY\\nj3muO8dprzMuTji/Db3jnMyG4+mE1Usmo9dmRELnqXXBh54IUOUWc88jmE/QbBt72UaoBrZaYali\\n4s+OmIzHOFaV2Peo2TayZqGaFYxKlQ3Dxg0KiqIkLyWicLYAIznLDAcnrLSWaLXaRLHP1I2otZqo\\n6Ji6QpLFyKJFtSKoWBC6OrqcYGgCSVpgZ8pyYcNWqVRwKnVkWSeMfaLA5fhgl2prmWazzsrqMpam\\nEcyn6KLgpNejVWliWhLjyQklCvPQR0tCmvUWc3eKIRUMhzvU7cuILCBLTdrtNrV6ncHwiJYG7XaT\\nUggkAWka43lTqraFpBnUK3WSXMI0dDQjYTLqsbJxil7viDJ333Y8fl0kBVleINYURSIIAiZTn9l4\\nThj4vOvd34AfxlRqBqPhFFVXiNOEKE5wvYhOx2A29dB1DcdxeO65dzKZTDAMnV6vy1Jnnc3KBv1+\\nn/X1M/RHQ1rN5cUNkxS4oyFLrRaSJGg2G9y6dZsLZy8wGHbxZ1MaNQeKgma1zt3bd5jLgsSPmYyn\\nDAYzTF3QqNWJ44jKA/hsmIU0bJM0LbANA88fYSgasqIQBwl5thCYRS4I/YA4TLAtjaKQcWcBqq6R\\nlxmyJCHKhYeEpRqEkUS9JpjOQ7JUwjQKLFthPk0wLdAMh6zI8CcuirEgmuWUaEKQaQWOmrC5ZOAm\\nCaFcohKyecnAKM5RFHPuzCUsVUahwE88nn6HA5hIbsQ8MLnTzZh0VRjmlC2BLUUsHV8jPr5F55s/\\nxlpV5yvXf4fMuUBVsTDjjCM95rs+/gluPf+79K5/GttQkYMcYaRYuQSFzvoyfOfZDDmaMEsjtIrD\\nRsvh/sEBldCnsbzEtdffRJIFp5frIBv4osDQBL3DHqtnTqEpKkf3blPVBFq7QxDkOFaLMIuwnUUV\\nv9VeYhZMuXbjRd733g+jmiFOpYHrDhnPphi6Q5YP0OSM6zfucebsI6hIeGGEKKBWa2BYNqEX41RN\\nkmyKVSaoskSj6aAYEvV6h8l4SKIJosREUw2CMMZpVAnCDDnXGE9DVldXkfLFMnM4HqAoCvfv7bKx\\nfZlu7wQATVPQNYPJeEyntbZgWRYFSZKQJh6aYWEYFSpmhdF0AEhYVoUsKpEUwXC8h6ykVEyHOHr7\\nBrN/7fJBCLEphPicEOK6EOKaEOI/fbD/Z4QQR0KIqw9e3/JVY35SCHFXCHFLCPHNf9050iQlS0Jm\\nkxFL7SZnT2+T5zlxkvPKq28xnQV85s+eZ6m9TJ4mTEZTilRjc+MUqqqwstpiOOrROzlCVVWazQZ2\\nxWT7zCZzdwyiYHVtGSEkkjAhCSOksmRrcxVdNvnSC6+wurrGdDYmSWKOD7tsbWxyavM0g5MhhqZz\\n784OQshIhUwSh4TehJWlCrouM3Yn5FJOMEsp4gU8VxMFulQSexPKXJCmOVEWkhGgKAlxmiFKi7JQ\\nFhz7ZEaRh9SdCqQprVoNbz5Dk+qksUaSZ5SKy0HXJUxN6o2CdrOBVbFpdRxSMiajiP7xHFkxKMoS\\nhGAugdS0yas6WpxQz2NaSByMEo4GIVFiceOei5g4vOP8PR6vlwSGRO5IBIULRkSgxxgXYp58POex\\n90g88X1XII+YXvlerv/QVfx3/2Oe7f8xkn+LH/iwzvvKT/CBw1+i/dkfpfLqX/K7/+2XOYweQ6rL\\nmCsCIZWkMVzpFPz2j6eoekwly1ELGTNWGB0fkZQOjdYGUSyIw5Bmu8m58+fpjWbkhYQuGShqjVOn\\nz5JlCmtnHuHcE+9Drp7D0tfIkwxdVlhuLqEqJrKkY9VqSJLB009+hPEoR5YrTCY9ykJFkSokcYZj\\nmpz0Pc6c+8ACcBYr6GoVSdFQNB3X9SllCT9OkOQWmr6KVVtB1mpIosrMTbCry+jWKrXaOlmuYTsN\\nVLmGYkrM/JAzZx9B0esoRg2ntoJVX8KsdTh78WmqNRPbrFCvtiiKgnrLQpZl4iQnLwRCCLI85fhk\\nTIlEfzAkSGNa7RVa7SWq1SqaZlAmBRfOPYGq2GiaglH5GzSYBTLgH5ZleQV4DvgxIcSVB8d+6atN\\nZx8khCvA9wMPAR8B/kchhPy1TqCoGltbW6yuLoxRu8MTzl68RBjnbJ89hx95PPnkk3zpS1+iVmvQ\\naDR45zvfyWB4gmEYeF7AqVOn0DRlUWzyfYqi4MaNGwgh0+12SdOYwaCPbVoEXkC/12c6naLpCqe3\\nt7h9+y6T0Zhmo8Z0OuZP//RPeeONN9B1Hd/3CYJoIRMfuA/IKwVRlFAUBbWKjaIo2LXKwrFJVZl7\\nEVGSkZYgKxqKpjKfL/QiwjhatMlmM+azgCgqKAv5AcR5oaoUBAslqSTJFkIgsoKQFFRDRTdVKhWD\\nKPZxvYCskMhSKIqCgpIkSh84TylIKriJRy5AoUQX4BgR680AR/HJsxo/+s9+gAsfqfMNH/jb3JAK\\nhqOFQEyaQ5AKjGqdoLqNJ2DpQp2V91cYzwSXntmmUh2T6yPmIiPZ2aH34ls0DIf6+7b4rh/+IK1z\\nPh/+8Ud55rsfIioL8iwBSZAIhSiBUkBYQkyFMq6iaVXqzRpCKjFMhZnvMhqNEUIQBAFOrUIYuWRp\\niB/Mcd05x919ZFEwGo0wbQdZVlE1KEnoD7p43pi93bvcv/0Wc3fMZDIBkRMGM3bu30OVSyqWTrNe\\nJQgCKqZFGE0wNR1Vk4mSGNcL8IKQosjI85CyCPH8KYatsbt3j5k3I0kjRJkyGp2QxAGH+3v47pQs\\nCYhDD12GpXaLuedSliVxFBCFBUUumE09kFSOjo4W0HghyIoSP0xptzpkWfbAR1JC1wyuXHkYRVE5\\ne/YsSApZIYiSnLm7WCY0V1fJU5fJuE8QRGSZ9jeXFMqy7JZl+eqD9y5wA1j/GkO+Hfjtsizjsizv\\ns7Ckf8fXOsfcddm9f496tYahV9jcOI1tO3zs27+N2XTASruB53lMJhP6/T5vvfUGk2mfeq2J7yWo\\nikWa5siyiu8vhDWKomBrawtNFzz88GV0XWdpaRV3HlHmCTWnwtHRIUtrDTZOtQn8OYpsYxp1bAcu\\nXjrHaDT4K6ksWS6J02SBHzBNkqxAUgykXCaaByRuSBbPiTP3gemLRlRAKUsEccTMDanYdYrSpKRC\\nGgUstZpQlJQpkFew622+8/u+jw999KMYRgXTckiSCFmVmM1CphOV427KvftjTo4jJsOcQkj40Ryz\\nWkHRZAyrJI1KQkUiNW1MRyMqBAEKRaowGlZ5bVdj7jawbJWKfED3y6/hDUccf/FTPPOxb+PDP/6P\\nkGIo1AbGyjojf8q7fuAnOHILTn/rj3H6mR/jF2/8Ouabv8Q7P/VR3jX/daR5iRIbSPf2Sf/oN3j9\\n9/938lLm7EUTL0uZz/bYfOhj+AUopUCWSzwrQTIKapkEsksuR+RGQiwUVNXg1s7rNGxroYKlmiiS\\nhm52aDQ3AIW8NLCrK2x1Nji4dZWmnZIVLkHqIykaUZRgajqKkDizdZHtU+ewTIOKbRDHE7Ii4amn\\n3kOcxSSFT5x4GLpNrVoh8I8YDk4oyph6vYGk6MRRSs2pEHoukZ8yH46ZnhxRsQxGvT6tWhVbkziz\\ntoouCsoswFBBlVIsXSb20gW3wbaYBXOqNRPLypFERNVRyYsIx17CnU8oywXbVpU6BEGCrqtomkya\\nJQyHQyaj8UJfIsopC43eUR9v5pGlMaYqcev1V5BkE9OqYhoOQTh7uznh/1tNQQhxGngC+ArwbuDv\\nCyH+Q+BlFrOJCYuE8eWvGnbI+CUI1QAAIABJREFU104i1GtVZN3k7u4Rjz7yGAUBd+/cIEkS3rh+\\ni3q9zrd/7CMEYUSjsY4QDvO5hyTnWJbD7u4xURTwxBOPc+3GSzQaTdzjaGHbfXLIbJ4xGJywdWqN\\nuEhQc5tWo0qlaDDqeWRZxmNPPEwQTTg+7FN1lmk0HOIo4eTkhGq1CqJkbXWDo6MjjrsHJEmCJFSE\\nalLmECUJgZs8oID7OI5CludE84RqzUYWEmUh4c+nbG1tsNTuMJ354KpIsuAbPvAEDz9yiX/9r/4P\\ndnd3icKSMteRlAKRB2QpzNwZ7baBUFRiP0PSBGFSkueQJ3MUoVBxHOapj35mlUmuoK8+glbIuPMj\\nZqJElVeozBOyakg/mtGprzDJoOKscbHT4It/9C94MfszOq0aj378lykEeH/wy3zxF3+Ub/rhf8TU\\n3qI4eItg40ku/uOfpywM7h32OXO+zZqUMzw8otlaYVurkugSxliitAryJMY49y6cUuPg+qdpFBFp\\n2uTCB3+Sl//0J1nVFZTIZuydkPgZvi4429lE0iTSZMpsOqJeryHLGrKysIQXRUg0m3C7u4NuWoyG\\nc+bzORXbwDQqSJKO688I4ogkOKRWaxBnJc1GB02TCH2fO7dvcP7SWSb9PigqmcjxowBFqbCy1CHy\\nA7IsXpjyZBmzMKXe7ixa4+0GhrGoaZ07dwkhBLpVpYhTsixDU1SEbDCLpqzU64xnEzYbHeLQRc2n\\nDPf2UIwlWs0aUeQhpyFRkbG2vEKv20VRdeLokCItyPKILFugH61GlTSLKYsY3w+p1+vEoYKuq9iW\\njqwb1FpLZIqC5jj4SUBF+xucKXxVQqgAnwL+QVmWc+B/As4CjwNd4L9522dd/N6PCCFeFkK8fDKY\\nkmcmW6fOEMYhr770yqJdWG3wjqffjSw7HPcGvHX9Jr1Bj8PuPpcePk+93mDn3i4CGdNq0RsMqDoN\\nxqMp1WqdtbU1Nje2kYRKFGZomsZo6BKGMZ/77PP0+32ODvcJA4+XX7zGretdonABeJEVCT/waNSb\\nCCRazTaKolGtLhSPVFUlT1O8ICDOUmRNxXJswiRANXRcPyPLNWR1YVybZgVe4LO82qbVqaNqGkfd\\nE+benJk/50tfucYv//InuXnjPmmm4scxUR4RRjCZpiRZiaaqCzt0LyRDEGUleZmimxZ+CKAwm/sI\\ntyCKMnQSjOYqgygCtUHroXW0tYKN8xZ7d+cMD2Xu3xGsntlmOD7h3/zZZ3jnOx7DEhnjZMr4+l+w\\n8/JfEHQuI9fWGB29xODOX7J/cJdsPCDUSwL3KpeqY7zDA4L798gGLie9ESdHN9h9489Q0hOUcIyU\\nZEjOIxgXvoWt7/iXTAsJ255w9c9/CtspieQSX42oKjKnNtYokgV9PM9SKraFLAkkIcjTjCzJURQN\\nz59Qq5qcO3cGQ1ExZJ3l5WUaVYMkigiDOe1Gk3qtga6rBIGHrqrcunENz/MosgxJTvGmA/b27lOW\\nOXu7R5h2DdtpsL+/S5wsxGPz0KVeMdAkQZnmGIpGFifMJ1NEmSFJBa4/5fDgLq43RNcEig6NRg1T\\nreJ6KbZmMuid4Hshtt0iTk3K3CNOMrJcUGksU0QxO3fuIrICkZf0Dw8ZnvQRJeRZwmw65N7dmxRF\\nhu/OsHSNXveIpVaVxJ8hFSm9431a7Rre+ITB0S55vPBWfbvb2/qmEEJ9kBD+t7Is/wCgLMuTrzr+\\nq8AfP/h4BGx+1fCNB/v+b1tZlv8c+OcAW6v1cuaPaXUqBGHEycTj4OQa589foNGscvHiOiudFS6c\\nO0O1YlF3zvHma7e5v3uL5ZUmO7seihZzfy+jtdxBqDL7x4fI6MRxRFloIBSuXb+JYy9RbTk8+sST\\nUKr4wcK/QZEFlmkQhwE1awGKWVpeRtYrJJ6PZVc5Pr6Hoi4KeoEb0mpXCdOEOA9JS4lS2ICDIks0\\n2jpzL2A2HyNJEoquk+XQO/EZz/YJ3SlZKiGEilQI9vePkIROmsZQliQp1Oo5cShhVprM3TFCVhmf\\nzNB1ndv3Y1QVTq/pREFEWQh6hwkgaDZVokTDnwm+6T1tvOiE7skY9VWfWdxDMh3WzwtMXWXn2j77\\n93fYOH2We90+w1Mf4FTnaXbu3GKeSqRxhmZHnP/w32Vn9y3e9573MO/HRO6I0f0uQmkxy1NsVRAU\\nJtuPP0FUaEwnB8jTI5I8IJr7yGaN1P0CeWlTtRLkp3+a8+/9LuZFn+Fvf5y+e4hVl7ErdYYnJzSq\\nLaI4BeBg/5B2p8He3n2GgxkXz5/BcycIdGRF49qbr9BpOCQiYDbLkdMEbzyk1qwRFRqVWpvVtdPc\\nunmdyWiMU29gGBZHxz02tzeI/JAz5y4TZTmbmx2KNCCOS1ZWTxEmMQcHOzRbjQUgSauQC7AqNbrd\\nLrKs4Lkhip6iqRbkEYZm4s5D0jxg0LtP7Lsolkmp1CjykpphcdI74fyFC0xdF0G+WJIEEbKkkech\\nQRxiCDBNm1vXb+DYF6naGscnfba31xkOhpQiIjKqdI+HtGwTTSoZdvdIspy0YnO8f8jy8jLzyZjI\\nf/styb8W0SiEEMAngXFZlv/gq/avlmXZffD+x4Fny7L8fiHEQ8BvsagjrAF/Dpwvy/L/tSdyaq1Z\\n/s7/8J9TtRfS6l4YIlAZT1wOD++gyBpZ6rO9vc1kPGNlZZUkWUzVZ67PyuopanWT/f19anWHKExI\\n0xzHqXFwsJjqP/H4U7zx5nU0XUFVU3wv5N3PfZCXXn6BMIw5d+4MSRQwHg9Z7qwt2lhpiqlb7O3t\\nAbC2tkK3t8+//I1f46QfICQQ5YK7IUlgGiqaVkHVF0Ai1wtBGOR5/ldw6E7doigzJjOf2SxEkhTy\\nbIF0/LcAlXq9ShQmSGpJGmeoqkoUxxSFhO/nmBUddy6R5BmGmlHmJaosUOQSVZOwjZLtizaYGk+f\\nvkK70yBIQ27d3SPLY3q9HhvLEhvtDprSIPBlvMmE93/sozTWtvHmBQdHt9g++w7G4zGaWcdUXWa+\\n4PGHT9O7exO1tcQg0qnX2lDmzIaHGPExYeTjS6dpb57DUA0m2R6G9AhKfoQe3Sfy50hZxmw+JM5S\\nbLtFNuky9UKiwuLZb/07mE6NNI4XsmxJSKu1xNWrr7HUaWGqAs3QMSyb2WiMYehYlQpH96+TpinO\\n0jqdlU2GJ0MgYT5zMU0b29RI84JGs871N17Dth0oZQxT4DgORSFxeNAFqcCyLGTVYmlpiTxPieMc\\nz/eRRIkqSuIkp9VqMXYXCbpimYRhyHQ8WjAYH4TUaNynWrFQZYVbO/tsrm9QktGsV8jzkvs7e7SX\\nVxBFjhACy3JQFJnJZISmK8hSwXQ85tTGFmES0Ts+RNM0ckqqdoUkCbAsi8FgQq3aQEglWVxgOxWe\\n/8LneeYdT5Dn+UKlrN1m+bEf+htjSb4b+CHgTSHE1Qf7fgr4W0KIx4ES2AU+DlCW5TUhxO8C11l0\\nLn7sayUEAFkSjPoDvnjny6yubjIcDzBMBded02g0aTbaeO6Mnb0ek8mEeRDhzidsbC5xsDvnuL9L\\nrdpmfX2dG2/dpuLoZFnGfDrj6qsv88w7nuLVV77EUmeD3skxfjikyOFP/+TTCClla/MM3mxOEvts\\nrK/x4otXUZSC5559J6+98hUee+xR7ty+yeFxjm6odE8CNFUn8FPQdZAkyjxFk1TyEnw/ZjqPkCWZ\\nMveIEjBUgaIoTNyALMvICoEiVEhzqpUKE9cnSWIcW+G4O6daUUjjhRFpXkAYleRZjqRKTKcJmiXQ\\n8pI8A92SiaMCSaikRUpY2oxOIlbWatQfu0hrbZ22F9DfvUthpZSZzcRvMPQlkBJa+gGPPbnNziSE\\nSUAcHxDPDvHcGMmyaLe2mUkRWhly640eI3fAKTlEDXXmk31sW6cI56TJDHcyQa8rzA7G+GKKplWx\\nbZl+f5/myhaoJpmYkTYMjCghjnzkWhtVi4mzEFnJiKMZFCmiKKjZDp7nLty7/AC1YpLmBTW7SjGc\\nIEkSApluf8LKyhJkKfNhH6nMyUuJldUWd2+8RefKU3iBz/7ubVaXagz6Y5qdTSQlxvdmeG7IdDLi\\nve9+BtcPGY09evt7ZHlCe20NTZdIkozxdESeZdi2ilRktGpt0iJFNzTWtrbw5j5lvtDNqNWWCYM5\\nRtXkmWe+iTT2EVLG3u5dNtZWuXDhEkI26B7fpVa16HZv01k+h2naVKtVJsMuZ7ZOkWcF+/sHrC6t\\nMp1OWV1fJ0kSvCDirVeu8swzz5DFCZpqUBYxGTGnz6wii4LV1RX2kpTDvf23EepvMymUZfkCIP4f\\nDv3J1xjzc8DPvd2LkGSV/rTHN37wPcznPjff3MWPurRXLZJ0i1eu3uGxRy5x7vwqbuAjlQWnt89z\\nfNxjY9uiUqlQFjJ/9Ok/55ErF/nyV17FcWo89fgznN48jTebc+fOHabjMZcfvsytWyPW19e5c/se\\nly5ukxcxr7z8CqfPbC8UoKSYSrVFXmScjAa8eeMGw+GYhjNlNplSCAnFMCnjjJ2jBElA1VIphGA2\\nd8nzjDgDyvyvXJ4pBXGaoimCvABZLlGknDQtUGceFCApEiES5AvPA8/NkGUQYU6agaYoSKIkKQTz\\nUY5lGWRZgRcVuH7Jal1CKiErfYq6jqLGdK8d4OQSh70TAmeZU6fX8XYPObi6x0ZLYnsDlFTBPTjC\\nbmfk8leoGk3srSYFNtsbp7l27y1UpY3tGOSFjiYtc9jzqDWihQ5ivoKQZWaxQqm2MFWJKPTI0iFR\\nccKwextNKene3cFNC+IMti9eoFJtk2dt+v0hmWLRqp/CHcywbQtJl9FlnSCc8eYbV7l4/gKbqy1e\\n+MvnaTQayEWELhfkGdy6c5/O+iZlCb39XR59/DGmobRAuhYamlHnrauvkCPY3DrNYfeIc+cvkyYF\\nulFF03SKcsj5+iqTMCWOQ1RN0N48zXQ+48Uv/DkPXbpM1aqjt5cxDTjcP6DVrHPvzptohs7qyjru\\ndIZUSsiyzMH+DucfegbPdRn0D1CFxGCyaK2qqopTq3N8fEzodtnaXOPgcA/DsHBMuHVzF/PseerV\\nCteuv4aiqjj1Gv1+l6rjcPvGW+iGiWYanD13gVu373L+4jnSKAWRsb9/zPbWNook0z85YToZce7c\\nubcbjl8fhKjTa83y13/h46RJTlnCpUeucNw7Ynt7mxdfeYskSRgcD2m3m4v+tWFQFoIo9Hnxi89z\\n4cJFRjOPU9un8X2XpaU23e4JAoNOu8bBYQ8hBI8++iiHe31WOwsRzRv3XmI29el3E9a3OtQaOoah\\nYkptqpUGfhyg6RZpFnFwdJeaY3H+4hU+8B0/QUZBoYCZq4RlRi6XnNFlUilHKkBRHpCy8pKilIji\\nglICXZYoyxJZgChL0gd/vyqgFDJ5WSALCUWAoMQyBbKmEroRpQSdlkO35xElJUhgG4vWm2Wr6OQ0\\nHQWhwTse26DqzFg5fZYolrHtGv40xfUnhNGcJI5pOhH1SkEyLiizgrMXmhh2RLd7QjmV6bQvoJh1\\ntM4mmRSSB2OE0SAKC6JwjzJZyPKnaUoU1eisPYykSszmQxy9iSTneHFGrdPCyCUm7pBGZ+FPUIr6\\nQq05D9E0mWvXbvLQlUcQQmFtdYMoyznqdRlOTvjWb/pmbt+8xt7+fSqWjePUECisbm5QrVeZzWbM\\n5iM6nQ7/6g/+kKrlUGsvce7cBVbXT9E96dFoLGPbDdzZlJPRPQzDwjA07t/f48nH30mUJqDCVz73\\nWc5vbfH888+zfuo0ly5dQjVq3N9/ndu3bvHsU++i3u7QWtrgYOeApZUOf/wHv8VofMxHv+XD2M4K\\nd+/f4amn34HsrDM63EeRZsiFxHi+8GCYe+7Cy0PTSEOBbi0QvaZpMzjeAdWmUqkxG/QxdIGUp0iq\\nwmAyZWdvl0ceeRRdUegNBhiGwdnz53jjldfY3FgnjGbUG40HKMySssiI4wgpKzn/zT/x747IiiRJ\\nvPLKK9SqDcoSvvDyCwghI0sa+4d9Ll++yOHeIWG0Tq3mMJkUjEYTXn/tKs8+9Rivv/5/UvdmMdal\\n13ne8017OOfUqXn657HngexutUhKokTa0RRJMQzDF0EGBEqEIMhFEhiCEySIkFhOkKvAiezEkA3E\\nQWTDsAPLkRxEk02JktikSTZ7YLPn4Z//mqvOsPf+plysXdUMgiCtu+YBGo2/u/6qU2d/31rvet93\\nrfUyNx9/ktFoxGzacPfOLsvLawSvcHbA1Ss3cM7x0Ue3GI9WyaqkGtUMhkv81E//PO+88za3bt1i\\nNDxPaBMXrixR1gtcWzzP22++xTOPXeWRm6u89PWv8r//w/+NZx9f4p339vHRcm5VszeB/RbaFIkJ\\n2QSV5PKnLAadrEFhSDmRUwathFHOYBT4DDlnQsgYFcGJe+2kjaguEiIsFZbJZIZzjhA9kcyk6/dU\\nqIyfJVwRUZ3iZDZltGiZzma0reHkaEI9cAwXNKPxNu3JDirW7O50JLWPcYq9mWdzyaLKBWZE7j98\\niOc+dQPj9cfFKjvzrG1dp1U1axcvUNYVk507bF87R7W4wd7+Lpsbl1kajGjmt7lYD9jbe0AxGtMd\\naXZ2T+jCnI3tgrIeYah57fVvMZke0IUD1tcucffeh7iq5pEbl3lh8UlCN8foRF05rl+9xmuvvcbe\\n/jE+em7cuMbXvvIVvvPqy2xtbfCZZ5+lm3eczBve+u7rLC6MCO0+Dx6eMKgXKGzNuc119g8mrJ2/\\nxP7hjN/5nd/jueeeY2Fc82M/8ec5PtjjhR/5PG1oOJruc/PcRZS5zO79e5ADPk05OX5AMQJTR37u\\nF/5VHjz8iP3DfZZWLrO1eY47d+6xvuEIoeHoeJeV8QoLCyJrqgz3797jwoULWOe4c/cWq6urGCMo\\n4oPbH7EwWpZuSpN57bXXuXPvDj/5sz/DhQuXOTk54e6dWwyGC/zRH/0hh4eHEMF3gdFQgmRRV+QM\\n05ms0ivKT37VPxVI4fql9fzf/fJflr0LSpEo0CqzsDDEd3PmU6hHlhACMXkZLjKfYdQIVEdVlExm\\nc4bDIVpb2nnChxZjk2RloyhsSQyKpdVFckycnJzgfUvlKhmt5ScopWR7VG6pyiFdF5jNJnRdy+Vr\\nV3n9u2+yurrJG69+k2bWcffOLu/f3cMUQ+7vTJl7caGBBAIZwabwMaERggWQ3QBEdM4MjEIbMEqJ\\nGzFkmgAJGUWmMiQgAqZ/VAkJPEZ9/OeUFIXJaG1wtuDLzxrW1kqefuHHeeFzz/PmG9/he995mXaq\\nOXp4lx/+TAlUkGse7j4gm8BzP/wMo9WbLK9ss7Z6mbaRhq/pyYydvV021jYxKA6ne4yXR8zCIa4q\\naWaKze1rdF3D8dEudZHZe3jEaFAwOZzh/S5dSJzsJ7LWzJuOwYLlwoVLBJ945+0PWRyPaZoGrSK/\\n+Zu/yS/9e/8Bb7zxJldvXGU0HLK5ucnB3i7tfM68m/P+Bx9hbcHmxjY7O7vs7e0ymUz47PPP0TQN\\nG+urvPrK66TccePGDb721dcZL9Zsro/50z/9Ux598gZ3HtxhPB5x8cI1Uko89tijVNWQ5dV1Pnjv\\nTdq5p+0St2+9z+OPPM3bb7/LZ194nlu3v8fh4SHjxUXatuXqzccxyuK9lyXGoaWqHPMTIYkHw4pv\\nf+s1ZtHz3LPP8vprL3P16lU++ug22xvbrG2uMZ1OyVmxvz/jsy8+w/7+Qw527nN0dARBc+/uXa5e\\nvcqtj+4wHi9y/tI6+/uHbG5sc+v2h1y+JHzDB+/foq5r1jc3uHvnIZcvX2NjY53j2T4XP//v/+Ag\\nhRgjTdfhiopMZHGwynQ6YT6fQ1JUVYlGQcr4poUYKGyBKTJlMWByInMRY/Q07TFFMUabgqJKHOyf\\nsLw0wvsWbQqOj3ZIKQFQ1Y7ZyQEPHt6B7BgvrHBycowtWkK7S1E5nCmwrmRvb4/Cluzu7HP1ymPM\\nmyPmzYRbu1bGdVeKnBUxZWJ/kVPMYDJWiTqhE8SsyDnRaUWOGasUpIQiUzolewqypwmJnIXM0QZC\\nhKA0Kct7N1oChQPov66JQMwoP8coQ2wVm+eWefedtzAqUbiak06CZ0gZrTxOT9gciy1858MP+N73\\n9ki5ZGvrBmvbF6irgqIQw9DhyUMMjnKgaY4nzOZTUC3j0TLN/kPuPniAn0+ZTe7z4G6L0ZGyGHDS\\nfY9aP4KrO1ZXl/Exsn84o6wPKVzF0vKIslDcvXeH5cUVnnnmSba2ttjZ2ePO7dtYa2XEWFHwxuuv\\nSoficMz33nyb6aRh8/wF1rY2mUyljRmj2T84pihqtra32NnZ4Qs/+jn2D+6ztjZiPF7CNx0Xti9x\\n9+595kuRhw93mRw2NN6zff48OUy5cuUa8+aEr/3JS4TWUriK3/6t3+Li9iZN49lcXuDatZvsH52w\\nt3tE23reeeMdlpcXSVmS18bGBt99/XUee/RJ/sVX/5g/+cM/4v333uJob5+Lly9xfPiQFOecP3+e\\nb377W6RQ8qd/uM/161e5/cEHWONYHI954onHKMtaTHA58+G7HxJj5nD3kKLU/Mkfv0RZlljteO2V\\n13niqcSFC1d4eP8BD+/f5+Bw5xPfx09FUHBOJtcopYgx8s//r/+TpukYjUYsLy+ytLzK2sZ5xuMV\\num4km5WUAiNSzvr6OkdHxwzqZTo/ZVCv4rsEKjIsVzFGE0JAG6hqRdvJ32uagDUDts5dJsaIb1q2\\nzhtiEijvQ8t8foKzQ1JUbJ9bFvdYNyUluPHoZ/hXmjkhdrz33nv81j/9XUpXkGMkJ/ldslYy7TkI\\n4RjIKAVdMPiUiUoRs5CPWhti9FQGagOFFZ4hJUE8MSeqSpbbdPNAjIAFZy3GKHT2pJwwM/jJP3eT\\nnf0p/niHWXPC4e4++3t77O3t8MildUp2ickQfcKkQoLsYMyVSy8ym0bWlreZT+Yc7u9iDRhd9JN/\\nHMaUxE50WKUUrpySkRZ4Y2fcvXebmzd/GGMVqYPDt99k4Zzj/oP3IMjexJuPPsbdu7c5mh5y7eoF\\nXv726ywvbfHsMz/GytouX3npNbzv+OjWPhfOb/PH33yLjdU1qsXrJGXRaJTz6HKZ3/39l7hybZM7\\nt97j3PktDg+OeXBvh+XlRd56u+TFH/oCb7z3Gg8fHPDuW0eE5hbZL3L98iVyVpw7f5X5zLNztEfX\\nZlbXYGPjOn/zb/86qysbPP388zz3hSfZ2XnIUz/0I9y6c5cFs8BXv/V7/NjCj7KyvE49HnPh/CV2\\n9yaMhmM++ugWzfEE30VGwzGvvPIaV65c4fhwjy996c9hrGV1Y5N2fsIrr7zCH/zBv+DRRx+FMOGb\\nf/oy/+yf/FOeeuopjo8P+fCj93jxxRf58KNbPPro4ywsLNC2LXv7O1y+dJXdvSMWli8wqGv2H96l\\n85mF0SqvfecVjI0M64LQNJ/4Pn4qgoJS4Erfw3hP52FpeY2UEnfu7jCddDTec+P6I9R1Tdd1GGOI\\n/Q7G6fSEonCkLBOVm2Ymsw1dwhWaGDKDwYCUO7puTlHItielFHv7h2i9QtNIkxJATImyLBgMK8rS\\nYG3FwmiJpp328/SczNRzFcPBAtpkLlzY4tvfeJ179+9SF5acEkplcs7oEmyh6EJmaDQ5J6qQ0NbS\\nxkRKmqhE8gpKgobKoDSkFKVM0KCNJnYJbTKVVWSTmbdADjjrIEPKMF6BbjalMo63330LrVtKu4BO\\niXENo8qBNRwfJpwtGZUz3CBTuMDigmw+SkZRVTXWaWJsxYthLAmFSgnjHFlHtFZ0ocGWBXVpqesF\\n9g8GzNtAezLDxEzMhv3DGePRSGZAzGTR6sJ4QDyIvPbaayQVcaXh1u13WVxe56nlK8TU8rnPv8Dh\\n4T4ba2s8fHgfa6TxazgaceHKk6SUePzJv8TmxgIn/fDWlApIGt80TKb7aJO4fPEqLz7/48y+1HL3\\no+/yB7/zfzAcjbl85SYn0zk+g3WOtdVzfOtb3+Oznx3wi//Of8hkMuVv/I1f5xsvvUtZWS5fOcdr\\nr7/NyeSIzfUl3n37n3H1+k3eeustvvAjnyPnzOXLl1lfX6eNhxxNpxweHnLvwQPuH+7zuRee5+Dg\\ngPnxEcrVFIXh6rXHWF3fZmt9g7v3X+X5zz8O+QliCrx46Qnuf/A4k8mEK1eu0DSyHzKripXVdU7m\\nJ2ydX2XnaE5ZL3D9kUtoE3j1tW9y8eJllhZHfOvbX+OF57/wye/jp4FTuHFlM/+Pv/KLZ9nce02M\\nEa0yk5NGPOV1wvsAWbO1tYWxipwcOSkm00MWFxdpmo5Mi7MlOSuU7iico5lFcs4450hJy0oynUlR\\noJjKAJoYvbRHK4v3YosGjbZwMjnCOsnY85nn8PAQYxQrKyvCCRiNNSWHh4f8xm/8Qw52D7FaEzpx\\n5Z3yCypHNFJi5Cz8QkqZro1UVXnGRaQUQBeErkFlqApL13XkBFXl6JJsNjLKEJN8NjFrmpBYMvCl\\nL11nZXuJRy8+jhsN+PD9d+nm9xlWcLT3AMIJb7/ZYo3h6mXD9oUhg6HhwdEYV1/AlpfItmJYV4TQ\\nsbKyxqxtcNrQtnPKQSnPIGdOjjqqqmPnzgPmbcvm9jlCKhmN4GRnn8WVoZDGH7yESiXXbjzOcHmD\\neeOZTjo2liy37z+gDYat9Qt0vqGbBUBjS03OhuAjKysrfPe1V9ne3gQlNuiyHDCZNYzqEb7tcIWg\\nyKIeSZ1uGqqq4mCnYfvcGsZZ9o/uM6zXWVgYcHQyw5UV0+kJw1FF20zY3Nhmb3cGqiWEQDVwzJsZ\\nZVmSQsF0LoNzlscy6h1gZWWFpp0xb72UZ6E3rDmH92K0m0+POD4+Zmm8iHMlJ5MZw/GY0MpZi75j\\nMpuTk6MsS7z3XL58ke++8QoxZJomsHV+S2aKXlrn/fffZXd3l7nPHB/J5qoLW+vcvnOLtm1ZXFpl\\nY2OV8cKQw4Mp//pf/fWWVT9bAAAgAElEQVRPxCl8KoLCzSvb+df+638LAK0tSmmMEWgqmb9FK4s2\\nolSklNBGNv3Mpg3GKrTOsuXZKhQFwcuBFcQgGTflDqMrQgjkHDCmIOdAJqLiEGUmZyWMMSUpQkwt\\nWYGzZW9D9hhdM583dF1LUWpyVmgla7tkOErCGcvx8Yy/+7f/FybTAwpnyHQYAb6orIk9Y9g1LXU1\\nJqQpxkBOkWEFbQN1XRJzIJJJPmFNH6iMRytFiooue1AGkxwmJhZqz5ef2yZsDAmsQ3Y0vmOe5jTd\\njOVBxZuvv8dsPmFj6xy7+zMunNvm0uVzLI7Wca7k+GhGNbKsr5wnR8ve4T10rlhcHci2a69IHk6O\\nH7K9dYXoP2Q6n2DsmIcnnpXROj7sYWwgUePbgoXFjsXxMlU1pGtnhAjLC0NOXv99TJ0pB9sMHv0p\\napuwRpFMic4QsyGR+/PRD4/xkJOcB+8j1kVy9AwGI95+57ucP3eFrBRl5Tg82aW2FUU1kHV82lCW\\nJTEl2nlHzhJItD5NDALNchbpyFiwRtYH7tzfwxQZ6wzaBKw1+CbgfYMi0cwjPoijNXlZDS/fN9JO\\nJ7LRGmmBTjlSj8dMJg3D4ZD5fEpps2wY8/2cTx+IwaPQgO7PbsS6mkFdgjL4bLD1gK5LjEYDtIa2\\nm/PB+7fJWQx3S+NlfvqX/tsfHKJRabBOY7TD+4izJagkyEGLGcRaCLGj6zJtkyjLkt2dXTY21mVI\\nSdOyt3vAm2+8gfcBawp+/MtfJOU5XScHyVjpsTfGknOBUlAUpUD8XNCGjNIKQknXZEDj3BBUgmzo\\nfCCEyKyb4L0n5ySQOoJSmaI0FGZA085QhaWsC375P/8rfPeVb/P1r73Evfu3KbQidGJimrWeFKAo\\nIKUJziWGVQnJYFDoIpBCizUKazRJKaxVlNaera+32hEY0KVICBG8ZntcYosdzHTCwsIRs6lmGjL7\\nD1eotQdjOHf5HM10ia9/7V/y+c9vUroZ8+NDaA8JsUK7NUJOHOzJVq5qOCKFlslELqcyCasHLKys\\nM0+GZ7/wH/H6q3/MZP8+FzYds3DI0srTHB3cYalYZHy+oI2JkDuCqdGhY+Qik90jkttgZdEyPQoo\\nU9ClKcFrkg6yHo+I6pOEBAGPVglb9L4Osjwjkzk4PmDr3EWyzkBi3nZU5UimXnUepR3aGHb39lDK\\nUFUDgpdy1HuPNuCsQ6GoqgFd1xGY0CmZrbl+YUCKIvuc9hgNBgqtl+QcqZIYPT52mMKdBS3QGJvw\\nnWhQpz+vNIq1JEklJkEd3/j6N7ly5QobGxsyQdppcobZbEpRWpTKaFWQU2I+a5g0EaMTly5ucnh4\\nTE4ehefi5RXmswnOWQ52/1/tR/+fr09FUIBM1yiKwpJzZHd3l7v3bnPp0kWGw1r6AyLkpHDOojXk\\nnFhbXySmOTloZrM5xmhyKlheWuXRR29CFGuxTxmlND1xj1IJWfiqzuBf8g3GOSDinMXajDEWrRUx\\neTJgneVgf0pdL1KWBccnh8IIG0PTdBQ9f2FNgVKSjY4ne1x/5DqPP/4o77//Hv/oH/x9Ys6EDIVR\\nJAUGRe0s2IzWHdYoNBpbKIwqSNngfaSsHc4YSmeYTyGFQLIQ0wyVQfvE0mhIClNKWxCCZ/bwGK3h\\n5mCRbvE2JxPD/bsDkt1gtKb41/6Nn6DxD9nfO6Q9OKYsPVlHltdqUlpEWYNyiv2jhsXFRUKKGGfw\\nsWNza4PllUWWL3+Wwbnn+dkXn2Wy/zYv/fZvoNohOU056e4ySbtc9TepS4dOM5S36IHG6gzJsvL0\\nE4TZMdrvo1WFsxZdZLKClAKFLsnEM+SnlEFph/dQFBUYmbPpk8dWBq0tRWFpuxmFqjHGYZSmtJq2\\n9UCiLgfUw0oI3FqcoFrLGvmUEjlBVglXWnInXJMzjradY53uSzw5UNkqklLy7LVBmUypDCp6cgqY\\npIjRo3ViWFnaxkNMlMagDBSFI3hBvkklPvfFF84QceWA5CArxislMTfkHDHGELxiXI9ZMgWFLXnn\\nnff48IP7xBhlJ2tuKQrL4uIiTn/yXZKfivLhkavb+W/96r/L/v5er/WuSs2fBH6Jkans59N5BoMa\\nlEynda4gxkxRFMxnLdZpMrL4NMaIpkBhBAqqhNElSnfEPMMoedgpt0wmc7QqyTkQvMBVZ0sODx/Q\\nzDtS0v3Pc7RzgbEL4yGuSH1g0VhTS2Ai9iO2E/P5HGs1zhmabo5zNaWruP3eO/zjf/D3GA40Xdsw\\nsJouKpKSpamgyMkQfUtOskTWVRlFolCG5C2gUSmjCo0nYAaa4OdcGGauPXGZspuSN17k4f4HDOfv\\ns2haklYU5z7DN19u+NwPXeDg8B43n3yavbtvY08S+5OMLiq0HfLRQUvpCmI7w1UGrQzWGRYGy8ym\\nhygtXZ2N11TVGinNWF6suXnzJuVQo7PlzvtvsLWi6SYPyUf7LK3OqN05UiGTrg9OCtriMo0fMm0y\\nTz33GLoraPMcpS0xZhyng7vkGcYYUebj55pUh1FOYL/y+HkUzVYlNCVZdaAM1jqCT/K/dEKlSBcT\\n1rk+ECRhvfuX95G6rsmpo23npASuNMS2E9SiHCmB1oms+/IjyHg9QM6oypAdkESyjh5jFCEkrC3Q\\nGUDOUF2XfXKRFXcxelIOaJ1JPZoIsaEsC5p5RqMJMWKsxtiM1pmcpfTQWs5hXdfiOO0CT//5v/qD\\nwyk8cnUr/09//RdlIKUXokZ4AghBGHKlM12bKMuKELzUVVZ2KQwGA5qmESIoBWKMxJh7M5LFd1Kf\\nuQLm847hcAjkXuqDGDLGwt27dxmPxzhXkJPpo3WQg5EzIXZioApyAKwtiDESQqDrmp7sbLC24GD/\\niKIocIXp62CDc4oueIwxzKcNiwsjfve3f4u3vvc9Rlbj4wRXQBcSXWPJBHIS+LkwLkiho6oKFBHn\\nSspCeh+UDlgbiGlINjMWBpGL157m4e4BG2tzFt0G02lGj2Hv4C71QPP+ezPOra5z9eolbr/zLbbL\\nyHrr8bHFWJgZQ2VK7GCAvfo0x3oVmwNGL1O6ltI7YuqwVU1hF9jzAZMThQEb77FW7xBJDIpFusZy\\n8HCfNH+d+XFiYWOdpSuX0Lki0JL3Wo6bkmL1GRafehESxBgwxpKiJvqAtRKUv/+5JqTPLhOw1pFj\\nIiclpQYJrRUpW2Jq0FkuakpiLHNFP+8wBE7vgFFaJowkaTTDZJGz46z/2ZEQEsO6ovMtRg1QSuOj\\nqCkAhVvoVSdNiHOMUXRtoigtyWuK0uCco+s6FJaUogxMyfFjvkxrvG9JWSZ36wzzrpUzZIuzr0sp\\nnXFgnZ8DwrHIK3E6LuX065/88i//4HAKKCXtnesrxBhJ0WC0I8ZASlPmTSsXzEkbsvADggasFaVA\\nayGGvI9iFNIaa4X5zSScK6iqsg8k8gBzViilcYV4Ac6fP4/3QTwGRAlQI9kZmFJCUeKcoaoz3nc0\\n7ZQUBrRty8J4iA8zitLQtQ1LyzKvUeZGykE2OlE7IaxWV1eZtzO+/LM/x8/8wl/m5Ze+yh9+5few\\nKjCsLEPniOqEqioYLS5RVZZRUbGze8j5C1fxYcKHd96hKgpy1vgWisIQs8JrjRnCQii58cgXubX7\\nPvM8Y9B1FGqTl772Kjoqnru2wpX6Pc49WVLqgnbvCBUj7Sxgs2Kpa7lzNGOlHFOFddwo0k46Ylzn\\n2M5BdxQZ9rNiobQUbqGfByFTfpwN5LDD/buHeOvo8pinnv9p3r39dZZSQidFOLzH7L7FXHqUyeKY\\nlQDJakpV43XGKgtIBp/NZj33ZAhZanNrLYqarhP7t9KgUKB6V2m/HEgbI1nXnpKIiZQksZxeMB+9\\nbICyBqUVSkeyihSmIiawTlErR4qBwWDEfNZKkI4Rrc1Z45vMzFUMSgnahbP4rqUsLUordvf2GI1G\\nQBJfiS1EBVMZ1AytM5UVriuGhNKGwlmKQgj0rgnY0lDVBSF2aBsZDEtCSP08Bo/S0r4cQgCVzkqd\\nT/L6dASFDKtryzTNDGP0mZqQku4lHY/Rwz4giI4fQybpQIwK54xkTCW1VsajlUEpqKqalGRLTtdm\\nMhlFgTGygEORIWtyjsQoH1xRVHRdgzYwnU4oy0rKhsaTEkxPMp2PjMdLzOMc6xQ5R0IH0XTUgwHW\\nGkLosC5hoiGliNIRpRIqQowdw6oWckkd8dkfeY5nXnwRTeJbX/8af/j7v8NooWYy65jNHlIWmV2z\\nQNd1LK1NyDEyKFZp2mNSjBT1gHqg8L5iZW0JW1Wsjta4fO063ra8efgaR9OG4YLm8194ivbQc29f\\nc+/BlEsXTxhpGA8dOi2wvLzBd/ZWON5cxmbDw13LlA+4Waxipm9Rvf51mgXQ0ZBQLJpAXFhn+/pj\\nRB0wxQDvM9YNieGENtSs2nOgJrzx7d/j3PUx3nYkbZhWV1HPvcjJTJNOFGFTk2eR4DwOTRc8thQO\\nqaoLrB7J3IDQEJMnBjAmorRGGXkOSsvejBwyRQkpK5QO6KzovBiwtMpivkJjezNciJ4UhVRWOgIZ\\njSXlLMEFRT4zq4MyDZnUb/lSaG1Ae8iZnISvUsoQYocx8p5iFGTrvcc5QTRWF71Zbo5iiNWWeTNF\\nKeG1sgJXFqAUrlJop/s+mtAnQEcKGWuknDFaeK6UA7YQX0fEf+Lr+KkoH25cXs+/9tf+Tcqy7r0K\\nLeQSrWTk1mCwgNElWpsz1jbnhHUVIQqKEFZXo00mJ4HrigRKaricpUY3uqDtBA4aYwFBHjlnQooU\\ntiKFgDIS7dv2Y1mpmQkCUToSUwtEfIfUbaGlLIZo4/FhjjGO1E+RyFnqVKX6QxY1xnU9zBSUUtc1\\nGd0fnMjq2jJ/7+/8Le599A7DskSnTNdByEm6QU9mTE8OSCpibKYsHZtri4xGFbvHh+SB4+KVK/hW\\nc/HSBT669Q5VVIS2I/gONdDMJ4d0nWdcDNFpyiPbiWFxTMoWY8foVBKsY5LXCO0KZmAw7BNPjjHd\\nnMJJiWeLDUJdMVWGNiYsR1xfcFjTolKLNSX37tynHAYWtq4R4iqvvjNmZWOb5fU1Prj3PhfPX+aD\\ntz7ihz7/BdBC4qUsvhQxUEViyJSV65+XPJecFcZY3n3vA9n3qZPYywGthEwOoUPpjFYWMKAMIGWX\\noDhDirknGQO6Pw+n56LzXp57/2etEpkoW8q8x9oSeiJUm3SGKpUSCdE5QbNGDdGml8CVyN3Ci+j+\\nfCTIJaggSURlwJBpCH3Zeep3sUZMfNYJYvZtd/YehZcoJIH23BpZcenFX/rB4RQevbad/+av/tu9\\nPgxaldDXgMpMe5LQ9jsWE20r7kdTSLavqgoVNVpbUK2QT0njbCG1dg/fU9QYXaB7V1zMR6Q0IPoC\\nZRrmbUvyAvm1grIse1iY+w5GL+VN7iTbaIn8SllScGjbkKLp91ZKsClK+Z2EMFWE2JBjQmuoqhEh\\nShZUSovm7VusyRLcMMzbhvX1dVQ0/De/8itsbi2h0wkxtD1UVGeDRrI+xqgRIXVcevw85y5d5ntv\\nvsegHFG4TG4brKtQrqCuVmhnUsdqvOj1HLA+rgCoh54VXVIUlv12ShvHjOo1yjILsRrBJE3btkTj\\naP0+ziZUsuKGXFggxswsDdAzi2ot3eIQpxVNmLOxusFsfoxSiZlXvPydN/mpn/w5bt640Ov6Ga1M\\nvwnJ97WyIsYO0KSUey4g9yVAxhZSanRdQKGlWxVIKaN6aVsuiVywGD1gRWnIWRBfTCglpYDW+WMZ\\nNAbhIxJUZUmMof9ZHcaqHm3mM5VEfCv06DCTUyEcg3YYU8h5zHOUrvqSI5NyxHeBstJn8iSAUbZv\\nmBJn7nQ6ZTQa9V6chNIdzXyOc+4sIQIUpUGpKL0znfnEQeHTUT4AKCUTepXCFvREjOb4eM7W1hbe\\nSyDIRELwlGUFRlNaR4y5tz7LB1sU9oxwCSGRxS4kRqHQYrKQiD5AM5+yMKxpo6auRwSbSKHDFKYf\\nkJJ64jKijejmOtW080BZ1mROsMahnSZl1R+2zGgkysZsfsRg6IixIXrb170BbR2d98R4yn8kctYU\\nRQ1kvG8xNmPNgPkskuKE/+qv/yqvv/Yd/tFv/BPGo0TQDWTXE2ZzXFERVEPSHlc2nBzf5Ue/+AwH\\nu1PeefMdbNA0s0Mh4pYT9bCmqhdYXLnA5vYGh/cP+Oi9NzHlgK3tG9x5cEA7bakHjjppDkKiay3D\\n4ZCsPU07JaiW1HUU9jqtziijSLmlaRw2izMz24CyBhs8WmkqLMoISTuZ7nD3wYQf/txneea5G4SZ\\nBARiQFtLSnNiTOQkWj19Q5AEUiVZPonEHGOEFEUxCfQdtRalEiFKJjZkjNKkkHFWJMlE//2yJqaE\\nUmKXPw0+Kcm2Lu8bschH8YSUVUFsG3QGhe0z9cdK1ynyUDrThURZadqmOQtwKUVS9GccWIqJuirx\\nYdIHI5HIU24ZDEqMkb0mg4EQ6loXgKLtPK4oiTFSFMUZypFzkUhd4M9y1T8VQeH04oGYOqazEwb1\\niJQyw+GC2JcTZMTtVZZ173yMpBTJOVKUQm5ZVfSasGI2P+k/pIC1YlbSRtHFgFWWFA2Fk5pP59R/\\n6BmtOWOBY/9vkbskaymyOCdVJkeDcDgJrSpiTNLw1O+eEOVCiFBlCozJoHL//SJVNZAsZeUBSk8G\\nGOOwVurb6CPaGI4mJ1x75DFuPH6dj959h4VhwWz2kJS0oCvlyVrRNhrFgLIY8+D+PrOpxxZD7t+9\\nT1UVjMdD9h4cM14oOMo7zGcdk+MpRRQFzbdTdu6/j7WgVSZMoCtrjHO4cITqfN8OHimddG7pXKJI\\npORx5YgcPUFBCgldluSYMLYjIW3rZlpy+96HTCe7OLfJha3zhKZFqQHOGFKIZ/BYqdT7AiLWOFCJ\\nnFN/+HWP1hRGWzrfoXVC9w1cpwy9ylnKAqQkkb+X+vPTt7vn0+fOGQoRxGDEqKZ79Np7FOTM9mej\\nz/an7wsVUNpJkEjSI5JizztA39Jv5X0Z4ZiU6i3ktSFJfJN+HqvQfSlyeke6rsMaIVaLouiDV+JU\\nVTtt2TfWYoyoHZ/09akICigNRvfaaqIwhdSTqod02ZMIfR3WP8So0dqgMBjjyDmcZQ6jC1Bie5YP\\nSv7pOpF4BEYmqqqQZa7RUxSOHCOEDjeoZJOR6QlNIlp5UrAoCmKaYUxJ5+dnh0speoLTit3aSObS\\n2pKz7eGhJyULCONtbdkrLMJC5xwp3KCvC5uelU5Y6wBNSB3tJPBzf/Ev0bTHgOb2e/f57d/8x8K9\\nOItRmi433L91RNvOMYMBs7n0SJx7ZMze7gn77S6PXr9CSi3NrOVof8bB3kdsbt4kFY7CGUylSCFT\\nlDWj0Yj9g4ei/5OY+ZZhPZDejtCSoiGqGaPRiLaNkFqUNcQY2J3exzWwMFzkww8/YrQw4PBwn7c/\\neIfsFcNRyZe+9GWWlpc5PJ5QO1EMTnsH2tZjtDtjz9tOpDdjoNcPBUGQ6NoGMP3l9j3rnvvSTQK6\\nMZoUPyYMFeJfEHlazkfKHaHvdBXEIGjUGEVKnpg6XGn75CNkJepUSlTkpHulrOt5CCQoJIfC9gmh\\nPzPR0nlFzhbTS6DRZ0FLgNKK6EuSUn1gksA1m82pa01VDog5M5vOODk5IqaWuq5YXFwkJ0hdQhvF\\nn4Um+FQEBYXq9VbfS4uamL8vgidFjBlrNers0plejZBMnVJGayGWlDJC6p3pvgpI/d+z+Bg/vsgZ\\nco60baJ0BcPhkGkrphOJ/ApyIqQsBzGCVtJ/oPpDKTVwABQY8RWcZoKun8Z8GrROD/fpYTn9s0R3\\nzqJ8UdoeeqqeiBL4fFoLV5V4M5589gZPPPGf8bu/+wd866Wv0HVTYooc7O3iY8H1Jx7DFRXGOHZ3\\nDhkOh3RN4M79e1y/fpWDgwN29x/gDOzsHbK8ssZgXLF7mFhd3aRIknWXl9YJscOamt3dh5wczxgM\\nSyaTE+rBIsO6JsQGVyhCTFhlcYVle3ODBw/v8MEHH3BweJ979wMZT2HWZCBOqzl3foumm0uLZ1KE\\nJIGmjY1k1zKekbWSvTMpGbngKZ71Fpx+5uIpOe1h6DOoyuSUPpbtAmgjiSOfln0pSnBAhtXElAUN\\nqHQ2McuaAtMno9SfMUEs8n201iTSWcfuKZKR2cXq+0qeTFYZVO9kTFnUBHX6/eRrRfMQYjTnTEyC\\nOhcWFs9K5Jwzq6urDIfDXtYMzGZTqrIEPvYpfNLXpyIooOgNHRGVVL8gNfZ1o8DtopBf7vSDkExb\\nEXpfOTn3F12yjEBADdqQc0fbthS2xnc91LSWFORBkaGqCpqmwSVLIlOqAt/4Hv5pIf3mU4qiEsKp\\nt5rmnMQL0WeD4DPWOpSRA5WJ+CBkk3XQeYU1rtfJxT+RUsQYS9u2qJ6QzDnhwxytLLk3ZAmHEdBa\\nMZ9liqLEx0AXpnzuJz7DY5+5zh995au8+tJ3aGeGZhq4cOGYckFTFRU3rn2WxYUxh0e7WKuZTGYs\\nrm7RNB2HBxMO9g+5//Ae4+UFLl68wutvfJ2T4yndLLOxsU30nvHigOnsmIxcOG1AqQWMHnEy2cWH\\nBoPm+GB6ZhhbWV1CK8d0NkHnhA8N9dIy49GQ0DlO9g+pRwskFcVpmA1Ky08QTkAuWgwfZ7tTL4mY\\nj9KZizDGQAj0HgSx9jZN2yeMPujqhHXy/js/72Xs/+depNOEkHPGkM6kP60TwZt+jHtCaYCi9wRE\\njAVjIzkHUi5I0fcBQVxTZwEhaxQFvXdVpPGkyLYhJYNWRX+OHTE3GO1AZ5FAFcSgMCkTsqCRtpuS\\nSShOFZl+c7kxhOTp3+gnen0qgkJKCd9EsTJnuQCn0V0pGXOee3ZXJJ4+sqZTmSahc+6zQcBY27sh\\no8hFyeAKQ+hmZ7ZoUkmKEo01InNWRUkiY9C9XbrP6lksqroc9maQQMoalQua+ZSyLBHhJGFMIdZm\\np3qbdolxUsaEYEEdn0V/rTNdK+aSTMBYfcaRGJtkcIpyKCV19SnjHoO0haOg6xTWllhKVpcdf+Ev\\n/Dw//wt/keW1RXbvP+B//h/+e0b1MZPpHk9//jzXbzzBxvYKX/nKV2iahsIN2d5cYbiwyOLKhJCO\\nIDu6NrGyWrO5MaCqSlJSNI2HNGdheQnfgbLSZKa0Y2m8wXvvHlGXNbaosFdLTAmdDywtVZwcz5mH\\nKScHDcPBeTa2H2UymbG2vUbShqaVZ5lsIuUOa6te9u27ZLXpPSvfH4w0ZVn0ZZr5mAcKGedOreJK\\nhtBYQZMhtH1tLm5GQRtiR6/rEik/ChIdORUoHcip6Hta4NXvvsZjjz0iXY1tIIaILXJf/0eIBcnL\\nCICcujN0GkJ3xvukLCVQzvQlUitIOUViK6WOlMuqRwcyIOX0d0sKIKOsJich0olWAlYSHq1Ulhg1\\nSglSKlz5ie/jpyIoKAUpKlLMKGXJyqCUDEtxhTxoZ6t+xoLqoaRGKSkDnDPEJKONjK7wXSL05UJK\\niRQCJkvtKOWAwDFXZmL02MIRg8BWeT+KGG2PBAQRKBLeG0xhQXlCADKYsuRk3lK4irKsiMpgqyEh\\nNqAtXcqESYftB7cqo4hevAWoCEoeQVaZ0tneQu1IqRMjjukNT1lTliUhfMy9aK1QVOQkcNP7OfO5\\nyFWT40OKwvKf/hf/JYSSW7du8b/+/b/LG698HeMann32PPX2FqOFZT64911SgsF4SFEaqqpg/2HL\\n4uJ5mrnn/oNjUpwzGIxwzrK3f9zzMlKLj8dD5k0kozk8mZFCS/TSSXrz8cdpphW+hcWFCwxKuHD+\\nMj/9Mz9PyoHJ5BjrkBrbGJFscyaSiCmRVewb0/TZJQZplFLKCkIzghAVqe8FEEdsiJ1kSCWcwCky\\nO31ZaynLgrZtGQwqIe/sKenYz9/Igj72D/YZLVRcuLjKP/+D3+LcuQs89eyLaK3puthDftu3cxdw\\nVj5kUhYPQY4Reo+CMRaFo209MfSIxDhCMBSlBiIxddLGbysUticLhR+zVt63MWKSK5whZSlFmnkH\\nGKxVdG34+Nx/wtenIijIS6CYaM4zUg4MRo4Upf88xPasfDhlm3PWYlTJ4j3LOWJLdSYFpRTRfFyz\\ngzRXCRTth6JmQ4yaTNOTShadFeH0QeL7UWka3zeaoD0C+iJOG+ygRmnQuiNGsaYa0+voGVyvOFiD\\nOPR00Y94LsA0oqTkmpPJjNIV4nYrDJkCH3ofQ1KkNCclj9YOY2N/mC2+C332M1TlmBQ1k5OGurKc\\nzOak0LC8scJ//J/8Fbo2U5Y1bXfM8dERe7sH3Hr12zx48ICtzQUWV2Vm4nhxhApzHLA2HjGda3w3\\npWvnzE4M1loWFgMrK2PIHX6+x/mtRebzAZNpZOdeoC43GVXXeOapZ7hy9Tx7B4cyKCZnDo/20Zq+\\nt0DmZQTfygSt6MnaEEkY1YGScgsgRoVSfB8M79uItXwPtEMrReendH5KWYzP+AcQTiEGcKVIek3T\\nimlIybnR2vTKkaKZ7fHyv/wO58+t89hjj7H3cI/jowlP3Hiab3/7ZS5efIKqqsla989fE3yWtndz\\nimalDDk+PmBpYQOl5X2TDUbbM/nTWkvnG0yRpAiwFt9KKY3qyFmLCY9TxBB7i79HqYzvIikHVO/Y\\nhD6wFeKm/TOsjf10BAUxjoirLHiJuGVZy0LQojqDT5IN49nKuJRERvTeU5TygXnfkoLGODG+nDZO\\nycuIqUWDQpMS6BzpukaMHjoTvBdXWlScdVr25iAZ8qJQWqOdISdFCEiA6c32Snlc0asNScoJraSD\\nLUTFoB4Rg6gPIWqsAUjCeZQWpS2lKWT7jhL5FBTOZPldnLSAhxDw3sswlKqQ/56DQFFgMBzRNlNS\\nqlAEmiYzXl1gUA2YN556sWS8eonljS0uXbnI6uoGxsw43N/ncO+E2x++zdHxMdjMu+/dYmXlCgdH\\nd2jbQF2tMIstg/eZO/EAAAz6SURBVHqTe7c6RoNNlpcXefrJG5RFxfb5C6QQmc728UGWwt66fb/n\\nfIRsHQyke6/rxHzWdV0vKzpyjqTsUSRiymgjBLOclYAPnZDOuDOfQQwzqbtz2dt/NVVVQRIJ8Kxn\\nxmis1cTcEJMiRYcrNCF4SCVKN2iXeHB7j4sXtvnRH/0iTXuAMrCzv4N1itFwlec++yIpdNy+vcv5\\nyxeBfBaYfZDdH2SDUoKmFscrdN2sHx4kz7L1x+QknMlkMqMorEwFyw7fLwTxfk7KgeFggO+ArKXD\\n0qmzUjpGj7aKwhRnXIiYoxq0EumcNPjE9/FT4Wi8eWUz/9pf+yVAAoTCgepQWlahpdOtc0q0VquH\\neH+qLnQoHXpZUvVEpHS+nbaPnh2eJF93xvhHYZ21UcTYSpDoDSinaoX30p12am82RsvD4dTq+nEz\\ni9EFysyJyaNVgVKn03tOgwSovqQR2BewWmOtZTQasLN7j0G1DGiy6Ug+YWwSJjwYlPZnn1HoPMb0\\nZU8IPUxtKYqSmJPUla6imXcUtiQrhSliL38WpNxgnaE0lRigtNh7FYama6WnwAoyKyqPVVVvGXf4\\nLskoOzqquqBpJhweHmOMaOPjoezvcH1LsnzmgXpQ0Dae2WxOWUq3nzHmDAEqpfjGN75BMz/hx7/4\\n06DmGF0RpU3h+w587pWaU8kwQba9IW3OfD5nNBoDohQ554hpTvBgTAHKAxGyI+aZEH4aQJP6YC4W\\naiO7OWho5x2FG6Bzg7Zynk5OphJ8hoOPz1sWpet0tOBpEDTGoEKB9x1FKd232mrI5ZlrMsZISC1K\\nnyIgi1KOGFtkxzPizcmht2zrPlF6tAoo5HcT01M6W3GQs4JUcuWHf/EHx9F42v4pJUHE9AsNYoCQ\\nO6wz/RQmGYTRNJ2UGXmO7mcPkI0c1tRRD+wZXEwpkLMRbTqbM796UTpAHJA5yu4Fci0TjrInhNi/\\nrwwE6npISsJ7aBNEAlWiSojvPYj2HWsUJX3lIGgj+f7gZZpuKlxHkEMeoxyigwOPs3UvhQZC2zPq\\nqZXyIRuSVx8bWOyAEOTwFKU484piTEwBp1N/ILMgCB3RGEiJHCNGBwlaURGSl7LHy9guGVpryE5B\\ndqTcETtDxONDoK60XECTOT6e4X2irmoGNRhdAdKdenx8jDWGsnb4FFFR0flISJ5qKMYzyL2CICRi\\nDJEXXniBlBTogLEFOlu876Rk1JmcxA6ekvQFfNz+nDBImVDXskGMXjbMOfYbvvtxZrSi4mCkezIF\\nlO4lu1TJYJsoqE8GeFbUoyGg0akAncgpoJyiHtSk+LG0mekgW8py2A/1EZJYA6lMoObcv3cPowwb\\nm+fpkkcp2/cq9GRqaCTw6Sz7SJPCGCkBfCdyLCbirBXFrBAvhUITUkOOIqkGz5lUaszHjVz/f69P\\nRVDIyIMTll2cfqlvVNFaanThGgI5S70VU8dp63hMnphzf6BL6Tbr5zvmLHKl0kr8EGSK0mGtJvR8\\ng7WaGNWZA67rPMYqWRZTybJa8cknGXmmYk9WKrIRd51WMnsxqygHBkEQrZeOTVBoI++vbU7JS2jm\\nLc7pPsv1xJJRFEUlpZSB6CO+DQwGNUVhewVEzDQose86a/sZBEqmVPVyXcgRldX/3d7ZhNp1VXH8\\nt84+H/fl5ZmYJpRQQz+kiMWBPiRNpBZHaiOSOuvIDgQnCjpwEOmkUwUdCCIoFqqIHViLHSj4QcGB\\nNLZKmraWtLEWbUia9PN93HvPOfvs5WDte3NfyEuCfe+dK+wfXO65597H+d913l1n77XXWoeyWCBE\\nhxK6bhqwU6zHxK6F3bx16d3pXbszycmLdcoK2sZGPEVesL6+Ria78N2IpqnJc8do5AlNzcrKiPXV\\nIfsP7KHIK9beWWOJAU07sszSbkCZD8icEkJjGYKZ2BVcMqpBie/GFrCjpW1yQhjHgjaLC41HYxZ2\\nVXHpOU7BSkG0IsT6lKqqcOSW4qtjRmPBZZV9W7Eqxqr4AJks0AWrdPRNa6XIwZxp19UUuU1xJBOr\\nnsTjuwbNbNhelrbyURa78D7Gm9QyYn3bIc7OZ+5KVlbXWVxcIm+F3/7qCY7cfZQLr17go0fvBjLq\\nxrMwGDCuxxBzGQB8NzSnFhOh8mKBaXxAvOWF+NriC77B5TkaHBoKMuenuRptc+OBxhuPPmwrMk0C\\nsamCXXXNUQiT+x4A0+HixCFMglVF4ajij72qSvLcWVs1V8ZeDM6Cf/iYmebj38p0iFXk1uugLAfT\\nTryAFVyJJZfYMR1FUZE7K5gSsXXuroOm7uh8Rtsqvs3QcDnBpOs6xuOGlZWVeLIczlXTKUgmlpDV\\nth3r6yO6mKiTZRlltTDVm7mAaoviqSqLVrtcUW1p29quWEGQGMBSVep6RJbnMzn9lgfRhYaiBB9W\\nWVxcjE1iMjIXAMdo1MagXof39fS7AFYDoUrrvS1PSsaePXuo6xFvXXqTejTm7YuXkMaz/u4KEjra\\npib4zrocBUU0Y1BWoDGeMy1cykEsMzTLrOeFc46yLGOAraYo3DTuY9+LGFe4nMwUgkXkRSYVjG2c\\nh1vHrS6MALsq13Udl8Itwm9L4zodnQHTzt1NU08d6zjeU8GOaT/gzIlVyWoWYyi7EJczWNjL4aOf\\n4fCnPk2TX05bVpW4NGkjg85PmgAFxGXWmk48HSO8DuP0qYuJUUoX2mkDoMtZvDrNh0H+30YKiv2A\\nZvrwWfVXSzapGFQbPql2dDqOc3ZH29qqgg9KluU0XRd7Jjh8Ay5XG3VkxJiDiznrlugRCOQ5ONmN\\nypAs96A5IUyWBlvG9dDWm7MixgSYXlFCN4pzwpzcZRCvGN53CKZp4uCCNpQV7Nu/SCzmIC+ULGvj\\n3HDS+AWr9gwdQWMfwW5MJ9ZtuKoK8pgqPRqvg2a4bIC4yVzTo8FRZAOC78izYFOYYL0eQ2f3wxgN\\n7QeSZzbvLkrIEVq/ztpqTZ4txo5WHper3akrc+Qup5ABa2vvsbi4m3oYKJaUfQcK1IH3jtNPPcPS\\n0hK33XqI4UjogkdW32O4bi3XEas4LYqCxb17Y83LiJv276Xz0DZDW9rD4X3HpFJ1kvfvXMGkWQoC\\nF//9Oif/8jRfuP+LrA5XqapYU1IobTu20mVnU5vxuEWkoW6gKBzaCb7rYjbmmKZtrXVbGMeR2IgQ\\nBgSUYdNSZSVFUVl3JgLOQdAxVsNjdSxdGOJjPY/3MR8mDFkTx0eWl3mn8Xxs+YhNoIJlTaq01tND\\nLV6QZTnej2nCGs5V09hVnhfUtY+5Khm52w0oRbka4xBCpys0tcQUb0XcjWc0zkWgUUQuAevAm31r\\nmWE/Sc/1mDdNSc+1uVVVD1zvQ3PhFABE5NkbiYzuFEnP9Zk3TUnP1jAnMYVEIjEvJKeQSCQ2ME9O\\n4cd9C7iCpOf6zJumpGcLmJuYQiKRmA/maaSQSCTmgN6dgoh8XkTOiMhZETnRk4bXROR5ETklIs/G\\nfftE5A8i8kp8/uA2a3hERC6KyAsz+66qQYwfRJudFpHlHdLzsIici3Y6JSLHZt77dtRzRkQ+tw16\\nDonIUyLyDxF5UUS+Eff3aaPNNPVmpy1hkhTSxwNwwD+BO4ASeA64qwcdrwH7r9j3XeBE3D4BfGeb\\nNdwLLAMvXE8DcAz4HSDAEeDkDul5GPjWVT57Vzx3FXB7PKdui/UcBJbj9hLwcjxunzbaTFNvdtqK\\nR98jhcPAWVV9VVUb4DHgeM+aJhwHHo3bjwL3b+fBVPXPwNs3qOE48DM1ngb2isjBHdCzGceBx1S1\\nVtV/AWexc7uVes6r6t/j9irwEnAL/dpoM02bse122gr6dgq3AP+Zef061zbqdqHA70XkbyLy1bjv\\nZlU9H7cvADf3oGszDX3a7etxOP7IzJRqR/WIyG3AJ4CTzImNrtAEc2Cn/5W+ncK8cI+qLgP3AV8T\\nkXtn31Qb+/W6TDMPGoAfAR8GPg6cB7630wJEZDfwOPBNVV2Zfa8vG11FU+92ej/07RTOAYdmXn8o\\n7ttRVPVcfL4IPIEN6d6YDDfj88Wd1nUNDb3YTVXfUNVOrTfaT7g89N0RPWKdRh4HfqGqv467e7XR\\n1TT1baf3S99O4RngThG5XURK4AHgyZ0UICKLIrI02QY+C7wQdTwYP/Yg8Jud1BXZTMOTwJdjhP0I\\n8N7MEHrbuGJO/iXMThM9D4hIJSK3A3cCf93iYwvwU+AlVf3+zFu92WgzTX3aaUvoO9KJRYlfxiKx\\nD/Vw/DuwiPBzwIsTDcBNwJ+AV4A/Avu2WccvsaFmi801v7KZBiyi/sNos+eBT+6Qnp/H453G/sEP\\nznz+oajnDHDfNui5B5sanAZOxcexnm20mabe7LQVj5TRmEgkNtD39CGRSMwZySkkEokNJKeQSCQ2\\nkJxCIpHYQHIKiURiA8kpJBKJDSSnkEgkNpCcQiKR2MB/AUsvfSqnmTG0AAAAAElFTkSuQmCC\\n\",\n      \"text/plain\": [\n       \"<Figure size 432x288 with 1 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"tags\": []\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"from matplotlib.pyplot import imshow\\n\",\n    \"co = repo.checkout()\\n\",\n    \"image_column = co.columns['images']\\n\",\n    \"dataset = make_tensorflow_dataset(image_column)\\n\",\n    \"for image in dataset:\\n\",\n    \"    imshow(image[0].numpy())\\n\",\n    \"    break\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"FTArZhtZfg7S\"\n   },\n   \"source\": [\n    \"### New columns\\n\",\n    \"\\n\",\n    \"For our example, we would need two columns. One for the image and another one for captions. Let's wipe our existing repository (`remove_old` argument in `repo.init` does this) and create these columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"ISMdkXtYHg2c\",\n    \"outputId\": \"0e18d9d3-1a4f-4f75-d388-c8c0c316f69b\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: hangar_repo/.hangar\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo = Repository(repo_path)\\n\",\n    \"repo.init(user_name=username, user_email=email, remove_old=True)\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"\\n\",\n    \"images_column = co.add_ndarray_column('images', shape=img_shape, dtype=np.uint8)\\n\",\n    \"captions_column = co.add_ndarray_column('captions', shape=(60,), dtype=np.float, variable_shape=True)\\n\",\n    \"co.commit('column init')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"z_fUUIpKCMen\"\n   },\n   \"source\": [\n    \"### Store image and captions to Hangar repo\\n\",\n    \"Each image will be converted to RGB channels with dtype `uint8`. Each caption will be prepended with `START` token and ended with `END` token before converting them to floats. We have another preprocessing stage for images later.\\n\",\n    \"\\n\",\n    \"We'll start with loading the caption file:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"VlX-su-gCMep\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"annotation_file = 'annotations/captions_train2014.json'\\n\",\n    \"with open(annotation_file, 'r') as f:\\n\",\n    \"    annotations = json.load(f)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"UMcYzkWgCMes\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import spacy\\n\",\n    \"# if you have installed spacy and the model in the same notebook session, you might need to restart the runtime to get it into the scope\\n\",\n    \"nlp = spacy.load('en_core_web_md')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"wxpbxEvmCMev\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"def sent2index(sent):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Convert sentence to an array of indices using SpaCy\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    ids = []\\n\",\n    \"    doc = nlp(sent)\\n\",\n    \"    for token in doc:\\n\",\n    \"        if token.has_vector:\\n\",\n    \"            id = nlp.vocab.vectors.key2row[token.norm]\\n\",\n    \"        else:\\n\",\n    \"            id = sent2index('UNK')[0]\\n\",\n    \"        ids.append(id)\\n\",\n    \"    return ids\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"RIvqFIHUCMey\"\n   },\n   \"source\": [\n    \"### Save the data to Hangar\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"__I8ntp3CMez\",\n    \"outputId\": \"287685d1-2e7c-4d3f-94b7-87db73f966e3\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 414113/414113 [00:03<00:00, 122039.19it/s]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"import os\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"all_captions = []\\n\",\n    \"all_img_name_vector = []\\n\",\n    \"limit = 100  # if you are not planning to save the whole dataset to Hangar. Zero means whole dataset\\n\",\n    \"\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"images_column = co.columns['images']\\n\",\n    \"captions_column = co.columns['captions']\\n\",\n    \"all_files = set(os.listdir(image_dir))\\n\",\n    \"i = 0\\n\",\n    \"with images_column, captions_column:\\n\",\n    \"    for annot in tqdm(annotations['annotations']):\\n\",\n    \"        if limit and i > limit:\\n\",\n    \"            continue\\n\",\n    \"        image_id = annot['image_id']\\n\",\n    \"        assumed_image_paths = 'COCO_train2014_' + '%012d.jpg' % (image_id)\\n\",\n    \"        if assumed_image_paths not in all_files:\\n\",\n    \"            continue\\n\",\n    \"        img_path = os.path.join(image_dir, assumed_image_paths)\\n\",\n    \"        img = Image.open(img_path)\\n\",\n    \"        if img.mode == 'L':\\n\",\n    \"            img = img.convert('RGB')\\n\",\n    \"        img = img.resize(img_shape[:-1])\\n\",\n    \"        img = np.array(img)\\n\",\n    \"        cap = sent2index('sos ' + annot['caption'] + ' eos')\\n\",\n    \"        cap = np.array(cap, dtype=np.float)\\n\",\n    \"        key = images_column.append(img)\\n\",\n    \"        captions_column[key] = cap\\n\",\n    \"        if i % 1000 == 0 and i != 0:\\n\",\n    \"            if co.diff.status() == 'DIRTY':\\n\",\n    \"                co.commit(f'Added batch {i}')\\n\",\n    \"        i += 1\\n\",\n    \"co.commit('Added full data')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"gXvSa2iCCMe2\"\n   },\n   \"source\": [\n    \"### Preprocess Images\\n\",\n    \"\\n\",\n    \"Our image captioning network requires a pre-processed input. We use transfer learning for this with a pretrained InceptionV3 network which is available in Keras. But we have a problem. Preprocessing is costly and we don't want to do it all the time. Since Hangar is flexible enough to create multiple columns and let you call the group of column as a `dataset`, it is quite easy to do make a new column for the processed image and we don't have to do the preprocessing online but keep a preprocessed image in the new column in the same repository with the same key. Which means, we have three columns in our repository (all three has different samples with the same name):\\n\",\n    \"- `images`\\n\",\n    \"- `captions`\\n\",\n    \"- `processed_images`\\n\",\n    \"\\n\",\n    \"Although we need only the `processed_images` for the network, we still keep the bare image in the repository in case we need to look into it later or if we decided to do some other preprocessing instead of InceptionV3 (it is always advised to keep the source truth with you).\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"QBGCS_ceCMe2\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import tensorflow as tf\\n\",\n    \"tf.compat.v1.enable_eager_execution()\\n\",\n    \"image_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')\\n\",\n    \"new_input = image_model.input\\n\",\n    \"hidden_layer = image_model.layers[-1].output\\n\",\n    \"image_features_extract_model = tf.keras.Model(new_input, hidden_layer)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def process_image(img):\\n\",\n    \"    img = tf.keras.applications.inception_v3.preprocess_input(img)\\n\",\n    \"    img = np.expand_dims(img, axis=0)\\n\",\n    \"    img = image_features_extract_model(img)\\n\",\n    \"    return tf.reshape(img, (-1, img.shape[3]))\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"ANFPvYByCMe5\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"repo_path = 'hangar_repo'\\n\",\n    \"\\n\",\n    \"repo = Repository(repo_path)\\n\",\n    \"co = repo.checkout(write=True)\\n\",\n    \"images = co.columns['images']\\n\",\n    \"sample_name = list(images.keys())[0]\\n\",\n    \"prototype = process_image(images[sample_name]).numpy()\\n\",\n    \"pimages = co.add_ndarray_column('processed_images', prototype=prototype)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"jWN6AxiHCMe7\"\n   },\n   \"source\": [\n    \"#### Saving the pre-processed images to the new column\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"HdFxmi5ECMe8\",\n    \"outputId\": \"38dddea0-64f8-47cf-fc9d-6b14a6140135\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 101/101 [00:11<00:00,  8.44it/s]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"with pimages:\\n\",\n    \"    for key in tqdm(images):\\n\",\n    \"        pimages[key] = process_image(images[key]).numpy()\\n\",\n    \"\\n\",\n    \"co.commit('processed image saved')\\n\",\n    \"co.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"zacZutpTCMe_\"\n   },\n   \"source\": [\n    \"### Dataloaders for training\\n\",\n    \"We are using Tensorflow to build the network but how do we load this data from Hangar repository to Tensorflow?\\n\",\n    \"\\n\",\n    \"A naive option would be to run through the samples and load the numpy arrays and pass that to the `sess.run` of Tensorflow. But that would be quite inefficient. Tensorflow uses multiple threads to load the data in memory and its dataloaders can prefetch the data before-hand so that your training loop doesn't get blocked while loading the data. Also, Tensoflow dataloaders brings batching, shuffling, etc. to the table prebuilt. That's cool but how to load data from Hangar to Tensorflow using TF dataset? Well, we have `make_tensorflow_dataset` which accepts the list of columns as a parameter and returns a TF dataset object.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"gcKsE3d4CMfA\",\n    \"outputId\": \"a42c5c84-e62f-4178-cc3a-175dac08aa7c\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" * Checking out BRANCH: master with current HEAD: 3cbb3fbe7eb0e056ff97e75f41d26303916ef686\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"from hangar.dataset import make_tensorflow_dataset\\n\",\n    \"co = repo.checkout()  # we don't need write checkout here\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 105\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"TybRGUGaCMfC\",\n    \"outputId\": \"8e75b46d-f8da-4dd3-c607-1174b23a15a0\"\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"<class 'hangar.columns.arrayset.ArraysetDataReader'>(repo_pth=hangar_repo/.hangar, aset_name=processed_images, default_schema_hash=f230548212ab, isVar=False, varMaxShape=(64, 2048), varDtypeNum=11, mode=r)\\n\",\n      \"<class 'hangar.columns.arrayset.ArraysetDataReader'>(repo_pth=hangar_repo/.hangar, aset_name=captions, default_schema_hash=4d60751421d5, isVar=True, varMaxShape=(60,), varDtypeNum=12, mode=r)\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/usr/local/lib/python3.6/dist-packages/hangar/dataloaders/tfloader.py:88: UserWarning: Dataloaders are experimental in the current release.\\n\",\n      \"  warnings.warn(\\\"Dataloaders are experimental in the current release.\\\", UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"BATCH_SIZE = 1\\n\",\n    \"EPOCHS = 2\\n\",\n    \"embedding_dim = 256\\n\",\n    \"units = 512\\n\",\n    \"vocab_size = len(nlp.vocab.vectors.key2row)\\n\",\n    \"num_steps = 50\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"captions_dset = co.columns['captions']\\n\",\n    \"pimages_dset = co.columns['processed_images']\\n\",\n    \"\\n\",\n    \"dataset = make_tensorflow_dataset([pimages_dset, captions_dset], shuffle=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"27mQc673CMfF\"\n   },\n   \"source\": [\n    \"### Padded Batching\\n\",\n    \"\\n\",\n    \"Batching needs a bit more explanation here since the dataset does not just consist of fixed shaped data. We have two dataset in which one is for captions. As you know captions are sequences which can be variably shaped. So instead of using `dataset.batch` we need to use `dataset.padded_batch` which takes care of padding the tensors with the longest value in each dimension for each batch. This `padded_batch` needs the shape by which the user needs the batch to be padded. Unless you need customization, you can use the shape stored in the `dataset` object by `make_tensorflow_dataset` function.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 34\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"8tpHg3w2CMfF\",\n    \"outputId\": \"e2145382-c73b-4acf-9076-40ff64554ade\"\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"(TensorShape([Dimension(64), Dimension(2048)]), TensorShape([Dimension(None)]))\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {\n      \"tags\": []\n     },\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"output_shapes = tf.compat.v1.data.get_output_shapes(dataset)\\n\",\n    \"output_shapes\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"imMQrtn7CMfI\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"dataset = dataset.padded_batch(BATCH_SIZE, padded_shapes=output_shapes)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"tY6Z7y8TCMfO\"\n   },\n   \"source\": [\n    \"### Build the network\\n\",\n    \"\\n\",\n    \"Since we have the dataloaders ready, we can now build the network for image captioning and start training. Rest of this tutorial is a copy of an official Tensorflow tutorial which is available at https://tensorflow.org/beta/tutorials/text/image_captioning. The content of Tensorflow tutorial page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License.\\n\",\n    \"Access date: Aug 20 2019\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"In this example, you extract the features from the lower convolutional layer of InceptionV3 giving us a vector of shape (8, 8, 2048) and quash that to a shape of (64, 2048). We have stored the result of this already to our Hangar repo. This vector is then passed through the CNN Encoder (which consists of a single Fully connected layer). The RNN (here GRU) attends over the image to predict the next word.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"6Kc-yZ0iCMfO\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"class BahdanauAttention(tf.keras.Model):\\n\",\n    \"    def __init__(self, units):\\n\",\n    \"        super(BahdanauAttention, self).__init__()\\n\",\n    \"        self.W1 = tf.keras.layers.Dense(units)\\n\",\n    \"        self.W2 = tf.keras.layers.Dense(units)\\n\",\n    \"        self.V = tf.keras.layers.Dense(1)\\n\",\n    \"\\n\",\n    \"    def call(self, features, hidden):\\n\",\n    \"        # features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)\\n\",\n    \"        # hidden shape == (batch_size, hidden_size)\\n\",\n    \"        # hidden_with_time_axis shape == (batch_size, 1, hidden_size)\\n\",\n    \"        hidden_with_time_axis = tf.expand_dims(hidden, 1)\\n\",\n    \"        # score shape == (batch_size, 64, hidden_size)\\n\",\n    \"        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))\\n\",\n    \"        # attention_weights shape == (batch_size, 64, 1)\\n\",\n    \"        # you get 1 at the last axis because you are applying score to self.V\\n\",\n    \"        attention_weights = tf.nn.softmax(self.V(score), axis=1)\\n\",\n    \"        # context_vector shape after sum == (batch_size, hidden_size)\\n\",\n    \"        context_vector = attention_weights * features\\n\",\n    \"        context_vector = tf.reduce_sum(context_vector, axis=1)\\n\",\n    \"\\n\",\n    \"        return context_vector, attention_weights\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"up0nVnIZO2_c\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"class CNN_Encoder(tf.keras.Model):\\n\",\n    \"    # Since you have already extracted the features and dumped it using pickle\\n\",\n    \"    # This encoder passes those features through a Fully connected layer\\n\",\n    \"    def __init__(self, embedding_dim):\\n\",\n    \"        super(CNN_Encoder, self).__init__()\\n\",\n    \"        # shape after fc == (batch_size, 64, embedding_dim)\\n\",\n    \"        self.fc = tf.keras.layers.Dense(embedding_dim)\\n\",\n    \"\\n\",\n    \"    def call(self, x):\\n\",\n    \"        x = self.fc(x)\\n\",\n    \"        x = tf.nn.relu(x)\\n\",\n    \"        return x\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"4qAEbanRO77k\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"class RNN_Decoder(tf.keras.Model):\\n\",\n    \"    def __init__(self, embedding_dim, units, vocab_size):\\n\",\n    \"        super(RNN_Decoder, self).__init__()\\n\",\n    \"        self.units = units\\n\",\n    \"        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)\\n\",\n    \"        self.gru = tf.keras.layers.GRU(self.units,\\n\",\n    \"                                       return_sequences=True,\\n\",\n    \"                                       return_state=True,\\n\",\n    \"                                       recurrent_initializer='glorot_uniform')\\n\",\n    \"        self.fc1 = tf.keras.layers.Dense(self.units)\\n\",\n    \"        self.fc2 = tf.keras.layers.Dense(vocab_size)\\n\",\n    \"        self.attention = BahdanauAttention(self.units)\\n\",\n    \"\\n\",\n    \"    def call(self, x, features, hidden):\\n\",\n    \"        # defining attention as a separate model\\n\",\n    \"        context_vector, attention_weights = self.attention(features, hidden)\\n\",\n    \"        # x shape after passing through embedding == (batch_size, 1, embedding_dim)\\n\",\n    \"        x = self.embedding(x)\\n\",\n    \"        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)\\n\",\n    \"        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)\\n\",\n    \"        # passing the concatenated vector to the GRU\\n\",\n    \"        output, state = self.gru(x)\\n\",\n    \"        # shape == (batch_size, max_length, hidden_size)\\n\",\n    \"        x = self.fc1(output)\\n\",\n    \"        # x shape == (batch_size * max_length, hidden_size)\\n\",\n    \"        x = tf.reshape(x, (-1, x.shape[2]))\\n\",\n    \"        # output shape == (batch_size * max_length, vocab)\\n\",\n    \"        x = self.fc2(x)\\n\",\n    \"        return x, state, attention_weights\\n\",\n    \"\\n\",\n    \"    def reset_state(self, batch_size):\\n\",\n    \"        return tf.zeros((batch_size, self.units))\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"9ZlfcS5VO_yA\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"def loss_function(real, pred):\\n\",\n    \"    mask = tf.math.logical_not(tf.math.equal(real, 0))\\n\",\n    \"    loss_ = loss_object(real, pred)\\n\",\n    \"    mask = tf.cast(mask, dtype=loss_.dtype)\\n\",\n    \"    loss_ *= mask\\n\",\n    \"    return tf.reduce_mean(loss_)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"s5kEPFlZCMfR\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"@tf.function\\n\",\n    \"def train_step(img_tensor, target):\\n\",\n    \"    loss = 0\\n\",\n    \"    # initializing the hidden state for each batch\\n\",\n    \"    # because the captions are not related from image to image\\n\",\n    \"    hidden = decoder.reset_state(batch_size=target.shape[0])\\n\",\n    \"    # TODO: do this dynamically: '<start>' == 2\\n\",\n    \"    dec_input = tf.expand_dims([2] * BATCH_SIZE, 1)\\n\",\n    \"\\n\",\n    \"    with tf.GradientTape() as tape:\\n\",\n    \"        features = encoder(img_tensor)\\n\",\n    \"        for i in range(1, target.shape[1]):\\n\",\n    \"            # passing the features through the decoder\\n\",\n    \"            predictions, hidden, _ = decoder(dec_input, features, hidden)\\n\",\n    \"            loss += loss_function(target[:, i], predictions)\\n\",\n    \"            # using teacher forcing\\n\",\n    \"            dec_input = tf.expand_dims(target[:, i], 1)\\n\",\n    \"    total_loss = (loss / int(target.shape[1]))\\n\",\n    \"    trainable_variables = encoder.trainable_variables + decoder.trainable_variables\\n\",\n    \"\\n\",\n    \"    gradients = tape.gradient(loss, trainable_variables)\\n\",\n    \"    optimizer.apply_gradients(zip(gradients, trainable_variables))\\n\",\n    \"    return loss, total_loss\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"cQeg3v4KCMfU\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"encoder = CNN_Encoder(embedding_dim)\\n\",\n    \"decoder = RNN_Decoder(embedding_dim, units, vocab_size)\\n\",\n    \"optimizer = tf.keras.optimizers.Adam()\\n\",\n    \"loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"pyYHHBHVCMfW\"\n   },\n   \"source\": [\n    \"### Training\\n\",\n    \"\\n\",\n    \"Here we consume the dataset we have made before by looping over it. The dataset returns the image tensor and target tensor (captions) which we will pass to `train_step` for training the network.\\n\",\n    \"\\n\",\n    \"The encoder output, hidden state (initialized to 0) and the decoder input (which is the start token) is passed to the decoder. The decoder returns the predictions and the decoder hidden state. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss. Use teacher forcing to decide the next input to the decoder. Teacher forcing is the technique where the target word is passed as the next input to the decoder. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"colab\": {},\n    \"colab_type\": \"code\",\n    \"id\": \"l4gg61xSCMfX\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import time\\n\",\n    \"\\n\",\n    \"loss_plot = []\\n\",\n    \"\\n\",\n    \"for epoch in range(0, EPOCHS):\\n\",\n    \"    start = time.time()\\n\",\n    \"    total_loss = 0\\n\",\n    \"    for (batch, (img_tensor, target)) in enumerate(dataset):\\n\",\n    \"        batch_loss, t_loss = train_step(img_tensor, target)\\n\",\n    \"        total_loss += t_loss\\n\",\n    \"        if batch % 1 == 0:\\n\",\n    \"            print('Epoch {} Batch {} Loss {:.4f}'.format(\\n\",\n    \"                epoch + 1, batch, batch_loss.numpy() / int(target.shape[1])))\\n\",\n    \"    # storing the epoch and loss value to plot later\\n\",\n    \"    loss_plot.append(total_loss / num_steps)\\n\",\n    \"\\n\",\n    \"    print('Epoch {} Loss {:.6f}'.format(epoch + 1,\\n\",\n    \"                                        total_loss / num_steps))\\n\",\n    \"    print('Time taken for 1 epoch {} sec\\\\n'.format(time.time() - start))\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"J7JPiJjtCMfb\"\n   },\n   \"source\": [\n    \"#### Visualize the loss\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 295\n    },\n    \"colab_type\": \"code\",\n    \"id\": \"M0icezYgCMfd\",\n    \"outputId\": \"5c2bf016-120c-4ca9-f7d2-cef69eb216a0\"\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAApA0lEQVR4nO3deXxddZ3/8dcna9s0\\ne9ItS9OWlm5AWkoXqmXHwjgUxRE6iIJLdRQZnRGXn7+Z8YfOKDqDIwqjqIigwyIoorIItOy0NHSj\\ne9M9oW3SpG3ahjbb5/fHPY3XkK1tbk6S+34+HvfRe8/53ns+Pb3Ju+f7Ped7zN0REREBSAi7ABER\\n6TsUCiIi0kqhICIirRQKIiLSSqEgIiKtFAoiItJKoSASEjO70Mwqwq5DJJpCQeKCme0ws0tD2O6N\\nZtZsZkfMrM7MVpnZ+0/hc+4zs2/FokaRaAoFkdh73d2HAlnAz4FHzCw73JJE2qdQkLhnZp8ys3Iz\\nqzWzJ8xsVLDczOz7ZlYV/C//LTObGqy70szWm9lhM6s0sy91tR13bwHuBQYD49qpY5KZvWBmB81s\\nnZldFSxfBFwPfDk44vhDD/71Rf6KQkHimpldDHwb+DAwEtgJPBSsvhyYB0wAMoM2NcG6nwOfdvd0\\nYCqwuBvbSgI+CRwBtrRZlwz8AfgzMAz4PPBrMzvT3e8Bfg18192HuvvfnvJfWKQLCgWJd9cD97r7\\nCnc/DnwNmGNmJUAjkA5MBMzdN7j7nuB9jcBkM8tw9wPuvqKTbcw2s4PAXmAh8AF3P9S2DTAU+I67\\nN7j7YuCPQXuRXqNQkHg3isjRAQDufoTI0UBB8Iv5R8BdQJWZ3WNmGUHTa4ArgZ1m9qKZzelkG0vd\\nPcvd89x9trs/10Edu4MuphN2AgWn/lcTOXkKBYl3bwOjT7wwszQgF6gEcPc73f1cYDKRbqRbg+XL\\n3X0Bka6ex4FHeqCOIjOL/pksPlEHoOmMpVcoFCSeJJvZoKhHEvAgcJOZlZpZKvAfwDJ332Fm55nZ\\nrKC//yhwDGgxsxQzu97MMt29EagDWjrcavcsA+qJDCYnm9mFwN/yl/GNfcDY09yGSJcUChJPngTe\\niXp8I+jK+RfgMWAPkbOCrgvaZwA/BQ4Q6cqpAb4XrLsB2GFmdcBniIxNnDJ3byASAlcA+4G7gY+6\\n+8agyc+JjGEcNLPHT2dbIp0x3WRHRERO0JGCiIi0UiiIiEgrhYKIiLRSKIiISKuksAs4WXl5eV5S\\nUhJ2GSIi/cqbb765393zu2oXs1Aws3uB9wNV7j61nfUTgV8A04Gvu/t/dudzS0pKKCsr69FaRUQG\\nOjPb2XWr2HYf3QfM72R9LXAL0K0wEBGR2ItZKLj7S0R+8Xe0vsrdlxOZWExERPoADTSLiEirfhEK\\nZrbIzMrMrKy6ujrsckREBqx+EQrufo+7z3D3Gfn5XQ6ei4jIKeoXoSAiIr0jlqekPghcCOSZWQXw\\nb0AygLv/2MxGAGVEZqJsMbMvAJPdvS5WNYmISOdiFgru3ultBN19L1AYq+23tWnvYR59czdfvGwC\\nQ1L63TV7IiK9Im66jyoO1PPTl7eztlIHIiIiHYmbUCgtygJg1e4D4RYiItKHxU0o5A5NpShnMKt2\\nHwy7FBGRPituQgGgtCibVbsOhl2GiEifFWehkMXbh46xr+5Y2KWIiPRJcRcKACt1tCAi0q64CoUp\\nozJITjSNK4iIdCCuQmFQciKTRmboDCQRkQ7EVShApAvprYpDNLd42KWIiPQ5cRcK04qzONrQzJaq\\nw2GXIiLS58RdKJQWZQPo1FQRkXbEXSiU5A4ha0iyBptFRNoRd6FgZpxTmKVQEBFpR9yFAkQGmzfv\\nO8zR401hlyIi0qfEZygUZ9HisKbiUNiliIj0KfEZCoVZAOpCEhFpIy5DITsthZLcIbqITUSkjbgM\\nBYiMK6zcdRB3XcQmInJCXIdC1eHj7DmkGVNFRE6I31AoDi5i07iCiEirmIWCmd1rZlVmtraD9WZm\\nd5pZuZmtMbPpsaqlPZNGppOSmKBQEBGJEssjhfuA+Z2svwIYHzwWAf8Tw1reJTUpkcmjMjTdhYhI\\nlJiFgru/BNR20mQBcL9HLAWyzGxkrOppT2lRFm9VHqKpuaU3Nysi0meFOaZQAOyOel0RLHsXM1tk\\nZmVmVlZdXd1jBUwrzuKdxmY27dOMqSIi0E8Gmt39Hnef4e4z8vPze+xzpxVpsFlEJFqYoVAJFEW9\\nLgyW9ZqinMHkpKVoXEFEJBBmKDwBfDQ4C2k2cMjd9/RmAWZGaZFmTBUROSEpVh9sZg8CFwJ5ZlYB\\n/BuQDODuPwaeBK4EyoF64KZY1dKZ0qIslmyq4vCxRtIHJYdRgohInxGzUHD3hV2sd+Bzsdp+d5UW\\nZeHBjKlzz8gLuxwRkVD1i4HmWDqnKAvQYLOICCgUyByczNj8NFbu0oypIiJxHwpA62CzZkwVkXin\\nUACmFWWx/0gDFQfeCbsUEZFQKRSAUl3EJiICKBQAmDgyndQkzZgqIqJQAJITE5hakKlQEJG4p1AI\\nlBZlsbbyEI2aMVVE4phCITCtOIvjTS1s3KMZU0UkfikUAqWtF7HpegURiV8KhUBB1mDyhqayUuMK\\nIhLHFAoBzZgqIqJQ+CvTirPYVn2UQ/WNYZciIhIKhUKUE+MKqysOhlqHiEhYFApRzi7MxAxW6k5s\\nIhKnFApR0gclc0b+UJ2BJCJxS6HQhmZMFZF4plBoo7Q4iwP1jeyqrQ+7FBGRXqdQaKNUd2ITkTgW\\n01Aws/lmtsnMys3sq+2sH21mz5vZGjN7wcwKY1lPd5w5PJ3ByYkabBaRuBSzUDCzROAu4ApgMrDQ\\nzCa3afafwP3ufjZwG/DtWNXTXUmJCZylGVNFJE7F8khhJlDu7tvcvQF4CFjQps1kYHHwfEk760NR\\nWpzF+rfrON7UHHYpIiK9KpahUADsjnpdESyLthr4YPD8A0C6meW2/SAzW2RmZWZWVl1dHZNio00r\\nyqKhuYUNmjFVROJM2APNXwIuMLOVwAVAJfCu/567+z3uPsPdZ+Tn58e8qNLiLABW7dL1CiISX5Ji\\n+NmVQFHU68JgWSt3f5vgSMHMhgLXuPvBGNbULSMzBzM8I1XjCiISd2J5pLAcGG9mY8wsBbgOeCK6\\ngZnlmdmJGr4G3BvDek6KZkwVkXgUs1Bw9ybgZuAZYAPwiLuvM7PbzOyqoNmFwCYz2wwMB/49VvWc\\nrNKibHbU1FN7tCHsUkREek0su49w9yeBJ9ss+9eo548Cj8ayhlPVOmPq7oNcNHFYuMWIiPSSsAea\\n+6yzCzNJMHQnNhGJKwqFDqSlJjFheLrGFUQkrigUOlFalMVqzZgqInFEodCJ0qIsDr3TyPb9R8Mu\\nRUSkVygUOtF6EZu6kEQkTigUOjF+WDppKYkKBRGJGwqFTiQmGGcVasZUEYkfCoUuTCvOZsOeOo41\\nasZUERn4FApdKC3KorHZWfd2XdiliIjEnEKhC9N0e04RiSMKhS4MyxjEqMxBCgURiQsKhW4oLc5i\\npe6tICJxQKHQDaVFWVQceIf9R46HXYqISEwpFLqhtCgbgFW7DoZbiIhIjCkUuuGsgkwSE0zjCiIy\\n4CkUumFwSiJnasZUEYkDCoVuKi2OzJja0qIZU0Vk4FIodNO0oiwOH29i6faasEsREYkZhUI3XXHW\\nSIpzhnDrb9Zw6J3GsMsREYmJmIaCmc03s01mVm5mX21nfbGZLTGzlWa2xsyujGU9p2NoahI/uK6U\\nfXXH+Prv3tKNd0RkQIpZKJhZInAXcAUwGVhoZpPbNPu/wCPuPg24Drg7VvX0hGnF2Xzxsgn8cc0e\\nHltRGXY5IiI9LpZHCjOBcnff5u4NwEPAgjZtHMgInmcCb8ewnh7xmQvGMXtsDv/6+7Xs0B3ZRGSA\\niWUoFAC7o15XBMuifQP4iJlVAE8Cn2/vg8xskZmVmVlZdXV1LGrttsQE4/vXlpKcmMAtD62koakl\\n1HpERHpS2APNC4H73L0QuBJ4wMzeVZO73+PuM9x9Rn5+fq8X2dbIzMHcfs1ZrKk4xB3Pbg67HBGR\\nHhPLUKgEiqJeFwbLon0CeATA3V8HBgF5Maypx8yfOpKFM4v5yUtbebV8f9jliIj0iFiGwnJgvJmN\\nMbMUIgPJT7Rpswu4BMDMJhEJhXD7h07Cv7x/EmPz0vinR1ZRe7Qh7HJERE5bzELB3ZuAm4FngA1E\\nzjJaZ2a3mdlVQbN/Bj5lZquBB4EbvR+d6zkkJYkfXDeNA0cb+cpja3Saqoj0e9bffpHNmDHDy8rK\\nwi7jr/zs5W18608b+NbVU/nI7NFhlyMi8i5m9qa7z+iqXdgDzQPCx+eO4YIJ+Xzzj+vZvO9w2OWI\\niJwyhUIPSEgw/vPvziF9UBK3PLiSY43NYZckInJKFAo9JD89le996Bw27j3Md57aGHY5IiKnRKHQ\\ngy6aOIyb5pZw32s7WLxxX9jliIicNIVCD/vK/IlMHJHOrb9ZQ9XhY2GXIyJyUhQKPWxQciI/XDiN\\nI8eb+OdHVuumPCLSr3QrFMxsnJmlBs8vNLNbzCwrppX1Y+OHp/Mv75/My1v2c++r28MuR0Sk27p7\\npPAY0GxmZwD3EJm+4n9jVtUAcP2sYi6bPJzbn97I2spDYZcjItIt3Q2FluAK5Q8AP3T3W4GRsSur\\n/zMzbr/mbHLSUrjloZXUNzSFXZKISJe6GwqNZrYQ+Bjwx2BZcmxKGjhy0lL4/odL2b7/KN/84/qw\\nyxER6VJ3Q+EmYA7w7+6+3czGAA/ErqyB4/wz8vjMBeN48I3dPPXWnrDLERHpVFJ3Grn7euAWADPL\\nBtLd/fZYFjaQ/NNlE3itfD9feWwNo3PTmDwqo+s3iYiEoLtnH71gZhlmlgOsAH5qZnfEtrSBIzkx\\ngTsXTmNIShIf+vFrPLteF7aJSN/U3e6jTHevAz4I3O/us4BLY1fWwDM6N43f3zyXM4YNZdEDZfzk\\nxa2aaltE+pzuhkKSmY0EPsxfBprlJA3PGMTDi+Zw5dSRfPupjXz50TW6x7OI9CndGlMAbiNys5xX\\n3X25mY0FtsSurIFrcErkiudx+WncubicnbX1/Pgj55KTlhJ2aSIiuslOmH6/qpJbH13DiIxB3Hvj\\nDM4Ylh52SSIyQPXoTXbMrNDMfmdmVcHjMTMrPP0y49uC0gIe/NRs6hua+MDdr/HS5n5ze2oRGaC6\\nO6bwC+AJYFTw+EOwTE7TuaOzefxzcynIGsxN9y3nl6/tCLskEYlj3Q2FfHf/hbs3BY/7gPwY1hVX\\nCrOH8Og/nM+FE/L5tyfW8S+Pr6WpWQPQItL7uhsKNWb2ETNLDB4fAWq6epOZzTezTWZWbmZfbWf9\\n981sVfDYbGYHT7L+AWNoahL3fHQGi+aN5YGlO7npvuUceqcx7LJEJM50NxQ+TuR01L3AHuBDwI2d\\nvcHMEoG7gCuAycBCM5sc3cbdv+jupe5eCvwQ+O3JFD/QJCYY/+fKSdx+zVm8vrWGD979Kjv2Hw27\\nLBGJI90KBXff6e5XuXu+uw9z96uBa7p420yg3N23uXsD8BCwoJP2C4EHu1PPQHftecX86pOzqDna\\nwNV3v8rSbV0elImI9IjTufPaP3WxvgDYHfW6Ilj2LmY2GhgDLO5g/SIzKzOzsurq+DhDZ/bYXB7/\\n7Fxy01K44efLeGT57q7fJCJymk4nFKzHqoDrgEfdvbm9le5+j7vPcPcZ+fnxM75dkpfGbz87l9lj\\nc/nyY2v4jyc3aABaRGLqdEKhq6veKoncoe2EwmBZe65DXUftyhyczC9uPI8bZo/mnpe28YG7X2P9\\n23VhlyUiA1SnoWBmh82srp3HYSLXK3RmOTDezMaYWQqRX/xPtLONiUA28Pop/h0GvKTEBL559VTu\\nvn46ew69w1U/eoX/+vMmjje1e2AlInLKOg0Fd09394x2Hunu3um8ScHtO28mMmfSBuARd19nZreZ\\n2VVRTa8DHvL+Nt9GCK48ayTPfvECriodxQ8Xl/P+O19hxa4DYZclIgOI5j7qp5ZsquLrv32LPXXH\\n+PjcMfzz5RMYktLd+Q1FJN706NxH0vdcdOYwnvniPK6fVczPX9nO/P9+mde27g+7LBHp5xQK/Vj6\\noGS+dfVZPLxoNgkGf//TZXztt29Rd0xXQovIqVEoDACzxuby9Bfm8el5Y3l4+S4uv+Mlnt+gW36K\\nyMlTKAwQg5IT+dqVk/jdZ+eSNSSZT/yyjH98aCW1RxvCLk1E+hGFwgBzTlEWT9z8Hr5w6XiefGsP\\nl97xIk+sflv3gxaRblEoDEApSQl84dIJ/PHz76UoezC3PLiST93/JvvqjoVdmoj0cQqFAezMEen8\\n9rNz+fqVk3h5SzWX3vEiP3lxK8caddGbiLRPoTDAJSYYn5o3lme+MI9zR2fz7ac2cvF/vsBvynbT\\n3KIuJRH5awqFOFGSl8Z9N83kwU/NJj9jELc+uoYrfvASz63fp/EGEWmlUIgzc8bl8vhnz+fu66fT\\n2Ox88v4yPvyT13lzZ23YpYlIH6BQiENmxpVnjeTPX5zHt66eyvb99VzzP6+z6P4yyqsOh12eiIRI\\ncx8JR483ce8r2/nJS9uob2jiwzOK+MKlExiROSjs0kSkh3R37iOFgrSqOXKcHy0p51dLd5KYYHx8\\n7hg+fcE4Mgcnh12aiJwmhYKcsl019dzx7CYeX/U2WUOSufmiM/jI7NEMSk4MuzQROUUKBTltaysP\\ncfvTG3l5y34KsgbzxcsmcHXpKJISNRQl0t9o6mw5bVMLMnngE7P41SdmkZOWwpd+s5pL73iRh5fv\\noqFJ94oWGYh0pCDd0tLi/Hn9Xn60pJy1lXWMyhzEonljuW5msbqVRPoBdR9JTLg7L2yu5q7F5ZTt\\nPEDe0BQ++d6xfGT2aIam6s5vIn2VQkFiyt1Ztr2Wu5aU8/KW/WQOTubG80u4aW4JWUNSwi5PRNro\\nE2MKZjbfzDaZWbmZfbWDNh82s/Vmts7M/jeW9UjPMTNmj83lgU/M4vefm8vMMTn84PktzP3OYr79\\n5AaqDmtGVpH+KGZHCmaWCGwGLgMqgOXAQndfH9VmPPAIcLG7HzCzYe5e1dnn6kih79q4t467l2zl\\nj2veJjkxgevOK2LRBeMoyBocdmkica8vHCnMBMrdfZu7NwAPAQvatPkUcJe7HwDoKhCkb5s4IoM7\\nF07j+X++kAWlo/j1sl1c8N0lfPnR1WzffzTs8kSkG2IZCgXA7qjXFcGyaBOACWb2qpktNbP57X2Q\\nmS0yszIzK6uuro5RudJTxuSl8d0PncOLX76I62cV8/tVb3PJf73A5x9cyardBzUrq0gfFvbpIknA\\neOBCoBB4yczOcveD0Y3c/R7gHoh0H/VyjXKKCrIG8/8WTOXmi8fzs1e28eulu/jD6rc5qyCTG+aM\\n5qpzRul0VpE+JpZHCpVAUdTrwmBZtArgCXdvdPftRMYgxsewJglBfnoqX7tiEkv/zyV8c8EUjjc1\\n8+VH1zDrP57n3/+0nh3qWhLpM2I50JxE5Jf8JUTCYDnw9+6+LqrNfCKDzx8zszxgJVDq7jUdfa4G\\nmvu/E6ezPrB0J8+s3UtTi3PBhHxumD2aiyYOIzHBwi5RZMDp7kBzzLqP3L3JzG4GngESgXvdfZ2Z\\n3QaUufsTwbrLzWw90Azc2lkgyMBw4nTW2WNz2Vd3jIfe2M3/vrGTT95fRkHWYK6fXcy1M4rIHZoa\\ndqkicUcXr0mf0NjcwnPr93H/6zt5fVsNKYkJ/M3ZI7lhzmimFWVhpqMHkdOhK5ql39qy7zC/WrqT\\nx1ZUcuR4E1NGZfDROaO56pwCBqdoYFrkVCgUpN87cryJx1dW8sDrO9m07zAZg5K4eloBH55RxNSC\\nzLDLE+lXFAoyYLg7y3cciAxMr9tLQ1MLU0ZlcO15RSw4p4DMIboznEhXFAoyIB2sb+D3q97m4eW7\\nWb+njtSkBOZPHcG1M4qYPTaXBJ25JNIuhYIMeGsrD/Hw8t08vqqSw8eaKM4Zwt+dW8iHZhQyMlPz\\nLYlEUyhI3DjW2MzTa/fy8PLdvL6thgSDeRPyuXZGEZdMGk5Kkm4wKKJQkLi0s+Yovymr4NE3K9hb\\nd4zctBQ+MK2Aa88rYvzw9LDLEwmNQkHiWnOL89Lmah5evpvnNuyjqcWZVpzFh84t5P1njyJzsAan\\nJb4oFEQC+48c5/GVlTy8fDdbqo6QkpTAZZOGc825Bcwbn09SorqXZOBTKIi04e6srazjsRUV/H5V\\nJQfqG8kbmsrVpaP44PRCJo/KCLtEkZhRKIh0oqGphRc2VfHYigoWb6yisdmZNDKDa6YXsKC0gPx0\\nzbskA4tCQaSbao828IfVb/PbFRWsrjhEYoJxwYR8rpleyCWThumeDzIgKBRETsGWfYd5bEUlj6+s\\nZG/dMTIGJfH+c0ZxzfRCphdrYj7pvxQKIqehucV5bet+HnuzgqfX7eVYYwtj8tL4m7NGMn/qCKaM\\nylBASL+iUBDpIUeON/HkW3v43YpKlm2vocWhKGcw86eMYP7UEUwrytb0GtLnKRREYqDmyHGe27CP\\np9bu5dXy/TQ2O8MzUnnflBHMnzKCmWNydIqr9EkKBZEYqzvWyOINVTy1dg8vbq7mWGMLOWkpXDZp\\nOPOnjuD8M3JJTdIgtfQNCgWRXlTf0MSLm6p5et1ent9QxZHjTaSnJnHxpGFcMXUEF0wYphsESagU\\nCiIhOd7UzGvlNTy1dg/Prt/HgfpGBiUncOGEYVw6eTgXnplPnu4/Lb2su6GQFOMi5gM/ABKBn7n7\\nd9qsvxH4HlAZLPqRu/8sljWJxFpqUiIXTRzGRROH0dTcwhvba3lq7V6eWbeXp9ftxQzOLszikonD\\nuHjiMJ3JJH1KzI4UzCwR2AxcBlQAy4GF7r4+qs2NwAx3v7m7n6sjBemv3J11b9exeGMVz2+sYk3F\\nQdxheEYqF50ZCYi5Z+SRlhrT/6tJnOoLRwozgXJ33xYU9BCwAFjf6btEBigzY2pBJlMLMrnlkvFU\\nHz7OC5uqWLKpij+t2cNDy3eTkpjArLE5wVHEcIpzh4RdtsSZWB4pfAiY7+6fDF7fAMyKPioIjhS+\\nDVQTOar4orvvbuezFgGLAIqLi8/duXNnTGoWCUtDUwtlO2pZvLGKxRur2Lb/KABnDBvKxROHcdGZ\\nw5hRkk2yTneVUxT6QHM3QyEXOOLux83s08C17n5xZ5+r7iOJBzv2H20NiGXba2hsdtIHJTFrTC6z\\nx+Ywa0wuk0dlkKiL5qSb+kL3USVQFPW6kL8MKAPg7jVRL38GfDeG9Yj0GyV5aXz8PWP4+HvGcOR4\\nE69sqeaFTdUs3VbDcxv2AZCemsSMkmxmjc1l1pgcphZk6khCTlssQ2E5MN7MxhAJg+uAv49uYGYj\\n3X1P8PIqYEMM6xHpl4amJjF/6kjmTx0JwN5Dx1i2vYZl22tZtq2GJZuqARiSksi5o7OZHYTE2YVZ\\nuj+1nLSYhYK7N5nZzcAzRE5Jvdfd15nZbUCZuz8B3GJmVwFNQC1wY6zqERkoRmQOYkFp5L4PANWH\\nj/PG9lqWbqth2fYavvfMJgAGJScwvTibWWNymTU2h9KiLE0DLl3SxWsiA0zt0Qbe2F7D0m21LNte\\ny8a9dbhDSlIC55VkM298PvMm5DNxRLquj4gjoQ80x4pCQeTkHKpv5I0dkSOJl7dUs3nfEQDy01N5\\n7/g8LpiQz3vOyCNXV1kPaAoFEWnX3kPHeGlLNS9truaV8v0crG/EDKaOyuS94/OYNyGf6cXZGo8Y\\nYBQKItKl5hbnrcpDvLy5mpe2VLNi10GaW5y0lETmjMtl3oR85o3PpyQvLexS5TQpFETkpNUda+T1\\nrTW8FITE7tp3ACjOGcJ7x+fx3vF5zB6bS9aQlJArlZOlUBCR0+Lu7Kip5+Wgq+m1rTXUNzS3djWd\\nf0Yuc8flcV5JjqYF7wcUCiLSoxqaWlhdcZBXy/fzWnkNK3cfoLHZSUlMYFpxFnPPyGPuGbmcXZil\\ni+j6IIWCiMRUfUMTb2yv5bWtNbxavp/1eyKnvg5NTWLmmBzOH5fL3DPydOprH9EXprkQkQFsSEoS\\nF545jAvPHAbAgaMNvL4tEhCvba1h8cYqAPKGpjBnXB5zx+Vy0cRhDM8YFGbZ0gUdKYhITFQefCfo\\natrPq1trqD58HIBpxVnMnzKC900ZobOaepG6j0Skz3B3Nu87wrPrI3efW1tZB8DEEelcPmUE86eM\\nYNJIdTPFkkJBRPqs3bX1/Hn9Pp5Zt5flO2pxh6KcwcyfMoL5U0cwrSibBE0L3qMUCiLSL+w/cpzn\\n1u/j6XV7ebV8P43NTn56KpdPHs77poxg9thcXV3dAxQKItLv1B1rZMnGKp5Zt5cXNlVT39BMxqAk\\nLpk0nPdNGc6ccXlkDk4Ou8x+SaEgIv3ascZmXt6yn2fW7eW5Dfta52iaMCyd6aOzmTE6mxkl2RTn\\nDNFYRDcoFERkwGhqbmH5jgMs31HLmzsPsGLXAQ4fawIgb2gq547O4tzR2Zw7OoepBRmkJukK67Z0\\nnYKIDBhJiQnMGZfLnHG5ALS0OJurDvPmzgO8ueMAb+46wDPrIrcpTUlK4OyCzCAkspk+Ops8TQve\\nbTpSEJEBoerwMVbsPMibOyNHE2sr62hobgFgTF4a04uzec/4XC6YMIyctPib0E/dRyIS1441NrO2\\n8hBlOw9Ejih2HqD2aANmUFqUxcVnDuOiicOYMiojLsYkFAoiIlFagntHLN5YxQubqlhdcQiAYemp\\nXHTmMC6amM97xuczNHVg9qr3iVAws/nAD4BE4Gfu/p0O2l0DPAqc5+6d/sZXKIhIT6g+fJwXN1ez\\nZGMVL22u5vDxJpITjZljcoKQGMbYvLQBcxQReiiYWSKwGbgMqACWAwvdfX2bdunAn4AU4GaFgoj0\\ntsbmFt7ceYAlG6tYsqmq9T7Wo3OHtAbErDE5DEruv2c19YWzj2YC5e6+LSjoIWABsL5Nu28CtwO3\\nxrAWEZEOJScmMHtsLrPH5vK1Kyexu7aeFzZVsWRTNQ++sYv7XtvB4OREZpRkc15JDjPH5FBalNWv\\nQ6IjsQyFAmB31OsKYFZ0AzObDhS5+5/MTKEgIn1CUc4QbphTwg1zSjjW2Mzr22p4YWMVy7bX8v3n\\nNuMOyYnG2YVZnFeSw3kl2cwYnUPmkP5/tXVoIypmlgDcAdzYjbaLgEUAxcXFsS1MRCTKoOTESBdS\\ncN+IQ/WNlO2s5Y0dtSzfXsvPX9nGj190zODM4emRkBiTw8ySHEZk9r97R8RyTGEO8A13f1/w+msA\\n7v7t4HUmsBU4ErxlBFALXNXZuILGFESkL3mnoZlVuw+yfEdt6xXX9Q3NABTnDGFGSTYzg6AYk5sW\\n2uyvfWGgOYnIQPMlQCWRgea/d/d1HbR/AfiSBppFpD9ram5h/Z463theGwRF5PoIgNSkBIpzhjA6\\nN42S3CGMzo08H507hIKswSTF8N7WoQ80u3uTmd0MPEPklNR73X2dmd0GlLn7E7HatohIWJISEzi7\\nMIuzC7P45HvH4u5srT5C2Y4DbK0+ws6aenbW1PNKeTXHGlv+8r4EozB7MMWtgZHG6JwhlOQNoTB7\\nSK8NauviNRGRELg7VYePs2P/UXbW1rOz5ig7aurZVVPPjpqjrRP+AZjByIxB3DR3DJ+aN/aUthf6\\nkYKIiHTMzBieMYjhGYOYNTb3r9a5OwfrG9lRc7T1yGJnzVGGZcR+Yj+FgohIH2NmZKelkJ2WwrTi\\n7F7dtu5xJyIirRQKIiLSSqEgIiKtFAoiItJKoSAiIq0UCiIi0kqhICIirRQKIiLSqt9Nc2Fm1cDO\\nU3x7HrC/B8vpaX29Puj7Naq+06P6Tk9frm+0u+d31ajfhcLpMLOy7sz9EZa+Xh/0/RpV3+lRfaen\\nr9fXHeo+EhGRVgoFERFpFW+hcE/YBXShr9cHfb9G1Xd6VN/p6ev1dSmuxhRERKRz8XakICIinVAo\\niIhIqwEZCmY238w2mVm5mX21nfWpZvZwsH6ZmZX0Ym1FZrbEzNab2Toz+8d22lxoZofMbFXw+Nfe\\nqi/Y/g4zeyvY9rvufWoRdwb7b42ZTe/F2s6M2i+rzKzOzL7Qpk2v7z8zu9fMqsxsbdSyHDN71sy2\\nBH+2e7cUM/tY0GaLmX2sF+v7npltDP4Nf2dmWR28t9PvQwzr+4aZVUb9O17ZwXs7/XmPYX0PR9W2\\nw8xWdfDemO+/HuXuA+oBJAJbgbFACrAamNymzWeBHwfPrwMe7sX6RgLTg+fpwOZ26rsQ+GOI+3AH\\nkNfJ+iuBpwADZgPLQvy33kvkopxQ9x8wD5gOrI1a9l3gq8HzrwK3t/O+HGBb8Gd28Dy7l+q7HEgK\\nnt/eXn3d+T7EsL5vAF/qxneg05/3WNXXZv1/Af8a1v7rycdAPFKYCZS7+zZ3bwAeAha0abMA+GXw\\n/FHgEjOz3ijO3fe4+4rg+WFgA1DQG9vuQQuA+z1iKZBlZiNDqOMSYKu7n+oV7j3G3V8Catssjv6e\\n/RK4up23vg941t1r3f0A8Cwwvzfqc/c/u/uJu8MvBQp7ervd1cH+647u/Lyfts7qC353fBh4sKe3\\nG4aBGAoFwO6o1xW8+5dua5vgh+IQkEsvC7qtpgHL2lk9x8xWm9lTZjaldyvDgT+b2Ztmtqid9d3Z\\nx73hOjr+QQxz/50w3N33BM/3AsPbadNX9uXHiRz9taer70Ms3Rx0b93bQfdbX9h/7wX2ufuWDtaH\\nuf9O2kAMhX7BzIYCjwFfcPe6NqtXEOkSOQf4IfB4L5f3HnefDlwBfM7M5vXy9rtkZinAVcBv2lkd\\n9v57F4/0I/TJ87/N7OtAE/DrDpqE9X34H2AcUArsIdJF0xctpPOjhD7/8xRtIIZCJVAU9bowWNZu\\nGzNLAjKBml6pLrLNZCKB8Gt3/23b9e5e5+5HgudPAslmltdb9bl7ZfBnFfA7Iofo0bqzj2PtCmCF\\nu+9ruyLs/Rdl34luteDPqnbahLovzexG4P3A9UFwvUs3vg8x4e773L3Z3VuAn3aw3bD3XxLwQeDh\\njtqEtf9O1UAMheXAeDMbE/xv8jrgiTZtngBOnOXxIWBxRz8QPS3of/w5sMHd7+igzYgTYxxmNpPI\\nv1OvhJaZpZlZ+onnRAYj17Zp9gTw0eAspNnAoahukt7S4f/Owtx/bUR/zz4G/L6dNs8Al5tZdtA9\\ncnmwLObMbD7wZeAqd6/voE13vg+xqi96nOoDHWy3Oz/vsXQpsNHdK9pbGeb+O2Vhj3TH4kHk7JjN\\nRM5K+Hqw7DYiX36AQUS6HcqBN4CxvVjbe4h0I6wBVgWPK4HPAJ8J2twMrCNyJsVS4PxerG9ssN3V\\nQQ0n9l90fQbcFezft4AZvfzvm0bkl3xm1LJQ9x+RgNoDNBLp1/4EkXGq54EtwHNATtB2BvCzqPd+\\nPPgulgM39WJ95UT64098D0+ckTcKeLKz70Mv1fdA8P1aQ+QX/ci29QWv3/Xz3hv1BcvvO/G9i2rb\\n6/uvJx+a5kJERFoNxO4jERE5RQoFERFppVAQEZFWCgUREWmlUBARkVYKBZGAmTXbX8/A2mMzbppZ\\nSfQMmyJ9VVLYBYj0Ie+4e2nYRYiESUcKIl0I5sP/bjAn/htmdkawvMTMFgcTtj1vZsXB8uHB/QlW\\nB4/zg49KNLOfWuQ+Gn82s8FB+1sscn+NNWb2UEh/TRFAoSASbXCb7qNro9YdcvezgB8B/x0s+yHw\\nS3c/m8hkcncGy+8EXvTIhHzTiVzJCjAeuMvdpwAHgWuC5V8FpgWf85nY/NVEukdXNIsEzOyIuw9t\\nZ/kO4GJ33xZMZrjX3XPNbD+RqRcag+V73D3PzKqBQnc/HvUZJUTumzA+eP0VINndv2VmTwNHiMzm\\n+rgHk/mJhEFHCiLd4x08PxnHo54385cxvb8hMpfUdGB5MPOmSCgUCiLdc23Un68Hz18jMisnwPXA\\ny8Hz54F/ADCzRDPL7OhDzSwBKHL3JcBXiEzj/q6jFZHeov+RiPzF4DY3X3/a3U+clpptZmuI/G9/\\nYbDs88AvzOxWoBq4KVj+j8A9ZvYJIkcE/0Bkhs32JAK/CoLDgDvd/WAP/X1ETprGFES6EIwpzHD3\\n/WHXIhJr6j4SEZFWOlIQEZFWOlIQEZFWCgUREWmlUBARkVYKBRERaaVQEBGRVv8f850UGBZWDxQA\\nAAAASUVORK5CYII=\\n\",\n      \"text/plain\": [\n       \"<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=389x278 at 0x7FD0E5D3FF60>\"\n      ]\n     },\n     \"execution_count\": 23,\n     \"metadata\": {\n      \"tags\": []\n     },\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"import matplotlib.pyplot as plt\\n\",\n    \"# Below loss curve is not the actual loss image we have got\\n\",\n    \"# while training and kept it here only as a reference\\n\",\n    \"plt.plot(loss_plot)\\n\",\n    \"plt.xlabel('Epochs')\\n\",\n    \"plt.ylabel('Loss')\\n\",\n    \"plt.title('Loss Plot')\\n\",\n    \"plt.show()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"dataloaders.ipynb\",\n   \"provenance\": [],\n   \"toc_visible\": true,\n   \"version\": \"0.3.2\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.7\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/Tutorial-QuickStart.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Quick Start Tutorial\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"A simple step-by-step guide that will quickly get you started with Hangar basics, including initializing a repository, adding and committing data to a repository.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Installation\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can install Hangar via `pip`:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ pip install hangar\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"or via `conda`:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ conda install -c conda-forge hangar\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Please refer to the [Installation](installation.rst) page for more information.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Quick Start for the Impatient\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The only import statement you'll ever need:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Create and initialize a new Hangar `Repository` at the given path:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"//anaconda/envs/hangar-tutorial/lib/python3.8/site-packages/hangar/context.py:92: UserWarning: No repository exists at /Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar, please use `repo.init()` method\\n\",\n      \"  warnings.warn(msg, UserWarning)\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'/Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar'\"\n      ]\n     },\n     \"execution_count\": 2,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"!mkdir /Volumes/Archivio/tensorwerk/hangar/quick-start\\n\",\n    \"\\n\",\n    \"repo = Repository(path=\\\"/Volumes/Archivio/tensorwerk/hangar/quick-start\\\")\\n\",\n    \"\\n\",\n    \"repo.init(\\n\",\n    \"    user_name=\\\"Alessia Marcolini\\\", user_email=\\\"alessia@tensorwerk.com\\\", remove_old=True\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Checkout the `Repository` in write mode:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar WriterCheckout                \\n\",\n       \"    Writer       : True                \\n\",\n       \"    Base Branch  : master                \\n\",\n       \"    Num Columns  : 0\\n\"\n      ]\n     },\n     \"execution_count\": 3,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout = repo.checkout(write=True)\\n\",\n    \"master_checkout\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Inspect the `columns` we have (we just started, none so far):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 0                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - \"\n      ]\n     },\n     \"execution_count\": 4,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Prepare some random data to play with:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([[0.17961852, 0.31945355],\\n\",\n       \"       [0.10929027, 0.2681622 ],\\n\",\n       \"       [0.29397449, 0.02659856]])\"\n      ]\n     },\n     \"execution_count\": 5,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"dummy = np.random.rand(3,2)\\n\",\n    \"dummy\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Create a new column named `dummy_column`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 6,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col = master_checkout.add_ndarray_column(name=\\\"dummy_column\\\", prototype=dummy)\\n\",\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Add data to `dummy_column`, treating it as a normal Python dictionary:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dummy_col[0] = dummy\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dummy_col[1] = np.random.rand(3,2)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Commit your changes providing a message:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=c104ef7e2cfe87318e78addd6033028488050cea'\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add dummy_column with 2 samples\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Add more data and commit again:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 3                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 10,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col[2] = np.random.rand(3,2)\\n\",\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=099557d48edebb7607fa3ec648eafa2a1af5e652'\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add one more sample to dummy_column\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"See the master branch history:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=099557d48edebb7607fa3ec648eafa2a1af5e652 (\\u001B[1;31mmaster\\u001B[m) : Add one more sample to dummy_column\\n\",\n      \"* a=c104ef7e2cfe87318e78addd6033028488050cea : Add dummy_column with 2 samples\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"master_checkout.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Close the write-enabled checkout:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"master_checkout.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Inspect the status of the `Repository`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Volumes/Archivio/tensorwerk/hangar/quick-start \\n\",\n      \"|  Disk Usage: 237.53 kB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=099557d48edebb7607fa3ec648eafa2a1af5e652 \\n\",\n      \"|  Created: Mon May  4 13:00:43 2020 \\n\",\n      \"|  By: Alessia Marcolini \\n\",\n      \"|  Email: alessia@tensorwerk.com \\n\",\n      \"|  Message: Add one more sample to dummy_column \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 1 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"dummy_column\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 3 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_hasher_tcode: 1 \\n\",\n      \"|    - data_hasher_tcode: 0 \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (3, 2) \\n\",\n      \"|    - dtype: float64 \\n\",\n      \"|    - backend: 01 \\n\",\n      \"|    - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"### Quick Start - with explanations\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 1. Create and initialize a \\\"Repository\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Central to Hangar is the concept of [Repository](api.rst#hangar.repository.Repository).\\n\",\n    \"\\n\",\n    \"A `Repository` consists of an **historically ordered mapping** of **Commits** over time by various **Committers** across any number of **Branches**. Though there are many conceptual similarities in what a Git repo and a Hangar repository achieve, Hangar is designed with the express purpose of dealing with **numeric data**.\\n\",\n    \"\\n\",\n    \"To start using Hangar programmatically, simply begin with this import statement:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Create the folder where you want to store the `Repository`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!mkdir /Volumes/Archivio/tensorwerk/hangar/quick-start\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Initialize the `Repository` object by saying where your repository should live.\\n\",\n    \"\\n\",\n    \".. note:: Note that if you feed a path to the `Repository` which does not contain a pre-initialized Hangar repo, Python shows you a warning saying that you will need to initialize the repo before starting working on it.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"//anaconda/envs/hangar-tutorial/lib/python3.8/site-packages/hangar/context.py:92: UserWarning: No repository exists at /Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar, please use `repo.init()` method\\n\",\n      \"  warnings.warn(msg, UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo = Repository(path=\\\"/Volumes/Archivio/tensorwerk/hangar/quick-start\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Initialize the `Repository` providing your name and your email.\\n\",\n    \"\\n\",\n    \".. warning:: Please be aware that the `remove_old` parameter set to `True` **removes and reinitializes** a Hangar repository at the given path.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'/Volumes/Archivio/tensorwerk/hangar/quick-start/.hangar'\"\n      ]\n     },\n     \"execution_count\": 4,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.init(\\n\",\n    \"    user_name=\\\"Alessia Marcolini\\\", user_email=\\\"alessia@tensorwerk.com\\\", remove_old=True\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 2. Open the Staging Area for Writing\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To start interacting with Hangar, first you need to check out the `Repository` you want to work on.\\n\",\n    \"\\n\",\n    \"A repo can be checked out in two modes:\\n\",\n    \"\\n\",\n    \"* [write-enabled](api.rst#hangar.checkout.WriterCheckout)\\n\",\n    \"* [read-only](api.rst#hangar.checkout.ReaderCheckout)\\n\",\n    \"\\n\",\n    \"We need to check out the repo in **write mode** in order to initialize the columns and write into them.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar WriterCheckout                \\n\",\n       \"    Writer       : True                \\n\",\n       \"    Base Branch  : master                \\n\",\n       \"    Num Columns  : 0\\n\"\n      ]\n     },\n     \"execution_count\": 5,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout = repo.checkout(write=True)\\n\",\n    \"master_checkout\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"A checkout allows access to `columns`. The `columns` attribute of a checkout provide the interface to working with all of the data on disk!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 0                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - \"\n      ]\n     },\n     \"execution_count\": 6,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 3. Create some random data to play with\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's create a random array to be used as a dummy example:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([[0.54631485, 0.26578857],\\n\",\n       \"       [0.74990074, 0.41764666],\\n\",\n       \"       [0.75884524, 0.05547267]])\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"dummy = np.random.rand(3,2)\\n\",\n    \"dummy\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 4. Initialize a column\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"With checkout write-enabled, we can now initialize a new column of the repository using the method [add_ndarray_column()](api.rst#hangar.checkout.WriterCheckout.add_ndarray_column).\\n\",\n    \"\\n\",\n    \"All samples within a column have the same data type, and number of dimensions. The size of each dimension can be either fixed (the default behavior) or variable per sample.\\n\",\n    \"\\n\",\n    \"You will need to provide a column name and a prototype, so Hangar can infer the shape of the elements contained in the array. `dummy_col` will become a column accessor object.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 8,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col = master_checkout.add_ndarray_column(name=\\\"dummy_column\\\", prototype=dummy)\\n\",\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Verify we successfully added the new column:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - dummy_column / False\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 5. Add data\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To add data to a named column, we can use **dict-style mode** as follows.\\n\",\n    \"Sample keys can be either str or int type.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dummy_col[0] = dummy\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"As we can see, `Number of Samples` is equal to 1 now!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 1                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"dummy_col[1] = np.random.rand(3,2)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 2                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 13,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([[0.17590758, 0.26950355],\\n\",\n       \"       [0.88036219, 0.7839301 ],\\n\",\n       \"       [0.87321484, 0.04316646]])\"\n      ]\n     },\n     \"execution_count\": 14,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col[1]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can also iterate over your column, as you would do with a regular Python dictionary:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Key: 0\\n\",\n      \"Value: [[0.54631485 0.26578857]\\n\",\n      \" [0.74990074 0.41764666]\\n\",\n      \" [0.75884524 0.05547267]]\\n\",\n      \"\\n\",\n      \"Key: 1\\n\",\n      \"Value: [[0.17590758 0.26950355]\\n\",\n      \" [0.88036219 0.7839301 ]\\n\",\n      \" [0.87321484 0.04316646]]\\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"for key, value in dummy_col.items():\\n\",\n    \"    print('Key:', key)\\n\",\n    \"    print('Value:', value)\\n\",\n    \"    print()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**How many samples are in the column?**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"2\"\n      ]\n     },\n     \"execution_count\": 16,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(dummy_col)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**Does the column contain that key?**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"execution_count\": 17,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"0 in dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"False\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"5 in dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 6. Commit changes\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once you have made a set of changes you want to **commit**, just simply call the [commit()](api.rst#hangar.checkout.WriterCheckout.commit) method (and pass in a message)!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=4f42fce2b66476271f149e3cd2eb4c6ba66daeee'\"\n      ]\n     },\n     \"execution_count\": 19,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add dummy_column with 2 samples\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's add another sample in the column:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : dummy_column                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : float64                \\n\",\n       \"    Shape                    : (3, 2)                \\n\",\n       \"    Number of Samples        : 3                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 20,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dummy_col[2] = np.random.rand(3,2)\\n\",\n    \"dummy_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"`Number of Samples` is equal to 3 now and we want to keep track of the change with another commit:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=753e28e27d4b23a0dca0633f90b4513538a98c40'\"\n      ]\n     },\n     \"execution_count\": 21,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add one more sample to dummy_column\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To view the **history** of your commits:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=753e28e27d4b23a0dca0633f90b4513538a98c40 (\\u001B[1;31mmaster\\u001B[m) : Add one more sample to dummy_column\\n\",\n      \"* a=4f42fce2b66476271f149e3cd2eb4c6ba66daeee : Add dummy_column with 2 samples\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"master_checkout.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**Do not forget to close the write-enabled checkout!**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"master_checkout.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Check the **state of the repository** and get useful information about disk usage, the columns you have and the last commit:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Volumes/Archivio/tensorwerk/hangar/quick-start \\n\",\n      \"|  Disk Usage: 237.53 kB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=753e28e27d4b23a0dca0633f90b4513538a98c40 \\n\",\n      \"|  Created: Tue Apr 21 21:50:15 2020 \\n\",\n      \"|  By: Alessia Marcolini \\n\",\n      \"|  Email: alessia@tensorwerk.com \\n\",\n      \"|  Message: Add one more sample to dummy_column \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 1 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"dummy_column\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 3 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_hasher_tcode: 1 \\n\",\n      \"|    - data_hasher_tcode: 0 \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (3, 2) \\n\",\n      \"|    - dtype: float64 \\n\",\n      \"|    - backend: 01 \\n\",\n      \"|    - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.summary()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"hide_input\": false,\n  \"kernelspec\": {\n   \"display_name\": \"Python (hangar-tutorial)\",\n   \"language\": \"python\",\n   \"name\": \"hangar-tutorial\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.2\"\n  },\n  \"varInspector\": {\n   \"cols\": {\n    \"lenName\": 16,\n    \"lenType\": 16,\n    \"lenVar\": 40\n   },\n   \"kernels_config\": {\n    \"python\": {\n     \"delete_cmd_postfix\": \"\",\n     \"delete_cmd_prefix\": \"del \",\n     \"library\": \"var_list.py\",\n     \"varRefreshCmd\": \"print(var_dic_list())\"\n    },\n    \"r\": {\n     \"delete_cmd_postfix\": \") \",\n     \"delete_cmd_prefix\": \"rm(\",\n     \"library\": \"var_list.r\",\n     \"varRefreshCmd\": \"cat(var_dic_list()) \"\n    }\n   },\n   \"types_to_exclude\": [\n    \"module\",\n    \"function\",\n    \"builtin_function_or_method\",\n    \"instance\",\n    \"_Feature\"\n   ],\n   \"window_display\": false\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/Tutorial-RealQuickStart.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## \\\"Real World\\\" Quick Start Tutorial\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This tutorial will guide you on working with the basics of Hangar, while playing with some \\\"real world\\\" data:\\n\",\n    \"\\n\",\n    \"* adding data to a repository\\n\",\n    \"* commiting changes\\n\",\n    \"* reading data from a commit\\n\",\n    \"* inspecting contents of a commit\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Setup\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can install Hangar via `pip`:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ pip install hangar\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"or via `conda`:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ conda install -c conda-forge hangar\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Other requirements for this tutorial are:\\n\",\n    \"\\n\",\n    \"* pillow - the python imaging library\\n\",\n    \"* tqdm - a simple tool to display progress bars (this is installed automatically as it is a requirement for `Hangar`)\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"$ pip install pillow\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 1. Create and Initialize a \\\"Repository\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"When working with Hangar programatically (the CLI is covered in later tutorials), we always start with the following import:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from hangar import Repository\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Create the folder where you want to store the Hangar `Repository`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!mkdir /Volumes/Archivio/tensorwerk/hangar/imagenette\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"and create the `Repository` object. Note that when you specify a new folder for a Hangar repository, Python shows you a warning saying that you will need to initialize the repo before starting working on it.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"//anaconda/envs/hangar-nested/lib/python3.7/site-packages/hangar-0.5.0.dev1-py3.7-macosx-10.9-x86_64.egg/hangar/context.py:94: UserWarning: No repository exists at /Volumes/Archivio/tensorwerk/hangar/imagenette/.hangar, please use `repo.init()` method\\n\",\n      \"  warnings.warn(msg, UserWarning)\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo = Repository(path=\\\"/Volumes/Archivio/tensorwerk/hangar/imagenette\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Initialize the `Repository` providing your name and your email.\\n\",\n    \"\\n\",\n    \".. warning:: Please be aware that the `remove_old` parameter set to `True` **removes and reinitializes** a Hangar repository at the given path.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Hangar Repo initialized at: /Volumes/Archivio/tensorwerk/hangar/imagenette/.hangar\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'/Volumes/Archivio/tensorwerk/hangar/imagenette/.hangar'\"\n      ]\n     },\n     \"execution_count\": 4,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"repo.init(\\n\",\n    \"    user_name=\\\"Alessia Marcolini\\\", user_email=\\\"alessia@tensorwerk.com\\\", remove_old=True\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. Open the Staging Area for Writing\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"A `Repository` can be checked out in two modes: write-enabled and read-only. We need to checkout the repo in write mode in order to initialize the columns and write into them.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"master_checkout = repo.checkout(write=True)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"A checkout allows access to `columns`. The `columns` attribute of a checkout provides the interface to working with all of the data on disk!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 0                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - \"\n      ]\n     },\n     \"execution_count\": 6,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 3. Download and Prepare Some Conventionally Stored Data\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To start playing with Hangar, let's get some data to work on. We'll be using the [Imagenette dataset](https://github.com/fastai/imagenette).\\n\",\n    \"\\n\",\n    \"The following commands will download ~96 MB of data to the local directory and decompress the tarball containing ~ 9,200 `.jpeg` images in the folder `data` in the current working directory.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"--2020-04-04 13:25:37--  https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz\\n\",\n      \"Resolving s3.amazonaws.com... 52.216.238.197\\n\",\n      \"Connecting to s3.amazonaws.com|52.216.238.197|:443... connected.\\n\",\n      \"HTTP request sent, awaiting response... 200 OK\\n\",\n      \"Length: 98948031 (94M) [application/x-tar]\\n\",\n      \"Saving to: ‘data/imagenette2-160.tgz’\\n\",\n      \"\\n\",\n      \"imagenette2-160.tgz 100%[===================>]  94.36M  4.52MB/s    in 22s     \\n\",\n      \"\\n\",\n      \"2020-04-04 13:26:00 (4.31 MB/s) - ‘data/imagenette2-160.tgz’ saved [98948031/98948031]\\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz -P data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!tar -xzf data/imagenette2-160.tgz -C data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"--2020-04-04 13:26:24--  http://image-net.org/archive/words.txt\\n\",\n      \"Resolving image-net.org... 171.64.68.16\\n\",\n      \"Connecting to image-net.org|171.64.68.16|:80... connected.\\n\",\n      \"HTTP request sent, awaiting response... 200 OK\\n\",\n      \"Length: 2655750 (2.5M) [text/plain]\\n\",\n      \"Saving to: ‘data/imagenette2-160/words.txt’\\n\",\n      \"\\n\",\n      \"words.txt           100%[===================>]   2.53M   884KB/s    in 2.9s    \\n\",\n      \"\\n\",\n      \"2020-04-04 13:26:27 (884 KB/s) - ‘data/imagenette2-160/words.txt’ saved [2655750/2655750]\\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"!wget http://image-net.org/archive/words.txt -P data/imagenette2-160\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### The dataset directory structure on disk is as follows:\\n\",\n    \"\\n\",\n    \"Each subdirectory in the `train` / `val` folders (named starting with `\\\"n0\\\"`) contains a few hundred images which feature objects/elements of a common classification  (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute, etc.). The image file names follow a convention specific to the ImageNet project, but can be thought of as essentially random (so long as they are unique).\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"imagenette2-160\\n\",\n    \"├── train\\n\",\n    \"│   ├── n01440764\\n\",\n    \"│   ├── n02102040\\n\",\n    \"│   ├── n02979186\\n\",\n    \"│   ├── n03000684\\n\",\n    \"│   ├── n03028079\\n\",\n    \"│   ├── n03394916\\n\",\n    \"│   ├── n03417042\\n\",\n    \"│   ├── n03425413\\n\",\n    \"│   ├── n03445777\\n\",\n    \"│   └── n03888257\\n\",\n    \"└── val\\n\",\n    \"    ├── n01440764\\n\",\n    \"    ├── n02102040\\n\",\n    \"    ├── n02979186\\n\",\n    \"    ├── n03000684\\n\",\n    \"    ├── n03028079\\n\",\n    \"    ├── n03394916\\n\",\n    \"    ├── n03417042\\n\",\n    \"    ├── n03425413\\n\",\n    \"    ├── n03445777\\n\",\n    \"    └── n03888257\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Classification/Label Data\\n\",\n    \"\\n\",\n    \"The labels associated with each image are contained in a seperate `.txt` file, we download the `words.txt` to the directory the images are extracted into.\\n\",\n    \"\\n\",\n    \"Reviewing the contents of this file, we will find a mapping of classification codes (subdirectory names starting with `\\\"n0\\\"`) to human readable descriptions of the contents. A small selection of the file is provided below as an illustration.\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"n01635343\\tRhyacotriton, genus Rhyacotriton\\n\",\n    \"n01635480\\tolympic salamander, Rhyacotriton olympicus\\n\",\n    \"n01635659\\tPlethodontidae, family Plethodontidae\\n\",\n    \"n01635964\\tPlethodon, genus Plethodon\\n\",\n    \"n01636127\\tlungless salamander, plethodont\\n\",\n    \"n01636352\\teastern red-backed salamander, Plethodon cinereus\\n\",\n    \"n01636510\\twestern red-backed salamander, Plethodon vehiculum\\n\",\n    \"n01636675\\tDesmograthus, genus Desmograthus\\n\",\n    \"n01636829\\tdusky salamander\\n\",\n    \"n01636984\\tAneides, genus Aneides\\n\",\n    \"n01637112\\tclimbing salamander\\n\",\n    \"n01637338\\tarboreal salamander, Aneides lugubris\\n\",\n    \"n01637478\\tBatrachoseps, genus Batrachoseps\\n\",\n    \"n01637615\\tslender salamander, worm salamander\\n\",\n    \"n01637796\\tHydromantes, genus Hydromantes\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Mapping Classification Codes to Meaningful Descriptors\\n\",\n    \"\\n\",\n    \"We begin by reading each line of this file and creating a dictionary to store the corrispondence between ImageNet synset name and a human readable label.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from pathlib import Path\\n\",\n    \"\\n\",\n    \"dataset_dir = Path(\\\"./data/imagenette2-160\\\")\\n\",\n    \"\\n\",\n    \"synset_label = {}\\n\",\n    \"with open(dataset_dir / \\\"words.txt\\\", \\\"r\\\") as f:\\n\",\n    \"    for line in f.readlines():\\n\",\n    \"        synset, label = line.split(\\\"\\\\t\\\")\\n\",\n    \"        synset_label[synset] = label.rstrip()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Read training data (images and labels) from disk and store them in NumPy arrays.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"import numpy as np\\n\",\n    \"from PIL import Image\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 10/10 [00:31<00:00,  3.12s/it]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"train_images = []\\n\",\n    \"train_labels = []\\n\",\n    \"\\n\",\n    \"for synset in tqdm(os.listdir(dataset_dir / \\\"train\\\")):\\n\",\n    \"    label = synset_label[synset]\\n\",\n    \"\\n\",\n    \"    for image_filename in os.listdir(dataset_dir / \\\"train\\\" / synset):\\n\",\n    \"        image = Image.open(dataset_dir / \\\"train\\\" / synset / image_filename)\\n\",\n    \"        image = image.resize((163, 160))\\n\",\n    \"        data = np.asarray(image)\\n\",\n    \"\\n\",\n    \"        if len(data.shape) == 2:  # discard B&W images\\n\",\n    \"            continue\\n\",\n    \"\\n\",\n    \"        train_images.append(data)\\n\",\n    \"        train_labels.append(label)\\n\",\n    \"\\n\",\n    \"train_images = np.array(train_images)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"(9296, 160, 163, 3)\"\n      ]\n     },\n     \"execution_count\": 13,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_images.shape\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \".. note:: Here we are reading the images from disk and storing them in a big Python list, and then converting it to a NumPy array. Note that it could be impractical for larger datasets. You might want to consider the idea of reading files in batch.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Read validation data (images and labels) from disk and store them in NumPy arrays, same as before.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 10/10 [00:12<00:00,  1.22s/it]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"val_images = []\\n\",\n    \"val_labels = []\\n\",\n    \"\\n\",\n    \"for synset in tqdm(os.listdir(dataset_dir / \\\"val\\\")):\\n\",\n    \"    label = synset_label[synset]\\n\",\n    \"\\n\",\n    \"    for image_filename in os.listdir(dataset_dir / \\\"val\\\" / synset):\\n\",\n    \"        image = Image.open(dataset_dir / \\\"val\\\" / synset / image_filename)\\n\",\n    \"        image = image.resize((163, 160))\\n\",\n    \"        data = np.asarray(image)\\n\",\n    \"\\n\",\n    \"        if len(data.shape) == 2:  # discard B&W images\\n\",\n    \"            continue\\n\",\n    \"\\n\",\n    \"        val_images.append(data)\\n\",\n    \"        val_labels.append(label)\\n\",\n    \"\\n\",\n    \"val_images = np.array(val_images)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"(3856, 160, 163, 3)\"\n      ]\n     },\n     \"execution_count\": 15,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"val_images.shape\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 4. Column initialization\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"With checkout write-enabled, we can now initialize a new column of the repository using the method `add_ndarray_column()`.\\n\",\n    \"\\n\",\n    \"All samples within a column have the same data type, and number of dimensions. The size of each dimension can be either fixed (the default behavior) or variable per sample.\\n\",\n    \"\\n\",\n    \"You will need to provide a column `name` and a `prototype`, so Hangar can infer the shape of the elements contained in the array.\\n\",\n    \"`train_im_col` will become a column accessor object.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_im_col = master_checkout.add_ndarray_column(\\n\",\n    \"    name=\\\"training_images\\\", prototype=train_images[0]\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Verify we successfully added the new column:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar Columns                \\n\",\n       \"    Writeable         : True                \\n\",\n       \"    Number of Columns : 1                \\n\",\n       \"    Column Names / Partial Remote References:                \\n\",\n       \"      - training_images / False\"\n      ]\n     },\n     \"execution_count\": 17,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Get useful information about the new column simply by inspecting `train_im_col` ...\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (160, 163, 3)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_im_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"... or by leveraging the dict-style columns access through the `checkout` object. They provide the same information.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (160, 163, 3)                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 19,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns[\\\"training_images\\\"]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Since Hangar 0.5, it's possible to have a column with string datatype, and we will be using it to store the labels of our dataset.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_lab_col = master_checkout.add_str_column(name=\\\"training_labels\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_labels                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : str                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : variable_shape                \\n\",\n       \"    DType                    : <class 'str'>                \\n\",\n       \"    Shape                    : None                \\n\",\n       \"    Number of Samples        : 0                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 21,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_lab_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 5. Adding data\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To add data to a named column, we can use dict-style mode (refer to the `__setitem__`, `__getitem__`, and `__delitem__` methods) or the `update()` method. Sample keys can be either `str` or `int` type.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_im_col[0] = train_images[0]\\n\",\n    \"train_lab_col[0] = train_labels[0]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"As we can see, `Number of Samples` is equal to 1 now.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_labels                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : str                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : variable_shape                \\n\",\n       \"    DType                    : <class 'str'>                \\n\",\n       \"    Shape                    : None                \\n\",\n       \"    Number of Samples        : 1                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 23,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.columns[\\\"training_labels\\\"]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"data = {1: train_images[1], 2: train_images[2]}\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_im_col.update(data)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_images                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : ndarray                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : fixed_shape                \\n\",\n       \"    DType                    : uint8                \\n\",\n       \"    Shape                    : (160, 163, 3)                \\n\",\n       \"    Number of Samples        : 3                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 26,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_im_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's add the remaining training images:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 27,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 9296/9296 [00:36<00:00, 257.92it/s]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"with train_im_col:\\n\",\n    \"    for i, img in tqdm(enumerate(train_images), total=train_images.shape[0]):\\n\",\n    \"        if i not in [0, 1, 2]:\\n\",\n    \"            train_im_col[i] = img\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 28,\n   \"metadata\": {\n    \"code_folding\": []\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 9296/9296 [00:01<00:00, 5513.23it/s] \\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"with train_lab_col:\\n\",\n    \"    for i, label in tqdm(enumerate(train_labels), total=len(train_labels)):\\n\",\n    \"        if i != 0:\\n\",\n    \"            train_lab_col[i] = label\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Hangar FlatSampleWriter                 \\n\",\n       \"    Column Name              : training_labels                \\n\",\n       \"    Writeable                : True                \\n\",\n       \"    Column Type              : str                \\n\",\n       \"    Column Layout            : flat                \\n\",\n       \"    Schema Type              : variable_shape                \\n\",\n       \"    DType                    : <class 'str'>                \\n\",\n       \"    Shape                    : None                \\n\",\n       \"    Number of Samples        : 9296                \\n\",\n       \"    Partial Remote Data Refs : False\\n\"\n      ]\n     },\n     \"execution_count\": 29,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"train_lab_col\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Both the `training_images` and the `training_labels` have 9296 samples. Great!\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \".. note:: To get an overview of the different ways you could add data to a Hangar repository (also from a performance point of view), please refer to the Performance section of the Hangar Tutorial Part 1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 6. Committing changes\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once you have made a set of changes you want to commit, simply call the `commit()` method and specify a message.\\n\",\n    \"\\n\",\n    \"The returned value (`a=ecc943c89b9b09e41574c9849f11937828fece28`) is the commit hash of this commit.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=ecc943c89b9b09e41574c9849f11937828fece28'\"\n      ]\n     },\n     \"execution_count\": 30,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add Imagenette training images and labels\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's add the validation data to the repository ...\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"val_im_col = master_checkout.add_ndarray_column(\\n\",\n    \"    name=\\\"validation_images\\\", prototype=val_images[0]\\n\",\n    \")\\n\",\n    \"val_lab_col = master_checkout.add_str_column(name=\\\"validation_labels\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 32,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100%|██████████| 3856/3856 [00:08<00:00, 474.25it/s]\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"with val_im_col, val_lab_col:\\n\",\n    \"    for img, label in tqdm(zip(val_images, val_labels), total=len(val_labels)):\\n\",\n    \"        val_im_col[i] = img\\n\",\n    \"        val_lab_col[i] = label\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"... and commit!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 33,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a=e31ef9a06c8d1a4cefeb52c336b2c33d1dca3fba'\"\n      ]\n     },\n     \"execution_count\": 33,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"master_checkout.commit(\\\"Add Imagenette validation images and labels\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To view the **history** of your commits:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 34,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"* a=e31ef9a06c8d1a4cefeb52c336b2c33d1dca3fba (\\u001B[1;31mmaster\\u001B[m) : Add Imagenette validation images and labels\\n\",\n      \"* a=ecc943c89b9b09e41574c9849f11937828fece28 : Add Imagenette training images and labels\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"master_checkout.log()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Do not forget to close the write-enabled checkout!\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 35,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"master_checkout.close()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Let's inspect the repository state! This will show disk usage information, the details of the last commit and all the information about the dataset columns.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 36,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Summary of Contents Contained in Data Repository \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Repository Info \\n\",\n      \"|----------------- \\n\",\n      \"|  Base Directory: /Volumes/Archivio/tensorwerk/hangar/imagenette \\n\",\n      \"|  Disk Usage: 862.09 MB \\n\",\n      \" \\n\",\n      \"=================== \\n\",\n      \"| Commit Details \\n\",\n      \"------------------- \\n\",\n      \"|  Commit: a=e31ef9a06c8d1a4cefeb52c336b2c33d1dca3fba \\n\",\n      \"|  Created: Sat Apr  4 11:29:12 2020 \\n\",\n      \"|  By: Alessia Marcolini \\n\",\n      \"|  Email: alessia@tensorwerk.com \\n\",\n      \"|  Message: Add Imagenette validation images and labels \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| DataSets \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Named Columns: 4 \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"training_images\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 9296 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (160, 163, 3) \\n\",\n      \"|    - dtype: uint8 \\n\",\n      \"|    - backend: 01 \\n\",\n      \"|    - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"training_labels\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 9296 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: str \\n\",\n      \"|    - schema_type: variable_shape \\n\",\n      \"|    - dtype: <class'str'> \\n\",\n      \"|    - backend: 30 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"validation_images\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 1 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: ndarray \\n\",\n      \"|    - schema_type: fixed_shape \\n\",\n      \"|    - shape: (160, 163, 3) \\n\",\n      \"|    - dtype: uint8 \\n\",\n      \"|    - backend: 01 \\n\",\n      \"|    - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \\n\",\n      \"|\\n\",\n      \"|  * Column Name: ColumnSchemaKey(column=\\\"validation_labels\\\", layout=\\\"flat\\\") \\n\",\n      \"|    Num Data Pieces: 1 \\n\",\n      \"|    Details: \\n\",\n      \"|    - column_layout: flat \\n\",\n      \"|    - column_type: str \\n\",\n      \"|    - schema_type: variable_shape \\n\",\n      \"|    - dtype: <class'str'> \\n\",\n      \"|    - backend: 30 \\n\",\n      \"|    - backend_options: {} \\n\",\n      \" \\n\",\n      \"================== \\n\",\n      \"| Metadata: \\n\",\n      \"|----------------- \\n\",\n      \"|  Number of Keys: 0 \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"repo.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Great! You've made it until the end of the \\\"Real World\\\" Quick Start Tutorial!! 👏🏼\\n\",\n    \"\\n\",\n    \"Please check out the other tutorials for more advanced stuff such as branching & merging, conflicts resolution and data loaders for TensorFlow and PyTorch!\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"hide_input\": false,\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.2\"\n  },\n  \"varInspector\": {\n   \"cols\": {\n    \"lenName\": 16,\n    \"lenType\": 16,\n    \"lenVar\": 40\n   },\n   \"kernels_config\": {\n    \"python\": {\n     \"delete_cmd_postfix\": \"\",\n     \"delete_cmd_prefix\": \"del \",\n     \"library\": \"var_list.py\",\n     \"varRefreshCmd\": \"print(var_dic_list())\"\n    },\n    \"r\": {\n     \"delete_cmd_postfix\": \") \",\n     \"delete_cmd_prefix\": \"rm(\",\n     \"library\": \"var_list.r\",\n     \"varRefreshCmd\": \"cat(var_dic_list()) \"\n    }\n   },\n   \"types_to_exclude\": [\n    \"module\",\n    \"function\",\n    \"builtin_function_or_method\",\n    \"instance\",\n    \"_Feature\"\n   ],\n   \"window_display\": false\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}"
  },
  {
    "path": "docs/api.rst",
    "content": ".. _ref-api:\n\n==========\nPython API\n==========\n\nThis is the python API for the Hangar project.\n\n\nRepository\n==========\n\n.. automodule:: hangar.repository\n   :members:\n\nRemotes\n=======\n\n.. autoclass:: Remotes()\n   :members:\n   :exclude-members: __init__\n\n\nWrite Enabled Checkout\n======================\n\nCheckout\n--------\n\n.. autoclass:: hangar.checkout.WriterCheckout()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __setitem__, __len__, __contains__, __iter__\n   :exclude-members: __init__\n\nColumns\n-------\n\n.. autoclass:: hangar.columns.column.Columns()\n   :members:\n   :special-members: __getitem__, __setitem__, __delitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\nFlat Column Layout Container\n----------------------------\n\n.. autoclass:: hangar.columns.layout_flat.FlatSampleWriter()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __setitem__, __delitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\nNested Column Layout Container\n------------------------------\n\n.. autoclass:: hangar.columns.layout_nested.NestedSampleWriter()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __setitem__, __delitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\n.. autoclass:: hangar.columns.layout_nested.FlatSubsampleWriter()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __setitem__, __delitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\nDiffer\n------\n\n.. autoclass:: hangar.diff.WriterUserDiff()\n   :members:\n   :exclude-members: __init__\n\nBulk Importer\n-------------\n\n.. automodule:: hangar.bulk_importer\n   :members:\n\n\nRead Only Checkout\n==================\n\nCheckout\n--------\n\n.. autoclass:: hangar.checkout.ReaderCheckout()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __len__, __contains__, __iter__\n   :exclude-members: __init__\n\n\nFlat Column Layout Container\n----------------------------\n\n.. autoclass:: hangar.columns.layout_flat.FlatSampleReader()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __setitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\n\nNested Column Layout Container\n------------------------------\n\n.. autoclass:: hangar.columns.layout_nested.NestedSampleReader()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\n.. autoclass:: hangar.columns.layout_nested.FlatSubsampleReader()\n   :members:\n   :inherited-members:\n   :special-members: __getitem__,, __contains__, __len__, __iter__\n   :exclude-members: __init__\n\n\nDiffer\n------\n\n.. autoclass:: hangar.diff.ReaderUserDiff()\n   :members:\n   :exclude-members: __init__\n\n\nML Framework Dataloaders\n========================\n\nTensorflow\n----------\n\n.. autofunction:: hangar.dataset.make_tensorflow_dataset\n\nPytorch\n-------\n\n.. autofunction:: hangar.dataset.make_torch_dataset\n\nNumpy\n-----\n\n.. autofunction:: hangar.dataset.make_numpy_dataset\n"
  },
  {
    "path": "docs/authors.rst",
    "content": ".. include:: ../AUTHORS.rst\n"
  },
  {
    "path": "docs/backends/hdf5_00.rst",
    "content": "Local HDF5 Backend\n==================\n\n.. automodule:: hangar.backends.hdf5_00\n"
  },
  {
    "path": "docs/backends/hdf5_01.rst",
    "content": "Fixed Shape Optimized Local HDF5\n================================\n\n.. automodule:: hangar.backends.hdf5_01"
  },
  {
    "path": "docs/backends/lmdb_30.rst",
    "content": "Variable Shape LMDB String Data Store\n=====================================\n\n.. automodule:: hangar.backends.lmdb_30\n"
  },
  {
    "path": "docs/backends/numpy_10.rst",
    "content": "Local NP Memmap Backend\n=======================\n\n.. automodule:: hangar.backends.numpy_10\n"
  },
  {
    "path": "docs/backends/remote_50.rst",
    "content": "Remote Server Unknown Backend\n=============================\n\n.. automodule:: hangar.backends.remote_50\n"
  },
  {
    "path": "docs/backends.rst",
    "content": ".. _ref-backends:\n\n.. note::\n\n   The following documentation contains highly technical descriptions of the\n   data writing and loading backends of the Hangar core. It is intended for\n   developer use only, with the functionality described herein being completely\n   hidden from regular users.\n\n   Any questions or comments can be directed to the `Hangar Github Issues Page\n   <https://github.com/tensorwerk/hangar-py/issues>`_\n\n=================\nBackend selection\n=================\n\n.. automodule:: hangar.backends.__init__\n\n\nBackend Specifications\n======================\n\n.. toctree::\n   :maxdepth: 2\n   :titlesonly:\n\n   ./backends/hdf5_00\n   ./backends/hdf5_01\n   ./backends/numpy_10\n   ./backends/lmdb_30\n   ./backends/remote_50\n"
  },
  {
    "path": "docs/benchmarking.rst",
    "content": ".. include:: ../asv_bench/README.rst"
  },
  {
    "path": "docs/changelog.rst",
    "content": ".. include:: ../CHANGELOG.rst\n"
  },
  {
    "path": "docs/cli.rst",
    "content": "Hangar CLI Documentation\n========================\n\nThe CLI described below is automatically available after the Hangar Python\npackage has been installed (either through a package manager or via source\nbuilds). In general, the commands require the terminals ``cwd`` to be at the\nsame level the repository was initially created in.\n\nSimply start by typing ``$ hangar --help`` in your terminal to get started!\n\n.. click:: hangar.cli:main\n   :prog: hangar\n   :show-nested:\n"
  },
  {
    "path": "docs/codeofconduct.rst",
    "content": ".. _ref-code-of-conduct:\n\n.. include:: ../CODE_OF_CONDUCT.rst"
  },
  {
    "path": "docs/concepts.rst",
    "content": ".. _ref-concepts:\n\n####################\nHangar Core Concepts\n####################\n\n.. warning::\n\n  The usage info displayed in the ``latest`` build of the project\n  documentation do not reflect recent changes to the API and internal\n  structure of the project. They should not be relied on at the current\n  moment; they will be updated over the next weeks, and will be in line before\n  the next release.\n\nThis document provides a high level overview of the problems Hangar is designed\nto solve and introduces the core concepts for beginning to use Hangar.\n\n***************\nWhat Is Hangar?\n***************\n\nAt its core Hangar is designed to solve many of the same problems faced by\ntraditional code version control system (ie. ``Git``), just adapted for\nnumerical data:\n\n* Time travel through the historical evolution of a dataset\n* Zero-cost Branching to enable exploratory analysis and collaboration\n* Cheap Merging to build datasets over time (with multiple collaborators)\n* Completely abstracted organization and management of data files on disk\n* Ability to only retrieve a small portion of the data (as needed) while still\n  maintaining complete historical record\n* Ability to push and pull changes directly to collaborators or a central\n  server (ie. a truly distributed version control system)\n\nThe ability of version control systems to perform these tasks for codebases is\nlargely taken for granted by almost every developer today; however, we are\nin-fact standing on the shoulders of giants, with decades of engineering which\nhas resulted in these phenomenally useful tools. Now that a new era of\n\"Data-Defined software\" is taking hold, we find there is a strong need for\nanalogous version control systems which are designed to handle numerical data\nat large scale... Welcome to Hangar!\n\n***********\nInspiration\n***********\n\nThe design of Hangar was heavily influenced by the `Git <https://git-scm.org>`_\nsource-code version control system. As a Hangar user, many of the fundamental\nbuilding blocks and commands can be thought of as interchangeable:\n\n* checkout\n* commit\n* branch\n* merge\n* diff\n* push\n* pull/fetch\n* log\n\nEmulating the high level the git syntax has allowed us to create a user\nexperience which should be familiar in many ways to Hangar users; a goal of the\nproject is to enable many of the same VCS workflows developers use for code\nwhile working with their data!\n\nThere are, however, many fundamental differences in how humans/programs\ninterpret and use text in source files vs. numerical data which raise many\nquestions Hangar needs to uniquely solve:\n\n* How do we connect some piece of \"Data\" with a meaning in the real world?\n* How do we diff and merge large collections of data samples?\n* How can we resolve conflicts?\n* How do we make data access (reading and writing) convenient for both\n  user-driven exploratory analyses and high performance production systems\n  operating without supervision?\n* How can we enable people to work on huge datasets in a local (laptop grade)\n  development environment?\n\nWe will show how Hangar solves these questions in a high-level guide below.\nFor a deep dive into the Hangar internals, we invite you to check out the\n:ref:`ref-hangar-under-the-hood` page.\n\n****************************\nHow Hangar Thinks About Data\n****************************\n\nAbstraction 0: What is a Repository?\n====================================\n\nA \"Repository\" consists of an historically ordered mapping of \"Commits\" over\ntime by various \"Committers\" across any number of \"Branches\". Though there are\nmany conceptual similarities in what a Git repo and a Hangar Repository\nachieve, Hangar is designed with the express purpose of dealing with numeric\ndata. As such, when you read/write to/from a Repository, the main way of\ninteraction with information will be through (an arbitrary number of) Columns\nin each Commit. A simple key/value store is also included to store metadata,\nbut as it is a minor point is will largely be ignored for the rest of this\npost.\n\nHistory exists at the Repository level, Information exists at the Commit level.\n\nAbstraction 1: What is a Dataset?\n=================================\n\nLet's get philosophical and talk about what a \"Dataset\" is. The word \"Dataset\"\ninvokes some meaning to humans; a dataset may have a canonical name (like\n\"MNIST\" or \"CoCo\"), it will have a source where it comes from, (ideally) it has\na purpose for some real-world task, it will have people who build, aggregate,\nand nurture it, and most importantly a Dataset always contains pieces of some\ntype of information type which describes \"something\".\n\nIt's an abstract definition, but it is only us, the humans behind the machine,\nwhich associate \"Data\" with some meaning in the real world; it is in the same\nvein which we associate a group of Data in a \"Dataset\" with some real world\nmeaning.\n\nOur first abstraction is therefore the \"Dataset\": a collection of (potentially\ngroups of) data pieces observing a common form among instances which act to\ndescribe something meaningful. *To describe some phenomenon, a dataset may\nrequire multiple pieces of information, each of a particular format, for each\ninstance/sample recorded in the dataset.*\n\n   **For Example**\n\n   a Hospital will typically have a *Dataset* containing all of the CT scans\n   performed over some period of time. A single CT scan is an instance, a\n   single sample; however, once many are grouped together they form a\n   *Dataset*. To expand on this simple view we realize that each CT scan\n   consists of hundreds of pieces of information:\n\n      * Some large ``numeric array`` (the image data).\n      * Some smaller ``numeric tuples`` (describing image spacing, dimension\n        scale, capture time, machine parameters, etc).\n      * Many pieces of ``string`` data (the patient name, doctor name, scan\n        type, results found, etc).\n\nWhen thinking about the group of CT scans in aggregate, we realize that\nthough a single scan contains many disparate pieces of information stuck\ntogether, when thinking about the aggregation of every scan in the group,\nmost of (if not all) of the same information fields are duplicated within\neach samples.\n\n*A single scan is a bunch of disparate information stuck together, many of\nthose put together makes a Dataset, but looking down from the top, we identify\npattern of common fields across all items. We call these groupings of similar\ntyped information:* **Columns**.\n\nAbstraction 2: What Makes up a Column?\n======================================\n\nA ``Dataset`` is made of one or more ``Columns`` (and optionally some\n``Metadata``), with each item placed in some ``Column`` belonging to and\nmaking up an individual ``Sample``. It is important to remember that all data\nneeded to fully describe a single ``sample`` in a ``Dataset`` may consist of\ninformation spread across any number of ``Columns``. To define a ``Column``\nin Hangar, we only need to provide:\n\n* a name\n* a type\n* a shape\n\nThe individual pieces of information (``Data``) which fully describe some\nphenomenon via an aggregate mapping access across any number of \"Columns\" are\nboth individually and collectively referred to as ``Samples`` in the Hangar\nvernacular. According to the specification above, all samples contained in a\n``Column`` must be numeric arrays with each having:\n\n1) Same data type (standard ``numpy`` data types are supported).\n2) A shape with each dimension size <= the shape (``max shape``) set in the\n   ``column`` specification (more on this later).\n\nAdditionally, samples in a ``column`` can either be named, or unnamed\n(depending on how you interpret what the information contained in the\n``column`` actually represents).\n\n\n\nEffective use of Hangar relies on having an understanding of what exactly a\n``\"Sample\"`` is in a particular ``Column``. The most effective way to find\nout is to ask: \"What is the smallest piece of data which has a useful meaning\nto 'me' (or 'my' downstream processes\"). In the MNIST ``column``, this would\nbe a single digit image (a 28x28 array); for a medical ``column`` it might be\nan entire (512x320x320) MRI volume scan for a particular patient; while for the\nNASDAQ Stock Ticker it might be an hours worth of price data points (or less,\nor more!) The point is that **when you think about what a ``sample`` is, it\nshould typically be the smallest atomic unit of useful information.**\n\nAbstraction 3: What is Data?\n============================\n\nFrom this point forward, **when we talk about \"Data\" we are actually talking\nabout n-dimensional arrays of numeric information. To Hangar, \"Data\" is just a\ncollection of numbers being passed into and out of it.** Data does not have a\nfile type, it does not have a file-extension, it does not mean anything to\nHangar itself - it is just numbers. This theory of \"Data\" is nearly as simple\nas it gets, and this simplicity is what enables us to be unconstrained as we\nbuild abstractions and utilities to operate on it.\n\nSummary\n=======\n\n.. code-block:: text\n\n   A Dataset is thought of as containing Samples, but is actually defined by\n   Columns, which store parts of fully defined Samples in structures common\n   across the full aggregation of Dataset Samples.\n\n   This can essentially be represented as a key -> tensor mapping, which can\n   (optionally) be Sparse depending on usage patterns\n\n                         Dataset\n                            |\n         -----------------------------------------\n         |            |            |             |\n      Column 1     Column 2     Column 3      Column 4\n         |            |            |             |\n   ------------------------------------------------------\n       image    |  filename  |   label    |  annotation |\n   ------------------------------------------------------\n         S1     |     S1     |            |      S1     |\n         S2     |     S2     |     S2     |      S2     |\n         S3     |     S3     |     S3     |             |\n         S4     |     S4     |            |             |\n\n   More techincally, a Dataset is just a view over the columns that gives you\n   sample tuples based on the cross product of keys and columns. Hangar doesn't\n   store or track the data set, just the underlying columns.\n\n    S1 = (image[S1], filename[S1], annotation[S1])\n    S2 = (image[S2], filename[S2], label[S2], annotation[S2])\n    S3 = (image[S3], filename[S3], label[S3])\n    S4 = (image[S4], filename[S4])\n\n\n.. note::\n\n   The technical crowd among the readers should note:\n\n      * Hangar preserves all sample data bit-exactly.\n      * Dense arrays are fully supported, Sparse array support is currently\n        under development and will be released soon.\n      * Integrity checks are built in by default (explained in more detail in\n        :ref:`ref-hangar-under-the-hood`.) using cryptographically secure\n        algorithms.\n      * Hangar is very much a young project, until penetration tests and\n        security reviews are performed, we will refrain from stating that Hangar\n        is fully \"cryptographically secure\". Security experts are welcome to\n        contact us privately at `hangar.info@tensorwerk.com\n        <hangar.info@tensorwerk.com>`__ to disclose any security issues.\n\n\n******************************************\nImplications of the Hangar Data Philosophy\n******************************************\n\nThe Domain-Specific File Format Problem\n=======================================\n\nThough it may seem counterintuitive at first, there is an incredible\namount of freedom (and power) that is gained when \"you\" (the user) start to\ndecouple some information container from the data which it actually holds. At\nthe end of the day, the algorithms and systems you use to produce insight from\ndata are just mathematical operations; math does not operate on a specific file\ntype, math operates on numbers.\n\nHuman & Computational Cost\n--------------------------\n\nIt seems strange that organizations & projects commonly rely on storing data on\ndisk in some domain-specific - or custom built - binary format (ie. a ``.jpg``\nimage, ``.nii`` neuroimaging informatics study, ``.cvs`` tabular data, etc.),\nand just deal with the hassle of maintaining all the infrastructure around\nreading, writing, transforming, and preprocessing these files into useable\nnumerical data every time they want to interact with their Columns. Even\ndisregarding the computational cost/overhead of preprocessing & transforming\nthe data on every read/write, these schemes require significant amounts of\nhuman capital (developer time) to be spent on building, testing, and\nupkeep/maintenance; all while adding significant complexity for users. Oh, and\nthey also have a strangely high inclination to degenerate into horrible\ncomplexity which essentially becomes \"magic\" after the original creators move\non.\n\nThe Hangar system is quite different in this regards. First, **we trust that\nyou know what your data is and what it should be best represented as**. When\nwriting to a Hangar repository, you process the data into n-dimensional arrays\nonce. Then when you retrieve it you are provided with the same array, in the\nsame shape and datatype (unless you ask for a particular subarray-slice),\nalready initialized in memory and ready to compute on instantly.\n\nHigh Performance From Simplicity\n--------------------------------\n\nBecause Hangar is designed to deal (almost exclusively) with numerical arrays,\nwe are able to \"stand on the shoulders of giants\" once again by utilizing many\nof the well validated, highly optimized, and community validated numerical\narray data management utilities developed by the High Performance Computing\ncommunity over the past few decades.\n\nIn a sense, the backend of Hangar serves two functions:\n\n1) Bookkeeping: recording information about about columns, samples, commits,\n   etc.\n2) Data Storage: highly optimized interfaces which store and retrieve data from\n   from disk through its backend utility.\n\nThe details are explained much more thoroughly in\n:ref:`ref-hangar-under-the-hood`.\n\nBecause Hangar only considers data to be numbers, the choice of backend to\nstore data is (in a sense) completely arbitrary so long as ``Data In == Data\nOut``. **This fact has massive implications for the system**; instead of being\ntied to a single backend (each of which will have significant performance\ntradeoffs for arrays of particular datatypes, shapes, and access patterns), we\nsimultaneously store different data pieces in the backend which is most suited\nto it. A great deal of care has been taken to optimize parameters in the\nbackend interface which affects performance and compression of data samples.\n\nThe choice of backend to store a piece of data is selected automatically from\nheuristics based on the column specification, system details, and context of\nthe storage service internal to Hangar. **As a user, this is completely\ntransparent to you** in all steps of interacting with the repository. It does\nnot require (or even accept) user specified configuration.\n\nAt the time of writing, Hangar has the following backends implemented (with\nplans to potentially support more as needs arise):\n\n1) `HDF5 <https://www.hdfgroup.org/solutions/hdf5/>`_\n2) `Memmapped Arrays <https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html>`_\n3) `TileDb <https://tiledb.io/>`_ (in development)\n\n\nOpen Source Software Style Collaboration in Dataset Curation\n=============================================================\n\nSpecialized Domain Knowledge is A Scarce Resource\n-------------------------------------------------\n\nA common side effect of the `The Domain-Specific File Format Problem`_ is that\nanyone who wants to work with an organization's/project's data needs to not\nonly have some domain expertise (so they can do useful things with the data),\nbut they also need to have a non-trivial understanding of the projects\ndataset, file format, and access conventions / transformation pipelines. *In a\nworld where highly specialized talent is already scarce, this phenomenon\nshrinks the pool of available collaborators dramatically.*\n\nGiven this situation, it's understandable why when most organizations spend\nmassive amounts of money and time to build a team, collect & annotate data, and\nbuild an infrastructure around that information, they hold it for their private\nuse with little regards for how the world could use it together. Businesses\nrely on proprietary information to stay ahead of their competitors, and because\nthis information is so difficult (and expensive) to generate, it's completely\nreasonable that they should be the ones to benefit from all that work.\n\n    **A Thought Experiment**\n\n    Imagine that ``Git`` and ``GitHub`` didn't take over the world. Imagine\n    that the ``Diff`` and ``Patch`` Unix tools never existed. Instead, imagine\n    we were to live in a world where every software project had very different\n    version control systems (largely homeade by non VCS experts, & not\n    validated by a community over many years of use). Even worse, most of these\n    tools don't allow users to easily branch, make changes, and automatically\n    merge them back. It shouldn't be difficult to imagine how dramatically such\n    a world would contrast to ours today. Open source software as we know it\n    would hardly exist, and any efforts would probably be massively fragmented\n    across the web (if there would even be a 'web' that we would recognize in\n    this strange world).\n\n    Without a way to collaborate in the open, open source software would\n    largely not exist, and we would all be worse off for it.\n\n    Doesn't this hypothetical sound quite a bit like the state of open source\n    data collaboration in todays world?\n\nThe impetus for developing a tool like Hangar is the belief that if it is\nsimple for anyone with domain knowledge to collaboratively curate columns\ncontaining information they care about, then they will.* Open source software\ndevelopment benefits everyone, we believe open source column curation can do\nthe same.\n\nHow To Overcome The \"Size\" Problem\n----------------------------------\n\nEven if the greatest tool imaginable existed to version, branch, and merge\ncolumns, it would face one massive problem which if it didn't solve would\nkill the project: *The size of data can very easily exceeds what can fit on\n(most) contributors laptops or personal workstations*. This section explains\nhow Hangar can handle working with columns which are prohibitively large to\ndownload or store on a single machine.\n\nAs mentioned in `High Performance From Simplicity`_, under the hood Hangar\ndeals with \"Data\" and \"Bookkeeping\" completely separately. We've previously\ncovered what exactly we mean by Data in `How Hangar Thinks About Data`_, so\nwe'll briefly cover the second major component of Hangar here. In short\n\"Bookkeeping\" describes everything about the repository. By everything, we do\nmean that the Bookkeeping records describe everything: all commits, parents,\nbranches, columns, samples, data descriptors, schemas, commit message, etc.\nThough complete, these records are fairly small (tens of MB in size for\ndecently sized repositories with decent history), and are highly compressed for\nfast transfer between a Hangar client/server.\n\n    **A brief technical interlude**\n\n    There is one very important (and rather complex) property which gives\n    Hangar Bookeeping massive power: **Existence of some data piece is always\n    known to Hangar and stored immutably once committed. However, the access\n    pattern, backend, and locating information for this data piece may (and\n    over time, will) be unique in every hangar repository instance**.\n\n    Though the details of how this works is well beyond the scope of this\n    document, the following example may provide some insight into the\n    implications of this property:\n\n        If you ``clone`` some hangar repository, Bookeeping says that \"some\n        number of data pieces exist\" and they should retrieved from the server.\n        However, the bookeeping records transfered in a ``fetch`` / ``push`` /\n        ``clone`` operation do not include information about where that piece\n        of data existed on the client (or server) computer. Two synced\n        repositories can use completely different backends to store the data, in\n        completly different locations, and it does not matter - Hangar only\n        guarantees that when collaborators ask for a data sample in some\n        checkout, that they will be provided with identical arrays, not that\n        they will come from the same place or be stored in the same way. Only\n        when data is actually retrieved the \"locating information\" is set for\n        that repository instance.\n\nBecause Hangar makes no assumptions about how/where it should retrieve some\npiece of data, or even an assumption that it exists on the local machine, and\nbecause records are small and completely describe history, once a machine has\nthe Bookkeeping, it can decide what data it actually wants to materialize on\nit's local disk! These ``partial fetch`` / ``partial clone`` operations can\nmaterialize any desired data, whether it be for a few records at the head\nbranch, for all data in a commit, or for the entire historical data. A future\nrelease will even include the ability to stream data directly to a Hangar\ncheckout and materialize the data in memory without having to save it to disk\nat all!\n\nMore importantly: **Since Bookkeeping describes all history, merging can be\nperformed between branches which may contain partial (or even no) actual\ndata.** Aka **you don't need data on disk to merge changes into it.** It's an odd\nconcept which will be explained more in depth in the future.\n\n..note ::\n\n   To try this out for yourself, please refer to the the API Docs\n   (:ref:`ref-api`) on working with Remotes, especially the ``fetch()`` and\n   ``fetch-data()`` methods. Otherwise look for through our tutorials &\n   examples for more practical info!\n\nWhat Does it Mean to \"Merge\" Data?\n----------------------------------\n\nWe'll start this section, once again, with a comparison to source code version\ncontrol systems. When dealing with source code text, merging is performed in\norder to take a set of changes made to a document, and logically insert the\nchanges into some other version of the document. The goal is to generate a new\nversion of the document with all changes made to it in a fashion which conforms\nto the \"change author's\" intentions. Simply put: the new version is valid and\nwhat is expected by the authors.\n\nThis concept of what it means to merge text does not generally map well to\nchanges made in a column we'll explore why through this section, but look\nback to the philosophy of Data outlined in `How Hangar Thinks About Data`_ for\ninspiration as we begin. Remember, in the Hangar design a Sample is the\nsmallest array which contains useful information. As any smaller selection of\nthe sample array is meaningless, Hangar does not support subarray-slicing or\nper-index updates *when writing* data. (subarray-slice queries are permitted\nfor read operations, though regular use is discouraged and may indicate that\nyour samples are larger than they should be).\n\nDiffing Hangar Checkouts\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo understand merge logic, we first need to understand diffing, and the actors\noperations which can occur.\n\n:Addition:\n\n    An operation which creates a column, sample, or some metadata which\n    did not previously exist in the relevant branch history.\n\n:Removal:\n\n    An operation which removes some column, a sample, or some metadata which\n    existed in the parent of the commit under consideration. (Note: removing a\n    column also removes all samples contained in it).\n\n:Mutation:\n\n    An operation which sets: data to a sample, the value of some metadata key,\n    or a column schema, to a different value than what it had previously been\n    created with (Note: a column schema mutation is observed when a column\n    is removed, and a new column with the same name is created with a\n    different dtype/shape, all in the same commit).\n\nMerging Changes\n^^^^^^^^^^^^^^^\n\nMerging diffs solely consisting of additions and removals between branches is\ntrivial, and performs exactly as one would expect from a text diff. Where\nthings diverge from text is when we consider how we will merge diffs containing\nmutations.\n\nSay we have some sample in commit A, a branch is created, the sample is\nupdated, and commit C is created. At the same time, someone else checks out\nbranch whose HEAD is at commit A, and commits a change to the sample as well.\nIf these changes are identical, they are compatible, but what if they are not?\nIn the following example, we diff and merge each element of the sample array\nlike we would text:\n\n::\n\n                                                   Merge ??\n      commit A          commit B            Does combining mean anything?\n\n    [[0, 1, 2],        [[0, 1, 2],               [[1, 1, 1],\n     [0, 1, 2], ----->  [2, 2, 2], ------------>  [2, 2, 2],\n     [0, 1, 2]]         [3, 3, 3]]      /         [3, 3, 3]]\n          \\                            /\n           \\            commit C      /\n            \\                        /\n             \\          [[1, 1, 1], /\n              ------->   [0, 1, 2],\n                         [0, 1, 2]]\n\nWe see that a result can be generated, and can agree if this was a piece of\ntext, the result would be correct. Don't be fooled, this is an abomination and\nutterly wrong/meaningless. Remember we said earlier ``\"the result of a merge\nshould conform to the intentions of each author\"``. This merge result conforms\nto neither author's intention. The value of an array element is not isolated,\nevery value affects how the entire sample is understood. The values at Commit B\nor commit C may be fine on their own, but if two samples are mutated\nindependently with non-identical updates, it is a conflict that needs to be\nhandled by the authors.\n\nThis is the actual behavior of Hangar.\n\n::\n\n      commit A          commit B\n\n    [[0, 1, 2],        [[0, 1, 2],\n     [0, 1, 2], ----->  [2, 2, 2], ----- MERGE CONFLICT\n     [0, 1, 2]]         [3, 3, 3]]      /\n          \\                            /\n           \\            commit C      /\n            \\                        /\n             \\          [[1, 1, 1], /\n              ------->   [0, 1, 2],\n                         [0, 1, 2]]\n\nWhen a conflict is detected, the merge author must either pick a sample from\none of the commits or make changes in one of the branches such that the\nconflicting sample values are resolved.\n\nHow Are Conflicts Detected?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAny merge conflicts can be identified and addressed ahead of running a\n``merge`` command by using the built in ``diff`` tools. When diffing commits,\nHangar will provide a list of conflicts which it identifies. In general these\nfall into 4 categories:\n\n1) **Additions** in both branches which created new keys (samples / columns /\n   metadata) with non-compatible values. For samples & metadata, the hash of\n   the data is compared, for columns, the schema specification is checked for\n   compatibility in a method custom to the internal workings of Hangar.\n2) **Removal** in ``Master Commit / Branch`` **& Mutation** in ``Dev Commit /\n   Branch``. Applies for samples, columns, and metadata identically.\n3) **Mutation** in ``Dev Commit / Branch`` **& Removal** in ``Master Commit /\n   Branch``. Applies for samples, columns, and metadata identically.\n4) **Mutations** on keys both branches to non-compatible values. For samples &\n   metadata, the hash of the data is compared, for columns, the schema\n   specification is checked for compatibility in a method custom to the\n   internal workings of Hangar.\n\n************\nWhat's Next?\n************\n\n* Get started using Hangar today: :ref:`ref_installation`.\n* Read the tutorials: :ref:`ref-tutorial`.\n* Dive into the details: :ref:`ref-hangar-under-the-hood`.\n"
  },
  {
    "path": "docs/conf.py",
    "content": "# -*- coding: utf-8 -*-\nfrom __future__ import unicode_literals\n\nimport os\n\n\nextensions = [\n    'sphinx.ext.autodoc',\n    'sphinx.ext.autosummary',\n    'sphinx.ext.coverage',\n    'sphinx.ext.doctest',\n    'sphinx.ext.extlinks',\n    'sphinx.ext.ifconfig',\n    'sphinx.ext.napoleon',\n    'sphinx.ext.todo',\n    'sphinx.ext.intersphinx',\n    'sphinx_click.ext',\n    'nbsphinx',\n    'sphinx_copybutton',\n    'sphinx.ext.mathjax',\n    'recommonmark',\n    'IPython.sphinxext.ipython_console_highlighting',\n]\n\n\nif os.getenv('SPELLCHECK'):\n    extensions += 'sphinxcontrib.spelling',\n    spelling_show_suggestions = True\n    spelling_lang = 'en_US'\n\n# to exclude traditional Python prompts from your copied code\ncopybutton_prompt_text = \">>> \"\n# All lines of the code blocks will be copied after the prompts are stripped.\ncopybutton_only_copy_prompt_lines = False\n\n\nnbsphinx_execute = 'never'\n\nautodoc_mock_imports = ['torch', 'tensorflow']\nautosummary_generate = True\n\nsource_suffix = {\n    '.rst': 'restructuredtext',\n    '.txt': 'markdown',\n    '.md': 'markdown',\n}\nmaster_doc = 'index'\nproject = 'Hangar'\nyear = '2019-2020'\nauthor = 'Richard Izzo'\ncopyright = '{0}, {1}'.format(year, author)\nversion = release = '0.5.2'\n\npygments_style = 'default'\npygments_lexer = 'PythonConsoleLexer'\nhighlight_options = {\n    'python3': True\n}\ntemplates_path = ['.']\nexclude_patterns = ['_build', '**.ipynb_checkpoints']\nextlinks = {\n    'issue': ('https://github.com/tensorwerk/hangar-py/issues/%s', '#'),\n    'pr': ('https://github.com/tensorwerk/hangar-py/pull/%s', 'PR #'),\n}\nintersphinx_mapping = {\n    'python': ('https://docs.python.org/3', None),\n    'torch': ('https://pytorch.org/docs/master', None),\n    'numpy': ('http://docs.scipy.org/doc/numpy', None),\n}\n\n# Regular expressions that match URIs that should not be checked\n# when doing a linkcheck build\nlinkcheck_ignore = [\n    r'http://localhost:\\d+/?', 'http://localhost/',\n    'https://github.com/tensorwerk/hangar-py',\n    r'https://github.com/tensorwerk/hangar-py/.*',\n    r'http://tensorwerk.com/hangar-benchmarks/',\n    r'https://tensorwerk.com/hangar-benchmarks',\n]\nlinkcheck_retries = 3\n\n# on_rtd is whether we are on readthedocs.org\non_rtd = os.environ.get('READTHEDOCS', None) == 'True'\n\n# if not on_rtd:  # only set the theme if we're building docs locally\n# html_theme = 'sphinx_rtd_theme'\nhtml_theme = 'sphinx_material'\n\nhtml_sidebars = {\n    \"**\": [\"logo-text.html\", \"globaltoc.html\", \"localtoc.html\", \"searchbox.html\"]\n}\n\nhtml_short_title = '%s-%s' % (project, version)\n\nnapoleon_use_ivar = True\nnapoleon_use_rtype = True\nnapoleon_use_param = True\nnapoleon_include_init_with_doc = True\n\nadd_module_names = False\ndoctest_test_doctest_blocks = None\nautoclass_content = 'class'\n\n# Material theme options (see theme.conf for more information)\nhtml_theme_options = {\n\n    # Set the name of the project to appear in the navigation.\n    'nav_title': 'Hangar',\n\n    # Set the color and the accent color\n    'color_primary': 'deep-purple',\n    'color_accent': 'blue',\n\n    # Set the repo location to get a badge with stats\n    'repo_url': 'https://github.com/tensorwerk/hangar-py/',\n    'repo_name': 'Hangar',\n    'repo_type': 'github',\n\n\n    # Visible levels of the global TOC; -1 means unlimited\n    'globaltoc_depth': -1,\n    # If False, expand all TOC entries\n    'globaltoc_collapse': True,\n    # If True, show hidden TOC entries\n    'globaltoc_includehidden': True,\n}\n"
  },
  {
    "path": "docs/contributing.rst",
    "content": ".. include:: ../CONTRIBUTING.rst"
  },
  {
    "path": "docs/contributingindex.rst",
    "content": ".. _ref-contributing:\n\n######################\nContributing to Hangar\n######################\n\n.. toctree::\n   :maxdepth: 2\n\n   contributing\n   codeofconduct\n   benchmarking\n"
  },
  {
    "path": "docs/design.rst",
    "content": ".. _ref-hangar-under-the-hood:\n\n=====================\nHangar Under The Hood\n=====================\n\nAt its core, Hangar is a content addressable data store whose design\nrequirements were inspired by the Git version control system.\n\n\nThings In Life Change, Your Data Shouldn't\n==========================================\n\nWhen designing a high performance data version control system, achieving\nperformance goals while ensuring consistency is incredibly difficult. Memory is\nfast, disk is slow; not much we can do about it. But since Hangar should\ndeal with any numeric data in an array of any size (with an enforced limit of\n31 dimensions in a sample...) we have to find ways to work *with* the disk,\nnot against it.\n\nUpon coming to terms with this face, we are actually presented with a problem\nonce we realize that we live in the real world, and real world is ugly.\nComputers crash, processes get killed, and people do * *interesting* * things.\nBecause of this, It is a foundational design principle for us to **guarantee\nthat once Hangar says data has been successfully added to the repository, it is\nactually persisted.** This essentially means that any process which interacts\nwith data records on disk must be stateless. If (for example) we were to keep a\nrecord of all data added to the staging area in an in-memory list, and the\nprocess gets killed, we may have just lost references to all of the array data,\nand may not even be sure that the arrays were flushed to disk properly. These\nsituations are a NO-GO from the start, and will always remain so.\n\nSo, we come to the first design choice: **read and write actions are atomic**.\nOnce data is added to a Hangar repository, the numeric array along with the\nnecessary book-keeping records will *always* occur transactionally, ensuring\nthat when something unexpected happens, the data and records are committed to\ndisk.\n\n.. note::\n\n  The atomicity of interactions is completely hidden from a normal user; they\n  shouldn't have to care about this or even know this exists. However, this\n  is also why using the context-manager style column interaction scheme can\n  result in ~2x times speedup on writes/reads. We can just pass on most of the\n  work to the Python ``contextlib`` package instead of having to begin and\n  commit/abort (depending on interaction mode) transactions with every call to\n  an `add` or `get` method.\n\n\nData Is Large, We Don't Waste Space\n===================================\n\nFrom the very beginning we knew that while it would be easy to just store all\ndata in every commit as independent arrays on disk, such a naive implementation\nwould just absolutely eat up disk space for any repository with a non-trivial\nhistory. Hangar commits should be fast and use minimal disk space, duplicating\ndata just doesn't make sense for such a system. And so we decided on\nimplementing a content addressable data store backend.\n\nWhen a user requests to add data to a Hangar repository, one of the first\noperations which occur is to generate a hash of the array contents. If the hash\ndoes not match a piece of data already placed in the Hangar repository, the\ndata is sent to the appropriate storage backend methods. On success, the\nbackend sends back some arbitrary specification which can be used to retrieve\nthat same piece of data from that particular backend. The record backend then\nstores a key/value pair of (`hash`, `backend_specification`).\n\n.. note::\n\n  The record backend stores hash information in a separate location from the\n  commit references (which associate a `(columnname, sample name/id)` to a\n  `sample_hash`). This let's us separate the historical repository\n  information from a particular computer's location of a data piece. All we need in\n  the public history is to know that some data with a particular hash is\n  associated with a commit. No one but the system which actually needs to access\n  the data needs to know where it can be found.\n\nOn the other hand, if a data sample is added to a repository which already has\na record of some hash, we don't even involve the storage backend. All we need\nto do is just record that a new sample in a column was added with that hash.\nIt makes no sense to write the same data twice.\n\nThis method can actually result in massive space savings for some common use\ncases. For the MNIST column, the training label data is typically a 1D-array\nof size 50,000. Because there are only 10 labels, we only need to store 10 ints\non disk, and just keep references to the rest.\n\n\nThe Basics of Collaboration: Branching and Merging\n==================================================\n\nUp to this point, we haven't actually discussed much about how data and records\nare treated on disk. We'll leave an entire walkthrough of the backend record\nstructure for another tutorial, but let's introduce the basics here, and see\nhow we enable the types of branching and merging operations you might be used\nto with source code (at largely the same speed!).\n\nHere's a few core principles to keep in mind:\n\nNumbers == Numbers\n------------------\n\nHangar has no concept of what a piece of data is outside of a string of bytes /\nnumerical array, and most importantly, *hangar does not care*; Hangar is a\ntool, and we leave it up to you to know what your data actually means)!\n\nAt the end of the day when the data is placed into *some* collection on disk,\nthe storage backend we use won't care either. In fact, this is the entire\nreason why Hangar can do what it can; we don't attempt to treat data as\nanything other then a series of bytes on disk!\n\nThe fact that *Hangar does not care about what your data represents* is a\nfundamental underpinning of how the system works under the hood. It is the\n*designed and intended behavior* of Hangar to dump arrays to disk in what would\nseem like completely arbitrary buffers/locations to an outside observer. And\nfor the most part, they would be essentially correct in their observation that\ndata samples on disk are in strange locations.\n\nWhile there is almost no organization or hierarchy for the actual data samples\nwhen they are stored on disk, that is not to say that they are stored without\ncare! We may not care about global trends, but we do care a great deal about\nthe byte order/layout, sequentiality, chunking/compression and validations\noperations which are applied across the bytes which make up a data sample.\n\nIn other words, we optimize for utility and performance on the backend, not so\nthat a human can understand the file format without a computer! After the array\nhas been saved to disk, all we care about is that bookkeeper can record some\nunique information about where some piece of content is, and how we can read\nit. **None of that information is stored alongside the data itself - Remember:\nnumbers are just numbers - they don't have any concept of what they are**.\n\n\nRecords != Numbers\n------------------\n\n*The form numerical data takes once dumped on disk is completely irrelevant to\nthe specifications of records in the repository history.*\n\nNow, let's unpack this for a bit. We know from `Numbers == Numbers`_ that data\nis saved to disk in some arbitrary locations with some arbitrary backend. We\nalso know from `Data Is Large, We Don't Waste Space`_ that the permanent\nrepository information only contains a record which links a sample name to a\nhash. We also assert that there is also a mapping of hash to storage backend\nspecification kept somewhere (doesn't matter what that mapping is for the\nmoment). With those 3 pieces of information, it's obvious that once data is\nplaced in the repository, we don't actually need to interact with it to\nunderstand the accounting of what was added when!\n\nIn order to make a commit, we just pack up all the records which existed in the\nstaging area, create a hash of the records (including the hash of any parent\ncommits), and then store the commit hash mapping alongside details such as the\ncommit user/email and commit message, and a compressed version of the full\ncommit records as they existed at that point in time.\n\n.. note::\n\n  That last point \"storing a compressed version of the full commit records\", is\n  semi inefficient, and will be changed in the future so that unchanged records\n  are note duplicated across commits.\n\nAn example is given below of the keys -> values mapping which stores each of\nthe staged records, and which are packed up / compressed on commit (and\nsubsequently unpacked on checkout!).\n\n::\n\n    Num asets                      'a.'               -> '2'\n    ---------------------------------------------------------------------------\n    Name of aset -> num samples || 'a.train_images'   -> '10'\n    Name of data -> hash        || 'a.train_images.0' -> BAR_HASH_1'\n    Name of data -> hash        || 'a.train_images.1' -> BAR_HASH_2'\n    Name of data -> hash        || 'a.train_images.2' -> BAR_HASH_3'\n    Name of data -> hash        || 'a.train_images.3' -> BAR_HASH_4'\n    Name of data -> hash        || 'a.train_images.4' -> BAR_HASH_5'\n    Name of data -> hash        || 'a.train_images.5' -> BAR_HASH_6'\n    Name of data -> hash        || 'a.train_images.6' -> BAR_HASH_7'\n    Name of data -> hash        || 'a.train_images.7' -> BAR_HASH_8'\n    Name of data -> hash        || 'a.train_images.8' -> BAR_HASH_9'\n    Name of data -> hash        || 'a.train_images.9' -> BAR_HASH_0'\n    ---------------------------------------------------------------------------\n    Name of aset -> num samples || 'a.train_labels'   -> '10'\n    Name of data -> hash        || 'a.train_labels.0' -> BAR_HASH_11'\n    Name of data -> hash        || 'a.train_labels.1' -> BAR_HASH_12'\n    Name of data -> hash        || 'a.train_labels.2' -> BAR_HASH_13'\n    Name of data -> hash        || 'a.train_labels.3' -> BAR_HASH_14'\n    Name of data -> hash        || 'a.train_labels.4' -> BAR_HASH_15'\n    Name of data -> hash        || 'a.train_labels.5' -> BAR_HASH_16'\n    Name of data -> hash        || 'a.train_labels.6' -> BAR_HASH_17'\n    Name of data -> hash        || 'a.train_labels.7' -> BAR_HASH_18'\n    Name of data -> hash        || 'a.train_labels.8' -> BAR_HASH_19'\n    Name of data -> hash        || 'a.train_labels.9' -> BAR_HASH_10'\n    ---------------------------------------------------------------------------\n    's.train_images'   -> '{\"schema_hash\": \"RM4DefFsjRs=\",\n                            \"schema_dtype\": 2,\n                            \"schema_is_var\": false,\n                            \"schema_max_shape\": [784],\n                            \"schema_is_named\": true}'\n    's.train_labels'   -> '{\"schema_hash\":\n                            \"ncbHqE6Xldg=\",\n                            \"schema_dtype\": 7,\n                            \"schema_is_var\": false,\n                            \"schema_max_shape\": [1],\n                            \"schema_is_named\": true}'\n\nHistory is Relative\n-------------------\n\nThough it may be a bit obvious to state, it is of critical importance to\nrealize that it is only because we store the full contents of the repository\nstaging area as it existed in the instant just prior to a commit, that the\nintegrity of full repository history can be verified from a single commit's\ncontents and expected hash value. More so, any single commit has only a topical\nrelationship to a commit at any other point in time. It is only our imposition\nof a commit's ancestry tree which actualizes any subsequent insights or\ninteractivity\n\nWhile the general process of topological ordering: create branch, checkout\nbranch, commit a few times, and merge, follows the `git` model fairly well at a\nconceptual level, there are some important\ndifferences we want to highlight due to their implementation differences:\n\n1) Multiple commits can simultaneously checked out in \"read-only\" mode on a\n   single machine. Checking out a commit for reading does not touch the staging\n   area status.\n2) Only one process can interact with the a write-enabled checkout at a time.\n3) A detached head CANNOT exist for write enabled checkouts. A staging area must\n   begin with an identical state to the most recent commit of a/any branch.\n4) A staging area which has had changes made in it cannot switch base branch\n   without either a commit, hard-reset, or (soon to be developed) stash\n   operation.\n\nWhen a repository is initialized, a record is created which indicates the\nstaging area's `HEAD` branch. in addition, a branch is created with the name\n`master`, and which is the only commit in the entire repository which will have\nno parent. The record key/value pairs resemble the following:\n\n::\n\n  'branch.master' -> ''                # No parent commit.\n  'head'          -> 'branch.master'   # Staging area head branch\n\n  # Commit Hash  |  Parent Commit\n  -------------------------------------\n\n\n.. warning::\n\n  Much like git, odd things can happen before the `'initial commit'` is made. We\n  recommend creating the initial commit as quickly as possible to prevent\n  undefined behavior during repository setup. In the future, we may decide to\n  create the \"initial commit\" automatically upon repository initialization.\n\n\nOnce the initial commit is made, a permanent commit record in made which\nspecifies the records (not shown below) and the parent commit. The branch head\npointer is then updated to point to that commit as it's base.\n\n::\n\n    'branch.master' -> '479b4cfff6219e3d'\n    'head'          -> 'branch.master'\n\n    # Commit Hash       |  Parent Commit\n    -------------------------------------\n    '479b4cfff6219e3d' ->  ''\n\nBranches can be created as cheaply as a single line of text can be written, and\nthey simply require a \"root\" commit hash (or a branch name, in which case the\nbranch's current HEAD commit will be used as the root HEAD). Likewise a branch\ncan be merged with just a single write operation (once the merge logic has\ncompleted - a process which is explained separately from this section; just\ntrust that it happens for now).\n\nA more complex example which creates 4 different branches and merges them in a\ncomplicated order can be seen below. Please note that the `` << `` symbol is\nused to indicate a merge commit where `X << Y` reads: ``'merging dev branch Y\ninto master branch X'``.\n\n::\n\n    'branch.large_branch' -> '8eabd22a51c5818c'\n    'branch.master'       -> '2cd30b98d34f28f0'\n    'branch.test_branch'  -> '1241a36e89201f88'\n    'branch.trydelete'    -> '51bec9f355627596'\n    'head'                -> 'branch.master'\n\n     # Commit Hash       |  Parent Commit\n     -------------------------------------\n    '1241a36e89201f88'  -> '8a6004f205fd7169'\n    '2cd30b98d34f28f0'  -> '9ec29571d67fa95f << 51bec9f355627596'\n    '51bec9f355627596'  -> 'd683cbeded0c8a89'\n    '69a09d87ea946f43'  -> 'd683cbeded0c8a89'\n    '8a6004f205fd7169'  -> 'a320ae935fc3b91b'\n    '8eabd22a51c5818c'  -> 'c1d596ed78f95f8f'\n    '9ec29571d67fa95f'  -> '69a09d87ea946f43 << 8eabd22a51c5818c'\n    'a320ae935fc3b91b'  -> 'e3e79dd897c3b120'\n    'c1d596ed78f95f8f'  -> ''\n    'd683cbeded0c8a89'  -> 'fe0bcc6a427d5950 << 1241a36e89201f88'\n    'e3e79dd897c3b120'  -> 'c1d596ed78f95f8f'\n    'fe0bcc6a427d5950'  -> 'e3e79dd897c3b120'\n\n\nBecause the raw commit hash logs can be quite dense to parse, a graphical\nlogging utility is included as part of the repository. Running the\n``Repository.log()`` method will pretty print a graph representation of the\ncommit history:\n\n.. code:: python\n\n  >>> from hangar import Repository\n  >>> repo = Repository(path='/foo/bar/path/')\n\n  ... # make some commits\n\n  >>> repo.log()\n\n.. image:: ./img/repo_graph_log.png\n"
  },
  {
    "path": "docs/externals.rst",
    "content": ".. _ref-external:\n\n===============\nHangar External\n===============\n\nHigh level interaction interface between hangar and everything external.\n\nHigh Level Methods\n==================\n.. automodule:: hangar.external._external\n   :members:\n\nPlugin System\n=============\n.. automodule:: hangar.external.base_plugin\n   :members:\n"
  },
  {
    "path": "docs/faq.rst",
    "content": ".. _ref-faq:\n\n==========================\nFrequently Asked Questions\n==========================\n\nThe following documentation are taken from questions and comments on the\n`Hangar User Group Slack Channel <https://hangarusergroup.slack.com>`_\nand over various Github issues.\n\n\nHow can I get an Invite to the Hangar User Group?\n==================================================\n\nJust click on `This Signup Link\n<https://join.slack.com/t/hangarusergroup/shared_invite/enQtNjQ0NzM5ODQ1NjY1LWZlYmIzNTQ0ODZmOTAwMmNmOTgzZTAzM2NhMWE2MTNlMTRhMzNhN2Y3YmJmMjcwZDgxNDIyMDM1MzVhYzk4MjU>`_\nto get started.\n\n\nData Integrity\n==============\n\n   Being a young project did you encounter some situations where the disaster\n   was not a compilation error but dataset corruption? This is the most fearing\n   aspect of using young projects but every project will start from a phase\n   before becoming mature and production ready.\n\nAn absolute requirement of a system right this is to protect user data at all\ncosts (I’ll refer to this as preserving data \"integrity\" from here). During our\ninitial design of the system, we made the decision that preserving integrity\ncomes above all other system parameters: including performance, disk size,\ncomplexity of the Hangar core, and even features should we not be able to make\nthem absolutely safe for the user. And to be honest, the very first versions of\nHangar were quite slow and difficult to use as a result of this.\n\nThe initial versions of Hangar (which we put together in ~2 weeks) had\nessentially most of the features we have today. We’ve improved the API, made\nthings clearer, and added some visualization/reporting utilities, but not much\nhas changed. Essentially the entire development effort has been addressing\nissues stemming from a fundamental need to protect user data at all costs. That\nwork has been very successful, and performance is extremely promising (and\nimproving all the time).\n\nTo get into the details here: There have been only 3 instances in the entire\ntime I’ve developed Hangar where we lost data irrecoverably:\n\n1. We used to move data around between folders with some regularity (as a\n   convenient way to mark some files as containing data which have been\n   “committed”, and can no longer be opened in anything but read-only mode).\n   There was a bug (which never made it past a local dev version) at one point\n   where I accidentally called ``shutil.rmtree(path)`` with a directory one\n   level too high… that wasn’t great.\n\n   Just to be clear, we don’t do this anymore (since disk IO costs are way too\n   high), but remnants of it’s intention are still very much alive and well.\n   Once data has been added to the repository, and is “committed”, the file\n   containing that data will never be opened in anything but read-only mode\n   again. This reduces the chance of disk corruption massively from the start.\n\n----\n\n2. When I was implementing the numpy memmap array storage backend, I was\n   totally surprised during an early test when I:\n\n   .. code:: text\n\n      - opened a write-enabled checkout\n      - added some data\n      - without committing, retrieved the same data again via the user facing API\n      - overwrote some slice of the return array with new data and did some processing\n      - asked Hangar for that same array key again, and instead of returning\n        the contents got a fatal RuntimeError raised by Hangar with the\n        code/message indicating \"'DATA CORRUPTION ERROR: Checksum {cksum} !=\n        recorded for {hashVal}\"\n\n   What had happened was that when opening a ``numpy.memmap`` array on disk in\n   ``w+`` mode, the default behavior when returning a subarray is to return a\n   subclass of ``np.ndarray`` of type ``np.memmap``. Though the numpy docs\n   state: \"The memmap object can be used anywhere an ndarray is accepted. Given\n   a ``memmap fp``, ``isinstance(fp, numpy.ndarray)`` returns ``True``\". I did\n   not anticipate that updates to the subarray slice would also update the\n   memmap on disk. A simple mistake to make; this has since been remedied by\n   manually instantiating a new ``np.ndarray`` instance from the ``np.memmap``\n   subarray slice buffer.\n\n   However, the nice part is that this was a real world proof that our system\n   design worked (and not just in tests). When you add data to a Hangar\n   checkout (or receive it on a fetch/clone operation) we calculate a hash\n   digest of the data via ``blake2b`` (a cryptographically secure algorithm in the\n   python standard library). While this allows us to cryptographically verify full\n   integrity checks and history immutability, cryptographic hashes are slow by\n   design. When we want to read local data (which we’ve already ensured was\n   correct when it was placed on disk) it would be prohibitively slow to do a\n   full cryptographic verification on every read. However, since its NOT\n   acceptable to provide no integrity verification (even for local writes) we\n   compromise with a much faster (though non cryptographic) hash\n   digest/checksum. This operation occurs on EVERY read of data from disk.\n\n   The theory here is that even though Hangar makes every effort to guarantee\n   safe operations itself, in the real world we have to deal with systems which\n   break. We’ve planned for cases where some OS induced disk corruption occurs,\n   or where some malicious actor modifies the file contents manually; we can’t\n   stop that from happening, but Hangar can make sure that you will know about\n   it when it happens!\n\n----\n\n3. Before we got smart with the HDF5 backend low level details, it was an issue\n   for us to have a write-enabled checkout attempt to write an array to disk\n   and immediately read it back in. I’ll gloss over the details for the sake of\n   simplicity here, but basically I was presented with an CRC32 Checksum\n   Verification Failed error in some edge cases. The interesting bit was that\n   if I closed the checkout, and reopened it, it data was secure and intact on\n   disk, but for immediate reads after writes, we weren’t propagating changes\n   to the HDF5 chunk metadata cache to ``rw`` operations appropriately.\n\n   This was fixed very early on by taking advantage of a new feature in HDF5\n   1.10.4 referred to as Single Writer Multiple Reader (SWMR). The long and\n   short is that by being careful to handle the order in which a new HDF5 file\n   is created on disk and opened in w and r mode with SWMR enabled, the HDF5\n   core guarantees the integrity of the metadata chunk cache at all times. Even\n   if a fatal system crash occurs in the middle of a write, the data will be\n   preserved. This solved this issue completely for us\n\n   There are many many many more details which I could cover here, but the long\n   and short of it is that in order to ensure data integrity, Hangar is\n   designed to not let the user do anything they aren’t allowed to at any time\n\n      -  Read checkouts have no ability to modify contents on disk via any\n         method. It’s not possible for them to actually delete or overwrite\n         anything in any way.\n      -  Write checkouts can only ever write data. The only way to remove the\n         actual contents of written data from disk is if changes have been made\n         in the staging area (but not committed) and the\n         ``reset_staging_area()`` method is called. And even this has no\n         ability to remove any data which had previously existed in some commit\n         in the repo’s history\n\n   In addition, a Hangar checkout object is not what it appears to be (at first\n   glance, use, or even during common introspection operations). If you try to\n   operate on it after closing the checkout, or holding it while another\n   checkout is started, you won’t be able to (there’s a whole lot of invisible\n   “magic” going on with ``weakrefs``, ``objectproxies``, and instance\n   attributes).  I would encourage you to do the following:\n\n   .. code:: pycon\n\n      >>> co = repo.checkout(write=True)\n      >>> co.metadata['hello'] = 'world'\n      >>> # try to hold a reference to the metadata object:\n      >>> mRef = co.metadata\n      >>> mRef['hello']\n      'world'\n      >>> co.commit('first commit')\n      >>> co.close()\n      >>> # what happens when you try to access the `co` or `mRef` object?\n      >>> mRef['hello']\n      ReferenceError: weakly-referenced object no longer exists\n      >>> print(co)  # or any other operation\n      PermissionError: Unable to operate on past checkout objects which have been closed. No operation occurred. Please use a new checkout.\n\n   The last bit I’ll leave you with is a note on context managers and performance\n   (how we handle record data safety and effectively\n\n   .. seealso::\n\n      - :ref:`ref-tutorial` (Part 1, In section: \"performance\")\n      - :ref:`ref-hangar-under-the-hood`\n\n\nHow Can a Hangar Repository be Backed Up?\n=========================================\n\nTwo strategies exist:\n\n1. Use a remote server and Hangar’s built in ability to just push data to a\n   remote! (tutorial coming soon, see :ref:`ref-api` for more details.\n\n2. A Hangar repository is self contained in it’s .hangar directory. To back\n   up the data, just copy/paste or rsync it to another machine! (edited)\n\n\nOn Determining ``Column`` Schema Sizes\n=======================================\n\n   Say I have a data group that specifies a data array with one dimension,\n   three elements (say height, width, num channels) and later on I want to add\n   bit depth. Can I do that, or do I need to make a new data group? Should it\n   have been three scalar data groups from the start?\n\nSo right now it’s not possible to change the schema (shape, dtype) of a\ncolumn. I’ve thought about such a feature for a while now, and while it will\nrequire a new user facing API option, its (almost) trivial to make it work in\nthe core. It just hasn’t seemed like a priority yet...\n\nAnd no, I wouldn’t specify each of those as scalar data groups, they are a\nrelated piece of information, and generally would want to be accessed together\n\nAccess patterns should generally dictate how much info is placed in a column\n\n\nIs there a performance/space penalty for having lots of small data groups?\n--------------------------------------------------------------------------\n\nAs far as a performance / space penalty, this is where it gets good :)\n\n- Using fewer columns means that there are fewer records (the internal\n  locating info, kind-of like a git tree) to store, since each record points to\n  a sample containing more information.\n\n- Using more columns means that the likelihood of samples having the same\n  value increases, meaning fewer pieces of data are actually stored on disk\n  (remember it’s a content addressable file store)\n\nHowever, since the size of a record (40 bytes or so before compression, and we\ngenerally see compression ratios around 15-30% of the original size once the\nrecords are committed) is generally negligible compared to the size of data on\ndisk, optimizing for number of records is just way overkill. For this case, it\nreally doesn’t matter. **Optimize for ease of use**\n"
  },
  {
    "path": "docs/index.rst",
    "content": ".. include:: ../README.rst\n\n.. toctree::\n   :maxdepth: 3\n\n   readme\n   quickstart\n   installation\n   concepts\n   api\n   tutorial\n   design\n   cli\n   externals\n   faq\n   backends\n   contributingindex\n   authors\n   changelog\n\n\nIndices and tables\n==================\n\n* :ref:`genindex`\n* :ref:`modindex`\n* :ref:`search`\n"
  },
  {
    "path": "docs/installation.rst",
    "content": ".. _ref_installation:\n\n============\nInstallation\n============\n\nFor general usage it is recommended that you use a pre-built version of Hangar,\neither from a Python Distribution, or a pre-built wheel from PyPi.\n\n\nPre-Built Installation\n======================\n\n\nPython Distributions\n--------------------\n\nIf you do not already use a Python Distribution, we recommend the `Anaconda\n<https://www.anaconda.com/distribution/>`_ (or `Miniconda\n<https://docs.conda.io/en/latest/miniconda.html>`_) distribution, which supports\nall major operating systems (Windows, MacOSX, & the typical Linux variations).\nDetailed usage instructions are available `on the anaconda website\n<https://docs.anaconda.com/anaconda/>`_.\n\nTo install Hangar via the Anaconda Distribution (from the `conda-forge conda\nchannel <https://anaconda.org/conda-forge/hangar>`_)::\n\n    conda install -c conda-forge hangar\n\n\nWheels (PyPi)\n-------------\n\nIf you have an existing python installation on your computer, pre-built Hangar Wheels\ncan be installed via pip from the Python Package Index (PyPi)::\n\n    pip install hangar\n\n\nSource Installation\n===================\n\n\nTo install Hangar from source, clone the repository from `Github\n<https://github.com/tensorwerk/hangar-py>`_::\n\n    git clone https://github.com/tensorwerk/hangar-py.git\n    cd hangar-py\n    python setup.py install\n\nOr use pip on the local package if you want to install all dependencies\nautomatically in a development environment::\n\n    pip install -e .\n\n\nSource installation in Google colab\n-----------------------------------\nGoogle colab comes with an older version of ``h5py`` pre-installed which is not\ncompatible with hangar. If you need to install hangar from the source in \ngoogle colab, make sure to uninstall the existing ``h5py`` ::\n\n    !pip uninstall h5py\n\nThen follow the Source Installation steps given above.\n"
  },
  {
    "path": "docs/noindexapi/apiinit.rst",
    "content": ".. automethod:: hangar.checkout.WriterCheckout.add_ndarray_column\n   :noindex:\n\n.. automethod:: hangar.checkout.WriterCheckout.add_str_column\n   :noindex:\n\n.. automethod:: hangar.checkout.WriterCheckout.add_bytes_column\n   :noindex:\n"
  },
  {
    "path": "docs/noindexapi/apiremotefetchdata.rst",
    "content": ".. automethod:: hangar.repository.Remotes.fetch_data\n   :noindex:"
  },
  {
    "path": "docs/quickstart.rst",
    "content": "=====\nUsage\n=====\n\nTo use Hangar in a project::\n\n\tfrom hangar import Repository\n\n\nPlease refer to the :ref:`ref-tutorial` for examples, or :ref:`ref-concepts` to\nreview the core concepts of the Hangar system."
  },
  {
    "path": "docs/readme.rst",
    "content": ".. include:: ../README.rst\n"
  },
  {
    "path": "docs/requirements.txt",
    "content": "sphinx>=2.0\nsphinx-material\nsphinx-click\nnbsphinx\nsphinx-copybutton\nrecommonmark\nIPython\nCython\n"
  },
  {
    "path": "docs/requirements_rtd.txt",
    "content": "https://files.pythonhosted.org/packages/84/ad/ee890cbea43dd97cbb05aa30b9b08ff908efa8407f514e9d447dd365ef15/tensorflow_cpu-2.1.0-cp37-cp37m-manylinux2010_x86_64.whl\nhttps://download.pytorch.org/whl/cpu/torch-1.3.1%2Bcpu-cp37-cp37m-linux_x86_64.whl\nCython\n"
  },
  {
    "path": "docs/spelling_wordlist.txt",
    "content": "builtin\nbuiltins\nclassmethod\nstaticmethod\nclassmethods\nstaticmethods\nargs\nkwargs\ncallstack\nChangelog\nIndices\n"
  },
  {
    "path": "docs/tutorial.rst",
    "content": ".. _ref-tutorial:\n\n###############\nHangar Tutorial\n###############\n\n.. toctree::\n   :maxdepth: 2\n   :titlesonly:\n\n   Tutorial-QuickStart\n   Tutorial-001\n   Tutorial-002\n   Tutorial-003\n   Tutorial-Dataloader\n   Tutorial-RealQuickStart\n"
  },
  {
    "path": "hangar.yml",
    "content": "# Metadata file for Zenoodo source code upload\n# This is part of the Escape 2020 project and was originally\n# requested by Filippo Quarenghi (Orobix).\n#\n# Metadata version - do not change\nmetadata-version: 0.2\n\n# Mandatory entries\ntitle: hangar\nauthors:\n  - Rick Izzo\n  - Luca Antiga\ncontact:\n  - name: Rick Izzo\n  - email: rick@tensorwerk.com\n  - name: Luca Antiga\n  - email: luca.antiga@orobix.com\nlicense: Apache License 2.0\nurl: https://github.com/tensorwerk/hangar-py\ndescription: Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.\n\n#Optional entries\ndoi: null\nkeywords: Data versioning\ntype: source\ngrant: Tensorwerk Inc.\nlanguage: python\nhardware:\n  - machine: [local, server, hpc]\n  - CPU: null\n  - RAM: 2GB\n  - drive:\n    - type: [SSD, HDD]\n    - volume: 500MB\n  - GPU: null\n\ndependencies:\n  - python>=3.6\n  - HDF5\n  - cython>=0.27\n  - setuptools>=40.0\n  - wheel>=0.30\n  - blosc>=1.8\n  - click\n  - protobuf\n  - h5py>=2.9\n  - hdf5plugin>=2.0\n  - lmdb>=0.94\n  - tqdm\n  - wrapt\n  - xxhash\n  - numpy\n  - grpcio\n\nos:\n  - 'win-64'\n  - 'linux'\n  - 'osx-64'\n\ncompiler:\n  - gcc>=4.7\n  - manylinux2014\n\nmulti-thread: true\ncontainer:\n  - null\n"
  },
  {
    "path": "mypy.ini",
    "content": "# ------------------------- Global Options ------------------------------------\n\n[mypy]\nwarn_unused_configs = True\n\n# ------------------------- Per Module Configuration --------------------------\n\n[mypy-lmdb]\nignore_missing_imports = True\n\n[mypy-numpy]\nignore_missing_imports = True\n\n[mypy-grpc]\nignore_missing_imports = True"
  },
  {
    "path": "scripts/run_proto_codegen.py",
    "content": "import os\nfrom shutil import move\n\nfrom grpc_tools import protoc\n\n\n# ------------------------- output locations ----------------------------------\n\n\ntoolsPath = os.path.dirname(__file__)\nsrcPath = os.path.normpath(os.path.join(toolsPath, os.path.pardir, 'src'))\n\nhangarProtoDir = os.path.join(srcPath, 'hangar', 'remote')\nhangarProtoPath = os.path.join(hangarProtoDir, 'hangar_service.proto')\nif not os.path.isfile(hangarProtoPath):\n    raise FileNotFoundError(f'Cannot access hangar_service.proto at: {hangarProtoPath}')\n\n# ------------------------ hangar service -------------------------------------\n\nos.environ.putenv('PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', 'cpp')\n# generates hangar service protobuf for python\nprotoc.main((\n    '',\n    f'-I{hangarProtoDir}',\n    f'--python_out={hangarProtoDir}',\n    f'--grpc_python_out={hangarProtoDir}',\n    f'--mypy_out={hangarProtoDir}',\n    hangarProtoPath,\n))\n\n\"\"\"\nBecause python3 requires explicit relative imports (which is not yet supported\nin the Google protoc compiler), we have to replace the 'import foo_grpc' line\nwith the 'from . import foo' line in the generated grpc code.\n\"\"\"\n\nhangar_service_grpc_path_orig = os.path.join(hangarProtoDir, 'hangar_service_pb2_grpc.py')\nhangar_service_grpc_path_old = os.path.join(hangarProtoDir, 'hangar_service_pb2_grpc.py.old')\nmove(hangar_service_grpc_path_orig, hangar_service_grpc_path_old)\nwith open(hangar_service_grpc_path_orig, 'w') as new_file:\n    with open(hangar_service_grpc_path_old, 'r+') as old_file:\n        for old_line in old_file:\n            if old_line == 'import hangar_service_pb2 as hangar__service__pb2\\n':\n                newline = old_line.replace('import', 'from . import')\n            else:\n                newline = old_line\n            new_file.writelines(newline)\nos.remove(hangar_service_grpc_path_old)"
  },
  {
    "path": "setup.cfg",
    "content": "[bdist_wheel]\nuniversal = 0\n\n\n[flake8]\nmax-line-length = 150\nexclude = */migrations/*\n\n[tool:pytest]\nnorecursedirs =\n    .git\n    .tox\n    .env\n    dist\n    build\n    migrations\npython_files =\n    test_*.py\n    *_test.py\n    tests.py\naddopts =\n    -ra\n    --strict\n    --ignore=docs/conf.py\n    --ignore=setup.py\n    --ignore=.eggs\n    --tb=auto\n\n[isort]\nforce_single_line = True\nline_length = 120\nknown_first_party = hangar\ndefault_section = THIRDPARTY\nforced_separate = test_hangar\nnot_skip = __init__.py\nskip = migrations\n"
  },
  {
    "path": "setup.py",
    "content": "#!/usr/bin/env python\n# -*- encoding: utf-8 -*-\nimport os\nimport platform\nimport sys\nfrom os.path import join\nfrom distutils.sysconfig import get_config_var\nfrom distutils.version import LooseVersion\nfrom setuptools import setup, Extension, find_packages\n\n\n# Use `setup.py [] --debug` for a debug build of hangar\nHANGAR_DEBUG_BUILD = False\n\n\n# Set deployment target for mac\n#\n# Need to ensure that extensions are built for macos 10.9 when compiling on a\n# 10.9 system or above, overriding distutils behavior which is to target\n# the version used to build the current python binary.\n#\n# TO OVERRIDE:\n#   set MACOSX_DEPLOYMENT_TARGET before calling setup.py\n#\n# From https://github.com/pandas-dev/pandas/pull/24274\n# 3-Clause BSD License: https://github.com/pandas-dev/pandas/blob/master/LICENSE\nif sys.platform == 'darwin':\n    if 'MACOSX_DEPLOYMENT_TARGET' not in os.environ:\n        current_system = LooseVersion(platform.mac_ver()[0])\n        python_target = LooseVersion(get_config_var('MACOSX_DEPLOYMENT_TARGET'))\n        if python_target < '10.9' and current_system >= '10.9':\n            os.environ['MACOSX_DEPLOYMENT_TARGET'] = '10.9'\n\n\nclass LazyCommandClass(dict):\n    \"\"\"\n    Lazy command class that defers operations requiring Cython and numpy until\n    they've actually been downloaded and installed by setup_requires.\n    \"\"\"\n\n    def __contains__(self, key):\n        return key in ['build_ext', 'bdist_wheel', 'sdist'] or super().__contains__(key)\n\n    def __setitem__(self, key, value):\n        if key == 'build_ext':\n            raise AssertionError(\"build_ext overridden!\")\n        super().__setitem__(key, value)\n\n    def __getitem__(self, key):\n        if key == 'build_ext':\n            return self.make_build_ext_cmd()\n        elif key == 'bdist_wheel':\n            return self.make_bdist_wheel_cmd()\n        elif key  == 'sdist':\n            return self.make_sdist_cmd()\n        else:\n            return super().__getitem__(key)\n\n    def make_build_ext_cmd(self):\n        \"\"\"Returns a command class implementing 'build_ext'.\n        \"\"\"\n        from Cython.Distutils.build_ext import new_build_ext as cython_build_ext\n        from Cython.Compiler.Main import default_options\n\n        default_options['language_level'] = 3\n        default_options['compiler_directives']['embedsignature'] = True\n        default_options['compiler_directives']['emit_code_comments'] = True\n        if HANGAR_DEBUG_BUILD is True:\n            default_options['annotate'] = True\n            default_options['emit_linenums'] = True\n            default_options['gdb_debug'] = True\n\n        class build_ext(cython_build_ext):\n            def build_extensions(self):\n                cython_build_ext.build_extensions(self)\n\n        return build_ext\n\n    def make_bdist_wheel_cmd(self):\n        \"\"\"Returns a command class implementing 'bdist_wheel'.\n        \"\"\"\n        from wheel.bdist_wheel import bdist_wheel\n\n        class bdist_wheel_cmd(bdist_wheel):\n            def run(self):\n                # This may modify package_data:\n                bdist_wheel.run(self)\n\n        return bdist_wheel_cmd\n\n    def make_sdist_cmd(self):\n        \"\"\"A command class implementing 'sdist'.\n        \"\"\"\n        from distutils.command.sdist import sdist as _sdist\n\n        class sdist(_sdist):\n            def run(self):\n                # Make sure the compiled Cython files in the distribution are up-to-date\n                # so we generate .c files correctly (.so will be removed)\n                _sdist.run(self)\n\n        return sdist\n\n# Pass command line flags to setup.py script\n# handle --lflags=[FLAGS] --cflags=[FLAGS]\nargs = sys.argv[:]\nfor arg in args:\n    if arg.find('--debug') == 0:\n        HANGAR_DEBUG_BUILD = True\n        sys.argv.remove(arg)\n\n# Source files for build\nCYTHON_SOURCES = [\n    join('src', 'hangar', 'optimized_utils.pyx'),\n    join('src', 'hangar', 'backends', 'specs.pyx'),\n    join('src', 'hangar', 'backends', 'specparse.pyx'),\n    join('src', 'hangar', 'records', 'recordstructs.pyx'),\n    join('src', 'hangar', 'records', 'column_parsers.pyx'),\n    join('src', 'hangar', 'records', 'hashmachine.pyx'),\n]\nCYTHON_HEADERS = [\n    join('src', 'hangar', 'external_cpython.pxd'),\n    join('src', 'hangar', 'optimized_utils.pxd'),\n    join('src', 'hangar', 'backends', 'specs.pxd'),\n    join('src', 'hangar', 'records', 'recordstructs.pxd'),\n]\n\n__extensions = []\nfor source in CYTHON_SOURCES:\n    module_name = os.path.splitext(source)[0]\n    if module_name + '.pxd' in CYTHON_HEADERS:\n        deps = module_name + '.pxd'\n    else:\n        deps = None\n    if module_name.startswith(f'src{os.sep}'):\n        module_name = module_name.lstrip(f'src{os.sep}')\n    module_name = module_name.replace(os.sep, '.')\n    ext = Extension(module_name,\n                    include_dirs=[],\n                    define_macros=[],\n                    sources=[source],\n                    depends=[deps] if deps else [],\n                    library_dirs=[],\n                    libraries=[],\n                    extra_link_args=[],\n                    extra_compile_args=[],\n                    language=\"c\")\n    __extensions.append(ext)\n\nwith open('README.rst') as f:\n    README_RST = f.read()\n\nSHORT_DESCRIPTION = (\n    'Hangar is version control for tensor data. Commit, branch, merge, '\n    'revert, and collaborate in the data-defined software era.'\n)\n\nSETUP_REQUIRES = [\n    'cython>=0.27',\n    'setuptools>=40.0',\n    'wheel>=0.30',\n]\n\nINSTALL_REQUIRES = [\n    'blosc>=1.8',\n    'cloudpickle>=1.4',\n    'click',\n    'grpcio',\n    'protobuf',\n    'h5py>=2.9',\n    'hdf5plugin>=2.0',\n    'lmdb>=0.94',\n    'numpy',\n    'tqdm',\n    'wrapt',\n    'xxhash',\n]\n\nsetup(\n    name='hangar',\n    version='0.5.2',\n    license='Apache 2.0',\n    # Package Meta Info (for PyPi)\n    description=SHORT_DESCRIPTION,\n    long_description=README_RST,\n    long_description_content_type='text/x-rst',\n    author='Richard Izzo',\n    author_email='rick@tensorwerk.com',\n    maintainer='Richard Izzo',\n    maintainer_email='rick@tensorwerk.com',\n    url='https://github.com/tensorwerk/hangar-py',\n    project_urls={\n        'Documentation': 'https://hangar-py.readthedocs.io/',\n        'Changelog': 'https://hangar-py.readthedocs.io/en/latest/changelog.html',\n        'Issue Tracker': 'https://github.com/tensorwerk/hangar-py/issues',\n    },\n    platforms=['any'],\n    # Module Source Files\n    ext_modules=__extensions,\n    packages=find_packages('src'),\n    package_dir={'': 'src'},\n    package_data={'': ['*.ini', '*.proto']},\n    include_package_data=True,\n    zip_safe=False,\n    entry_points={\n        'console_scripts': ['hangar = hangar.cli:main']\n    },\n    # Requirements\n    python_requires='>= 3.6.0',\n    install_requires=INSTALL_REQUIRES,\n    setup_requires=SETUP_REQUIRES,\n    # hooks into `sdist`, `bdist_wheel`, `bdist_ext` commands.\n    cmdclass=LazyCommandClass(),\n    # PyPi classifiers\n    # http://pypi.python.org/pypi?%3Aaction=list_classifiers\n    classifiers=[\n        'Development Status :: 4 - Beta',\n        'Intended Audience :: Developers',\n        'License :: OSI Approved',\n        'Operating System :: MacOS',\n        'Operating System :: Microsoft :: Windows',\n        'Operating System :: POSIX :: Linux',\n        'Operating System :: Unix',\n        'Programming Language :: Cython',\n        'Programming Language :: Python :: 3 :: Only',\n        'Programming Language :: Python :: 3.6',\n        'Programming Language :: Python :: 3.7',\n        'Programming Language :: Python :: 3.8',\n        'Topic :: Database',\n        'Topic :: Scientific/Engineering',\n        'Topic :: Software Development :: Libraries',\n        'Topic :: Software Development :: Version Control',\n        'Topic :: Utilities',\n    ],\n)\n"
  },
  {
    "path": "src/hangar/__init__.py",
    "content": "__version__ = '0.5.2'\n__all__ = ('Repository',)\n\nfrom .repository import Repository\n"
  },
  {
    "path": "src/hangar/__main__.py",
    "content": "\"\"\"\nEntrypoint module, in case you use `python -m hangar`.\n\n\nWhy does this file exist, and why __main__? For more info, read:\n\n- https://www.python.org/dev/peps/pep-0338/\n- https://docs.python.org/2/using/cmdline.html#cmdoption-m\n- https://docs.python.org/3/using/cmdline.html#cmdoption-m\n\"\"\"\nfrom hangar.cli import main\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/hangar/_version.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\nPortions of this code have been taken and modified from the \"packaging\" project.\n\nURL:      https://github.com/pypa/packaging\nFiles:    packaging/_structures.py\n          packaging/version.py\nCommit:   6a09d4015b54f80762ff3ef1597a8b6740563c19\nAccessed: 11 DEC 2019\n\npackaging License\n-------------------------------------------------------------------------------\nLicense: Dual licensed under the terms of the Apache License, Version 2.0, and the BSD License.\nURL:     https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE\n         https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE.APACHE\n         https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE.BSD\n\"\"\"\nimport re\nimport typing\nfrom collections import namedtuple\nfrom itertools import dropwhile\nfrom typing import Callable, Optional, SupportsInt, Tuple, Union\nfrom operator import lt, le, eq, ge, gt, ne\n\n\n__all__ = [\"parse\", \"Version\", \"InvalidVersion\", \"VERSION_PATTERN\"]\n\n\n_Version = namedtuple(\n    \"_Version\", [\"epoch\", \"release\", \"dev\", \"pre\", \"post\", \"local\"]\n)\n\n\nclass InfinityType(object):\n    __slots__ = ()\n\n    def __repr__(self) -> str:\n        return \"Infinity\"\n\n    def __hash__(self) -> int:\n        return hash(repr(self))\n\n    def __lt__(self, other: object) -> bool:\n        return False\n\n    def __le__(self, other: object) -> bool:\n        return False\n\n    def __eq__(self, other: object) -> bool:\n        return isinstance(other, self.__class__)\n\n    def __ne__(self, other: object) -> bool:\n        return not isinstance(other, self.__class__)\n\n    def __gt__(self, other: object) -> bool:\n        return True\n\n    def __ge__(self, other: object) -> bool:\n        return True\n\n    def __neg__(self) -> 'NegativeInfinityType':\n        return NegativeInfinity\n\n\nInfinity = InfinityType()\n\n\nclass NegativeInfinityType(object):\n    __slots__ = ()\n\n    def __repr__(self) -> str:\n        return \"-Infinity\"\n\n    def __hash__(self) -> int:\n        return hash(repr(self))\n\n    def __lt__(self, other: object) -> bool:\n        return True\n\n    def __le__(self, other: object) -> bool:\n        return True\n\n    def __eq__(self, other: object) -> bool:\n        return isinstance(other, self.__class__)\n\n    def __ne__(self, other: object) -> bool:\n        return not isinstance(other, self.__class__)\n\n    def __gt__(self, other: object) -> bool:\n        return False\n\n    def __ge__(self, other: object) -> bool:\n        return False\n\n    def __neg__(self) -> InfinityType:\n        return Infinity\n\n\nNegativeInfinity = NegativeInfinityType()\n\n# -------------------- Type Definitions ---------------------------------------\n\nif typing.TYPE_CHECKING:\n    InfiniteTypes = Union[InfinityType, NegativeInfinityType]\n    PrePostDevType = Union[InfiniteTypes, Tuple[str, int]]\n    SubLocalType = Union[InfiniteTypes, int, str]\n    LocalType = Union[\n        NegativeInfinityType,\n        Tuple[\n            Union[\n                SubLocalType,\n                Tuple[SubLocalType, str],\n                Tuple[NegativeInfinityType, SubLocalType],\n            ],\n            ...,\n        ],\n    ]\n    CmpKey = Tuple[\n        int, Tuple[int, ...], PrePostDevType, PrePostDevType, PrePostDevType, LocalType\n    ]\n    VersionComparisonMethod = Callable[[CmpKey, CmpKey], bool]\n\n\n# ---------------------------- Version Parsing --------------------------------\n\n\ndef parse(version: str) -> Union['Version']:\n    \"\"\"\n    Parse the given version string and return a :class:`Version` object if the\n    given version is a valid PEP 440 version, else raises InvalidVersionError\n    \"\"\"\n    return Version(version)\n\n\nclass InvalidVersion(ValueError):\n    \"\"\"\n    An invalid version was found, users should refer to PEP 440.\n    \"\"\"\n    __slots__ = ()\n\n\nclass _BaseVersion(object):\n\n    __slots__ = ('_key',)\n\n    def __init__(self):\n        self._key: 'CmpKey' = None\n\n    def __hash__(self) -> int:\n        return hash(self._key)\n\n    def __lt__(self, other: '_BaseVersion') -> bool:\n        return self._compare(other, lt)\n\n    def __le__(self, other: '_BaseVersion') -> bool:\n        return self._compare(other, le)\n\n    def __eq__(self, other: object) -> bool:\n        return self._compare(other, eq)\n\n    def __ge__(self, other: '_BaseVersion') -> bool:\n        return self._compare(other, ge)\n\n    def __gt__(self, other: '_BaseVersion') -> bool:\n        return self._compare(other, gt)\n\n    def __ne__(self, other: object) -> bool:\n        return self._compare(other, ne)\n\n    def _compare(self, other: object, method: 'VersionComparisonMethod'\n                 ) -> Union[bool, type(NotImplemented)]:\n        if isinstance(other, _BaseVersion):\n            return method(self._key, other._key)\n        return NotImplemented\n\n\n# Deliberately not anchored to the start and end of the string, to make it\n# easier for 3rd party code to reuse\nVERSION_PATTERN = r\"\"\"\n    v?\n    (?:\n        (?:(?P<epoch>[0-9]+)!)?                           # epoch\n        (?P<release>[0-9]+(?:\\.[0-9]+)*)                  # release segment\n        (?P<pre>                                          # pre-release\n            [-_\\.]?\n            (?P<pre_l>(a|b|c|rc|alpha|beta|pre|preview))\n            [-_\\.]?\n            (?P<pre_n>[0-9]+)?\n        )?\n        (?P<post>                                         # post release\n            (?:-(?P<post_n1>[0-9]+))\n            |\n            (?:\n                [-_\\.]?\n                (?P<post_l>post|rev|r)\n                [-_\\.]?\n                (?P<post_n2>[0-9]+)?\n            )\n        )?\n        (?P<dev>                                          # dev release\n            [-_\\.]?\n            (?P<dev_l>dev)\n            [-_\\.]?\n            (?P<dev_n>[0-9]+)?\n        )?\n    )\n    (?:\\+(?P<local>[a-z0-9]+(?:[-_\\.][a-z0-9]+)*))?       # local version\n\"\"\"\n\n_REGEX = re.compile(r\"^\\s*\" + VERSION_PATTERN + r\"\\s*$\", re.VERBOSE | re.IGNORECASE)\n\n\nclass Version(_BaseVersion):  # lgtm [py/missing-equals]\n\n    __slots__ = ('_version',)\n\n    def __init__(self, version: str) -> None:\n        super().__init__()\n\n        # Validate the version and parse it into pieces\n        match = _REGEX.search(version)\n        if not match:\n            raise InvalidVersion(f\"Invalid version: '{version}'\")\n\n        # Store the parsed out pieces of the version\n        self._version = _Version(\n            epoch=int(match.group(\"epoch\")) if match.group(\"epoch\") else 0,\n            release=tuple(int(i) for i in match.group(\"release\").split(\".\")),\n            pre=_parse_letter_version(match.group(\"pre_l\"), match.group(\"pre_n\")),\n            post=_parse_letter_version(\n                match.group(\"post_l\"), match.group(\"post_n1\") or match.group(\"post_n2\")\n            ),\n            dev=_parse_letter_version(match.group(\"dev_l\"), match.group(\"dev_n\")),\n            local=_parse_local_version(match.group(\"local\")),\n        )\n\n        # Generate a key which will be used for sorting\n        self._key = _cmpkey(\n            self._version.epoch,\n            self._version.release,\n            self._version.pre,\n            self._version.post,\n            self._version.dev,\n            self._version.local,\n        )\n\n    def __repr__(self) -> str:\n        return f\"<Version({repr(str(self))})>\"\n\n    def __str__(self) -> str:\n        parts = []\n\n        # Epoch\n        if self.epoch != 0:\n            parts.append(f\"{self.epoch}!\")\n\n        # Release segment\n        parts.append(\".\".join(str(x) for x in self.release))\n\n        # Pre-release\n        if self.pre is not None:\n            parts.append(\"\".join(str(x) for x in self.pre))\n\n        # Post-release\n        if self.post is not None:\n            parts.append(f\".post{self.post}\")\n\n        # Development release\n        if self.dev is not None:\n            parts.append(f\".dev{self.dev}\")\n\n        # Local version segment\n        if self.local is not None:\n            parts.append(f\"+{self.local}\")\n\n        return \"\".join(parts)\n\n    @property\n    def epoch(self) -> int:\n        _epoch: int = self._version.epoch\n        return _epoch\n\n    @property\n    def release(self) -> Tuple[int, ...]:\n        _release: Tuple[int, ...] = self._version.release\n        return _release\n\n    @property\n    def pre(self) -> Optional[Tuple[str, int]]:\n        _pre: Optional[Tuple[str, int]] = self._version.pre\n        return _pre\n\n    @property\n    def post(self) -> Optional[Tuple[str, int]]:\n        return self._version.post[1] if self._version.post else None\n\n    @property\n    def dev(self) -> Optional[Tuple[str, int]]:\n        return self._version.dev[1] if self._version.dev else None\n\n    @property\n    def local(self) -> Optional[str]:\n        if self._version.local:\n            return \".\".join(str(x) for x in self._version.local)\n        else:\n            return None\n\n    @property\n    def public(self) -> str:\n        return str(self).split(\"+\", 1)[0]\n\n    @property\n    def base_version(self) -> str:\n        parts = []\n\n        # Epoch\n        if self.epoch != 0:\n            parts.append(f\"{self.epoch}!\")\n\n        # Release segment\n        parts.append(\".\".join(str(x) for x in self.release))\n\n        return \"\".join(parts)\n\n    @property\n    def is_prerelease(self) -> bool:\n        return self.dev is not None or self.pre is not None\n\n    @property\n    def is_postrelease(self) -> bool:\n        return self.post is not None\n\n    @property\n    def is_devrelease(self) -> bool:\n        return self.dev is not None\n\n    @property\n    def major(self) -> int:\n        return self.release[0] if len(self.release) >= 1 else 0\n\n    @property\n    def minor(self) -> int:\n        return self.release[1] if len(self.release) >= 2 else 0\n\n    @property\n    def micro(self) -> int:\n        return self.release[2] if len(self.release) >= 3 else 0\n\n\ndef _parse_letter_version(\n    letter: str,\n    number: Union[str, bytes, SupportsInt],\n) -> Optional[Tuple[str, int]]:\n\n    if letter:\n        # We consider there to be an implicit 0 in a pre-release if there is\n        # not a numeral associated with it.\n        if number is None:\n            number = 0\n\n        # We normalize any letters to their lower case form\n        letter = letter.lower()\n\n        # We consider some words to be alternate spellings of other words and\n        # in those cases we want to normalize the spellings to our preferred\n        # spelling.\n        if letter == \"alpha\":\n            letter = \"a\"\n        elif letter == \"beta\":\n            letter = \"b\"\n        elif letter in [\"c\", \"pre\", \"preview\"]:\n            letter = \"rc\"\n        elif letter in [\"rev\", \"r\"]:\n            letter = \"post\"\n\n        return letter, int(number)\n    if not letter and number:\n        # We assume if we are given a number, but we are not given a letter\n        # then this is using the implicit post release syntax (e.g. 1.0-1)\n        letter = \"post\"\n\n        return letter, int(number)\n\n    return None\n\n\n_local_version_separators = re.compile(r\"[\\._-]\")\n\n\ndef _parse_local_version(local: str) -> Optional['LocalType']:\n    \"\"\"\n    Takes a string like abc.1.twelve and turns it into (\"abc\", 1, \"twelve\").\n    \"\"\"\n    if local is not None:\n        return tuple(\n            part.lower() if not part.isdigit() else int(part)\n            for part in _local_version_separators.split(local)\n        )\n    return None\n\n\ndef _cmpkey(\n        epoch: int,\n        release: Tuple[int, ...],\n        pre: Optional[Tuple[str, int]],\n        post: Optional[Tuple[str, int]],\n        dev: Optional[Tuple[str, int]],\n        local: Optional[Tuple['SubLocalType']],\n) -> 'CmpKey':\n\n    # When we compare a release version, we want to compare it with all of the\n    # trailing zeros removed. So we'll use a reverse the list, drop all the now\n    # leading zeros until we come to something non zero, then take the rest\n    # re-reverse it back into the correct order and make it a tuple and use\n    # that for our sorting key.\n    _release = tuple(\n        reversed(list(dropwhile(lambda x: x == 0, reversed(release))))\n    )\n\n    # We need to \"trick\" the sorting algorithm to put 1.0.dev0 before 1.0a0.\n    # We'll do this by abusing the pre segment, but we _only_ want to do this\n    # if there is not a pre or a post segment. If we have one of those then\n    # the normal sorting rules will handle this case correctly.\n    if pre is None and post is None and dev is not None:\n        _pre: PrePostDevType = NegativeInfinity\n    # Versions without a pre-release (except as noted above) should sort after\n    # those with one.\n    elif pre is None:\n        _pre = Infinity\n    else:\n        _pre = pre\n\n    # Versions without a post segment should sort before those with one.\n    if post is None:\n        _post: PrePostDevType = NegativeInfinity\n\n    else:\n        _post = post\n\n    # Versions without a development segment should sort after those with one.\n    if dev is None:\n        _dev: PrePostDevType = Infinity\n\n    else:\n        _dev = dev\n\n    if local is None:\n        # Versions without a local segment should sort before those with one.\n        _local: LocalType = NegativeInfinity\n    else:\n        # Versions with a local segment need that segment parsed to implement\n        # the sorting rules in PEP440.\n        # - Alpha numeric segments sort before numeric segments\n        # - Alpha numeric segments sort lexicographically\n        # - Numeric segments sort numerically\n        # - Shorter versions sort before longer versions when the prefixes\n        #   match exactly\n        _local = tuple(\n            (i, \"\") if isinstance(i, int) else (NegativeInfinity, i) for i in local\n        )\n\n    return epoch, _release, _pre, _post, _dev, _local\n"
  },
  {
    "path": "src/hangar/backends/__init__.py",
    "content": "\"\"\"Definition and dynamic routing to Hangar backend implementations.\n\nThis module defines the available backends for a Hangar installation & provides\ndynamic routing of method calls to the appropriate backend from a stored record\nspecification.\n\nIdentification\n--------------\n\nA two character ascii code identifies which backend/version some record belongs\nto. Valid characters are the union of ``ascii_lowercase``, ``ascii_uppercase``,\nand ``ascii_digits``:\n\n.. centered:: ``abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789``\n\nThough stored as bytes in the backend, we use human readable characters (and not\nunprintable bytes) to aid in human tasks like developer database dumps and\ndebugging. The characters making up the two digit code have the following\nsymantic meanings:\n\n   *  First Character (element 0) indicates the ``backend type`` used.\n\n   *  Second character (element 1) indicates the ``version`` of the backend type\n      which should be used to parse the specification & accesss data (more on\n      this later)\n\nThe number of codes possible (a 2-choice permutation with repetition) is: 3844\nwhich we anticipate to be more then sufficient long into the future. As a\nconvention, the range of values in which the first digit of the code falls into\ncan be used to identify the storage medium location:\n\n   *  Lowercase ``ascii_letters`` & digits ``[0, 1, 2, 3, 4]`` -> reserved for\n      backends handling data on the local disk.\n\n   *  Uppercase ``ascii_letters`` & digits ``[5, 6, 7, 8, 9]`` -> reserved for\n      backends referring to data residing on a remote server.\n\nThis is not a hard and fast rule though, and can be changed in the future if the\nneed arises.\n\nProcess & Guarantees\n--------------------\n\nIn order to maintain backwards compatibility across versions of Hangar into the\nfuture the following ruleset is specified and MUST BE HONORED:\n\n*  When a new backend is proposed, the contributor(s) provide the class with a\n   meaningful name (``HDF5``, ``NUMPY``, ``TILEDB``, etc) identifying the\n   backend to Hangar developers. The review team will provide:\n\n   -  ``backend type`` code\n   -  ``version`` code\n\n   which all records related to that implementation identify themselves with. In\n   addition, Externally facing classes / methods go by a canonical name which is\n   the concatenation of the ``meaningful name`` and assigned ``\"format code\"``\n   ie. for ``backend name: 'NUMPY'`` assigned ``type code: '1'`` and ``version\n   code: '0'`` must start external method/class names with: ``NUMPY_10_foo``\n\n*  Once a new backend is accepted, the code assigned to it is PERMANENT &\n   UNCHANGING. The same code cannot be used in the future for other backends.\n\n*  Each backend independently determines the information it needs to log/store\n   to uniquely identify and retrieve a sample stored by it. There is no standard\n   format, each is free to define whatever fields they find most convenient.\n   Unique encode/decode methods are defined in order to serialize this\n   information to bytes and then reconstruct the information later. These bytes\n   are what are passed in when a retrieval request is made, and returned when a\n   storage request for some piece of data is performed.\n\n*  Once accepted, The record format specified (ie. the byte representation\n   described above) cannot be modified in any way. This must remain permanent!\n\n*  Backend (internal) methods can be updated, optimized, and/or changed at any\n   time so long as:\n\n   *  No changes to the record format specification are introduced\n\n   *  Data stored via any previous iteration of the backend's accessor methods\n      can be retrieved bitwise exactly by the \"updated\" version.\n\nBefore proposing a new backend or making changes to this file, please consider\nreaching out to the Hangar core development team so we can guide you through the\nprocess.\n\"\"\"\nimport string\nfrom typing import Dict\n\nfrom .specs import (\n    HDF5_00_DataHashSpec,\n    HDF5_01_DataHashSpec,\n    NUMPY_10_DataHashSpec,\n    LMDB_30_DataHashSpec,\n    LMDB_31_DataHashSpec,\n    REMOTE_50_DataHashSpec,\n)\nfrom .specparse import backend_decoder\n\nfrom .hdf5_00 import HDF5_00_FileHandles, HDF5_00_Options\nfrom .hdf5_01 import HDF5_01_FileHandles, HDF5_01_Options\nfrom .lmdb_30 import LMDB_30_FileHandles, LMDB_30_Options\nfrom .lmdb_31 import LMDB_31_FileHandles, LMDB_31_Options\nfrom .numpy_10 import NUMPY_10_FileHandles, NUMPY_10_Options\nfrom .remote_50 import REMOTE_50_Handler, REMOTE_50_Options\n\n\nBACKEND_ACCESSOR_MAP = {\n    # LOCALS -> [00:50] + ['aa':'zz']\n    '00': HDF5_00_FileHandles,\n    '01': HDF5_01_FileHandles,\n    '10': NUMPY_10_FileHandles,\n    '30': LMDB_30_FileHandles,\n    '31': LMDB_31_FileHandles,\n    # REMOTES -> [50:99] + ['AA':'ZZ']\n    '50': REMOTE_50_Handler,\n}\n\nBACKEND_OPTIONS_MAP = {\n    '00': HDF5_00_Options,\n    '01': HDF5_01_Options,\n    '10': NUMPY_10_Options,\n    '30': LMDB_30_Options,\n    '31': LMDB_31_Options,\n    '50': REMOTE_50_Options,\n}\n\n_local_prefixes = string.digits[0:5] + string.ascii_lowercase\n\nBACKEND_IS_LOCAL_MAP: Dict[str, bool] = {\n    k: bool(k[0] in _local_prefixes) for k in BACKEND_ACCESSOR_MAP.keys()\n}\n\n__all__ = [\n    'backend_decoder', 'HDF5_00_DataHashSpec', 'HDF5_01_DataHashSpec',\n    'NUMPY_10_DataHashSpec', 'LMDB_30_DataHashSpec', 'REMOTE_50_DataHashSpec',\n    'LMDB_31_DataHashSpec', 'BACKEND_OPTIONS_MAP', 'BACKEND_ACCESSOR_MAP',\n    'BACKEND_IS_LOCAL_MAP',\n]\n"
  },
  {
    "path": "src/hangar/backends/chunk.py",
    "content": "\"\"\"\nPortions of this code have been taken and modified from the \"PyTables\" project.\n\nURL:      https://github.com/PyTables/PyTables\nFile:     tables/leaf.py\nCommit:   1e7b14e87507c2392265321fe18b2f1f5920ea7f\nAccessed: 23 JAN 2020\n\nPyTables License\n-------------------------------------------------------------------------------\nLicense: BSD\nURL:     https://github.com/PyTables/PyTables/blob/1e7b14e875/LICENSE.txt\n\"\"\"\nimport numpy as np\nimport math\n\n\nSizeType = np.int64\n\n\ndef _csformula(expected_mb):\n    \"\"\"Return the fitted chunksize for expected_mb.\"\"\"\n\n    # For a basesize of 8 KB, this will return:\n    # 8 KB for datasets <= 1 MB\n    # 1 MB for datasets >= 10 TB\n    basesize = 8 * 1024  # 8 KB is a good minimum\n    return basesize * int(2 ** math.log10(expected_mb))\n\n\ndef _limit_es(expected_mb):\n    \"\"\"Protection against creating too small or too large chunks.\"\"\"\n\n    if expected_mb < 1:  # < 1 MB\n        expected_mb = 1\n    elif expected_mb > 10 ** 7:  # > 10 TB\n        expected_mb = 10 ** 7\n    return expected_mb\n\n\ndef _calc_chunksize(expected_mb):\n    \"\"\"Compute the optimum HDF5 chunksize for I/O purposes.\n\n    Rational: HDF5 takes the data in bunches of chunksize length to write the\n    on disk. A BTree in memory is used to map structures on disk. The more\n    chunks that are allocated for a dataset the larger the B-tree. Large\n    B-trees take memory and causes file storage overhead as well as more disk\n    I/O and higher contention for the meta data cache.  You have to balance\n    between memory and I/O overhead (small B-trees) and time to access to data\n    (big B-trees). The tuning of the chunksize parameter affects the\n    performance and the memory consumed. This is based on my own experiments\n    and, as always, your mileage may vary.\n    \"\"\"\n\n    expected_mb = _limit_es(expected_mb)\n    zone = int(math.log10(expected_mb))\n    expected_mb = 10 ** zone\n    chunksize = _csformula(expected_mb)\n    # XXX: Multiply by 8 seems optimal for sequential access\n    return chunksize * 24\n\n\ndef _rowsize(shape, maindim, itemsize):\n    \"\"\"\"The size of the rows in bytes in dimensions orthogonal to *maindim*.\"\n\n    shape:\n        Shape of the sample to fit in the row\n\n    maindim:\n        The dimension along which iterators work. Its value is 0 (i.e. the first\n        dimension) when the dataset is not extendable, and self.extdim (where\n        available) for extendable ones.\n\n    itemsize:\n        nbytes of each element\n\n    The meaning of *atomic* is that individual elements of a cell can not be\n    extracted directly by indexing (i.e.  __getitem__()) the dataset; e.g. if a\n    dataset has shape (2, 2) and its atoms have shape (3,), to get the third\n    element of the cell at (1, 0) one should use dataset[1,0][2] instead of\n    dataset[1,0,2].\n    \"\"\"\n    rowsize = itemsize\n    for i, dim in enumerate(shape):\n        if i != maindim:\n            rowsize *= dim\n    return rowsize\n\n\ndef calc_chunkshape(shape, expectedrows, itemsize, maindim):\n    \"\"\"Calculate the shape for the HDF5 chunk.\n\n    shape:\n        Shape of the sample to fit in the row\n\n    expectedrows:\n        how many samples will fit into the file container\n\n    itemsize:\n        nbytes of each element\n\n    maindim:\n        The dimension along which iterators work. Its value is 0 (i.e. the first\n        dimension) when the dataset is not extendable, and self.extdim (where\n        available) for extendable ones.\n\n        may want to set to shape.index(max(shape))\n    \"\"\"\n\n    # In case of a scalar shape, return the unit chunksize\n    if shape == ():\n        return (SizeType(1),)\n\n    MB = 1024 * 1024\n    # if shape is sufficiently small, no need to further chunk\n    # At time of writing, set to be less than 1MB since that is\n    # the limit to hdf5 chunk cache.\n    if ((np.prod(shape) * itemsize) < MB) and (shape != ()):\n        return shape\n\n    # Compute the chunksize\n    rsize = _rowsize(shape, maindim, itemsize)\n    expected_mb = (expectedrows * rsize) // MB\n    chunksize = _calc_chunksize(expected_mb)\n\n    # Compute the chunknitems\n    chunknitems = chunksize // itemsize\n    # Safeguard against itemsizes being extremely large\n    if chunknitems == 0:\n        chunknitems = 1\n    chunkshape = list(shape)\n    # Check whether trimming the main dimension is enough\n    chunkshape[maindim] = 1\n    newchunknitems = np.prod(chunkshape, dtype=SizeType)\n    if newchunknitems <= chunknitems:\n        chunkshape[maindim] = chunknitems // newchunknitems\n    else:\n        # No, so start trimming other dimensions as well\n        for j in range(len(chunkshape)):\n            # Check whether trimming this dimension is enough\n            chunkshape[j] = 1\n            newchunknitems = np.prod(chunkshape, dtype=SizeType)\n            if newchunknitems <= chunknitems:\n                chunkshape[j] = chunknitems // newchunknitems\n                break\n        else:\n            # Ops, we ran out of the loop without a break\n            # Set the last dimension to chunknitems\n            chunkshape[-1] = chunknitems\n\n    # safeguard outputing chunks which are larger than shape\n    if chunkshape[maindim] > shape[maindim]:\n        chunkshape[maindim] = shape[maindim]\n\n    return tuple(SizeType(s) for s in chunkshape)\n"
  },
  {
    "path": "src/hangar/backends/hdf5_00.py",
    "content": "\"\"\"Local HDF5 Backend Implementation, Identifier: ``HDF5_00``\n\nBackend Identifiers\n===================\n\n*  Backend: ``0``\n*  Version: ``0``\n*  Format Code: ``00``\n*  Canonical Name: ``HDF5_00``\n\nStorage Method\n==============\n\n*  Data is written to specific subarray indexes inside an HDF5 \"dataset\" in a\n   single HDF5 File.\n\n*  In each HDF5 File there are ``COLLECTION_COUNT`` \"datasets\" (named ``[\"0\" :\n   \"{COLLECTION_COUNT}\"]``). These are referred to as ``\"dataset number\"``\n\n*  Each dataset is a zero-initialized array of:\n\n   *  ``dtype: {schema_dtype}``; ie ``np.float32`` or ``np.uint8``\n\n   *  ``shape: (COLLECTION_SIZE, *{schema_shape.size})``; ie ``(500, 10)`` or\n      ``(500, 300)``. The first index in the dataset is referred to as a\n      ``collection index``. See technical note below for detailed explanation\n      on why the flatten operaiton is performed.\n\n*  Compression Filters, Chunking Configuration/Options are applied globally for\n   all ``datasets`` in a file at dataset creation time.\n\n*  On read and write of all samples the xxhash64_hexdigest is calculated for\n   the raw array bytes. This is to ensure that all data in == data out of the\n   hdf5 files. That way even if a file is manually edited (bypassing fletcher32\n   filter check) we have a quick way to tell that things are not as they should\n   be.\n\nCompression Options\n===================\n\nAccepts dictionary containing keys\n\n*  ``backend`` == ``\"00\"``\n*  ``complib``\n*  ``complevel``\n*  ``shuffle``\n\nBlosc-HDF5\n\n*  ``complib`` valid values:\n\n   *  ``'blosc:blosclz'``,\n   *  ``'blosc:lz4'``,\n   *  ``'blosc:lz4hc'``,\n   *  ``'blosc:zlib'``,\n   *  ``'blosc:zstd'``\n\n*  ``complevel`` valid values: [0, 9] where 0 is \"no compression\" and 9 is\n   \"most compression\"\n\n*  ``shuffle`` valid values:\n\n   *  ``None``\n   *  ``'none'``\n   *  ``'byte'``\n   *  ``'bit'``\n\n\nLZF Filter\n\n*  ``'complib' == 'lzf'``\n*  ``'shuffle'`` one of ``[False, None, 'none', True, 'byte']``\n*  ``'complevel'`` one of ``[False, None, 'none']``\n\nGZip Filter\n\n*  ``'complib' == 'gzip'``\n*  ``'shuffle'`` one of ``[False, None, 'none', True, 'byte']``\n*  ``complevel`` valid values: [0, 9] where 0 is \"no compression\" and 9 is\n   \"most compression\"\n\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  File UID\n*  xxhash64_hexdigest (ie. checksum)\n*  Dataset Number (``0:COLLECTION_COUNT`` dataset selection)\n*  Dataset Index (``0:COLLECTION_SIZE`` dataset subarray selection)\n*  Subarray Shape\n\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  Array shape (Subarray Shape): (10, 10)\n    *  File UID: \"rlUK3C\"\n    *  xxhash64_hexdigest: 8067007c0f05c359\n    *  Dataset Number: 16\n    *  Collection Index: 105\n\n    ``Record Data => \"00:rlUK3C:8067007c0f05c359:16:105:10 10\"``\n\n1)  Adding to a piece of data to a the middle of a file:\n\n    *  Array shape (Subarray Shape): (20, 2, 3)\n    *  File UID: \"rlUK3C\"\n    *  xxhash64_hexdigest: b89f873d3d153a9c\n    *  Dataset Number: \"3\"\n    *  Collection Index: 199\n\n    ``Record Data => \"00:rlUK3C:b89f873d3d153a9c:8:199:20 2 3\"``\n\n\nTechnical Notes\n===============\n\n*  Files are read only after initial creation/writes. Only a write-enabled\n   checkout can open a HDF5 file in ``\"w\"`` or ``\"a\"`` mode, and writer\n   checkouts create new files on every checkout, and make no attempt to fill in\n   unset locations in previous files. This is not an issue as no disk space is\n   used until data is written to the initially created \"zero-initialized\"\n   collection datasets\n\n*  On write: Single Writer Multiple Reader (``SWMR``) mode is set to ensure that\n   improper closing (not calling ``.close()``) method does not corrupt any data\n   which had been previously flushed to the file.\n\n*  On read: SWMR is set to allow multiple readers (in different threads /\n   processes) to read from the same file. File handle serialization is handled\n   via custom python ``pickle`` serialization/reduction logic which is\n   implemented by the high level ``pickle`` reduction ``__set_state__()``,\n   ``__get_state__()`` class methods.\n\n*  An optimization is performed in order to increase the read / write\n   performance of variable shaped datasets. Due to the way that we initialize\n   an entire HDF5 file with all datasets pre-created (to the size of the max\n   subarray shape), we need to ensure that storing smaller sized arrays (in a\n   variable sized Hangar Column) would be effective. Because we use chunked\n   storage, certain dimensions which are incomplete could have potentially\n   required writes to chunks which do are primarily empty (worst case \"C\" index\n   ordering), increasing read / write speeds significantly.\n\n   To overcome this, we create HDF5 datasets which have ``COLLECTION_SIZE``\n   first dimension size, and only ONE second dimension of size\n   ``schema_shape.size()`` (ie. product of all dimensions). For example an\n   array schema with shape (10, 10, 3) would be stored in a HDF5 dataset of\n   shape (COLLECTION_SIZE, 300). Chunk sizes are chosen to align on the first\n   dimension with a second dimension of size which fits the total data into L2\n   CPU Cache (< 256 KB). On write, we use the ``np.ravel`` function to\n   construct a \"view\" (not copy) of the array as a 1D array, and then on read\n   we reshape the array to the recorded size (a copyless \"view-only\"\n   operation). This is part of the reason that we only accept C ordered arrays\n   as input to Hangar.\n\"\"\"\nimport logging\nimport os\nfrom collections import ChainMap\nfrom contextlib import suppress\nfrom functools import partial\nfrom pathlib import Path\nfrom typing import MutableMapping, Tuple, Optional, Union, Callable\n\nimport h5py\nimport numpy as np\n\ntry:\n    # hdf5plugin warns if a filter is already loaded.\n    _logger = logging.getLogger('hdf5plugin')\n    _initialLevel = _logger.getEffectiveLevel()\n    _logger.setLevel(logging.ERROR)\n    import hdf5plugin\n    if 'blosc' not in hdf5plugin.FILTERS:\n        raise ImportError(f'BLOSC unavailable via hdf5plugin: {hdf5plugin.FILTERS}')\nfinally:\n    _logger.setLevel(_initialLevel)\nfrom xxhash import xxh64_hexdigest\n\nfrom .specs import HDF5_00_DataHashSpec\nfrom .. import __version__\nfrom ..optimized_utils import SizedDict\nfrom ..constants import DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE, DIR_DATA\nfrom ..utils import random_string, set_blosc_nthreads\nfrom ..optimized_utils import find_next_prime\nfrom ..op_state import reader_checkout_only, writer_checkout_only\nfrom ..typesystem import Descriptor, OneOf, DictItems, SizedIntegerTuple, checkedmeta\n\nset_blosc_nthreads()\n\n# ----------------------------- Configuration ---------------------------------\n\n_FmtCode = '00'\n\n# contents of a single hdf5 file\nCOLLECTION_SIZE = 250\nCOLLECTION_COUNT = 100\n\n# chunking options for compression schemes\nCHUNK_MAX_NBYTES = 255_000  # < 256 KB to fit in L2 CPU Cache\nCHUNK_MAX_RDCC_NBYTES = 100_000_000\nCHUNK_RDCC_W0 = 0.75\n\n# -------------------------------- Parser Implementation ----------------------\n\n\ndef hdf5_00_encode(uid: str, cksum: str, dset: int, dset_idx: int, shape: Tuple[int]) -> bytes:\n    \"\"\"converts the hdf5 data has spec to an appropriate db value\n\n    Parameters\n    ----------\n    uid : str\n        the file name prefix which the data is written to.\n    cksum : int\n        xxhash_64.hex_digest checksum of the data bytes in numpy array form.\n    dset : int\n        collection (ie. hdf5 dataset) name to find this data piece.\n    dset_idx : int\n        collection first axis index in which this data piece resides.\n    shape : Tuple[int]\n        shape of the data sample written to the collection idx. ie:\n        what subslices of the hdf5 dataset should be read to retrieve\n        the sample as recorded.\n\n    Returns\n    -------\n    bytes\n        hash data db value recording all input specifications.\n    \"\"\"\n    shape_str = \" \".join([str(i) for i in shape])\n    return f'00:{uid}:{cksum}:{dset}:{dset_idx}:{shape_str}'.encode()\n\n\n# ------------------------- Accessor Object -----------------------------------\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': True, 'shuffle': True},\n    expected_values={\n        'complib': ['blosc:blosclz', 'blosc:lz4','blosc:lz4hc', 'blosc:zlib', 'blosc:zstd'],\n        'complevel': [i for i in range(10)],\n        'shuffle': [None, 'none', 'byte', 'bit']})\nclass BloscCompressionOptions(Descriptor):\n    pass\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': True, 'shuffle': True},\n    expected_values={\n        'complib': ['gzip'], 'complevel': [i for i in range(10)], 'shuffle': [True, False]})\nclass GzipCompressionOptions(Descriptor):\n    pass\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': False, 'shuffle': True},\n    expected_values={\n        'complib': ['lzf'], 'complevel': ['none', None], 'shuffle': [True, False]})\nclass LzfCompressionOptions(Descriptor):\n    pass\n\n\n@OneOf(list(map(lambda x: np.dtype(x).name, [\n        np.bool, np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16,\n        np.int32, np.int64, np.float16, np.float32, np.float64, np.longdouble])))\nclass AllowedDtypes(Descriptor):\n    \"\"\"\n    Note. np.longdouble since np.float128 not guaranteed to be available on\n    all system. this is a particular issue with some windows numpy builds\n    \"\"\"\n    pass\n\n\nclass HDF5_00_Options(metaclass=checkedmeta):\n    _shape = SizedIntegerTuple(size=32)\n    _dtype = AllowedDtypes()\n    _lzf = LzfCompressionOptions()\n    _gzip = GzipCompressionOptions()\n    _blosc = BloscCompressionOptions()\n    _avail_filters = ('_lzf', '_gzip', '_blosc')\n\n    def __init__(self, backend_options, dtype, shape, *args, **kwargs):\n        self._shape = shape\n        self._dtype = dtype\n        self._selected_filter = None\n        if backend_options is None:\n            backend_options = self.default_options\n\n        for filter_attr in self._avail_filters:\n            with suppress((KeyError, ValueError)):\n                setattr(self, filter_attr, backend_options)\n                self._selected_filter = filter_attr\n                break\n        else:  # N.B. for-else loop (ie. \"no-break\")\n            raise ValueError(f'Invalid backend_options {backend_options}')\n        self._verify_data_nbytes_larger_than_clib_min()\n\n    def _verify_data_nbytes_larger_than_clib_min(self):\n        \"\"\"blosc clib should not be used if data buffer size < 16 bytes.\n\n        Raises\n        ------\n        ValueError:\n            if the data size is not valid for the clib\n        \"\"\"\n        if self._selected_filter in ['_blosc', None]:\n            num_items = np.prod(self._shape)\n            itemsize = np.dtype(self._dtype).itemsize\n            nbytes = itemsize * num_items\n            if nbytes <= 16:\n                raise ValueError(f'blosc clib requires data buffer size > 16 bytes')\n\n    @property\n    def default_options(self):\n        if 'blosc' in hdf5plugin.FILTERS:\n            try:\n                self._verify_data_nbytes_larger_than_clib_min()\n                return {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'}\n            except ValueError:\n                pass\n        return {'complib': 'lzf', 'complevel': None, 'shuffle': True}\n\n    @property\n    def backend_options(self):\n        return getattr(self, self._selected_filter)\n\n    @property\n    def init_requires(self):\n        return ('repo_path', 'schema_shape', 'schema_dtype')\n\n\nHDF5_00_MapTypes = MutableMapping[str, Union[h5py.File, Callable[[], h5py.File]]]\n\n\nclass HDF5_00_FileHandles(object):\n    \"\"\"Manage HDF5 file handles.\n\n    When in SWMR-write mode, no more than a single file handle can be in the\n    \"writeable\" state. This is an issue where multiple columns may need to\n    write to the same column schema.\n    \"\"\"\n\n    def __init__(self, repo_path: Path, schema_shape: tuple, schema_dtype: np.dtype):\n        self.path: Path = repo_path\n        self.schema_shape: tuple = schema_shape\n        self.schema_dtype: np.dtype = schema_dtype\n        self._dflt_backend_opts: Optional[dict] = None\n\n        self.rFp: HDF5_00_MapTypes = {}\n        self.wFp: HDF5_00_MapTypes = {}\n        self.Fp: HDF5_00_MapTypes = ChainMap(self.rFp, self.wFp)\n        self.rDatasets = SizedDict(maxsize=100)\n        self.wdset: Optional[h5py.Dataset] = None\n\n        self.mode: Optional[str] = None\n        self.hIdx: Optional[int] = None\n        self.w_uid: Optional[str] = None\n        self.hMaxSize: Optional[int] = None\n        self.hNextPath: Optional[int] = None\n        self.hColsRemain: Optional[int] = None\n\n        self.STAGEDIR: Path = Path(self.path, DIR_DATA_STAGE, _FmtCode)\n        self.REMOTEDIR: Path = Path(self.path, DIR_DATA_REMOTE, _FmtCode)\n        self.STOREDIR: Path = Path(self.path, DIR_DATA_STORE, _FmtCode)\n        self.DATADIR: Path = Path(self.path, DIR_DATA, _FmtCode)\n        self.DATADIR.mkdir(exist_ok=True)\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, *exc):\n        if self.w_uid in self.wFp:\n            self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n            self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n            self.wFp[self.w_uid].flush()\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        del state['rFp']\n        del state['wFp']\n        del state['Fp']\n        del state['rDatasets']\n        del state['wdset']\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n        self.rDatasets = {}\n        self.wdset = None\n        self.open(mode=self.mode)\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode: str, *, remote_operation: bool = False):\n        \"\"\"Open an hdf5 file handle in the Handler Singleton\n\n        Parameters\n        ----------\n        mode : str\n            one of `r` or `a` for read only / read-write.\n        remote_operation : optional, kwarg only, bool\n            if this hdf5 data is being created from a remote fetch operation, then\n            we don't open any files for reading, and only open files for writing\n            which exist in the remote data dir. (default is false, which means that\n            write operations use the stage data dir and read operations use data store\n            dir)\n        \"\"\"\n        self.mode = mode\n        if self.mode == 'a':\n            process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n            process_dir.mkdir(exist_ok=True)\n            for uidpth in process_dir.iterdir():\n                if uidpth.suffix == '.hdf5':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(\n                        h5py.File, file_pth, 'r', swmr=True, libver='latest')\n\n        if not remote_operation:\n            if not self.STOREDIR.is_dir():\n                return\n            for uidpth in self.STOREDIR.iterdir():\n                if uidpth.suffix == '.hdf5':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(\n                        h5py.File, file_pth, 'r', swmr=True, libver='latest')\n\n    def close(self):\n        \"\"\"Close a file handle after writes have been completed\n\n        behavior changes depending on write-enable or read-only file\n\n        Returns\n        -------\n        bool\n            True if success, otherwise False.\n        \"\"\"\n        if self.mode == 'a':\n            if self.w_uid in self.wFp:\n                self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n                self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n                self.wFp[self.w_uid].flush()\n            for uid in list(self.wFp.keys()):\n                with suppress(AttributeError):\n                    self.wFp[uid].close()\n                del self.wFp[uid]\n            self.wdset = None\n            self.hMaxSize = None\n            self.hNextPath = None\n            self.hIdx = None\n            self.hColsRemain = None\n            self.w_uid = None\n\n        for uid in list(self.rFp.keys()):\n            with suppress(AttributeError):\n                self.rFp[uid].close()\n            del self.rFp[uid]\n        self.rDatasets = {}\n\n    @staticmethod\n    def delete_in_process_data(repo_path: Path, *, remote_operation=False) -> None:\n        \"\"\"Removes some set of files entirely from the stage/remote directory.\n\n        DANGER ZONE. This should essentially only be used to perform hard resets\n        of the repository state.\n\n        Parameters\n        ----------\n        repo_path : Path\n            path to the repository on disk\n        remote_operation : optional, kwarg only, bool\n            If true, modify contents of the remote_dir, if false (default) modify\n            contents of the staging directory.\n        \"\"\"\n        data_dir = Path(repo_path, DIR_DATA, _FmtCode)\n        PDIR = DIR_DATA_STAGE if not remote_operation else DIR_DATA_REMOTE\n        process_dir = Path(repo_path, PDIR, _FmtCode)\n        if not process_dir.is_dir():\n            return\n\n        for uidpth in process_dir.iterdir():\n            if uidpth.suffix == '.hdf5':\n                os.remove(process_dir.joinpath(uidpth.name))\n                os.remove(data_dir.joinpath(uidpth.name))\n        os.rmdir(process_dir)\n\n    @staticmethod\n    def _dataset_opts(complib: str, complevel: int, shuffle: Union[bool, str]) -> dict:\n        \"\"\"specify compression options for the hdf5 dataset.\n\n        .. seealso:: :function:`_blosc_opts`\n\n        to enable blosc compression, use the conda-forge `blosc-hdf5-plugin` package.\n\n        .. seealso::\n\n        * https://github.com/conda-forge/staged-recipes/pull/7650\n        * https://github.com/h5py/h5py/issues/611\n\n        Parameters\n        ----------\n        complib : str\n            the compression lib to use, one of ['lzf', 'gzip', 'blosc:blosclz',\n            'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd']\n        complevel : int\n            compression level to specify (accepts values [0, 9] for all except 'lzf'\n            where no complevel is accepted)\n        shuffle : bool\n            if True or `byte`, enable byte shuffle filter, if blosc\n            compression, pass through 'bits' is accepted as well. False, or\n            None indicates no shuffle should be applied.\n        \"\"\"\n        # ---- blosc hdf5 plugin filters ----\n        _blosc_compression = {\n            'blosc:blosclz': 0,\n            'blosc:lz4': 1,\n            'blosc:lz4hc': 2,\n            # Not built 'snappy': 3,\n            'blosc:zlib': 4,\n            'blosc:zstd': 5}\n        _blosc_shuffle = {None: 0, 'none': 0, 'byte': 1, 'bit': 2}\n        _blosc_complevel = {**{i: i for i in range(10)}, None: 9, 'none': 9}\n\n        # ---- h5py built in filters ----\n        _lzf_gzip_shuffle = {None: False, False: False, 'none': False, True: True, 'byte': True}\n        _lzf_complevel = {False: None, None: None, 'none': None}\n        _gzip_complevel = {**{i: i for i in range(10)}, None: 4, 'none': 4}\n\n        if complib.startswith('blosc'):\n            args = {\n                'compression': 32001,\n                'compression_opts': (\n                    0, 0, 0, 0,\n                    _blosc_complevel[complevel],\n                    _blosc_shuffle[shuffle],\n                    _blosc_compression[complib]),\n                'shuffle': False}\n        elif complib == 'lzf':\n            args = {\n                'shuffle': _lzf_gzip_shuffle[shuffle],\n                'compression': complib,\n                'compression_opts': _lzf_complevel[complevel]}\n        elif complib == 'gzip':\n            args = {\n                'shuffle': _lzf_gzip_shuffle[shuffle],\n                'compression': complib,\n                'compression_opts': _gzip_complevel[complevel]}\n        elif complib in (None, False, 'none'):\n            args = {\n                'shuffle': False,\n                'compression': None,\n                'compression_opts': None}\n        else:\n            raise ValueError(f'unknown value for opt arg `complib`: {complib}')\n        return args\n\n    @staticmethod\n    def _chunk_opts(sample_array: np.ndarray, max_chunk_nbytes: int) -> Tuple[list, int]:\n        \"\"\"Determine the chunk shape so each array chunk fits into configured nbytes.\n\n        Currently the chunk nbytes are not user configurable. Instead the constant\n        `HDF5_MAX_CHUNK_NBYTES` is sued to determine when to split.\n\n        Parameters\n        ----------\n        sample_array : `np.array`\n            Sample array whose shape and dtype should be used as the basis of the\n            chunk shape determination\n        max_chunk_nbytes : int\n            how many bytes the array chunks should be limited to.\n\n        Returns\n        -------\n        list\n            list of ints of length == rank of `sample_array` specifying chunk sizes\n            to split `sample_array` into nbytes\n        int\n            nbytes which the chunk will fit in. Will be <= `HDF5_MAX_CHUNK_NBYTES`\n        \"\"\"\n        chunk_size = int(np.floor(max_chunk_nbytes / sample_array.itemsize))\n        if chunk_size > sample_array.size:\n            chunk_size = sample_array.size\n        chunk_shape = [chunk_size]\n        chunk_nbytes = np.zeros(shape=chunk_shape, dtype=sample_array.dtype).nbytes\n\n        return (chunk_shape, chunk_nbytes)\n\n    def _create_schema(self, *, remote_operation: bool = False):\n        \"\"\"stores the shape and dtype as the schema of a column.\n\n        Parameters\n        ----------\n        remote_operation : optional, kwarg only, bool\n            if this schema is being created from a remote fetch operation, then do not\n            place the file symlink in the staging directory. Instead symlink it\n            to a special remote staging directory. (default is False, which places the\n            symlink in the stage data directory.)\n\n        Notes\n        -----\n\n        Parameters set for raw-data-chunk-cache (rdcc) values:\n\n        * rdcc_nbytes: sets the total size (measured in bytes) of the raw data chunk\n          cache for each dataset. This should be set to the size of each chunk times\n          the number of chunks that are likely to be needed in cache.\n        * rdcc_w0: sets the policy for chunks to be removed from the cache when more\n          space is needed. If set to 0, always evict the least recently used chunk in\n          cache. If set to 1, always evict the least recently used chunk which has\n          been fully read or written. If the value is between 0 and 1, the behavior\n          will be a blend of the two.\n        * rdcc_nslots: The number of chunk slots in the cache for this entire file.\n          In order for quick lookup, a hash map is used for each chunk value. For\n          maximum performance, this value should be set approximately 100 times that\n          number of chunks.\n\n        .. seealso::\n\n            http://docs.h5py.org/en/stable/high/file.html#chunk-cache\n\n        \"\"\"\n        # -------------------- Chunk & RDCC Vals ------------------------------\n\n        sample_array = np.zeros(self.schema_shape, dtype=self.schema_dtype)\n        chunk_shape, chunk_nbytes = self._chunk_opts(\n            sample_array=sample_array, max_chunk_nbytes=CHUNK_MAX_NBYTES)\n\n        rdcc_nbytes_val = sample_array.nbytes * COLLECTION_SIZE\n        if rdcc_nbytes_val < CHUNK_MAX_NBYTES:\n            rdcc_nbytes_val = CHUNK_MAX_NBYTES\n        elif rdcc_nbytes_val > CHUNK_MAX_RDCC_NBYTES:\n            rdcc_nbytes_val = CHUNK_MAX_RDCC_NBYTES\n\n        rdcc_nslots_guess = np.math.ceil(rdcc_nbytes_val / chunk_nbytes) * 100\n        rdcc_nslots_prime_val = find_next_prime(rdcc_nslots_guess)\n\n        # ---------------------------- File Creation --------------------------\n\n        uid = random_string()\n        file_path = self.DATADIR.joinpath(f'{uid}.hdf5')\n        self.wFp[uid] = h5py.File(file_path,\n                                  mode='w',\n                                  libver='latest',\n                                  rdcc_nbytes=rdcc_nbytes_val,\n                                  rdcc_w0=CHUNK_RDCC_W0,\n                                  rdcc_nslots=rdcc_nslots_prime_val)\n        self.w_uid = uid\n        self.wdset = None\n        self.hNextPath = 0\n        self.hIdx = 0\n        self.hColsRemain = COLLECTION_COUNT\n        self.hMaxSize = COLLECTION_SIZE\n\n        process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n        Path(process_dir, f'{uid}.hdf5').touch()\n\n        # ----------------------- Dataset Creation ----------------------------\n\n        optKwargs = self._dataset_opts(**self._dflt_backend_opts)\n        for dset_num in range(COLLECTION_COUNT):\n            self.wFp[uid].create_dataset(\n                f'/{dset_num}',\n                shape=(COLLECTION_SIZE, sample_array.size),\n                dtype=sample_array.dtype,\n                maxshape=(COLLECTION_SIZE, sample_array.size),\n                chunks=(1, *chunk_shape),\n                **optKwargs)\n\n        # ---------------------- Attribute Config Vals ------------------------\n\n        self.wFp[self.w_uid]['/'].attrs['HANGAR_VERSION'] = __version__\n        self.wFp[self.w_uid]['/'].attrs['schema_shape'] = sample_array.shape\n        self.wFp[self.w_uid]['/'].attrs['schema_dtype_num'] = sample_array.dtype.num\n        self.wFp[self.w_uid]['/'].attrs['next_location'] = (0, 0)\n        self.wFp[self.w_uid]['/'].attrs['collection_max_size'] = COLLECTION_SIZE\n        self.wFp[self.w_uid]['/'].attrs['collection_total'] = COLLECTION_COUNT\n        self.wFp[self.w_uid]['/'].attrs['collections_remaining'] = COLLECTION_COUNT\n        self.wFp[self.w_uid]['/'].attrs['rdcc_nbytes'] = rdcc_nbytes_val\n        self.wFp[self.w_uid]['/'].attrs['rdcc_w0'] = CHUNK_RDCC_W0\n        self.wFp[self.w_uid]['/'].attrs['rdcc_nslots'] = rdcc_nslots_prime_val\n        self.wFp[self.w_uid]['/'].attrs['chunk_shape'] = chunk_shape\n        if optKwargs['compression_opts'] is not None:\n            self.wFp[self.w_uid]['/'].attrs['compression_opts'] = optKwargs['compression_opts']\n        else:\n            self.wFp[self.w_uid]['/'].attrs['compression_opts'] = False\n\n        self.wFp[self.w_uid].flush()\n        try:\n            self.wFp[self.w_uid].swmr_mode = True\n        except ValueError:\n            assert self.wFp[self.w_uid].swmr_mode is True\n        self.wdset = self.wFp[self.w_uid][f'/{self.hNextPath}']\n\n    def read_data(self, hashVal: HDF5_00_DataHashSpec) -> np.ndarray:\n        \"\"\"Read data from an hdf5 file handle at the specified locations\n\n        Parameters\n        ----------\n        hashVal : HDF5_00_DataHashSpec\n            record specification parsed from its serialized store val in lmdb.\n\n        Returns\n        -------\n        np.array\n            requested data.\n        \"\"\"\n        arrSize = 1\n        for dim in hashVal.shape:\n            arrSize *= dim\n        srcSlc = (hashVal.dataset_idx, slice(0, arrSize))\n        dsetCol = f'/{hashVal.dataset}'\n        rdictkey = f'{hashVal.uid}{dsetCol}'\n\n        if self.schema_dtype:  # if is not None\n            destArr = np.empty((arrSize,), self.schema_dtype)\n            if rdictkey in self.rDatasets:\n                self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n            else:\n                try:\n                    self.Fp[hashVal.uid][dsetCol].read_direct(destArr, srcSlc, None)\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except TypeError:\n                    self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                    self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n                except KeyError:\n                    process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n                    if Path(process_dir, f'{hashVal.uid}.hdf5').is_file():\n                        file_pth = self.DATADIR.joinpath(f'{hashVal.uid}.hdf5')\n                        self.rFp[hashVal.uid] = h5py.File(file_pth, 'r', swmr=True, libver='latest')\n                        self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                        self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n                    else:\n                        raise\n        else:\n            if rdictkey in self.rDatasets:\n                destArr = self.rDatasets[rdictkey][srcSlc]\n            else:\n                try:\n                    destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except TypeError:\n                    self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n                    destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except KeyError:\n                    process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n                    if Path(process_dir, f'{hashVal.uid}.hdf5').is_file():\n                        file_pth = self.DATADIR.joinpath(f'{hashVal.uid}.hdf5')\n                        self.rFp[hashVal.uid] = h5py.File(file_pth, 'r', swmr=True, libver='latest')\n                        destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                        self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                    else:\n                        raise\n\n        out = destArr.reshape(hashVal.shape)\n        if xxh64_hexdigest(out) != hashVal.checksum:\n            # try casting to check if dtype does not match for all zeros case\n            out = out.astype(np.typeDict[self.Fp[hashVal.uid]['/'].attrs['schema_dtype_num']])\n            if xxh64_hexdigest(out) != hashVal.checksum:\n                raise RuntimeError(\n                    f'DATA CORRUPTION Checksum {xxh64_hexdigest(out)} != recorded {hashVal}')\n        return out\n\n    def write_data(self, array: np.ndarray, *, remote_operation: bool = False) -> bytes:\n        \"\"\"verifies correctness of array data and performs write operation.\n\n        Parameters\n        ----------\n        array : np.ndarray\n            tensor to write to group.\n        remote_operation : optional, kwarg only, bool\n            If this is a remote process which is adding data, any necessary\n            hdf5 dataset files will be created in the remote data dir instead\n            of the stage directory. (default is False, which is for a regular\n            access process)\n\n        Returns\n        -------\n        bytes\n            string identifying the collection dataset and collection dim-0 index\n            which the array can be accessed at.\n        \"\"\"\n        checksum = xxh64_hexdigest(array)\n        if self.w_uid in self.wFp:\n            self.hIdx += 1\n            if self.hIdx >= self.hMaxSize:\n                self.wdset.flush()\n                self.hIdx = 0\n                self.hNextPath += 1\n                self.hColsRemain -= 1\n                self.wdset = self.wFp[self.w_uid][f'/{self.hNextPath}']\n                if self.hColsRemain <= 1:\n                    self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n                    self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n                    self.wFp[self.w_uid].flush()\n                    self._create_schema(remote_operation=remote_operation)\n        else:\n            self._create_schema(remote_operation=remote_operation)\n\n        destSlc = (self.hIdx, slice(0, array.size))\n        flat_arr = np.ravel(array)\n        self.wdset.write_direct(flat_arr, None, destSlc)\n        self.wdset.flush()\n        return hdf5_00_encode(self.w_uid, checksum, self.hNextPath, self.hIdx, array.shape)\n"
  },
  {
    "path": "src/hangar/backends/hdf5_01.py",
    "content": "\"\"\"Local HDF5 Backend Implementation, Identifier: ``HDF5_01``\n\nBackend Identifiers\n===================\n\n*  Backend: ``0``\n*  Version: ``1``\n*  Format Code: ``01``\n*  Canonical Name: ``HDF5_01``\n\nStorage Method\n==============\n\n*  This module is meant to handle larger datasets which are of fixed size. IO\n   and significant compression optimization is achieved by storing arrays at\n   their appropriate top level index in the same shape they naturally assume\n   and chunking over the entire subarray domain making up a sample (rather than\n   having to subdivide chunks when the sample could be variably shaped.)\n\n*  Data is written to specific subarray indexes inside an HDF5 \"dataset\" in a\n   single HDF5 File.\n\n*  In each HDF5 File there are ``COLLECTION_COUNT`` \"datasets\" (named ``[\"0\" :\n   \"{COLLECTION_COUNT}\"]``). These are referred to as ``\"dataset number\"``\n\n*  Each dataset is a zero-initialized array of:\n\n   *  ``dtype: {schema_dtype}``; ie ``np.float32`` or ``np.uint8``\n\n   *  ``shape: (COLLECTION_SIZE, *{schema_shape})``; ie ``(500, 10, 10)`` or\n      ``(500, 512, 512, 320)``. The first index in the dataset is referred to as a\n      ``collection index``.\n\n*  Compression Filters, Chunking Configuration/Options are applied globally for\n   all ``datasets`` in a file at dataset creation time.\n\n*  On read and write of all samples the xxhash64_hexdigest is calculated for\n   the raw array bytes. This is to ensure that all data in == data out of the\n   hdf5 files. That way even if a file is manually edited (bypassing fletcher32\n   filter check) we have a quick way to tell that things are not as they should\n   be.\n\nCompression Options\n===================\n\nAccepts dictionary containing keys\n\n*  ``backend`` == ``\"01\"``\n*  ``complib``\n*  ``complevel``\n*  ``shuffle``\n\nBlosc-HDF5\n\n*  ``complib`` valid values:\n\n   *  ``'blosc:blosclz'``,\n   *  ``'blosc:lz4'``,\n   *  ``'blosc:lz4hc'``,\n   *  ``'blosc:zlib'``,\n   *  ``'blosc:zstd'``\n\n*  ``complevel`` valid values: [0, 9] where 0 is \"no compression\" and 9 is\n   \"most compression\"\n\n*  ``shuffle`` valid values:\n\n   *  ``None``\n   *  ``'none'``\n   *  ``'byte'``\n   *  ``'bit'``\n\n\nLZF Filter\n\n*  ``'complib' == 'lzf'``\n*  ``'shuffle'`` one of ``[False, None, 'none', True, 'byte']``\n*  ``'complevel'`` one of ``[False, None, 'none']``\n\nGZip Filter\n\n*  ``'complib' == 'gzip'``\n*  ``'shuffle'`` one of ``[False, None, 'none', True, 'byte']``\n*  ``complevel`` valid values: [0, 9] where 0 is \"no compression\" and 9 is\n   \"most compression\"\n\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  File UID\n*  xxhash64_hexdigest (ie. checksum)\n*  Dataset Number (``0:COLLECTION_COUNT`` dataset selection)\n*  Dataset Index (``0:COLLECTION_SIZE`` dataset subarray selection)\n*  Subarray Shape\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  Array shape (Subarray Shape): (10, 10)\n    *  File UID: \"rlUK3C\"\n    *  xxhash64_hexdigest: 8067007c0f05c359\n    *  Dataset Number: 16\n    *  Collection Index: 105\n\n    ``Record Data => \"01:rlUK3C:8067007c0f05c359:16:105:10 10\"``\n\n1)  Adding to a piece of data to a the middle of a file:\n\n    *  Array shape (Subarray Shape): (20, 2, 3)\n    *  File UID: \"rlUK3C\"\n    *  xxhash64_hexdigest: b89f873d3d153a9c\n    *  Dataset Number: \"3\"\n    *  Collection Index: 199\n\n    ``Record Data => \"01:rlUK3C:b89f873d3d153a9c:8:199:20 2 3\"``\n\n\nTechnical Notes\n===============\n\n*  The majority of methods not directly related to \"chunking\" and the \"raw data\n   chunk cache\" are either identical to HDF5_00, or only slightly modified.\n\n*  Files are read only after initial creation/writes. Only a write-enabled\n   checkout can open a HDF5 file in ``\"w\"`` or ``\"a\"`` mode, and writer\n   checkouts create new files on every checkout, and make no attempt to fill in\n   unset locations in previous files. This is not an issue as no disk space is\n   used until data is written to the initially created \"zero-initialized\"\n   collection datasets\n\n*  On write: Single Writer Multiple Reader (``SWMR``) mode is set to ensure that\n   improper closing (not calling ``.close()``) method does not corrupt any data\n   which had been previously flushed to the file.\n\n*  On read: SWMR is set to allow multiple readers (in different threads /\n   processes) to read from the same file. File handle serialization is handled\n   via custom python ``pickle`` serialization/reduction logic which is\n   implemented by the high level ``pickle`` reduction ``__set_state__()``,\n   ``__get_state__()`` class methods.\n\n*  An optimization is performed in order to increase the read / write\n   performance of fixed size datasets. Due to the way that we initialize an\n   entire HDF5 file with all datasets pre-created (to the size of the fixed\n   subarray shape), and the fact we absolutely know the size / shape /\n   access-pattern of the arrays, inefficient IO due to wasted chunk processing\n   is not a concern. It is far more efficient for us to completely blow off the\n   metadata chunk cache, and chunk each subarray as a single large item item.\n\n   This method of processing tends to have a number of significant effects as\n   compared to chunked storage methods:\n\n      1. **Compression rations improve** (by a non-trivial factor). This is\n         simply due to the fact that a larger amount of raw data is being passed\n         into the compressor at a time. While the exact improvement seen is\n         highly dependent on both the data size and compressor used, there\n         should be no case where compressing the full tensor uses more disk\n         space then chunking the tensor, compressing each chunk individually,\n         and then saving each chunk to disk.\n\n      2. **Read performance improves** (so long as a suitable compressor /\n         option set was chosen). Instead of issuing (potentially) many read\n         requests - one for each chunk - to the storage hardware, signifiantly\n         few IOPS are used to retrieve the entire set of compressed raw data\n         from disk. Fewer IOPS means much less time waiting on the hard disk.\n         Moreso, only a single decompression step is needed to reconstruct\n         the numeric array, completly decoupling performance from HDF5's\n         ability to parallelize internal filter pipeline operations.\n\n         Additionally, since the entire requested chunk is retrieved in a\n         single decompression pipeline run, there is no need for the HDF5 core\n         to initialize an intermediate buffer which holds data chunks as each\n         decompression operation completes. Futher, by preinitializing an empty\n         ``numpy.ndarray`` container and using the low level HDF5\n         ``read_direct`` method, the decompressed data buffer is passes\n         directly into the returned ``ndarray.__array_interface__.data``\n         field with no intermediate copy or processing steps.\n\n      3. **Shuffle filters are favored.**. With much more data to work with in\n         a single compression operation, the use of \"byte shuffle\" filters in\n         the compressor spec has been seen to both markedly decrease read time\n         and increase compression ratios. Shuffling can significantly reduce\n         disk space required to store some piece of data on disk, further\n         reducing the time spent waiting on hard disk IO while incuring a\n         negligible cost to decompression speed.\n\n   Taking all of these effects into account, there can be up to an order of\n   magnitude increase in read performance as compared to the subarray chunking\n   strategy employed by the ``HDF5_00`` backend.\n\n*  Like all other backends at the time of writing, only 'C' ordered arrays\n   are accepted by this method.\n\"\"\"\nimport logging\nimport math\nimport os\nfrom collections import ChainMap\nfrom contextlib import suppress\nfrom functools import partial\nfrom pathlib import Path\nfrom typing import MutableMapping, Tuple, Optional, Union, Callable\n\nimport h5py\nimport numpy as np\n\ntry:\n    # hdf5plugin warns if a filter is already loaded.\n    _logger = logging.getLogger('hdf5plugin')\n    _initialLevel = _logger.getEffectiveLevel()\n    _logger.setLevel(logging.ERROR)\n    import hdf5plugin\n    if 'blosc' not in hdf5plugin.FILTERS:\n        raise ImportError(f'BLOSC unavailable via hdf5plugin: {hdf5plugin.FILTERS}')\nfinally:\n    _logger.setLevel(_initialLevel)\nfrom xxhash import xxh64_hexdigest\n\n\nfrom .chunk import calc_chunkshape\nfrom .specs import HDF5_01_DataHashSpec\nfrom .. import __version__\nfrom ..optimized_utils import SizedDict\nfrom ..constants import DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE, DIR_DATA\nfrom ..op_state import writer_checkout_only, reader_checkout_only\nfrom ..utils import random_string, set_blosc_nthreads\nfrom ..optimized_utils import find_next_prime\nfrom ..typesystem import Descriptor, OneOf, DictItems, SizedIntegerTuple, checkedmeta\n\nset_blosc_nthreads()\n\n# ----------------------------- Configuration ---------------------------------\n\n_FmtCode = '01'\n\n# contents of a single hdf5 file\nCOLLECTION_SIZE = 100\nCOLLECTION_COUNT = 100\nCHUNK_MAX_RDCC_NBYTES = 250_000_000\nCHUNK_RDCC_W0 = 0.75\n\n# -------------------------------- Parser Implementation ----------------------\n\n\ndef hdf5_01_encode(uid: str, cksum: str, dset: int, dset_idx: int,\n                   shape: Tuple[int]) -> bytes:\n    \"\"\"converts the hdf5 data has spec to an appropriate db value\n\n    Parameters\n    ----------\n    uid : str\n        the file name prefix which the data is written to.\n    cksum : int\n        xxhash_64.hex_digest checksum of the data bytes in numpy array form.\n    dset : int\n        collection (ie. hdf5 dataset) name to find this data piece.\n    dset_idx : int\n        collection first axis index in which this data piece resides.\n    shape : Tuple[int]\n        shape of the data sample written to the collection idx. ie:\n        what subslices of the hdf5 dataset should be read to retrieve\n        the sample as recorded.\n\n    Returns\n    -------\n    bytes\n        hash data db value recording all input specifications.\n    \"\"\"\n    shape_str = \" \".join([str(i) for i in shape])\n    return f'01:{uid}:{cksum}:{dset}:{dset_idx}:{shape_str}'.encode()\n\n\n# ------------------------- Accessor Object -----------------------------------\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': True, 'shuffle': True},\n    expected_values={\n        'complib': ['blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd'],\n        'complevel': [i for i in range(10)],\n        'shuffle': [None, 'none', 'byte', 'bit']})\nclass BloscCompressionOptions(Descriptor):\n    pass\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': True, 'shuffle': True},\n    expected_values={\n        'complib': ['gzip'], 'complevel': [i for i in range(10)], 'shuffle': [True, False]})\nclass GzipCompressionOptions(Descriptor):\n    pass\n\n\n@DictItems(\n    expected_keys_required={'complib': True, 'complevel': False, 'shuffle': True},\n    expected_values={\n        'complib': ['lzf'], 'complevel': ['none', None], 'shuffle': [True, False]})\nclass LzfCompressionOptions(Descriptor):\n    pass\n\n\n@OneOf(list(map(lambda x: np.dtype(x).name, [\n        np.bool, np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16,\n        np.int32, np.int64, np.float16, np.float32, np.float64, np.longdouble])))\nclass AllowedDtypes(Descriptor):\n    # Note. np.longdouble since np.float128 not guaranteed to be available on\n    # all system. this is a particular issue with some windows numpy builds.\n    pass\n\n\nclass HDF5_01_Options(metaclass=checkedmeta):\n    _shape = SizedIntegerTuple(size=32)\n    _dtype = AllowedDtypes()\n    _lzf = LzfCompressionOptions()\n    _gzip = GzipCompressionOptions()\n    _blosc = BloscCompressionOptions()\n    _avail_filters = ('_lzf', '_gzip', '_blosc')\n\n    def __init__(self, backend_options, dtype, shape, *args, **kwargs):\n        self._shape = shape\n        self._dtype = dtype\n        self._selected_filter = None\n        if backend_options is None:\n            backend_options = self.default_options\n\n        for filter_attr in self._avail_filters:\n            with suppress((KeyError, ValueError)):\n                setattr(self, filter_attr, backend_options)\n                self._selected_filter = filter_attr\n                break\n        else:  # N.B. for-else loop (ie. \"no-break\")\n            raise ValueError(f'Invalid backend_options {backend_options}')\n        self._verify_data_nbytes_larger_than_clib_min()\n\n    def _verify_data_nbytes_larger_than_clib_min(self):\n        \"\"\"blosc clib should not be used if data buffer size < 16 bytes.\n\n        Raises\n        ------\n        ValueError:\n            if the data size is not valid for the clib\n        \"\"\"\n        if self._selected_filter in ['_blosc', None]:\n            num_items = np.prod(self._shape)\n            itemsize = np.dtype(self._dtype).itemsize\n            nbytes = itemsize * num_items\n            if nbytes <= 16:\n                raise ValueError(f'blosc clib requires data buffer size > 16 bytes')\n\n    @property\n    def default_options(self):\n        if 'blosc' in hdf5plugin.FILTERS:\n            try:\n                self._verify_data_nbytes_larger_than_clib_min()\n                return {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'}\n            except ValueError:\n                pass\n        return {'complib': 'lzf', 'complevel': None, 'shuffle': True}\n\n    @property\n    def backend_options(self):\n        return getattr(self, self._selected_filter)\n\n    @property\n    def init_requires(self):\n        return ('repo_path', 'schema_shape', 'schema_dtype')\n\n\nHDF5_01_MapTypes = MutableMapping[str, Union[h5py.File, Callable[[], h5py.File]]]\n\n\nclass HDF5_01_FileHandles(object):\n    \"\"\"Manage HDF5 file handles.\n\n    When in SWMR-write mode, no more than a single file handle can be in the\n    \"writeable\" state. This is an issue where multiple columns may need to\n    write to the same column schema.\n    \"\"\"\n\n    def __init__(self, repo_path: Path, schema_shape: tuple, schema_dtype: np.dtype):\n        self.path: Path = repo_path\n        self.schema_shape: tuple = schema_shape\n        self.schema_dtype: np.dtype = schema_dtype\n        self._dflt_backend_opts: Optional[dict] = None\n\n        self.rFp: HDF5_01_MapTypes = {}\n        self.wFp: HDF5_01_MapTypes = {}\n        self.Fp: HDF5_01_MapTypes = ChainMap(self.rFp, self.wFp)\n        self.rDatasets = SizedDict(maxsize=100)\n        self.wdset: h5py.Dataset = None\n\n        self.mode: Optional[str] = None\n        self.hIdx: Optional[int] = None\n        self.w_uid: Optional[str] = None\n        self.hMaxSize: Optional[int] = None\n        self.hNextPath: Optional[int] = None\n        self.hColsRemain: Optional[int] = None\n\n        self.STAGEDIR: Path = Path(self.path, DIR_DATA_STAGE, _FmtCode)\n        self.REMOTEDIR: Path = Path(self.path, DIR_DATA_REMOTE, _FmtCode)\n        self.DATADIR: Path = Path(self.path, DIR_DATA, _FmtCode)\n        self.STOREDIR: Path = Path(self.path, DIR_DATA_STORE, _FmtCode)\n        self.DATADIR.mkdir(exist_ok=True)\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, *exc):\n        if self.w_uid in self.wFp:\n            self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n            self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n            self.wFp[self.w_uid].flush()\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        del state['rFp']\n        del state['wFp']\n        del state['Fp']\n        del state['rDatasets']\n        del state['wdset']\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n        self.rDatasets = {}\n        self.wdset = None\n        self.open(mode=self.mode)\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode: str, *, remote_operation: bool = False):\n        \"\"\"Open an hdf5 file handle in the Handler Singleton\n\n        Parameters\n        ----------\n        mode : str\n            one of `r` or `a` for read only / read-write.\n        remote_operation : optional, kwarg only, bool\n            if this hdf5 data is being created from a remote fetch operation, then\n            we don't open any files for reading, and only open files for writing\n            which exist in the remote data dir. (default is false, which means that\n            write operations use the stage data dir and read operations use data store\n            dir)\n        \"\"\"\n        self.mode = mode\n        if self.mode == 'a':\n            process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n            process_dir.mkdir(exist_ok=True)\n            for uidpth in process_dir.iterdir():\n                if uidpth.suffix == '.hdf5':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(\n                        h5py.File, file_pth, 'r', swmr=True, libver='latest')\n\n        if not remote_operation:\n            if not self.STOREDIR.is_dir():\n                return\n            for uidpth in self.STOREDIR.iterdir():\n                if uidpth.suffix == '.hdf5':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(\n                        h5py.File, file_pth, 'r', swmr=True, libver='latest')\n\n    def close(self):\n        \"\"\"Close a file handle after writes have been completed\n\n        behavior changes depending on write-enable or read-only file\n\n        Returns\n        -------\n        bool\n            True if success, otherwise False.\n        \"\"\"\n        if self.mode == 'a':\n            if self.w_uid in self.wFp:\n                self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n                self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n                self.wFp[self.w_uid].flush()\n            for uid in list(self.wFp.keys()):\n                with suppress(AttributeError):\n                    self.wFp[uid].close()\n                del self.wFp[uid]\n            self.wdset = None\n            self.hMaxSize = None\n            self.hNextPath = None\n            self.hIdx = None\n            self.hColsRemain = None\n            self.w_uid = None\n\n        for uid in list(self.rFp.keys()):\n            with suppress(AttributeError):\n                self.rFp[uid].close()\n            del self.rFp[uid]\n        self.rDatasets = {}\n\n    @staticmethod\n    def delete_in_process_data(repo_path: Path, *, remote_operation=False) -> None:\n        \"\"\"Removes some set of files entirely from the stage/remote directory.\n\n        DANGER ZONE. This should essentially only be used to perform hard resets\n        of the repository state.\n\n        Parameters\n        ----------\n        repo_path : Path\n            path to the repository on disk\n        remote_operation : optional, kwarg only, bool\n            If true, modify contents of the remote_dir, if false (default) modify\n            contents of the staging directory.\n        \"\"\"\n        data_dir = Path(repo_path, DIR_DATA, _FmtCode)\n        PDIR = DIR_DATA_STAGE if not remote_operation else DIR_DATA_REMOTE\n        process_dir = Path(repo_path, PDIR, _FmtCode)\n        if not process_dir.is_dir():\n            return\n\n        for uidpth in process_dir.iterdir():\n            if uidpth.suffix == '.hdf5':\n                os.remove(process_dir.joinpath(uidpth.name))\n                os.remove(data_dir.joinpath(uidpth.name))\n        os.rmdir(process_dir)\n\n    @staticmethod\n    def _dataset_opts(complib: str, complevel: int, shuffle: Union[bool, str]) -> dict:\n        \"\"\"specify compression options for the hdf5 dataset.\n\n        .. seealso:: :function:`_blosc_opts`\n\n        to enable blosc compression, use the conda-forge `blosc-hdf5-plugin` package.\n\n        .. seealso::\n\n        * https://github.com/conda-forge/staged-recipes/pull/7650\n        * https://github.com/h5py/h5py/issues/611\n\n        Parameters\n        ----------\n        complib : str\n            the compression lib to use, one of ['lzf', 'gzip', 'blosc:blosclz',\n            'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd']\n        complevel : int\n            compression level to specify (accepts values [0, 9] for all except 'lzf'\n            where no complevel is accepted)\n        shuffle : bool\n            if True or `byte`, enable byte shuffle filter, if blosc\n            compression, pass through 'bits' is accepted as well. False, or\n            None indicates no shuffle should be applied.\n        \"\"\"\n        # ---- blosc hdf5 plugin filters ----\n        _blosc_compression = {\n            'blosc:blosclz': 0,\n            'blosc:lz4': 1,\n            'blosc:lz4hc': 2,\n            # Not built 'snappy': 3,\n            'blosc:zlib': 4,\n            'blosc:zstd': 5}\n        _blosc_shuffle = {None: 0, 'none': 0, 'byte': 1, 'bit': 2}\n        _blosc_complevel = {**{i: i for i in range(10)}, None: 9, 'none': 9}\n\n        # ---- h5py built in filters ----\n        _lzf_gzip_shuffle = {None: False, False: False, 'none': False, True: True, 'byte': True}\n        _lzf_complevel = {False: None, None: None, 'none': None}\n        _gzip_complevel = {**{i: i for i in range(10)}, None: 4, 'none': 4}\n\n        if complib.startswith('blosc'):\n            args = {\n                'compression': 32001,\n                'compression_opts': (\n                    0, 0, 0, 0,\n                    _blosc_complevel[complevel],\n                    _blosc_shuffle[shuffle],\n                    _blosc_compression[complib]),\n                'shuffle': False}\n        elif complib == 'lzf':\n            args = {\n                'shuffle': _lzf_gzip_shuffle[shuffle],\n                'compression': complib,\n                'compression_opts': _lzf_complevel[complevel]}\n        elif complib == 'gzip':\n            args = {\n                'shuffle': _lzf_gzip_shuffle[shuffle],\n                'compression': complib,\n                'compression_opts': _gzip_complevel[complevel]}\n        elif complib in (None, False, 'none'):\n            args = {\n                'shuffle': False,\n                'compression': None,\n                'compression_opts': None}\n        else:\n            raise ValueError(f'unknown value for opt arg `complib`: {complib}')\n        return args\n\n    def _create_schema(self, *, remote_operation: bool = False):\n        \"\"\"stores the shape and dtype as the schema of a column.\n\n        Parameters\n        ----------\n        remote_operation : optional, kwarg only, bool\n            if this schema is being created from a remote fetch operation, then do not\n            place the file symlink in the staging directory. Instead symlink it\n            to a special remote staging directory. (default is False, which places the\n            symlink in the stage data directory.)\n\n        Notes\n        -----\n\n        Parameters set for raw-data-chunk-cache (rdcc) values:\n\n        * rdcc_nbytes: sets the total size (measured in bytes) of the raw data chunk\n          cache for each dataset. This should be set to the size of each chunk times\n          the number of chunks that are likely to be needed in cache.\n        * rdcc_w0: sets the policy for chunks to be removed from the cache when more\n          space is needed. If set to 0, always evict the least recently used chunk in\n          cache. If set to 1, always evict the least recently used chunk which has\n          been fully read or written. If the value is between 0 and 1, the behavior\n          will be a blend of the two.\n        * rdcc_nslots: The number of chunk slots in the cache for this entire file.\n          In order for quick lookup, a hash map is used for each chunk value. For\n          maximum performance, this value should be set approximately 100 times that\n          number of chunks.\n\n        .. seealso::\n\n            http://docs.h5py.org/en/stable/high/file.html#chunk-cache\n\n        \"\"\"\n\n        # -------------------- Chunk & RDCC Vals ------------------------------\n        schema_shape = self.schema_shape\n        itemsize = np.dtype(self.schema_dtype).itemsize\n        expectedrows = COLLECTION_SIZE * COLLECTION_COUNT\n        maindim = 0\n\n        chunk_shape = calc_chunkshape(schema_shape, expectedrows, itemsize, maindim)\n        if chunk_shape == (1,) and schema_shape == ():\n            schema_shape = (1,)\n        req_chunks_per_dim = [math.ceil(i / j) for i, j in zip(schema_shape, chunk_shape)]\n        req_shape = [i * j for i, j in zip(req_chunks_per_dim, chunk_shape)]\n        chunk_nbytes = np.prod(chunk_shape) * itemsize\n        nchunks = np.prod(req_chunks_per_dim)\n\n        rdcc_nbytes_val = chunk_nbytes * nchunks * COLLECTION_SIZE\n        if rdcc_nbytes_val >= CHUNK_MAX_RDCC_NBYTES:\n            rdcc_nbytes_val = CHUNK_MAX_RDCC_NBYTES\n        rdcc_nslots_guess = nchunks * expectedrows * 100\n        rdcc_nslots_prime_val = find_next_prime(rdcc_nslots_guess)\n\n        # ---------------------------- File Creation --------------------------\n\n        uid = random_string()\n        file_path = Path(self.DATADIR, f'{uid}.hdf5')\n        self.wFp[uid] = h5py.File(file_path,\n                                  mode='w',\n                                  libver='latest',\n                                  rdcc_nbytes=rdcc_nbytes_val,\n                                  rdcc_w0=CHUNK_RDCC_W0,\n                                  rdcc_nslots=rdcc_nslots_prime_val)\n        self.w_uid = uid\n        self.wdset = None\n        self.hNextPath = 0\n        self.hIdx = 0\n        self.hColsRemain = COLLECTION_COUNT\n        self.hMaxSize = COLLECTION_SIZE\n\n        process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n        Path(process_dir, f'{uid}.hdf5').touch()\n\n        # ----------------------- Dataset Creation ----------------------------\n\n        optKwargs = self._dataset_opts(**self._dflt_backend_opts)\n        for dset_num in range(COLLECTION_COUNT):\n            self.wFp[uid].create_dataset(\n                f'/{dset_num}',\n                shape=(COLLECTION_SIZE, *req_shape),\n                dtype=self.schema_dtype,\n                chunks=(1, *chunk_shape),\n                **optKwargs)\n\n        # ---------------------- Attribute Config Vals ------------------------\n\n        self.wFp[self.w_uid]['/'].attrs['HANGAR_VERSION'] = __version__\n        self.wFp[self.w_uid]['/'].attrs['schema_shape'] = self.schema_shape\n        self.wFp[self.w_uid]['/'].attrs['schema_dtype_num'] = np.dtype(self.schema_dtype).num\n        self.wFp[self.w_uid]['/'].attrs['next_location'] = (0, 0)\n        self.wFp[self.w_uid]['/'].attrs['collection_max_size'] = COLLECTION_SIZE\n        self.wFp[self.w_uid]['/'].attrs['collection_total'] = COLLECTION_COUNT\n        self.wFp[self.w_uid]['/'].attrs['collections_remaining'] = COLLECTION_COUNT\n        self.wFp[self.w_uid]['/'].attrs['rdcc_nbytes'] = rdcc_nbytes_val\n        self.wFp[self.w_uid]['/'].attrs['rdcc_w0'] = CHUNK_RDCC_W0\n        self.wFp[self.w_uid]['/'].attrs['rdcc_nslots'] = rdcc_nslots_prime_val\n        self.wFp[self.w_uid]['/'].attrs['chunk_shape'] = chunk_shape\n        if optKwargs['compression_opts'] is not None:\n            self.wFp[self.w_uid]['/'].attrs['compression_opts'] = optKwargs['compression_opts']\n        else:\n            self.wFp[self.w_uid]['/'].attrs['compression_opts'] = False\n\n        self.wFp[self.w_uid].flush()\n        try:\n            self.wFp[self.w_uid].swmr_mode = True\n        except ValueError:\n            assert self.wFp[self.w_uid].swmr_mode is True\n        self.wdset = self.wFp[self.w_uid][f'/{self.hNextPath}']\n\n    def read_data(self, hashVal: HDF5_01_DataHashSpec) -> np.ndarray:\n        \"\"\"Read data from an hdf5 file handle at the specified locations\n\n        Parameters\n        ----------\n        hashVal : HDF5_01_DataHashSpec\n            record specification parsed from its serialized store val in lmdb.\n\n        Returns\n        -------\n        np.array\n            requested data\n        \"\"\"\n        dsetCol = f'/{hashVal.dataset}'\n        srcSlc = (hashVal.dataset_idx, *[slice(0, dim) for dim in hashVal.shape])\n        rdictkey = f'{hashVal.uid}{dsetCol}'\n\n        if self.schema_dtype:  # if is not None\n            destArr = np.empty(hashVal.shape, self.schema_dtype)\n            if rdictkey in self.rDatasets:\n                self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n            else:\n                try:\n                    self.Fp[hashVal.uid][dsetCol].read_direct(destArr, srcSlc, None)\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except TypeError:\n                    self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                    self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n                except KeyError:\n                    process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n                    if Path(process_dir, f'{hashVal.uid}.hdf5').is_file():\n                        file_pth = self.DATADIR.joinpath(f'{hashVal.uid}.hdf5')\n                        self.rFp[hashVal.uid] = h5py.File(file_pth, 'r', swmr=True, libver='latest')\n                        self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                        self.rDatasets[rdictkey].read_direct(destArr, srcSlc, None)\n                    else:\n                        raise\n        else:\n            if rdictkey in self.rDatasets:\n                destArr = self.rDatasets[rdictkey][srcSlc]\n            else:\n                try:\n                    destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except TypeError:\n                    self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n                    destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                    self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                except KeyError:\n                    process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n                    if Path(process_dir, f'{hashVal.uid}.hdf5').is_file():\n                        file_pth = self.DATADIR.joinpath(f'{hashVal.uid}.hdf5')\n                        self.rFp[hashVal.uid] = h5py.File(file_pth, 'r', swmr=True, libver='latest')\n                        destArr = self.Fp[hashVal.uid][dsetCol][srcSlc]\n                        self.rDatasets[rdictkey] = self.Fp[hashVal.uid][dsetCol]\n                    else:\n                        raise\n\n        if xxh64_hexdigest(destArr) != hashVal.checksum:\n            # try casting to check if dtype does not match for all zeros case\n            destArr = destArr.astype(np.typeDict[self.Fp[hashVal.uid]['/'].attrs['schema_dtype_num']])\n            if xxh64_hexdigest(destArr) != hashVal.checksum:\n                raise RuntimeError(\n                    f'DATA CORRUPTION Checksum {xxh64_hexdigest(destArr)} != recorded {hashVal}')\n        return destArr\n\n    def write_data(self, array: np.ndarray, *, remote_operation: bool = False) -> bytes:\n        \"\"\"verifies correctness of array data and performs write operation.\n\n        Parameters\n        ----------\n        array : np.ndarray\n            tensor to write to group.\n        remote_operation : optional, kwarg only, bool\n            If this is a remote process which is adding data, any necessary\n            hdf5 dataset files will be created in the remote data dir instead\n            of the stage directory. (default is False, which is for a regular\n            access process)\n\n        Returns\n        -------\n        bytes\n            string identifying the collection dataset and collection dim-0 index\n            which the array can be accessed at.\n        \"\"\"\n        checksum = xxh64_hexdigest(array)\n        if self.w_uid in self.wFp:\n            self.hIdx += 1\n            if self.hIdx >= self.hMaxSize:\n                self.wdset.flush()\n                self.hIdx = 0\n                self.hNextPath += 1\n                self.hColsRemain -= 1\n                self.wdset = self.wFp[self.w_uid][f'/{self.hNextPath}']\n                if self.hColsRemain <= 1:\n                    self.wFp[self.w_uid]['/'].attrs.modify('next_location', (self.hNextPath, self.hIdx))\n                    self.wFp[self.w_uid]['/'].attrs.modify('collections_remaining', self.hColsRemain)\n                    self.wFp[self.w_uid].flush()\n                    self._create_schema(remote_operation=remote_operation)\n        else:\n            self._create_schema(remote_operation=remote_operation)\n\n        destSlc = (self.hIdx, *[slice(0, dim) for dim in array.shape])\n        self.wdset.write_direct(array, None, destSlc)\n        self.wdset.flush()\n        res = hdf5_01_encode(self.w_uid, checksum, self.hNextPath, self.hIdx, array.shape)\n        return res\n"
  },
  {
    "path": "src/hangar/backends/lmdb_30.py",
    "content": "\"\"\"Local LMDB Backend Implementation, Identifier: ``LMDB_30``\n\nBackend Identifiers\n===================\n\n*  Backend: ``3``\n*  Version: ``0``\n*  Format Code: ``30``\n*  Canonical Name: ``LMDB_30``\n\nStorage Method\n==============\n\n*  This module is meant to handle string typed data which is of any size. IO\n   is performed via the LMDB storage system.\n\n*  This module does not compress values upon writing, the full (uncompressed)\n   value of the text is written to the DB for each key.\n\n*  For each LMDB file generated, data is indexed by keys which are generated\n   in lexicographically sorted order of key length 4. Keys consist of 4 characters\n   chosen from an alphabet consisting of ASCII digits, lowercase letters, and\n   upercase letters. Within a single write instance (when an LMDB file is created\n   and written to), lexicographically sorted permutations of the chosen characters\n   are used as key indexes.\n\n   This means that for each LMDB file written in a repo, the sequence of generated\n   index keys will be identical, even though two databases with the same key will\n   store different values. As such, the File UID is crucial in order to identify\n   a unique db/index key combo to access a particular value by.\n\n*  There is no limit to the size which each record can occupy. Data is stored\n   \"as-is\" and is uncompressed. Reading the data back will return the exact\n   data stored (regardless of how large the data record is).\n\n*  On read and write of all samples the xxhash64_hexdigest is calculated for\n   the raw data bytes. This is to ensure that all data in == data out of the\n   lmdb files. That way even if a file is manually edited we have a quick way\n   to tell that things are not as they should be. (full data hash digests may\n   not be calculated every time a read is performed).\n\nCompression Options\n===================\n\nNone\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  File UID\n*  Row Index\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"0123\"\n    *  xxhash64_hexdigest: 8067007c0f05c359\n\n    ``Record Data => \"30:rlUK3C:0123:8067007c0f05c359\"``\n\n2)  Adding a second piece of data:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"0124\"\n    *  xxhash64_hexdigest: b89f873d3d153a9c\n\n    ``Record Data => \"30:rlUK3C:0124:b89f873d3d153a9c\"``\n\n3)  Adding a the 500th piece of data:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"01AU\"\n    *  xxhash64_hexdigest: cf3fc53cad153a5a\n\n    ``Record Data => \"30:rlUK3C:01AU:cf3fc53cad153a5a\"``\n\"\"\"\nimport os\nimport shutil\nimport string\nfrom collections import ChainMap\nfrom contextlib import suppress\nfrom functools import partial\nfrom itertools import permutations\nfrom pathlib import Path\nfrom typing import Optional\n\nimport lmdb\nfrom xxhash import xxh64_hexdigest\n\nfrom .specs import LMDB_30_DataHashSpec\nfrom ..constants import DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE, DIR_DATA\nfrom ..op_state import reader_checkout_only, writer_checkout_only\nfrom ..utils import random_string\nfrom ..typesystem import Descriptor, OneOf, EmptyDict, checkedmeta\n\n\nLMDB_SETTINGS = {\n    'map_size': 300_000_000,\n    'meminit': False,\n    'subdir': True,\n    'lock': False,\n    'max_spare_txns': 4,\n}\n_FmtCode = '30'\n\n\ndef _lexicographic_keys():\n    lexicographic_ids = ''.join([\n        string.digits,\n        string.ascii_uppercase,\n        string.ascii_lowercase,\n    ])\n    # permutations generates results in lexicographic order\n    # total of 14_776_336 total ids can be generated with\n    # a row_id consisting of 4 characters. This is more keys than\n    # we will ever allow in a single LMDB database\n    p = permutations(lexicographic_ids, 4)\n\n    for perm in p:\n        res = ''.join(perm)\n        yield res\n\n\ndef lmdb_30_encode(uid: str, row_idx: int, checksum: str) -> bytes:\n    res = f'30:{uid}:{row_idx}:{checksum}'\n    return res.encode()\n\n\n@OneOf(['<class\\'str\\'>', str])\nclass AllowedDtypes(Descriptor):\n    pass\n\n\nclass LMDB_30_Options(metaclass=checkedmeta):\n    _dtype = AllowedDtypes()\n    _backend_options = EmptyDict()\n\n    def __init__(self, backend_options, dtype, *args, **kwargs):\n        if backend_options is None:\n            backend_options = self.default_options\n        self._backend_options = backend_options\n        self._dtype = dtype\n\n    @property\n    def default_options(self):\n        return {}\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    @property\n    def init_requires(self):\n        return ('repo_path',)\n\n\nclass LMDB_30_FileHandles:\n\n    def __init__(self, repo_path: Path, *args, **kwargs):\n\n        self.path: Path = repo_path\n\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n\n        self.mode: Optional[str] = None\n        self.w_uid: Optional[str] = None\n        self.row_idx: Optional[str] = None\n        self._dflt_backend_opts: Optional[dict] = None\n\n        self.STAGEDIR: Path = Path(self.path, DIR_DATA_STAGE, _FmtCode)\n        self.REMOTEDIR: Path = Path(self.path, DIR_DATA_REMOTE, _FmtCode)\n        self.STOREDIR: Path = Path(self.path, DIR_DATA_STORE, _FmtCode)\n        self.DATADIR: Path = Path(self.path, DIR_DATA, _FmtCode)\n        self.DATADIR.mkdir(exist_ok=True)\n\n    def __enter__(self):\n\n        return self\n\n    def __exit__(self, *exc):\n        return\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        del state['rFp']\n        del state['wFp']\n        del state['Fp']\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode: str, *, remote_operation: bool = False):\n        \"\"\"Open an lmdb file handle.\n\n        Parameters\n        ----------\n        mode : str\n            one of `r` or `a` for read only / read-write.\n        remote_operation : optional, kwarg only, bool\n            if this lmdb data is being created from a remote fetch operation, then\n            we don't open any files for reading, and only open files for writing\n            which exist in the remote data dir. (default is false, which means that\n            write operations use the stage data dir and read operations use data store\n            dir)\n        \"\"\"\n        self.mode = mode\n        if self.mode == 'a':\n            process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n            process_dir.mkdir(exist_ok=True)\n            for uidpth in process_dir.iterdir():\n                if uidpth.suffix == '.lmdbdir':\n                    file_pth = self.DATADIR.joinpath(uidpth.stem)\n                    self.rFp[uidpth.stem] = partial(lmdb.open, str(file_pth), readonly=True,\n                                                    **LMDB_SETTINGS)\n\n        if not remote_operation:\n            if not self.STOREDIR.is_dir():\n                return\n            for uidpth in self.STOREDIR.iterdir():\n                if uidpth.suffix == '.lmdbdir':\n                    file_pth = self.DATADIR.joinpath(uidpth.stem)\n                    self.rFp[uidpth.stem] = partial(lmdb.open, str(file_pth), readonly=True,\n                                                    **LMDB_SETTINGS)\n\n    def close(self):\n        \"\"\"Close a file handle after writes have been completed\n\n        behavior changes depending on write-enable or read-only file\n\n        Returns\n        -------\n        bool\n            True if success, otherwise False.\n        \"\"\"\n        if self.mode == 'a':\n            for uid in list(self.wFp.keys()):\n                with suppress(AttributeError):\n                    self.wFp[uid].close()\n                del self.wFp[uid]\n            self.w_uid = None\n            self.row_idx = None\n\n        for uid in list(self.rFp.keys()):\n            with suppress(AttributeError):\n                self.rFp[uid].close()\n            del self.rFp[uid]\n\n    @staticmethod\n    def delete_in_process_data(repo_path: Path, *, remote_operation=False) -> None:\n        \"\"\"Removes some set of files entirely from the stage/remote directory.\n\n        DANGER ZONE. This should essentially only be used to perform hard resets\n        of the repository state.\n\n        Parameters\n        ----------\n        repo_path : Path\n            path to the repository on disk\n        remote_operation : optional, kwarg only, bool\n            If true, modify contents of the remote_dir, if false (default) modify\n            contents of the staging directory.\n        \"\"\"\n        data_dir = Path(repo_path, DIR_DATA, _FmtCode)\n        PDIR = DIR_DATA_STAGE if not remote_operation else DIR_DATA_REMOTE\n        process_dir = Path(repo_path, PDIR, _FmtCode)\n        if not process_dir.is_dir():\n            return\n\n        for uidpth in process_dir.iterdir():\n            if uidpth.suffix == '.lmdbdir':\n                os.remove(process_dir.joinpath(uidpth.name))\n                db_dir = data_dir.joinpath(uidpth.stem)\n                shutil.rmtree(str(db_dir))\n        os.rmdir(process_dir)\n\n    def _create_schema(self, *, remote_operation: bool = False):\n\n        uid = random_string()\n        db_dir_path = self.DATADIR.joinpath(f'{uid}')\n        self.wFp[uid] = lmdb.open(str(db_dir_path), **LMDB_SETTINGS)\n\n        self.w_uid = uid\n        self.row_idx = _lexicographic_keys()\n\n        process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n        Path(process_dir, f'{uid}.lmdbdir').touch()\n\n    def read_data(self, hashVal: LMDB_30_DataHashSpec) -> str:\n        \"\"\"Read data from an hdf5 file handle at the specified locations\n\n        Parameters\n        ----------\n        hashVal : LMDB_30_DataHashSpec\n            record specification parsed from its serialized store val in lmdb.\n\n        Returns\n        -------\n        str\n            requested data.\n        \"\"\"\n        try:\n            with self.Fp[hashVal.uid].begin(write=False) as txn:\n                res = txn.get(hashVal.row_idx.encode(), default=False)\n                if res is False:\n                    raise RuntimeError(hashVal)\n        except AttributeError:\n            self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n            return self.read_data(hashVal)\n        except KeyError:\n            process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n            if Path(process_dir, f'{hashVal.uid}.lmdbdir').is_file():\n                file_pth = self.DATADIR.joinpath(hashVal.uid)\n                self.rFp[hashVal.uid] = lmdb.open(str(file_pth), readonly=True, **LMDB_SETTINGS)\n                return self.read_data(hashVal)\n            else:\n                raise\n\n        out = res.decode()\n        if xxh64_hexdigest(res) != hashVal.checksum:\n            raise RuntimeError(\n                f'DATA CORRUPTION Checksum {xxh64_hexdigest(res)} != recorded {hashVal}')\n        return out\n\n    def write_data(self, data: str, *, remote_operation: bool = False) -> bytes:\n        \"\"\"verifies correctness of array data and performs write operation.\n\n        Parameters\n        ----------\n        data: str\n            data to write to group.\n        remote_operation : optional, kwarg only, bool\n            If this is a remote process which is adding data, any necessary\n            hdf5 dataset files will be created in the remote data dir instead\n            of the stage directory. (default is False, which is for a regular\n            access process)\n\n        Returns\n        -------\n        bytes\n            string identifying the collection dataset and collection dim-0 index\n            which the array can be accessed at.\n        \"\"\"\n        encoded_data = data.encode()\n        checksum = xxh64_hexdigest(encoded_data)\n\n        if self.w_uid in self.wFp:\n            try:\n                row_idx = next(self.row_idx)\n            except StopIteration:\n                self._create_schema(remote_operation=remote_operation)\n                return self.write_data(data, remote_operation=remote_operation)\n        else:\n            self._create_schema(remote_operation=remote_operation)\n            return self.write_data(data, remote_operation=remote_operation)\n\n        encoded_row_idx = row_idx.encode()\n        try:\n            with self.wFp[self.w_uid].begin(write=True) as txn:\n                txn.put(encoded_row_idx, encoded_data, append=True)\n        except lmdb.MapFullError:\n            self._create_schema(remote_operation=remote_operation)\n            return self.write_data(data, remote_operation=remote_operation)\n\n        return lmdb_30_encode(self.w_uid, row_idx, checksum)\n"
  },
  {
    "path": "src/hangar/backends/lmdb_31.py",
    "content": "\"\"\"Local LMDB Backend Implementation, Identifier: ``LMDB_30``\n\nBackend Identifiers\n===================\n\n*  Backend: ``3``\n*  Version: ``1``\n*  Format Code: ``31``\n*  Canonical Name: ``LMDB_31``\n\nStorage Method\n==============\n\n*  This module is meant to handle bbytes typed data which is of any size.\n   less than 2MB per value. IO is performed via the LMDB storage system.\n\n*  This module does not compress values upon writing, the full (uncompressed)\n   value of the text is written to the DB for each key.\n\n*  For each LMDB file generated, data is indexed by keys which are generated\n   in lexicographically sorted order of key length 4. Keys consist of 4 characters\n   chosen from an alphabet consisting of ASCII digits, lowercase letters, and\n   upercase letters. Within a single write instance (when an LMDB file is created\n   and written to), lexicographically sorted permutations of the chosen characters\n   are used as key indexes.\n\n   This means that for each LMDB file written in a repo, the sequence of generated\n   index keys will be identical, even though two databases with the same key will\n   store different values. As such, the File UID is crucial in order to identify\n   a unique db/index key combo to access a particular value by.\n\n*  There is no limit to the size which each record can occupy. Data is stored\n   \"as-is\" and is uncompressed. Reading the data back will return the exact\n   data stored (regardless of how large the data record is).\n\n*  On read and write of all samples the xxhash64_hexdigest is calculated for\n   the raw data bytes. This is to ensure that all data in == data out of the\n   lmdb files. That way even if a file is manually edited we have a quick way\n   to tell that things are not as they should be. (full data hash digests may\n   not be calculated every time a read is performed).\n\nCompression Options\n===================\n\nNone\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  File UID\n*  Row Index\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"0123\"\n    *  xxhash64_hexdigest: 8067007c0f05c359\n\n    ``Record Data => \"31:rlUK3C:0123:8067007c0f05c359\"``\n\n2)  Adding a second piece of data:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"0124\"\n    *  xxhash64_hexdigest: b89f873d3d153a9c\n\n    ``Record Data => \"31:rlUK3C:0124:b89f873d3d153a9c\"``\n\n3)  Adding a the 500th piece of data:\n\n    *  File UID: \"rlUK3C\"\n    *  Row Index: \"01AU\"\n    *  xxhash64_hexdigest: cf3fc53cad153a5a\n\n    ``Record Data => \"31:rlUK3C:01AU:cf3fc53cad153a5a\"``\n\"\"\"\nimport os\nimport shutil\nimport string\nfrom collections import ChainMap\nfrom contextlib import suppress\nfrom functools import partial\nfrom itertools import permutations\nfrom pathlib import Path\nfrom typing import Optional\n\nimport lmdb\nfrom xxhash import xxh64_hexdigest\n\nfrom .specs import LMDB_31_DataHashSpec\nfrom ..constants import DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE, DIR_DATA\nfrom ..op_state import reader_checkout_only, writer_checkout_only\nfrom ..utils import random_string\nfrom ..typesystem import Descriptor, OneOf, EmptyDict, checkedmeta\n\n\nLMDB_SETTINGS = {\n    'map_size': 300_000_000,\n    'meminit': False,\n    'subdir': True,\n    'lock': False,\n    'max_spare_txns': 4,\n}\n_FmtCode = '31'\n\n\ndef _lexicographic_keys():\n    lexicographic_ids = ''.join([\n        string.digits,\n        string.ascii_uppercase,\n        string.ascii_lowercase,\n    ])\n    # permutations generates results in lexicographic order\n    # total of 14_776_336 total ids can be generated with\n    # a row_id consisting of 4 characters. This is more keys than\n    # we will ever allow in a single LMDB database\n    p = permutations(lexicographic_ids, 4)\n\n    for perm in p:\n        res = ''.join(perm)\n        yield res\n\n\ndef lmdb_31_encode(uid: str, row_idx: int, checksum: str) -> bytes:\n    res = f'31:{uid}:{row_idx}:{checksum}'\n    return res.encode()\n\n\n@OneOf(['<class\\'bytes\\'>', bytes])\nclass AllowedDtypes(Descriptor):\n    pass\n\n\nclass LMDB_31_Options(metaclass=checkedmeta):\n    _dtype = AllowedDtypes()\n    _backend_options = EmptyDict()\n\n    def __init__(self, backend_options, dtype, *args, **kwargs):\n        if backend_options is None:\n            backend_options = self.default_options\n        self._backend_options = backend_options\n        self._dtype = dtype\n\n    @property\n    def default_options(self):\n        return {}\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    @property\n    def init_requires(self):\n        return ('repo_path',)\n\n\nclass LMDB_31_FileHandles:\n\n    def __init__(self, repo_path: Path, *args, **kwargs):\n\n        self.path: Path = repo_path\n\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n\n        self.mode: Optional[str] = None\n        self.w_uid: Optional[str] = None\n        self.row_idx: Optional[str] = None\n        self._dflt_backend_opts: Optional[dict] = None\n\n        self.STAGEDIR: Path = Path(self.path, DIR_DATA_STAGE, _FmtCode)\n        self.REMOTEDIR: Path = Path(self.path, DIR_DATA_REMOTE, _FmtCode)\n        self.STOREDIR: Path = Path(self.path, DIR_DATA_STORE, _FmtCode)\n        self.DATADIR: Path = Path(self.path, DIR_DATA, _FmtCode)\n        self.DATADIR.mkdir(exist_ok=True)\n\n    def __enter__(self):\n\n        return self\n\n    def __exit__(self, *exc):\n        return\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        del state['rFp']\n        del state['wFp']\n        del state['Fp']\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode: str, *, remote_operation: bool = False):\n        \"\"\"Open an lmdb file handle.\n\n        Parameters\n        ----------\n        mode : str\n            one of `r` or `a` for read only / read-write.\n        remote_operation : optional, kwarg only, bool\n            if this lmdb data is being created from a remote fetch operation, then\n            we don't open any files for reading, and only open files for writing\n            which exist in the remote data dir. (default is false, which means that\n            write operations use the stage data dir and read operations use data store\n            dir)\n        \"\"\"\n        self.mode = mode\n        if self.mode == 'a':\n            process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n            process_dir.mkdir(exist_ok=True)\n            for uidpth in process_dir.iterdir():\n                if uidpth.suffix == '.lmdbdir':\n                    file_pth = self.DATADIR.joinpath(uidpth.stem)\n                    self.rFp[uidpth.stem] = partial(\n                        lmdb.open, str(file_pth), readonly=True, **LMDB_SETTINGS)\n\n        if not remote_operation:\n            if not self.STOREDIR.is_dir():\n                return\n            for uidpth in self.STOREDIR.iterdir():\n                if uidpth.suffix == '.lmdbdir':\n                    file_pth = self.DATADIR.joinpath(uidpth.stem)\n                    self.rFp[uidpth.stem] = partial(\n                        lmdb.open, str(file_pth), readonly=True, **LMDB_SETTINGS)\n\n    def close(self):\n        \"\"\"Close a file handle after writes have been completed\n\n        behavior changes depending on write-enable or read-only file\n\n        Returns\n        -------\n        bool\n            True if success, otherwise False.\n        \"\"\"\n        if self.mode == 'a':\n            for uid in list(self.wFp.keys()):\n                with suppress(AttributeError):\n                    self.wFp[uid].close()\n                del self.wFp[uid]\n            self.w_uid = None\n            self.row_idx = None\n\n        for uid in list(self.rFp.keys()):\n            with suppress(AttributeError):\n                self.rFp[uid].close()\n            del self.rFp[uid]\n\n    @staticmethod\n    def delete_in_process_data(repo_path: Path, *, remote_operation=False) -> None:\n        \"\"\"Removes some set of files entirely from the stage/remote directory.\n\n        DANGER ZONE. This should essentially only be used to perform hard resets\n        of the repository state.\n\n        Parameters\n        ----------\n        repo_path : Path\n            path to the repository on disk\n        remote_operation : optional, kwarg only, bool\n            If true, modify contents of the remote_dir, if false (default) modify\n            contents of the staging directory.\n        \"\"\"\n        data_dir = Path(repo_path, DIR_DATA, _FmtCode)\n        PDIR = DIR_DATA_STAGE if not remote_operation else DIR_DATA_REMOTE\n        process_dir = Path(repo_path, PDIR, _FmtCode)\n        if not process_dir.is_dir():\n            return\n\n        for uidpth in process_dir.iterdir():\n            if uidpth.suffix == '.lmdbdir':\n                os.remove(process_dir.joinpath(uidpth.name))\n                db_dir = data_dir.joinpath(uidpth.stem)\n                shutil.rmtree(str(db_dir))\n        os.rmdir(process_dir)\n\n    def _create_schema(self, *, remote_operation: bool = False):\n\n        uid = random_string()\n        db_dir_path = self.DATADIR.joinpath(f'{uid}')\n        self.wFp[uid] = lmdb.open(str(db_dir_path), **LMDB_SETTINGS)\n\n        self.w_uid = uid\n        self.row_idx = _lexicographic_keys()\n\n        process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n        Path(process_dir, f'{uid}.lmdbdir').touch()\n\n    def read_data(self, hashVal: LMDB_31_DataHashSpec) -> str:\n        \"\"\"Read data from an hdf5 file handle at the specified locations\n\n        Parameters\n        ----------\n        hashVal : LMDB_31_DataHashSpec\n            record specification parsed from its serialized store val in lmdb.\n\n        Returns\n        -------\n        str\n            requested data.\n        \"\"\"\n        try:\n            with self.Fp[hashVal.uid].begin(write=False) as txn:\n                res = txn.get(hashVal.row_idx.encode(), default=False)\n                if res is False:\n                    raise RuntimeError(hashVal)\n        except AttributeError:\n            self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n            return self.read_data(hashVal)\n        except KeyError:\n            process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n            if Path(process_dir, f'{hashVal.uid}.lmdbdir').is_file():\n                file_pth = self.DATADIR.joinpath(hashVal.uid)\n                self.rFp[hashVal.uid] = lmdb.open(str(file_pth), readonly=True, **LMDB_SETTINGS)\n                return self.read_data(hashVal)\n            else:\n                raise\n\n        if xxh64_hexdigest(res) != hashVal.checksum:\n            raise RuntimeError(\n                f'DATA CORRUPTION Checksum {xxh64_hexdigest(res)} != recorded {hashVal}')\n        return res\n\n    def write_data(self, data: bytes, *, remote_operation: bool = False) -> bytes:\n        \"\"\"verifies correctness of array data and performs write operation.\n\n        Parameters\n        ----------\n        data: bytes\n            data to write to group.\n        remote_operation : optional, kwarg only, bool\n            If this is a remote process which is adding data, any necessary\n            hdf5 dataset files will be created in the remote data dir instead\n            of the stage directory. (default is False, which is for a regular\n            access process)\n\n        Returns\n        -------\n        bytes\n            string identifying the collection dataset and collection dim-0 index\n            which the array can be accessed at.\n        \"\"\"\n        checksum = xxh64_hexdigest(data)\n        if self.w_uid in self.wFp:\n            try:\n                row_idx = next(self.row_idx)\n            except StopIteration:\n                self._create_schema(remote_operation=remote_operation)\n                return self.write_data(data, remote_operation=remote_operation)\n        else:\n            self._create_schema(remote_operation=remote_operation)\n            return self.write_data(data, remote_operation=remote_operation)\n\n        encoded_row_idx = row_idx.encode()\n        try:\n            with self.wFp[self.w_uid].begin(write=True) as txn:\n                txn.put(encoded_row_idx, data, append=True)\n        except lmdb.MapFullError:\n            self._create_schema(remote_operation=remote_operation)\n            return self.write_data(data, remote_operation=remote_operation)\n\n        return lmdb_31_encode(self.w_uid, row_idx, checksum)\n"
  },
  {
    "path": "src/hangar/backends/numpy_10.py",
    "content": "\"\"\"Local Numpy memmap Backend Implementation, Identifier: ``NUMPY_10``\n\nBackend Identifiers\n===================\n\n*  Backend: ``1``\n*  Version: ``0``\n*  Format Code: ``10``\n*  Canonical Name: ``NUMPY_10``\n\nStorage Method\n==============\n\n* Data is written to specific subarray indexes inside a numpy memmapped array on disk.\n\n* Each file is a zero-initialized array of\n\n  *  ``dtype: {schema_dtype}``; ie ``np.float32`` or ``np.uint8``\n\n  *  ``shape: (COLLECTION_SIZE, *{schema_shape})``; ie ``(500, 10)`` or ``(500,\n     4, 3)``. The first index in the array is referred to as a \"collection\n     index\".\n\nCompression Options\n===================\n\nDoes not accept any compression options. No compression is applied.\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  File UID\n*  xxhash64_hexdigest\n*  Collection Index (0:COLLECTION_SIZE subarray selection)\n*  Subarray Shape\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  Array shape (Subarray Shape): (10, 10)\n    *  File UID: \"K3ktxv\"\n    *  xxhash64_hexdigest: 94701dd9f32626e2\n    *  Collection Index: 488\n\n    ``Record Data =>  \"10:K3ktxv:94701dd9f32626e2:488:10 10\"``\n\n2)  Adding to a piece of data to a the middle of a file:\n\n    *  Array shape (Subarray Shape): (20, 2, 3)\n    *  File UID: \"Mk23nl\"\n    *  xxhash64_hexdigest: 1363344b6c051b29\n    *  Collection Index: 199\n\n    ``Record Data => \"10:Mk23nl:1363344b6c051b29:199:20 2 3\"``\n\n\nTechnical Notes\n===============\n\n*  A typical numpy memmap file persisted to disk does not retain information\n   about its datatype or shape, and as such must be provided when re-opened\n   after close. In order to persist a memmap in ``.npy`` format, we use the a\n   special function ``open_memmap`` imported from ``np.lib.format`` which can\n   open a memmap file and persist necessary header info to disk in ``.npy``\n   format.\n\n*  On each write, an ``xxhash64_hexdigest`` checksum is calculated. This is not\n   for use as the primary hash algorithm, but rather stored in the local record\n   format itself to serve as a quick way to verify no disk corruption occurred.\n   This is required since numpy has no built in data integrity validation\n   methods when reading from disk.\n\"\"\"\nimport os\nfrom collections import ChainMap\nfrom functools import partial\nfrom pathlib import Path\nfrom typing import MutableMapping, Optional\n\nimport numpy as np\nfrom numpy.lib.format import open_memmap\nfrom xxhash import xxh64_hexdigest\n\nfrom .specs import NUMPY_10_DataHashSpec\nfrom ..constants import DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE, DIR_DATA\nfrom ..op_state import reader_checkout_only, writer_checkout_only\nfrom ..utils import random_string\nfrom ..typesystem import Descriptor, OneOf, EmptyDict, checkedmeta\n\n\n# ----------------------------- Configuration ---------------------------------\n\n_FmtCode = '10'\n\n# number of subarray contents of a single numpy memmap file\nCOLLECTION_SIZE = 1000\n\n# -------------------------------- Parser Implementation ----------------------\n\n\ndef numpy_10_encode(uid: str, cksum: str, collection_idx: int, shape: tuple) -> bytes:\n    \"\"\"converts the numpy data spect to an appropriate db value\n\n    Parameters\n    ----------\n    uid : str\n        file name (schema uid) of the np file to find this data piece in.\n    cksum : int\n        xxhash64_hexdigest checksum of the data as computed on that local machine.\n    collection_idx : int\n        collection first axis index in which this data piece resides.\n    shape : tuple\n        shape of the data sample written to the collection idx. ie: what\n        subslices of the array should be read to retrieve the sample as\n        recorded.\n\n    Returns\n    -------\n    bytes\n        hash data db value recording all input specifications\n    \"\"\"\n    shape_str = \" \".join([str(i) for i in shape])\n    return f'10:{uid}:{cksum}:{collection_idx}:{shape_str}'.encode()\n\n\n# ------------------------- Accessor Object -----------------------------------\n\n\n@OneOf(list(map(lambda x: np.dtype(x).name, [\n        np.bool, np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16,\n        np.int32, np.int64, np.float16, np.float32, np.float64, np.longdouble])))\nclass AllowedDtypes(Descriptor):\n    # Note. np.longdouble since np.float128 not guaranteed to be available on\n    # all system. this is a particular issue with some windows numpy builds.\n    pass\n\n\nclass NUMPY_10_Options(metaclass=checkedmeta):\n    _dtype = AllowedDtypes()\n    _backend_options = EmptyDict()\n\n    def __init__(self, backend_options, dtype, *args, **kwargs):\n        if backend_options is None:\n            backend_options = self.default_options\n\n        self._backend_options = backend_options\n        self._dtype = dtype\n\n    @property\n    def default_options(self):\n        return {}\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    @property\n    def init_requires(self):\n        return ('repo_path', 'schema_shape', 'schema_dtype')\n\n\nclass NUMPY_10_FileHandles(object):\n\n    def __init__(self, repo_path: Path, schema_shape: tuple, schema_dtype: np.dtype):\n        self.repo_path = repo_path\n        self.schema_shape = schema_shape\n        self.schema_dtype = schema_dtype\n        self._dflt_backend_opts: Optional[dict] = None\n\n        self.rFp: MutableMapping[str, np.memmap] = {}\n        self.wFp: MutableMapping[str, np.memmap] = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n\n        self.mode: str = None\n        self.w_uid: str = None\n        self.hIdx: int = None\n\n        self.STAGEDIR: Path = Path(self.repo_path, DIR_DATA_STAGE, _FmtCode)\n        self.REMOTEDIR: Path = Path(self.repo_path, DIR_DATA_REMOTE, _FmtCode)\n        self.DATADIR: Path = Path(self.repo_path, DIR_DATA, _FmtCode)\n        self.STOREDIR: Path = Path(self.repo_path, DIR_DATA_STORE, _FmtCode)\n        self.DATADIR.mkdir(exist_ok=True)\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        del state['rFp']\n        del state['wFp']\n        del state['Fp']\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.rFp = {}\n        self.wFp = {}\n        self.Fp = ChainMap(self.rFp, self.wFp)\n        self.open(mode=self.mode)\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, *exc):\n        if self.w_uid in self.wFp:\n            self.wFp[self.w_uid].flush()\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode: str, *, remote_operation: bool = False):\n        \"\"\"open numpy file handle coded directories\n\n        Parameters\n        ----------\n        mode : str\n            one of `a` for `write-enabled` mode or `r` for read-only\n        remote_operation : bool, optional, kwarg only\n            True if remote operations call this method. Changes the symlink\n            directories used while writing., by default False\n        \"\"\"\n        self.mode = mode\n        if self.mode == 'a':\n            process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n            process_dir.mkdir(exist_ok=True)\n            for uidpth in process_dir.iterdir():\n                if uidpth.suffix == '.npy':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(open_memmap, file_pth, 'r')\n\n        if not remote_operation:\n            if not self.STOREDIR.is_dir():\n                return\n            for uidpth in self.STOREDIR.iterdir():\n                if uidpth.suffix == '.npy':\n                    file_pth = self.DATADIR.joinpath(uidpth.name)\n                    self.rFp[uidpth.stem] = partial(open_memmap, file_pth, 'r')\n\n    def close(self, *args, **kwargs):\n        \"\"\"Close any open file handles.\n        \"\"\"\n        if self.mode == 'a':\n            if self.w_uid in self.wFp:\n                self.wFp[self.w_uid].flush()\n                self.w_uid = None\n                self.hIdx = None\n            for k in list(self.wFp.keys()):\n                del self.wFp[k]\n\n        for k in list(self.rFp.keys()):\n            del self.rFp[k]\n\n    @staticmethod\n    def delete_in_process_data(repo_path: Path, *, remote_operation: bool = False):\n        \"\"\"Removes some set of files entirely from the stage/remote directory.\n\n        DANGER ZONE. This should essentially only be used to perform hard resets\n        of the repository state.\n\n        Parameters\n        ----------\n        repo_path : Path\n            path to the repository on disk\n        remote_operation : optional, kwarg only, bool\n            If true, modify contents of the remote_dir, if false (default) modify\n            contents of the staging directory.\n        \"\"\"\n        data_dir = Path(repo_path, DIR_DATA, _FmtCode)\n        pdir = DIR_DATA_STAGE if not remote_operation else DIR_DATA_REMOTE\n        process_dir = Path(repo_path, pdir, _FmtCode)\n        if not process_dir.is_dir():\n            return\n\n        for uidpth in process_dir.iterdir():\n            if uidpth.suffix == '.npy':\n                os.remove(process_dir.joinpath(uidpth.name))\n                os.remove(data_dir.joinpath(uidpth.name))\n        os.rmdir(process_dir)\n\n    def _create_schema(self, *, remote_operation: bool = False):\n        \"\"\"stores the shape and dtype as the schema of a column.\n\n        Parameters\n        ----------\n        remote_operation : optional, kwarg only, bool\n            if this schema is being created from a remote fetch operation, then do not\n            place the file symlink in the staging directory. Instead symlink it\n            to a special remote staging directory. (default is False, which places the\n            symlink in the stage data directory.)\n        \"\"\"\n        uid = random_string()\n        file_path = self.DATADIR.joinpath(f'{uid}.npy')\n        m = open_memmap(file_path,\n                        mode='w+',\n                        dtype=self.schema_dtype,\n                        shape=(COLLECTION_SIZE, *self.schema_shape))\n        self.wFp[uid] = m\n        self.w_uid = uid\n        self.hIdx = 0\n\n        process_dir = self.REMOTEDIR if remote_operation else self.STAGEDIR\n        Path(process_dir, f'{uid}.npy').touch()\n\n    def read_data(self, hashVal: NUMPY_10_DataHashSpec) -> np.ndarray:\n        \"\"\"Read data from disk written in the numpy_00 fmtBackend\n\n        Parameters\n        ----------\n        hashVal : NUMPY_10_DataHashSpec\n            record specification stored in the db\n\n        Returns\n        -------\n        np.ndarray\n            tensor data stored at the provided hashVal specification.\n\n        Raises\n        ------\n        RuntimeError\n            If the recorded checksum does not match the received checksum.\n\n        Notes\n        -----\n\n        TO AVOID DATA LOSS / CORRUPTION:\n\n        * On a read operation, we copy memmap subarray tensor data to a new\n          `np.ndarray` instance so as to prevent writes on a raw memmap result\n          slice (a `np.memmap` instance) from propogating to data on disk.\n\n        * This is an issue for reads from a write-enabled checkout where data\n          was just written, since the np flag \"WRITEABLE\" and \"OWNDATA\" will be\n          true, and writes to the returned array would be overwrite that data\n          slice on disk.\n\n        * For read-only checkouts, modifications to the resultant array would\n          perform a \"copy on write\"-like operation which would be propogated to\n          all future reads of the subarray from that process, but which would\n          not be persisted to disk.\n        \"\"\"\n        srcSlc = (hashVal.collection_idx, *[slice(0, x) for x in hashVal.shape])\n        try:\n            res = self.Fp[hashVal.uid][srcSlc]\n        except TypeError:\n            self.Fp[hashVal.uid] = self.Fp[hashVal.uid]()\n            res = self.Fp[hashVal.uid][srcSlc]\n        except KeyError:\n            process_dir = self.STAGEDIR if self.mode == 'a' else self.STOREDIR\n            if Path(process_dir, f'{hashVal.uid}.npy').is_file():\n                file_pth = self.DATADIR.joinpath(f'{hashVal.uid}.npy')\n                self.rFp[hashVal.uid] = open_memmap(file_pth, 'r')\n                res = self.Fp[hashVal.uid][srcSlc]\n            else:\n                raise\n\n        out = np.array(res, dtype=res.dtype, order='C')\n        if xxh64_hexdigest(out) != hashVal.checksum:\n            raise RuntimeError(\n                f'DATA CORRUPTION Checksum {xxh64_hexdigest(out)} != recorded {hashVal}')\n        return out\n\n    def write_data(self, array: np.ndarray, *, remote_operation: bool = False) -> bytes:\n        \"\"\"writes array data to disk in the numpy_00 fmtBackend\n\n        Parameters\n        ----------\n        array : np.ndarray\n            tensor to write to disk\n        remote_operation : bool, optional, kwarg only\n            True if writing in a remote operation, otherwise False. Default is\n            False\n\n        Returns\n        -------\n        bytes\n            db hash record value specifying location information\n        \"\"\"\n        checksum = xxh64_hexdigest(array)\n        if self.w_uid in self.wFp:\n            self.hIdx += 1\n            if self.hIdx >= COLLECTION_SIZE:\n                self.wFp[self.w_uid].flush()\n                self._create_schema(remote_operation=remote_operation)\n        else:\n            self._create_schema(remote_operation=remote_operation)\n\n        destSlc = (self.hIdx, *[slice(0, x) for x in array.shape])\n        self.wFp[self.w_uid][destSlc] = array\n        self.wFp[self.w_uid].flush()\n        return numpy_10_encode(self.w_uid, checksum, self.hIdx, array.shape)\n"
  },
  {
    "path": "src/hangar/backends/remote_50.py",
    "content": "\"\"\"Remote server location unknown backend, Identifier: ``REMOTE_50``\n\nBackend Identifiers\n===================\n\n*  Backend: ``5``\n*  Version: ``0``\n*  Format Code: ``50``\n*  Canonical Name: ``REMOTE_50``\n\nStorage Method\n==============\n\n*  This backend merely acts to record that there is some data sample with some\n   ``hash`` and ``schema_shape`` present in the repository. It does not store the\n   actual data on the local disk, but indicates that if it should be retrieved,\n   you need to ask the remote hangar server for it. Once present on the local\n   disk, the backend locating info will be updated with one of the `local` data\n   backend specifications.\n\nRecord Format\n=============\n\nFields Recorded for Each Array\n------------------------------\n\n*  Format Code\n*  Schema Hash\n\nSeparators used\n---------------\n\n* ``SEP_KEY: \":\"``\n\nExamples\n--------\n\n1)  Adding the first piece of data to a file:\n\n    *  Schema Hash: \"ae43A21a\"\n\n    ``Record Data => '50:ae43A21a'``\n\n1)  Adding to a piece of data to a the middle of a file:\n\n    *  Schema Hash: \"ae43A21a\"\n\n    ``Record Data => '50:ae43A21a'``\n\nTechnical Notes\n===============\n\n*  The schema_hash field is required in order to allow effective placement of\n   actual retrieved data into suitable sized collections on a ``fetch-data()``\n   operation\n\"\"\"\nfrom pathlib import Path\nfrom typing import Optional\n\nfrom .specs import REMOTE_50_DataHashSpec\nfrom ..op_state import writer_checkout_only, reader_checkout_only\nfrom ..typesystem import EmptyDict, checkedmeta\n\n\n# -------------------------------- Parser Implementation ----------------------\n\n_FmtCode = '50'\n\n\ndef remote_50_encode(schema_hash: str = '') -> bytes:\n    \"\"\"returns an db value saying that this hash exists somewhere on a remote\n\n    Returns\n    -------\n    bytes\n        hash data db value\n    \"\"\"\n    return f'50:{schema_hash}'.encode()\n\n\n# ------------------------- Accessor Object -----------------------------------\n\n\nclass REMOTE_50_Options(metaclass=checkedmeta):\n    _backend_options = EmptyDict()\n\n    def __init__(self, backend_options, *args, **kwargs):\n        if backend_options is None:\n            backend_options = self.default_options\n        self._backend_options = backend_options\n\n    @property\n    def default_options(self):\n        return {}\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    @property\n    def init_requires(self):\n        return ('repo_path',)\n\n\nclass REMOTE_50_Handler(object):\n\n    def __init__(self, repo_path: Path, *args, **kwargs):\n        self.repo_path = repo_path\n        self._dflt_backend_opts: Optional[dict] = None\n        self._mode: Optional[str] = None\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, *exc):\n        return\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.close()\n        state = self.__dict__.copy()\n        return state\n\n    def __setstate__(self, state: dict) -> None:  # pragma: no cover\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        self.__dict__.update(state)\n        self.open(mode=self._mode)\n\n    @property\n    def backend_opts(self):\n        return self._dflt_backend_opts\n\n    @writer_checkout_only\n    def _backend_opts_set(self, val):\n        \"\"\"Nonstandard descriptor method. See notes in ``backend_opts.setter``.\n        \"\"\"\n        self._dflt_backend_opts = val\n        return\n\n    @backend_opts.setter\n    def backend_opts(self, value):\n        \"\"\"\n        Using seperate setter method (with ``@writer_checkout_only`` decorator\n        applied) due to bug in python <3.8.\n\n        From: https://bugs.python.org/issue19072\n            > The classmethod decorator when applied to a function of a class,\n            > does not honour the descriptor binding protocol for whatever it\n            > wraps. This means it will fail when applied around a function which\n            > has a decorator already applied to it and where that decorator\n            > expects that the descriptor binding protocol is executed in order\n            > to properly bind the function to the class.\n        \"\"\"\n        return self._backend_opts_set(value)\n\n    def open(self, mode, *args, **kwargs):\n        self._mode = mode\n        return\n\n    def close(self, *args, **kwargs):\n        return\n\n    @staticmethod\n    def delete_in_process_data(*args, **kwargs) -> None:\n        \"\"\"mockup of clearing staged directory for upstream calls.\n        \"\"\"\n        return\n\n    def read_data(self, hashVal: REMOTE_50_DataHashSpec) -> None:\n        raise FileNotFoundError(\n            f'data hash spec: {REMOTE_50_DataHashSpec} does not exist on this machine. '\n            f'Perform a `data-fetch` operation to retrieve it from the remote server.')\n\n    def write_data(self, schema_hash: str, *args, **kwargs) -> bytes:\n        \"\"\"Provide a formatted byte representation for storage as a remote reference\n\n        Parameters\n        ----------\n        schema_hash : str\n            schema hash which the referenced data sample should be accessed under\n\n        Returns\n        -------\n        bytes\n            formated raw values encoding lookup information\n        \"\"\"\n        return remote_50_encode(schema_hash=schema_hash)\n"
  },
  {
    "path": "src/hangar/backends/specparse.pyx",
    "content": "# decoding methods to convert from byte string -> spec struct.\n\nfrom .specs cimport HDF5_01_DataHashSpec, \\\n    HDF5_00_DataHashSpec, \\\n    NUMPY_10_DataHashSpec, \\\n    LMDB_30_DataHashSpec, \\\n    LMDB_31_DataHashSpec, \\\n    REMOTE_50_DataHashSpec\n\n\ncdef HDF5_01_DataHashSpec HDF5_01_Parser(str inp):\n    cdef str fmt, uid, cksum, dset, dset_idx\n    cdef tuple shape_tup\n    cdef list shape_list = []\n    cdef int dataset_idx_int\n    cdef unsigned char i, c, cc\n    cdef unsigned char n = len(inp)\n    cdef HDF5_01_DataHashSpec res\n\n    c = 0\n    cc = 0\n    for i in range(n):\n        if inp[i] == ':':\n            if cc == 0:\n                fmt = inp[c:i]\n            elif cc == 1:\n                uid = inp[c:i]\n            elif cc == 2:\n                cksum = inp[c:i]\n            elif cc == 3:\n                dset = inp[c:i]\n            elif cc == 4:\n                dset_idx = inp[c:i]\n            c = i + 1\n            cc = cc + 1\n    shape_vs = inp[c:n]\n\n    c = 0\n    n = len(shape_vs)\n    for i in range(n):\n        if shape_vs[i] == ' ':\n            shape_list.append(int(shape_vs[c:i]))\n            c = i + 1\n    if shape_vs[c:n] != '':\n        shape_list.append(int(shape_vs[c:]))\n\n    shape_tup = tuple(shape_list)\n    dataset_idx_int = int(dset_idx)\n    res = HDF5_01_DataHashSpec(fmt, uid, cksum, dset, dataset_idx_int, shape_tup)\n    return res\n\n\ncdef HDF5_00_DataHashSpec HDF5_00_Parser(str inp):\n    cdef str fmt, uid, cksum, dset, dset_idx\n    cdef tuple shape_tup\n    cdef list shape_list = []\n    cdef int dataset_idx_int\n    cdef unsigned char i, c, cc\n    cdef unsigned char n = len(inp)\n    cdef HDF5_00_DataHashSpec res\n\n    c = 0\n    cc = 0\n    for i in range(n):\n        if inp[i] == ':':\n            if cc == 0:\n                fmt = inp[c:i]\n            elif cc == 1:\n                uid = inp[c:i]\n            elif cc == 2:\n                cksum = inp[c:i]\n            elif cc == 3:\n                dset = inp[c:i]\n            elif cc == 4:\n                dset_idx = inp[c:i]\n            c = i + 1\n            cc = cc + 1\n    shape_vs = inp[c:n]\n\n    c = 0\n    n = len(shape_vs)\n    for i in range(n):\n        if shape_vs[i] == ' ':\n            shape_list.append(int(shape_vs[c:i]))\n            c = i + 1\n    if shape_vs[c:n] != '':\n        shape_list.append(int(shape_vs[c:]))\n\n    shape_tup = tuple(shape_list)\n    dataset_idx_int = int(dset_idx)\n    res = HDF5_00_DataHashSpec(fmt, uid, cksum, dset, dataset_idx_int, shape_tup)\n    return res\n\n\ncdef NUMPY_10_DataHashSpec NUMPY_10_Parser(str inp):\n    cdef str fmt, uid, cksum, collection_idx\n    cdef tuple shape_tup\n    cdef list shape_list = []\n    cdef int collection_idx_int\n    cdef unsigned char i, c, cc\n    cdef unsigned char n = len(inp)\n    cdef NUMPY_10_DataHashSpec res\n\n    c = 0\n    cc = 0\n    for i in range(n):\n        if inp[i] == ':':\n            if cc == 0:\n                fmt = inp[c:i]\n            elif cc == 1:\n                uid = inp[c:i]\n            elif cc == 2:\n                cksum = inp[c:i]\n            elif cc == 3:\n                collection_idx = inp[c:i]\n            c = i + 1\n            cc = cc + 1\n    shape_vs = inp[c:n]\n\n    c = 0\n    n = len(shape_vs)\n    for i in range(n):\n        if shape_vs[i] == ' ':\n            shape_list.append(int(shape_vs[c:i]))\n            c = i + 1\n    if shape_vs[c:n] != '':\n        shape_list.append(int(shape_vs[c:]))\n\n    shape_tup = tuple(shape_list)\n    collection_idx_int = int(collection_idx)\n    res = NUMPY_10_DataHashSpec(fmt, uid, cksum, collection_idx_int, shape_tup)\n    return res\n\n\ncdef LMDB_30_DataHashSpec LMDB_30_Parser(str inp):\n    cdef str fmt, uid, row_idx, checksum\n    cdef unsigned char i, c, cc\n    cdef unsigned char n = len(inp)\n    cdef LMDB_30_DataHashSpec res\n\n    c = 0\n    cc = 0\n    for i in range(n):\n        if inp[i] == ':':\n            if cc == 0:\n                fmt = inp[c:i]\n            elif cc == 1:\n                uid = inp[c:i]\n            elif cc == 2:\n                row_idx = inp[c:i]\n            c = i + 1\n            cc = cc + 1\n    checksum = inp[c:n]\n\n    res = LMDB_30_DataHashSpec(fmt, uid, row_idx, checksum)\n    return res\n\n\ncdef LMDB_31_DataHashSpec LMDB_31_Parser(str inp):\n    cdef str fmt, uid, row_idx, checksum\n    cdef unsigned char i, c, cc\n    cdef unsigned char n = len(inp)\n    cdef LMDB_31_DataHashSpec res\n\n    c = 0\n    cc = 0\n    for i in range(n):\n        if inp[i] == ':':\n            if cc == 0:\n                fmt = inp[c:i]\n            elif cc == 1:\n                uid = inp[c:i]\n            elif cc == 2:\n                row_idx = inp[c:i]\n            c = i + 1\n            cc = cc + 1\n    checksum = inp[c:n]\n\n    res = LMDB_31_DataHashSpec(fmt, uid, row_idx, checksum)\n    return res\n\n\ncdef REMOTE_50_DataHashSpec REMOTE_50_Parser(str inp):\n    cdef str fmt, schema_hash\n    cdef unsigned char i, c\n    cdef unsigned char n = len(inp)\n    cdef REMOTE_50_DataHashSpec res\n\n    c = 0\n    for i in range(n):\n        if inp[i] == ':':\n            fmt = inp[c:i]\n            c = i + 1\n    schema_hash = inp[c:]\n    res = REMOTE_50_DataHashSpec(fmt, schema_hash)\n    return res\n\n\ncpdef object backend_decoder(bytes inp):\n    cdef str backend, inp_str\n    inp_str = inp.decode('utf-8')\n    backend = inp_str[:2]\n    if backend == '00':\n        return HDF5_00_Parser(inp_str)\n    elif backend == '01':\n        return HDF5_01_Parser(inp_str)\n    elif backend == '10':\n        return NUMPY_10_Parser(inp_str)\n    elif backend == '30':\n        return LMDB_30_Parser(inp_str)\n    elif backend == '31':\n        return LMDB_31_Parser(inp_str)\n    elif backend == '50':\n        return REMOTE_50_Parser(inp_str)\n    else:\n        raise ValueError(f'unknown backend type for input str {inp_str}')\n"
  },
  {
    "path": "src/hangar/backends/specs.pxd",
    "content": "# header files for spec containers\n\ncdef class HDF5_01_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str uid\n    cdef readonly str checksum\n    cdef readonly str dataset\n    cdef readonly int dataset_idx\n    cdef readonly tuple shape\n\n\ncdef class HDF5_00_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str uid\n    cdef readonly str checksum\n    cdef readonly str dataset\n    cdef readonly int dataset_idx\n    cdef readonly tuple shape\n\n\ncdef class NUMPY_10_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str uid\n    cdef readonly str checksum\n    cdef readonly int collection_idx\n    cdef readonly tuple shape\n\n\ncdef class LMDB_30_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str uid\n    cdef readonly str row_idx\n    cdef readonly  str checksum\n\n\ncdef class LMDB_31_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str uid\n    cdef readonly str row_idx\n    cdef readonly  str checksum\n\n\ncdef class REMOTE_50_DataHashSpec:\n\n    cdef readonly str backend\n    cdef readonly str schema_hash\n"
  },
  {
    "path": "src/hangar/backends/specs.pyx",
    "content": "# memory efficient container classes for data backends specs.\n# Allow for attribute access similar to named tuples.\n\ncdef class HDF5_01_DataHashSpec:\n\n    def __init__(self, str backend, str uid, str checksum, str dataset,\n                 int dataset_idx, tuple shape):\n\n        self.backend = backend\n        self.uid = uid\n        self.checksum = checksum\n        self.dataset = dataset\n        self.dataset_idx = dataset_idx\n        self.shape = shape\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'uid=\"{self.uid}\", '\n                f'checksum=\"{self.checksum}\", '\n                f'dataset=\"{self.dataset}\", '\n                f'dataset_idx={self.dataset_idx}, '\n                f'shape={self.shape})')\n\n    def __iter__(self):\n        for attr in ['backend', 'uid', 'checksum', 'dataset', 'dataset_idx', 'shape']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return True\n\n\ncdef class HDF5_00_DataHashSpec:\n\n    def __init__(self, str backend, str uid, str checksum,\n                 str dataset, int dataset_idx, tuple shape):\n\n        self.backend = backend\n        self.uid = uid\n        self.checksum = checksum\n        self.dataset = dataset\n        self.dataset_idx = dataset_idx\n        self.shape = shape\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'uid=\"{self.uid}\", '\n                f'checksum=\"{self.checksum}\", '\n                f'dataset=\"{self.dataset}\", '\n                f'dataset_idx={self.dataset_idx}, '\n                f'shape={self.shape})')\n\n    def __iter__(self):\n        for attr in ['backend', 'uid', 'checksum', 'dataset', 'dataset_idx', 'shape']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return True\n\n\ncdef class NUMPY_10_DataHashSpec:\n\n    def __init__(self, str backend, str uid, str checksum,\n                 int collection_idx, tuple shape):\n\n        self.backend = backend\n        self.uid = uid\n        self.checksum = checksum\n        self.collection_idx = collection_idx\n        self.shape = shape\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'uid=\"{self.uid}\", '\n                f'checksum=\"{self.checksum}\", '\n                f'collection_idx={self.collection_idx}, '\n                f'shape={self.shape})')\n\n    def __iter__(self):\n        for attr in ['backend', 'uid', 'checksum', 'collection_idx', 'shape']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return True\n\n\ncdef class LMDB_30_DataHashSpec:\n\n    def __init__(self, str backend, str uid, str row_idx, str checksum):\n\n        self.backend = backend\n        self.uid = uid\n        self.row_idx = row_idx\n        self.checksum = checksum\n\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'uid=\"{self.uid}\", '\n                f'row_idx={self.row_idx}, '\n                f'checksum=\"{self.checksum}\")')\n\n    def __iter__(self):\n        for attr in ['backend', 'uid', 'row_idx', 'checksum']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return True\n\n\n\ncdef class LMDB_31_DataHashSpec:\n\n    def __init__(self, str backend, str uid, str row_idx, str checksum):\n\n        self.backend = backend\n        self.uid = uid\n        self.row_idx = row_idx\n        self.checksum = checksum\n\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'uid=\"{self.uid}\", '\n                f'row_idx={self.row_idx}, '\n                f'checksum=\"{self.checksum}\")')\n\n    def __iter__(self):\n        for attr in ['backend', 'uid', 'row_idx', 'checksum']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return True\n\n\ncdef class REMOTE_50_DataHashSpec:\n\n    def __init__(self, str backend, str schema_hash):\n\n        self.backend = backend\n        self.schema_hash = schema_hash\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'backend=\"{self.backend}\", '\n                f'schema_hash=\"{self.schema_hash}\")')\n\n    def __iter__(self):\n        for attr in ['backend', 'schema_hash']:\n            yield getattr(self, attr)\n\n    @property\n    def islocal(self):\n        return False\n"
  },
  {
    "path": "src/hangar/bulk_importer.py",
    "content": "\"\"\"Bulk importer methods to ingest large quantities of data into Hangar.\n\nThe following module is designed to address challenges inherent to writing\nmassive amounts of data to a hangar repository via the standard API. Since\nwrite-enabled checkouts are limited to processing in a single thread, the\ntime required to import hundreds of Gigabytes (or Terabytes) of data into\nHangar (from external sources) can become prohibitivly long. This module\nimplements a multi-processed importer which reduces import time nearly\nlinearly with the number of CPU cores allocated on a machine.\n\nThere are a number of challenges to overcome:\n\n1. How to validate data against a column schema?\n\n    - Does the column exist?\n\n    - Are the key(s) valid?\n\n    - Is the data a valid type/shape/precision valid for the the selected\n      column schema?\n\n2. How to handle duplicated data?\n\n    -  If an identical piece of data is recorded in the repository already,\n       only record the sample reference (do not write the data to disk again).\n\n    - If the bulk import method would write identical pieces of data to the\n      repository multiple times, and the data does not already exist, then that\n      piece of content should only be written to disk once. Only sample\n      references should be saved after that.\n\n3. How to handle transactionality?\n\n    - What happens if some column, sample keys, or data piece is invalid and\n      cannot be written as desired?\n\n    - How to rollback partial changes if the process is inturupted in\n      the middle of a bulk import operation?\n\n4. How to limit memory usage if many processes are trying to load and\n   write large tensors?\n\n\nRough outline of steps:\n\n    1. Validate UDF & Argument Signature\n\n    2. Read, Validate, and Hash UDF results --> Task Recipe\n\n    3. Prune Recipe\n\n    4. Read, Validate, Write Data to Isolated Backend Storage\n\n    5. Record Sample References in Isolated Environment\n\n    6. If all successful, make isolated data known to repository core,\n       otherwise abort to starting state.\n\"\"\"\n__all__ = ('UDF_Return', 'run_bulk_import')\n\nimport concurrent.futures\nimport multiprocessing as mp\nimport multiprocessing.queues as mpq\nimport os\nimport pickle\nimport queue\nimport random\nimport shutil\nimport warnings\nfrom concurrent.futures import ThreadPoolExecutor\nfrom contextlib import closing, contextmanager\nfrom inspect import signature, isgeneratorfunction\nfrom math import ceil\nfrom operator import attrgetter, methodcaller\nfrom pathlib import Path\nfrom tempfile import TemporaryDirectory\nfrom typing import (\n    NamedTuple, Union, Tuple, List, Iterator,\n    Callable, Dict, Optional, TYPE_CHECKING\n)\n\nimport cloudpickle\nimport numpy as np\nfrom tqdm import tqdm\n\nfrom .columns.common import open_file_handles\nfrom .constants import DIR_DATA, DIR_DATA_REMOTE, DIR_DATA_STAGE, DIR_DATA_STORE\nfrom .records import hashs\nfrom .records.column_parsers import (\n    hash_data_raw_key_from_db_key,\n    hash_data_db_key_from_raw_key,\n    flat_data_db_key_from_names,\n    nested_data_db_key_from_names,\n    data_record_db_val_from_digest,\n)\nfrom .txnctx import TxnRegister\nfrom .utils import grouper, is_valid_directory_path, bound\n\nif TYPE_CHECKING:\n    import lmdb\n    from . import Repository\n    from .typesystem.base import ColumnBase\n    from .columns import ModifierTypes\n\n\nUDF_T = Callable[..., Iterator['UDF_Return']]\nKeyType = Union[str, int]\n\n\n# ----------------- User Facing Potions of Bulk Data Loader -------------------\n\n\n# noinspection PyUnresolvedReferences\nclass UDF_Return(NamedTuple):\n    \"\"\"User-Defined Function return container for bulk importer read functions\n\n    Attributes\n    ----------\n    column: str\n        column name to place data into\n    key: Union[KeyType, Tuple[KeyType, KeyType]]\n        key to place flat sample into, or 2-tuple of keys for nested samples\n    data: Union[np.ndarray, str, bytes]\n        piece of data to place in the column with the provided key.\n    \"\"\"\n    column: str\n    key: Union[KeyType, Tuple[KeyType, KeyType]]\n    data: Union[np.ndarray, str, bytes]\n\n    def __eq__(self, other):\n        if not self.__class__.__name__ == other.__class__.__name__:\n            raise NotImplementedError\n\n        if self.column != other.column:\n            return False\n        if self.key != other.key:\n            return False\n\n        if isinstance(self.data, np.ndarray):\n            if not np.array_equal(self.data, other.data):\n                return False\n        elif self.data != other.data:\n            return False\n        return True\n\n\ndef run_bulk_import(\n        repo: 'Repository',\n        branch_name: str,\n        column_names: List[str],\n        udf: UDF_T,\n        udf_kwargs: List[dict],\n        *,\n        ncpus: int = 0,\n        autocommit: bool = True\n):\n    \"\"\"Perform a bulk import operation from a given user-defined function.\n\n    In order to provide for arbitrary input data sources along with ensuring\n    the core promises of hangar hold we require the following from users:\n\n    Define some arbitrary function (ie \"user-defined function\" / \"UDF\") which\n    accepts some arguments and yields data. The UDF must be a generator function,\n    yielding only values which are of :class:`~.UDF_Return` type. The results\n    yielded by the UDF must be deterministic for a given set of  inputs. This\n    includes all values of the :class:`~.UDF_Return` (``columns`` and ``keys``,\n    as well as ``data``).\n\n    A list of input arguments to the UDF must be provided, this is formatted as a\n    sequence  (list / tuple) of keyword-arg dictionaries, each of which must be\n    valid when unpacked and bound to the UDF signature. Additionally, all columns\n    must be  specified up front. If any columns are named a :class:`~.UDF_Return`\n    which were not pre-specified, the entire operation will fail.\n\n    Notes\n    -----\n\n    *  This is an all-or-nothing operation, either all data is successfully\n       read, validated, and written to the storage backends, or none of it\n       is. A single maleformed key or data type/shape will cause the entire\n       import operation to abort.\n\n    *  The input kwargs should be fairly small (of no consequence to load\n       into memory), data out should be large. The results of the UDF\n       will only be stored in memory for a very short period (just the time\n       it takes to be validated against the column schema and compressed /\n       flushed to disk).\n\n    *  Every step of the process is executed as a generator, lazily loading\n       data the entire way. If possible, we recomend writing the UDF such that\n       data is not allocated in memory before it is ready to be yielded.\n\n    *  If it is possible, the task recipe will be pruned and optimized in such\n       a way that iteration over the UDF will be short circuted during the\n       second pass (writing data to the backend). As this can greatly reduce\n       processing time, we recomend trying to yield data pieces which are likely\n       to be unique first from the UDF.\n\n    Warnings\n    --------\n\n    *  Please be aware that these methods should not be executed within a\n       Jupyter Notebook / Jupyter Lab when running the bulk importer at scale.\n       The internal implemenation makes significant use of multiprocess Queues\n       for work distribution and recording. The heavy loads placed on the system\n       have been observed to place strain on Jupyters ZeroMQ implementation,\n       resulting in random failures which may or may not even display a traceback\n       to indicate failure mode.\n\n       A small sample set of data can be used within jupyter to test an\n       implementation without problems, but for full scale operations it is best\n       run in a script with the operations protected by a ``__main__`` block.\n\n    Examples\n    --------\n\n    >>> import os\n    >>> import numpy as np\n    >>> from PIL import Image\n    >>> from hangar.bulk_importer import UDF_Return\n\n    >>> def image_loader(file_path):\n    ...     im = Image.open(file_name)\n    ...     arr = np.array(im.resize(512, 512))\n    ...     im_record = UDF_Return(column='image', key=(category, sample), data=arr)\n    ...     yield im_record\n    ...\n    ...     root, sample_file = os.path.split(file_path)\n    ...     category = os.path.dirname(root)\n    ...     sample_name, _ = os.path.splitext(sample_file)\n    ...     path_record = UDF_Return(column='file_str', key=(category, sample_name), data=file_path)\n    ...     yield path_record\n    ...\n    >>> udf_kwargs = [\n    ...     {'file_path': '/foo/cat/image_001.jpeg'},\n    ...     {'file_path': '/foo/cat/image_002.jpeg'},\n    ...     {'file_path': '/foo/dog/image_001.jpeg'},\n    ...     {'file_path': '/foo/bird/image_011.jpeg'},\n    ...     {'file_path': '/foo/bird/image_003.jpeg'}\n    ... ]\n    >>> repo = Repository('foo/path/to/repo')\n    >>> from hangar.bulk_importer import run_bulk_import\n    >>> run_bulk_import(\n    ...     repo, branch_name='master', column_names=['file_str', 'image'],\n    ...     udf=image_loader, udf_kwargs=udf_kwargs)\n\n    However, the following will not work, since the output is non-deterministic.\n\n    >>> def nondeterminstic(x, y):\n    ...     first = str(x * y)\n    ...     yield UDF_Return(column='valstr', key=f'{x}_{y}', data=first)\n    ...\n    ...     second = str(x * y * random())\n    ...     yield UDF_Return(column='valstr', key=f'{x}_{y}', data=second)\n    ...\n    >>> udf_kwargs = [\n    ...     {'x': 1, 'y': 2},\n    ...     {'x': 1, 'y': 3},\n    ...     {'x': 2, 'y': 4},\n    ... ]\n    >>> run_bulk_import(\n    ...     repo, branch_name='master', column_names=['valstr'],\n    ...     udf=image_loader, udf_kwargs=udf_kwargs)\n    Traceback (most recent call last):\n      File \"<stdin>\", line 1, in <module>\n    TypeError: contents returned in subbsequent calls to UDF with identical\n      kwargs yielded different results. UDFs MUST generate deterministic\n      results for the given inputs. Input kwargs generating this result:\n      {'x': 1, 'y': 2}.\n\n    Not all columns must be returned from every input to the UDF, the number of\n    data pieces yielded can also vary arbitrarily (so long as the results are\n    deterministic for a particular set of inputs)\n\n    >>> def maybe_load(x_arr, y_arr, sample_name, columns=['default']):\n    ...     for column in columns:\n    ...         arr = np.multiply(x_arr, y_arr)\n    ...         yield UDF_Return(column=column, key=sample_name, data=arr)\n    ...     #\n    ...     # do some strange processing which only outputs another column sometimes\n    ...     if len(columns) == 1:\n    ...         other = np.array(x_arr.shape) * np.array(y_arr.shape)\n    ...         yield UDF_Return(column='strange_column', key=sample_name, data=other)\n    ...\n    >>> udf_kwargs = [\n    ...     {'x_arr': np.arange(10), 'y_arr': np.arange(10) + 1, 'sample_name': 'sample_1'},\n    ...     {'x_arr': np.arange(10), 'y_arr': np.arange(10) + 1, 'sample_name': 'sample_2', 'columns': ['foo', 'bar', 'default']},\n    ...     {'x_arr': np.arange(10) * 2, 'y_arr': np.arange(10), 'sample_name': 'sample_3'},\n    ... ]\n    >>> run_bulk_import(\n    ...     repo, branch_name='master',\n    ...     column_names=['default', 'foo', 'bar', 'strange_column'],\n    ...     udf=maybe_load, udf_kwargs=udf_kwargs)\n\n    Parameters\n    ----------\n    repo : 'Repository'\n        Initialized repository object to import data into.\n    branch_name : str\n        Name of the branch to checkout and import data into.\n    column_names : List[str]\n        Names of all columns which data should be saved to.\n    udf : UDF_T\n        User-Defined Function (generator style; yielding an arbitrary number\n        of values when iterated on) which is passed an unpacked kwarg dict as input\n        and yields a single :class:`~.UDF_Return` instance at a time when iterated over.\n        Cannot contain\n    udf_kwargs : List[dict]\n        A sequence of keyword argument dictionaries which are individually unpacked\n        as inputs into the user-defined function (UDF). the keyword argument dictionaries\n    ncpus : int, optional, default=0\n        Number of Parallel processes to read data files & write to hangar backend stores\n        in. If <= 0, then the default is set to ``num_cpus / 2``. The value of this\n        parameter should never exceed the total CPU count of the system. Import time\n        scales mostly linearly with ncpus. Optimal performance is achieved by balancing\n        memory usage of the ``UDF`` function and backend storage writer processes against\n        the total system memory.\n        generally increase linearly up to\n    autocommit : bool, optional, default=True\n        Control whether a commit should be made after successfully importing the\n        specified data to the staging area of the branch.\n    \"\"\"\n    _BATCH_SIZE = 10  # TODO: Is this necessary?\n\n    columns: Dict[str, 'ModifierTypes'] = {}\n    column_layouts: Dict[str, str] = {}\n    schemas: Dict[str, 'ColumnBase'] = {}\n\n    with closing(repo.checkout(write=True, branch=branch_name)) as co:\n        for name in column_names:\n            _col = co.columns[name]\n            _schema = _col._schema\n            columns[name] = _col\n            column_layouts[name] = _col.column_layout\n            schemas[name] = _schema\n\n        print(f'Validating Reader Function and Argument Input')\n        _check_user_input_func(columns=columns, udf=udf, udf_kwargs=udf_kwargs)\n        serialized_udf = _serialize_udf(udf)\n\n        ncpu = _process_num_cpus(ncpus)\n        print(f'Using {ncpu} worker processes')\n\n        recipe = _run_prepare_recipe(\n            column_layouts=column_layouts,\n            schemas=schemas,\n            udf=serialized_udf,\n            udf_kwargs=udf_kwargs,\n            ncpu=ncpu,\n            batch_size=_BATCH_SIZE)\n        print('Unifying naieve recipe task set.')\n        unified_recipe = _unify_recipe_contents(recipe)\n        print('Pruning redundant steps & eliminating tasks on data stored in hangar.')\n        reduced_recipe = _reduce_recipe_on_required_digests(recipe, co._hashenv)\n\n        nsteps_reduced_recipe = _num_steps_in_task_list(reduced_recipe)\n        optim_percent = ((len(unified_recipe) - nsteps_reduced_recipe) / len(unified_recipe)) * 100\n        print(f'Reduced recipe workload tasks by: {optim_percent:.2f}%')\n        print(f' - Num tasks for naieve ingest  : {len(unified_recipe)}')\n        print(f' - Num tasks after optimization : {nsteps_reduced_recipe}')\n\n        hangardirpth = repo._repo_path\n        if len(reduced_recipe) >= 1:\n            print('Starting multiprocessed data importer.')\n            with TemporaryDirectory(dir=str(hangardirpth)) as tmpdirname:\n                tmpdirpth = _mock_hangar_directory_structure(tmpdirname)\n                written_data_steps = _run_write_recipe_data(\n                    tmp_dir=tmpdirpth,\n                    columns=columns,\n                    schemas=schemas,\n                    udf=serialized_udf,\n                    recipe_tasks=reduced_recipe,\n                    ncpu=ncpu,\n                    batch_size=_BATCH_SIZE)\n                print(f'Finalizing written data pieces in hangar repo directory...')\n                _move_tmpdir_data_files_to_repodir(repodir=hangardirpth, tmpdir=tmpdirpth)\n            _write_digest_to_bespec_mapping(\n                executed_steps=written_data_steps,\n                hashenv=co._hashenv,\n                stagehashenv=co._stagehashenv)\n        else:\n            print('No actions requiring the data import remain after optimizations.')\n\n        print(f'Mapping full recipe requested via UDF to optimized task set actually processed.')\n        _write_full_recipe_sample_key_to_digest_mapping(sample_steps=unified_recipe, dataenv=co._stageenv)\n\n        if autocommit:\n            print(f'autocommiting changes.')\n            co.commit(f'Auto commit after bulk import of {len(unified_recipe)} samples to '\n                      f'column {column_names} on branch {branch_name}')\n        else:\n            print(f'skipping autocommit')\n\n        print('Buld data importer operation completed successfuly')\n        return\n\n\n# ---------------- Internal Implementation of Bulk Data Loader ----------------\n\n\nclass _ContentDescriptionPrep(NamedTuple):\n    column: str\n    layout: str\n    key: Union[Tuple[KeyType, KeyType], KeyType]\n    digest: str\n    udf_iter_idx: int\n\n    def db_record_key(self):\n        if self.layout == 'nested':\n            db_key = nested_data_db_key_from_names(self.column, self.key[0], self.key[1])\n        elif self.layout == 'flat':\n            db_key = flat_data_db_key_from_names(self.column, self.key)\n        else:\n            raise ValueError(f'unknown column layout value {self.layout} encountered while formating db record key')\n        return db_key\n\n    def db_record_val(self):\n        return data_record_db_val_from_digest(self.digest)\n\n\nclass _Task(NamedTuple):\n    udf_kwargs: dict\n    udf_iter_indices: Tuple[int, ...]\n    expected_digests: Tuple[str, ...]\n\n    def num_steps(self):\n        return len(self.udf_iter_indices)\n\n\nclass _WrittenContentDescription(NamedTuple):\n    \"\"\"Description of data content piece saved in the multprocess content writter\n\n    Attributes\n    ----------\n    digest: str\n        digest of the data piece written.\n    bespec: bytes\n        backend location spec in db formated bytes representation.\n    \"\"\"\n    digest: str\n    bespec: bytes\n\n\ndef _num_steps_in_task_list(task_list: List[_Task]) -> int:\n    num_steps_method = methodcaller('num_steps')\n    return sum(map(num_steps_method, task_list))\n\n\ndef _serialize_udf(udf: UDF_T) -> bytes:\n    raw = cloudpickle.dumps(udf, protocol=pickle.HIGHEST_PROTOCOL)\n    return raw\n\n\ndef _deserialize_udf(raw: bytes) -> UDF_T:\n    udf = cloudpickle.loads(raw)\n    return udf\n\n\ndef _process_num_cpus(ncpus: int) -> int:\n    \"\"\"Determine how many workerprocesses to spin up in bulk importer\n\n    Parameters\n    ----------\n    ncpus: int\n        User specified number of worker processes. If <= 0 set to num CPU cores / 2.\n\n    Returns\n    -------\n    int\n    \"\"\"\n    node_cpus = os.cpu_count()\n    if ncpus <= 0:\n        cpu_try = ceil(node_cpus / 2)\n        ncpus = bound(1, node_cpus, cpu_try)\n    elif ncpus > node_cpus:\n        warnings.warn(\n            f'Input number of CPUs exceeds maximum on node. {ncpus} > {node_cpus}',\n            category=UserWarning\n        )\n    return ncpus\n\n\ndef _check_user_input_func(\n        columns,\n        udf: UDF_T,\n        udf_kwargs: List[dict],\n        *,\n        prerun_check_percentage: float = 0.02\n):\n    \"\"\"Perform a few sanity tests to ensure kwargs and udf produces valid data.\n\n    Parameters\n    ----------\n    columns\n        initialized columns object dict.\n    udf : UDF_T\n        user provided function which takes some kwargs and generates one data sample.\n    udf_kwargs : List[dict]\n        kwarg dicts to unpack into UDF via `udf(**kwargs)`\n    prerun_check_percentage : float, kwargonly, default=0.02\n        value between (0.0, 1.0) representing what percentage of items in the full\n        work list should be selected (at random) to be processed by udf &\n        verified against the column schema.\n\n        This is meant to serve as a quick sanity check (to test if is success is even\n        possible) before launching the full pipeline with multiple worker processes.\n    \"\"\"\n    if not isgeneratorfunction(udf):\n        raise TypeError(f'UDF {udf} is not a user defined generator function.')\n\n    try:\n        _raw_udf = _serialize_udf(udf)\n        _deserialized = _deserialize_udf(_raw_udf)\n    except (pickle.PicklingError, pickle.UnpicklingError) as e:\n        my_err = RuntimeError(f'Could not pickle/unpickle UDF {udf} using cloudpickle.')\n        raise my_err from e\n\n    sig = signature(udf)\n    for idx, kwargs in enumerate(tqdm(udf_kwargs, desc='Validating argument signature')):\n        try:\n            sig.bind(**kwargs)\n        except TypeError as e:\n            my_err = TypeError(f'Value {kwargs} at index {idx} of `udf_kwargs` is invalid.')\n            raise my_err from e\n\n    num_choices_by_percent = ceil(len(udf_kwargs) * prerun_check_percentage)\n    num_choices = bound(2, 100, num_choices_by_percent)\n    work_samples = random.choices(udf_kwargs, k=num_choices)\n    for kwargs in tqdm(work_samples, desc=f'Performing pre-run sanity check'):\n        first_results = []\n        for first_res in udf(**kwargs):\n            if not first_res.__class__.__name__ == UDF_Return.__name__:\n                raise TypeError(\n                    f'UDF must yield only values of type {UDF_Return}, recieved '\n                    f'{type(first_res)} from input kwargs: {kwargs}')\n            if first_res.column not in columns:\n                raise ValueError(\n                    f'UDF_Return column value {first_res.column} was not specified in bulk '\n                    f'loader input. kwargs triggering this UDF_Return failure: {kwargs}')\n            _col = columns[first_res.column]\n            if _col.column_layout == 'flat':\n                _col._set_arg_validate(first_res.key, first_res.data)\n            else:\n                _col._set_arg_validate(first_res.key[0], {first_res.key[1]: first_res.data})\n            first_results.append(first_res)\n\n        _DeterministicError = ValueError(\n            f'contents returned in subbsequent calls to UDF with identical kwargs'\n            f'yielded different results. UDFs MUST generate deterministic results '\n            f'for the given inputs. Input kwargs generating this result: {kwargs}')\n        second_len = 0\n        for second_idx, second_res in enumerate(udf(**kwargs)):\n            if not second_res == first_results[second_idx]:\n                raise _DeterministicError\n            second_len += 1\n        if second_len != len(first_results):\n            raise _DeterministicError\n\n    return True\n\n\nclass _MPQueue(mpq.Queue):\n    \"\"\"Interuptable Multiprocess Queue class which does not throw errors.\n    \"\"\"\n\n    def __init__(self, *args, **kwargs):\n        ctx = mp.get_context()\n        super().__init__(*args, **kwargs, ctx=ctx)\n\n    def safe_get(self, timeout=0.5):\n        try:\n            if timeout is None:\n                return self.get(False)\n            else:\n                return self.get(True, timeout)\n        except queue.Empty:\n            return None\n\n    def safe_put(self, item, timeout=0.5) -> bool:\n        try:\n            self.put(item, False, timeout)\n            return True\n        except queue.Full:\n            return False\n\n    def drain(self):\n        item = self.safe_get()\n        while item:\n            yield item\n            item = self.safe_get()\n\n    def safe_close(self) -> int:\n        num_left = sum(1 for __ in self.drain())\n        self.close()\n        self.join_thread()\n        return num_left\n\n\nclass _BatchProcessPrepare(mp.Process):\n\n    def __init__(\n            self,\n            udf: bytes,\n            schemas: Dict[str, 'ColumnBase'],\n            column_layouts: Dict[str, str],\n            in_queue: _MPQueue,\n            out_queue: _MPQueue,\n            *args, **kwargs\n    ):\n        \"\"\"Read all data generated by all UDF(**udf_kwargs) input.\n\n        Validates reader function works, yields correct UDF_Return type, keys/columns\n        are compatible names, data schema is suitable for column, and calculates digest\n        of data and index location into UDF iteration.\n\n        Parameters\n        ----------\n        udf\n            user provided function yielding UDF_Return instances when iterated over\n        schemas\n            dict mapping column names -> initialized schema objects. This is required in\n            order to properly calculate the data hash digests.\n        column_layouts\n            dict mapping column names -> column layout string\n        in_queue\n            queue contianing work pieces (kwargs) to process via UDF `mp.Queue[List[dict]]`\n        out_queue\n            queue containing mp.Queue[List[Tuple[dict, List[_ContentDescriptionPrep]]]]\n            mapping kwargs -> content description read in.\n        \"\"\"\n        super().__init__(*args, **kwargs)\n        self.column_layouts = column_layouts\n        self._udf_raw: bytes = udf\n        self.udf: Optional[UDF_T] = None\n        self.in_queue = in_queue\n        self.out_queue = out_queue\n        self.schemas = schemas\n\n    def _setup(self):\n        self.udf = _deserialize_udf(self._udf_raw)\n\n    def _input_tasks(self) -> Iterator[List[dict]]:\n        udf_kwargs = self.in_queue.safe_get(timeout=2.0)\n        while udf_kwargs is not None:\n            yield udf_kwargs\n            udf_kwargs = self.in_queue.safe_get()\n\n    def run(self):\n        self._setup()\n        for udf_kwargs in self._input_tasks():\n            udf_kwargs_res = (\n                (kwargs, self.udf(**kwargs)) for kwargs in udf_kwargs if isinstance(kwargs, dict)\n            )\n            content_digests = []\n            for kwargs, udf_data_generator in udf_kwargs_res:\n                if kwargs is None:\n                    continue\n\n                udf_kwarg_content_digests = []\n                for udf_iter_idx, udf_return in enumerate(udf_data_generator):\n                    _column = udf_return.column\n                    _key = udf_return.key\n                    _data = udf_return.data\n                    _schema = self.schemas[_column]\n                    _layout = self.column_layouts[_column]\n\n                    iscompat = _schema.verify_data_compatible(_data)\n                    if not iscompat.compatible:\n                        raise ValueError(f'data for key {_key} incompatible due to {iscompat.reason}')\n                    digest = _schema.data_hash_digest(_data)\n                    res = _ContentDescriptionPrep(_column, _layout, _key, digest, udf_iter_idx)\n                    udf_kwarg_content_digests.append(res)\n                content_digests.append((kwargs, udf_kwarg_content_digests))\n            self.out_queue.safe_put(content_digests)\n\n\ndef _run_prepare_recipe(\n        column_layouts: Dict[str, str],\n        schemas: Dict[str, 'ColumnBase'],\n        udf: bytes,\n        udf_kwargs: List[dict],\n        *,\n        ncpu: int = 0,\n        batch_size: int = 10\n) -> List[Tuple[dict, List[_ContentDescriptionPrep]]]:\n\n    # Setup & populate queue with batched arguments\n    in_queue = _MPQueue()\n    out_queue = _MPQueue()\n    n_queue_tasks = ceil(len(udf_kwargs) / batch_size)\n    for keys_kwargs in grouper(udf_kwargs, batch_size):\n        in_queue.safe_put(keys_kwargs)\n\n    out, jobs = [], []\n    try:\n        # start worker processes\n        for _ in range(ncpu):\n            t = _BatchProcessPrepare(\n                udf=udf,\n                schemas=schemas,\n                column_layouts=column_layouts,\n                in_queue=in_queue,\n                out_queue=out_queue)\n            jobs.append(t)\n            t.start()\n\n        # collect outputs and fill queue with more work if low\n        # terminate if no more work should be done.\n        with tqdm(total=len(udf_kwargs), desc='Constructing task recipe') as pbar:\n            ngroups_processed = 0\n            while ngroups_processed < n_queue_tasks:\n                data_key_location_hash_digests = out_queue.safe_get(timeout=30)\n                if data_key_location_hash_digests is None:\n                    continue\n                ngroups_processed += 1\n                for saved in data_key_location_hash_digests:\n                    pbar.update(1)\n                    out.append(saved)\n\n        in_queue.safe_close()\n        out_queue.safe_close()\n        for j in jobs:\n            try:\n                j.join(timeout=0.2)\n            except mp.TimeoutError:\n                j.terminate()\n    except (KeyboardInterrupt, InterruptedError):\n        in_queue.safe_close()\n        out_queue.safe_close()\n        while jobs:\n            j = jobs.pop()\n            if j.is_alive():\n                print(f'terminating PID {j.pid}')\n                j.terminate()\n            else:\n                exitcode = j.exitcode\n                if exitcode:\n                    print(f'PID {j.pid} exitcode: {exitcode}')\n        raise\n    return out\n\n\nclass _BatchProcessWriter(mp.Process):\n\n    def __init__(\n            self,\n            udf: bytes,\n            backends: Dict[str, str],\n            schemas: Dict[str, 'ColumnBase'],\n            tmp_pth: Path,\n            in_queue: _MPQueue,\n            out_queue: _MPQueue,\n            *args, **kwargs\n    ):\n        \"\"\"\n\n        Parameters\n        ----------\n        udf\n            user provided function yielding UDF_Return instances when iterated over.\n        backends\n            dict mapping column name -> backend code.\n        schemas\n            dict mapping column names -> initialized schema objects. This is required in\n            order to properly calculate the data hash digests.\n        tmp_pth\n            tempdir path to write data to\n        in_queue\n            grouped task lists `mp.Queue[List[_Task]]`\n        out_queue\n            written content description `mp.Queue[List[_WrittenContentDescription]]`\n        args\n        kwargs\n        \"\"\"\n        super().__init__(*args, **kwargs)\n        self._udf_raw: bytes = udf\n        self.udf: Optional[UDF_T] = None\n        self.backends = backends\n        self.backend_instances = {}\n        self.in_queue = in_queue\n        self.out_queue = out_queue\n        self.schemas = schemas\n        self.tmp_pth = tmp_pth\n\n    def _setup(self):\n        \"\"\"\n        Because backend FileHandle classes have a reader checkout only condition\n        check set on __getstate__, we open individual classes (and file) in the actual\n        processes they will be used in (rather than trying to pickle)\n        \"\"\"\n        self.udf = _deserialize_udf(self._udf_raw)\n\n        for column_name, column_backend in self.backends.items():\n            be_instance_map = open_file_handles(\n                backends=[column_backend],\n                path=self.tmp_pth,\n                mode='a',\n                schema=self.schemas[column_name])\n            be_instance = be_instance_map[column_backend]\n            self.backend_instances[column_name] = be_instance\n\n    def _input_tasks(self) -> Iterator[List[_Task]]:\n        tasks_list = self.in_queue.safe_get(timeout=2)\n        while tasks_list is not None:\n            yield tasks_list\n            tasks_list = self.in_queue.safe_get()\n\n    @contextmanager\n    def _enter_backends(self):\n        try:\n            for be in self.backend_instances.keys():\n                self.backend_instances[be].__enter__()\n            yield\n        finally:\n            for be in self.backend_instances.keys():\n                self.backend_instances[be].__exit__()\n\n    def run(self):\n        self._setup()\n        with self._enter_backends():\n            for tasks_list in self._input_tasks():\n                tasks = (\n                    (task, self.udf(**task.udf_kwargs)) for task in tasks_list if isinstance(task, _Task)\n                )\n                written_digests_locations = []\n                for task, applied_udf in tasks:\n                    relevant_udf_indices = iter(task.udf_iter_indices)\n                    desired_udf_idx = next(relevant_udf_indices)\n                    for gen_idx, res in enumerate(applied_udf):\n                        if gen_idx < desired_udf_idx:\n                            continue\n\n                        column = res.column\n                        data = res.data\n                        digest = self.schemas[column].data_hash_digest(data)\n                        location_spec = self.backend_instances[column].write_data(data)\n                        res = _WrittenContentDescription(digest, location_spec)\n                        written_digests_locations.append(res)\n                        try:\n                            desired_udf_idx = next(relevant_udf_indices)\n                        except StopIteration:\n                            break\n                self.out_queue.safe_put(written_digests_locations)\n\n\ndef _run_write_recipe_data(\n        tmp_dir: Path,\n        columns: Dict[str, 'ModifierTypes'],\n        schemas: Dict[str, 'ColumnBase'],\n        udf: bytes,\n        recipe_tasks: List[_Task],\n        *,\n        ncpu=0,\n        batch_size=10\n) -> List[_WrittenContentDescription]:\n\n    # Setup & populate queue with batched arguments\n    in_queue = _MPQueue()\n    out_queue = _MPQueue()\n    n_queue_tasks = ceil(len(recipe_tasks) / batch_size)\n    for keys_kwargs in grouper(recipe_tasks, batch_size):\n        in_queue.put_nowait(keys_kwargs)\n\n    out, jobs = [], []\n    try:\n        # start worker processes\n        backends = {}\n        for col_name, column in columns.items():\n            backends[col_name] = column.backend\n        for _ in range(ncpu):\n            t = _BatchProcessWriter(\n                udf=udf,\n                backends=backends,\n                schemas=schemas,\n                tmp_pth=tmp_dir,\n                in_queue=in_queue,\n                out_queue=out_queue)\n            jobs.append(t)\n            t.start()\n\n        # collect outputs and fill queue with more work if low\n        # terminate if no more work should be done.\n        nsteps = _num_steps_in_task_list(recipe_tasks)\n        with tqdm(total=nsteps, desc='Executing Data Import Recipe') as pbar:\n            ngroups_processed = 0\n            while ngroups_processed < n_queue_tasks:\n                data_key_location_hash_digests = out_queue.safe_get(timeout=30)\n                if data_key_location_hash_digests is None:\n                    continue\n                ngroups_processed += 1\n                for saved in data_key_location_hash_digests:\n                    pbar.update(1)\n                    out.append(saved)\n        in_queue.safe_close()\n        out_queue.safe_close()\n        for j in jobs:\n            try:\n                j.join(timeout=0.2)\n            except mp.TimeoutError:\n                j.terminate()\n    except (KeyboardInterrupt, InterruptedError):\n        in_queue.safe_close()\n        out_queue.safe_close()\n        while jobs:\n            j = jobs.pop()\n            if j.is_alive():\n                print(f'terminating PID {j.pid}')\n                j.terminate()\n            else:\n                exitcode = j.exitcode\n                if exitcode:\n                    print(f'PID {j.pid} exitcode: {exitcode}')\n        raise\n    return out\n\n\ndef _unify_recipe_contents(recipe: List[Tuple[dict, List[_ContentDescriptionPrep]]]) -> List[_ContentDescriptionPrep]:\n    \"\"\"Flatten and isolate all ContentDescriptionPrep in flat recipe list.\n\n    Parameters\n    ----------\n    recipe: List[Tuple[dict, List[_ContentDescriptionPrep]]]\n\n    Returns\n    -------\n    List[_ContentDescriptionPrep]\n        Flat list where each element records a sample's column name, layout, keys, & digest.\n    \"\"\"\n    unified_content = []\n    for udf_kwargs, udf_contents in recipe:\n        for content in udf_contents:\n            unified_content.append(content)\n    return unified_content\n\n\ndef _reduce_recipe_on_required_digests(recipe: List[Tuple[dict, List[_ContentDescriptionPrep]]], hashenv):\n    \"\"\"Before writing, eliminate duplicate steps which would write identical\n    data and steps which would write data already recorded in the repository.\n\n    Parameters\n    ----------\n    recipe: List[Tuple[dict, List[_ContentDescriptionPrep]]]\n\n    Returns\n    -------\n    List[_Task]:\n        reduced recipe tasks to serve as input for the mp writer.\n\n    Notes\n    -----\n    - Any number of samples may be added which have unique keys/kwargs,\n      but whose udf returns identical data. To avoid writing\n      identical data to disk multiple times, we select just one sample\n      (at random) for each unique digest in the recipe. We write the\n      data to disk alongside the digest -> backend spec mapping. Once\n      all unique data sample steps are written, we use the full sample\n      step recipe to record the sample name -> digest mapping without\n      needing to actually process the data that a full step execution\n      would have produced produces.\n\n    - A similar exclusion is made for steps which produce data which is\n      already recorded in the repository. The only difference is that\n      we do not process writing of these steps at all (for any sample).\n      Since the digest -> backend spec map already exists, we just need\n      to process to key -> digest mapping.\n    \"\"\"\n    recipe_contents = _unify_recipe_contents(recipe)\n    digest_getter = attrgetter('digest')\n    recipe_digests = set(map(digest_getter, recipe_contents))\n\n    hq = hashs.HashQuery(hashenv)\n    recipe_digests_db = set(map(hash_data_db_key_from_raw_key, recipe_digests))\n    existing_digests_db = hq.intersect_keys_db(recipe_digests_db)\n    missing_digests_db = recipe_digests_db.difference(existing_digests_db)\n    missing_digests = set(map(hash_data_raw_key_from_db_key, missing_digests_db))\n\n    remaining_digests = set(missing_digests)\n    task_list = []\n    for udf_kwargs, content_prep_recipes in recipe:\n        task_udf_kwargs = None  # Set to value if kwargs should be included\n        udf_indices = []\n        expected_digests = []\n        for content_prep in content_prep_recipes:\n            _digest = content_prep.digest\n            if _digest in remaining_digests:\n                udf_indices.append(content_prep.udf_iter_idx)\n                expected_digests.append(_digest)\n                task_udf_kwargs = udf_kwargs\n                remaining_digests.remove(_digest)\n        if task_udf_kwargs:\n            _task = _Task(udf_kwargs, tuple(udf_indices), tuple(expected_digests))\n            task_list.append(_task)\n\n    return task_list\n\n\ndef _write_digest_to_bespec_mapping(\n        executed_steps: List[_WrittenContentDescription],\n        hashenv: 'lmdb.Environment',\n        stagehashenv: 'lmdb.Environment'\n):\n    \"\"\"Write written content digests and bespec to hash and stagehash db.\n    \"\"\"\n    digests_bespecs = []\n    for spec in executed_steps:\n        dbSpec = spec.bespec\n        dbDigest = hash_data_db_key_from_raw_key(spec.digest)\n        digests_bespecs.append((dbDigest, dbSpec))\n\n    hashtxn = TxnRegister().begin_writer_txn(hashenv)\n    stagehashtxn = TxnRegister().begin_writer_txn(stagehashenv)\n    try:\n        for dbDigest, dbSpec in digests_bespecs:\n            stagehashtxn.put(dbDigest, dbSpec)\n            hashtxn.put(dbDigest, dbSpec)\n    finally:\n        TxnRegister().commit_writer_txn(hashenv)\n        TxnRegister().commit_writer_txn(stagehashenv)\n\n\ndef _write_full_recipe_sample_key_to_digest_mapping(\n        sample_steps: List[_ContentDescriptionPrep],\n        dataenv: 'lmdb.Environment'\n):\n    \"\"\"Write sample name -> digest key/value pairs in checkout data (stage) db.\n    \"\"\"\n    db_kvs = []\n    for step in sample_steps:\n        staging_key = step.db_record_key()\n        staging_val = step.db_record_val()\n        db_kvs.append((staging_key, staging_val))\n\n    datatxn = TxnRegister().begin_writer_txn(dataenv)\n    try:\n        for dbk, dbv in db_kvs:\n            datatxn.put(dbk, dbv)\n    finally:\n        TxnRegister().commit_writer_txn(dataenv)\n\n\ndef _mock_hangar_directory_structure(dir_name: str) -> Path:\n    \"\"\"Setup folder structure of hangar repo within a temporary directory path.\n\n    Parameters\n    ----------\n    dir_name\n        directory path to create the hangar dir structure in.\n\n    Returns\n    -------\n    mocked hangar directory path.\n    \"\"\"\n    dirpth = Path(dir_name)\n    is_valid_directory_path(dirpth)\n\n    dirpth.joinpath(DIR_DATA_STORE).mkdir()\n    dirpth.joinpath(DIR_DATA_STAGE).mkdir()\n    dirpth.joinpath(DIR_DATA_REMOTE).mkdir()\n    dirpth.joinpath(DIR_DATA).mkdir()\n    return dirpth\n\n\ndef _move_tmpdir_data_files_to_repodir(repodir: Path, tmpdir: Path):\n    tmp_stage_dir = tmpdir.joinpath(DIR_DATA_STAGE)\n    tmp_data_dir = tmpdir.joinpath(DIR_DATA)\n    hangar_stage_dir = repodir.joinpath(DIR_DATA_STAGE)\n    hangar_data_dir = repodir.joinpath(DIR_DATA)\n\n    task_list = []\n    for be_pth in tmp_stage_dir.iterdir():\n        if be_pth.is_dir():\n            for fpth in be_pth.iterdir():\n                if fpth.is_file() and not fpth.stem.startswith('.'):\n                    tmp_stage_fp = tmp_stage_dir.joinpath(be_pth.name, fpth.name)\n                    hangar_stage_fp = hangar_stage_dir.joinpath(be_pth.name, fpth.name)\n                    task_list.append((tmp_stage_fp, hangar_stage_fp))\n\n                    if hangar_stage_fp.suffix.endswith('dir'):\n                        # data directories (ie. lmdb) have a stage_file suffix ending in\n                        # 'dir' (for lmdb this is a suffix of `.lmdbdir`). The stage_file\n                        # stem is the directory name which needs to be moved.\n                        tmp_data_fp = tmp_data_dir.joinpath(be_pth.name, fpth.stem)\n                        hangar_data_fp = hangar_data_dir.joinpath(be_pth.name, fpth.stem)\n                    else:\n                        # files are 1:1 copy of stage_file:data_file\n                        tmp_data_fp = tmp_data_dir.joinpath(be_pth.name, fpth.name)\n                        hangar_data_fp = hangar_data_dir.joinpath(be_pth.name, fpth.name)\n                    task_list.append((tmp_data_fp, hangar_data_fp))\n\n    _MoveException = None\n    num_workers = bound(5, 32, os.cpu_count() + 4)\n    with ThreadPoolExecutor(max_workers=num_workers, thread_name_prefix='hangar_import_shutil') as e:\n        future_result = [e.submit(shutil.move, str(src), str(dst)) for src, dst in task_list]\n        for future in concurrent.futures.as_completed(future_result):\n            if future.exception() is not None:\n                _MoveException = future.exception()\n\n    if _MoveException is not None:\n        print(f'Error encountered while persisting imported data in hangar repo directory.')\n        print(f'Begining change set roll back.')\n        for _, dest_fp in task_list:\n            if dest_fp.is_file():\n                os.remove(str(dest_fp))\n                print(f'- {dest_fp}')\n            elif dest_fp.is_dir():\n                shutil.rmtree(str(dest_fp))\n                print(f'- {dest_fp}')\n        print(f'Roll back completed successfully')\n        raise _MoveException\n    return True\n"
  },
  {
    "path": "src/hangar/checkout.py",
    "content": "import atexit\nfrom pathlib import Path\nimport weakref\nfrom contextlib import suppress, ExitStack\nfrom uuid import uuid4\nfrom typing import Optional, Union\n\nimport numpy as np\nimport lmdb\n\nfrom .mixins import GetMixin, CheckoutDictIteration\nfrom .columns import (\n    ColumnTxn,\n    Columns,\n    generate_nested_column,\n    generate_flat_column,\n)\nfrom .diff import ReaderUserDiff, WriterUserDiff\nfrom .merger import select_merge_algorithm\nfrom .records import commiting, hashs, heads, summarize\nfrom .typesystem import (\n    NdarrayFixedShape,\n    NdarrayVariableShape,\n    StringVariableShape,\n    BytesVariableShape,\n)\nfrom .utils import is_suitable_user_key, is_ascii\nfrom .records import (\n    schema_db_key_from_column,\n    schema_hash_record_db_val_from_spec,\n    schema_hash_db_key_from_digest,\n    schema_record_db_val_from_digest,\n)\n\n\nclass ReaderCheckout(GetMixin, CheckoutDictIteration):\n    \"\"\"Checkout the repository as it exists at a particular branch.\n\n    This class is instantiated automatically from a repository checkout\n    operation. This object will govern all access to data and interaction methods\n    the user requests.\n\n        >>> co = repo.checkout()\n        >>> isinstance(co, ReaderCheckout)\n        True\n\n    If a commit hash is provided, it will take precedent over the branch name\n    parameter. If neither a branch not commit is specified, the staging\n    environment's base branch ``HEAD`` commit hash will be read.\n\n        >>> co = repo.checkout(commit='foocommit')\n        >>> co.commit_hash\n        'foocommit'\n        >>> co.close()\n        >>> co = repo.checkout(branch='testbranch')\n        >>> co.commit_hash\n        'someothercommithashhere'\n        >>> co.close()\n\n    Unlike :class:`WriterCheckout`, any number of :class:`ReaderCheckout`\n    objects can exist on the repository independently. Like the\n    ``write-enabled`` variant, the :meth:`close` method should be called after\n    performing the necessary operations on the repo. However, as there is no\n    concept of a ``lock`` for ``read-only`` checkouts, this is just to free up\n    memory resources, rather than changing recorded access state.\n\n    In order to reduce the chance that the python interpreter is shut down\n    without calling :meth:`close`,  - a common mistake during ipython / jupyter\n    sessions - an `atexit <https://docs.python.org/3/library/atexit.html>`_\n    hook is registered to :meth:`close`. If properly closed by the user, the\n    hook is unregistered after completion with no ill effects. So long as a the\n    process is NOT terminated via non-python ``SIGKILL``, fatal internal python\n    error, or or special ``os exit`` methods, cleanup will occur on interpreter\n    shutdown and resources will be freed. If a non-handled termination method\n    does occur, the implications of holding resources varies on a per-OS basis.\n    While no risk to data integrity is observed, repeated misuse may require a\n    system reboot in order to achieve expected performance characteristics.\n    \"\"\"\n\n    def __init__(self,\n                 base_path: Path,\n                 dataenv: lmdb.Environment,\n                 hashenv: lmdb.Environment,\n                 branchenv: lmdb.Environment,\n                 refenv: lmdb.Environment,\n                 commit: str):\n        \"\"\"Developer documentation of init method.\n\n        Parameters\n        ----------\n        base_path : Path\n            directory path to the Hangar repository on disk\n        dataenv : lmdb.Environment\n            db where the checkout record data is unpacked and stored.\n        hashenv : lmdb.Environment\n            db where the hash records are stored.\n        branchenv : lmdb.Environment\n            db where the branch records are stored.\n        refenv : lmdb.Environment\n            db where the commit references are stored.\n        commit : str\n            specific commit hash to checkout\n        \"\"\"\n        self._commit_hash = commit\n        self._repo_path = base_path\n        self._dataenv = dataenv\n        self._hashenv = hashenv\n        self._branchenv = branchenv\n        self._refenv = refenv\n        self._enter_count = 0\n        self._stack: Optional[ExitStack] = None\n\n        self._columns = Columns._from_commit(\n            repo_pth=self._repo_path,\n            hashenv=self._hashenv,\n            cmtrefenv=self._dataenv)\n        self._differ = ReaderUserDiff(\n            commit_hash=self._commit_hash,\n            branchenv=self._branchenv,\n            refenv=self._refenv)\n        atexit.register(self.close)\n\n    def _repr_pretty_(self, p, cycle):\n        \"\"\"pretty repr for printing in jupyter notebooks\n        \"\"\"\n        self._verify_alive()\n        res = f'Hangar {self.__class__.__name__}\\\n                \\n    Writer       : False\\\n                \\n    Commit Hash  : {self._commit_hash}\\\n                \\n    Num Columns  : {len(self)}\\n'\n        p.text(res)\n\n    def __repr__(self):\n        self._verify_alive()\n        res = f'{self.__class__}('\\\n              f'base_path={self._repo_path} '\\\n              f'dataenv={self._dataenv} '\\\n              f'hashenv={self._hashenv} '\\\n              f'commit={self._commit_hash})'\n        return res\n\n    def __enter__(self):\n        self._verify_alive()\n        with ExitStack() as stack:\n            if self._enter_count == 0:\n                stack.enter_context(self._columns)\n            self._enter_count += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._stack.close()\n        self._enter_count -= 1\n\n    def _verify_alive(self):\n        \"\"\"Validates that the checkout object has not been closed\n\n        Raises\n        ------\n        PermissionError\n            if the checkout was previously close\n        \"\"\"\n        if not hasattr(self, '_columns'):\n            err = PermissionError(\n                f'Unable to operate on past checkout objects which have been '\n                f'closed. No operation occurred. Please use a new checkout.')\n            raise err from None\n\n    @property\n    def _is_conman(self) -> bool:\n        self._verify_alive()\n        return bool(self._enter_count)\n\n    @property\n    def columns(self) -> Columns:\n        \"\"\"Provides access to column interaction object.\n\n        Can be used to either return the columns accessor for all elements or\n        a single column instance by using dictionary style indexing.\n\n            >>> co = repo.checkout(write=False)\n            >>> len(co.columns)\n            1\n            >>> print(co.columns.keys())\n            ['foo']\n            >>> fooCol = co.columns['foo']\n            >>> fooCol.dtype\n            np.fooDtype\n            >>> cols = co.columns\n            >>> fooCol = cols['foo']\n            >>> fooCol.dtype\n            np.fooDtype\n            >>> fooCol = cols.get('foo')\n            >>> fooCol.dtype\n            np.fooDtype\n\n        .. seealso::\n\n            The class :class:`~.columns.column.Columns` contains all methods\n            accessible by this property accessor\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            the columns object which behaves exactly like a\n            columns accessor class but which can be invalidated when the writer\n            lock is released.\n        \"\"\"\n        self._verify_alive()\n        return self._columns\n\n    @property\n    def diff(self) -> ReaderUserDiff:\n        \"\"\"Access the differ methods for a read-only checkout.\n\n        .. seealso::\n\n            The class :class:`ReaderUserDiff` contains all methods accessible\n            by this property accessor\n\n        Returns\n        -------\n        ReaderUserDiff\n            weakref proxy to the differ object (and contained methods) which behaves\n            exactly like the differ class but which can be invalidated when the\n            writer lock is released.\n        \"\"\"\n        self._verify_alive()\n        wr = weakref.proxy(self._differ)\n        return wr\n\n    @property\n    def commit_hash(self) -> str:\n        \"\"\"Commit hash this read-only checkout's data is read from.\n\n            >>> co = repo.checkout()\n            >>> co.commit_hash\n            foohashdigesthere\n\n        Returns\n        -------\n        str\n            commit hash of the checkout\n        \"\"\"\n        self._verify_alive()\n        return self._commit_hash\n\n    def log(self,\n            branch: str = None,\n            commit: str = None,\n            *,\n            return_contents: bool = False,\n            show_time: bool = False,\n            show_user: bool = False) -> Optional[dict]:\n        \"\"\"Displays a pretty printed commit log graph to the terminal.\n\n        .. note::\n\n            For programatic access, the return_contents value can be set to true\n            which will retrieve relevant commit specifications as dictionary\n            elements.\n\n        if Neither `branch` nor `commit` arguments are supplied, the commit\n        digest of the currently reader checkout will be used as default.\n\n        Parameters\n        ----------\n        branch : str, optional\n            The name of the branch to start the log process from. (Default value\n            = None)\n        commit : str, optional\n            The commit hash to start the log process from. (Default value = None)\n        return_contents : bool, optional, kwarg only\n            If true, return the commit graph specifications in a dictionary\n            suitable for programatic access/evaluation.\n        show_time : bool, optional, kwarg only\n            If true and return_contents is False, show the time of each commit\n            on the printed log graph\n        show_user : bool, optional, kwarg only\n            If true and return_contents is False, show the committer of each\n            commit on the printed log graph\n        Returns\n        -------\n        Optional[dict]\n            Dict containing the commit ancestor graph, and all specifications.\n        \"\"\"\n        self._verify_alive()\n        if (branch is None) and (commit is None):\n            commit = self.commit_hash\n        res = summarize.log(branchenv=self._branchenv,\n                            refenv=self._refenv,\n                            branch=branch,\n                            commit=commit,\n                            return_contents=return_contents,\n                            show_time=show_time,\n                            show_user=show_user)\n        return res\n\n    def close(self) -> None:\n        \"\"\"Gracefully close the reader checkout object.\n\n        Though not strictly required for reader checkouts (as opposed to\n        writers), closing the checkout after reading will free file handles and\n        system resources, which may improve performance for repositories with\n        multiple simultaneous read checkouts.\n        \"\"\"\n        self._verify_alive()\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n\n        self._columns._destruct()\n        for attr in list(self.__dict__.keys()):\n            delattr(self, attr)\n        atexit.unregister(self.close)\n        return\n\n\n# --------------- Write enabled checkout ---------------------------------------\n\n\nclass WriterCheckout(GetMixin, CheckoutDictIteration):\n    \"\"\"Checkout the repository at the head of a given branch for writing.\n\n    This is the entry point for all writing operations to the repository, the\n    writer class records all interactions in a special ``\"staging\"`` area,\n    which is based off the state of the repository as it existed at the\n    ``HEAD`` commit of a branch.\n\n        >>> co = repo.checkout(write=True)\n        >>> co.branch_name\n        'master'\n        >>> co.commit_hash\n        'masterheadcommithash'\n        >>> co.close()\n\n    At the moment, only one instance of this class can write data to the\n    staging area at a time. After the desired operations have been completed,\n    it is crucial to call :meth:`close` to release the writer lock. In\n    addition, after any changes have been made to the staging area, the branch\n    ``HEAD`` cannot be changed. In order to checkout another branch ``HEAD``\n    for writing, you must either :meth:`commit` the changes, or perform a\n    hard-reset of the staging area to the last commit via\n    :meth:`reset_staging_area`.\n\n    In order to reduce the chance that the python interpreter is shut down\n    without calling :meth:`close`, which releases the writer lock - a common\n    mistake during ipython / jupyter sessions - an `atexit\n    <https://docs.python.org/3/library/atexit.html>`_ hook is registered to\n    :meth:`close`. If properly closed by the user, the hook is unregistered\n    after completion with no ill effects. So long as a the process is NOT\n    terminated via non-python SIGKILL, fatal internal python error, or or\n    special os exit methods, cleanup will occur on interpreter shutdown and the\n    writer lock will be released. If a non-handled termination method does\n    occur, the :meth:`~.Repository.force_release_writer_lock` method must be\n    called manually when a new python process wishes to open the writer\n    checkout.\n    \"\"\"\n\n    def __init__(self,\n                 repo_pth: Path,\n                 branch_name: str,\n                 hashenv: lmdb.Environment,\n                 refenv: lmdb.Environment,\n                 stageenv: lmdb.Environment,\n                 branchenv: lmdb.Environment,\n                 stagehashenv: lmdb.Environment,\n                 mode: str = 'a'):\n        \"\"\"Developer documentation of init method.\n\n        Parameters\n        ----------\n        repo_pth : Path\n            local file path of the repository.\n        branch_name : str\n            name of the branch whose ``HEAD`` commit will for the starting state\n            of the staging area.\n        hashenv lmdb.Environment\n            db where the hash records are stored.\n        refenv : lmdb.Environment\n            db where the commit record data is unpacked and stored.\n        stageenv : lmdb.Environment\n            db where the stage record data is unpacked and stored.\n        branchenv : lmdb.Environment\n            db where the head record data is unpacked and stored.\n        stagehashenv: lmdb.Environment\n            db where the staged hash record data is stored.\n        mode : str, optional\n            open in write or read only mode, default is 'a' which is write-enabled.\n        \"\"\"\n        self._enter_count = 0\n        self._repo_path: Path = repo_pth\n        self._branch_name = branch_name\n        self._writer_lock = str(uuid4())\n        self._stack: Optional[ExitStack] = None\n\n        self._refenv = refenv\n        self._hashenv = hashenv\n        self._stageenv = stageenv\n        self._branchenv = branchenv\n        self._stagehashenv = stagehashenv\n\n        self._columns: Optional[Columns] = None\n        self._differ: Optional[WriterUserDiff] = None\n        self._setup()\n        atexit.register(self.close)\n\n    def _repr_pretty_(self, p, cycle):\n        \"\"\"pretty repr for printing in jupyter notebooks\n        \"\"\"\n        self._verify_alive()\n        res = f'Hangar {self.__class__.__name__}\\\n                \\n    Writer       : True\\\n                \\n    Base Branch  : {self._branch_name}\\\n                \\n    Num Columns  : {len(self)}\\n'\n        p.text(res)\n\n    def __repr__(self):\n        self._verify_alive()\n        res = f'{self.__class__}('\\\n              f'base_path={self._repo_path} '\\\n              f'branch_name={self._branch_name} ' \\\n              f'hashenv={self._hashenv} '\\\n              f'refenv={self._refenv} '\\\n              f'stageenv={self._stageenv} '\\\n              f'branchenv={self._branchenv})\\n'\n        return res\n\n    def __enter__(self):\n        self._verify_alive()\n        with ExitStack() as stack:\n            if self._enter_count == 0:\n                stack.enter_context(self._columns)\n            self._enter_count += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._stack.close()\n        self._enter_count -= 1\n\n    @property\n    def _is_conman(self):\n        self._verify_alive()\n        return bool(self._enter_count)\n\n    def _verify_alive(self):\n        \"\"\"Ensures that this class instance holds the writer lock in the database.\n\n        Raises\n        ------\n        PermissionError\n            If the checkout was previously closed (no :attr:``_writer_lock``)\n            or if the writer lock value does not match that recorded in the\n            branch db\n        \"\"\"\n        if not hasattr(self, '_writer_lock'):\n            with suppress(AttributeError):\n                self._columns._destruct()\n                del self._columns\n            with suppress(AttributeError):\n                del self._differ\n            err = f'Unable to operate on past checkout objects which have been '\\\n                  f'closed. No operation occurred. Please use a new checkout.'\n            raise PermissionError(err) from None\n\n        try:\n            heads.acquire_writer_lock(self._branchenv, self._writer_lock)\n        except Exception as e:\n            with suppress(AttributeError):\n                self._columns._destruct()\n                del self._columns\n            with suppress(AttributeError):\n                del self._differ\n            raise e from None\n\n    def _setup(self):\n        \"\"\"setup the staging area appropriately for a write enabled checkout.\n\n        On setup, we cannot be sure what branch the staging area was previously\n        checked out on, and we cannot be sure if there are any 'uncommitted\n        changes' in the staging area (ie. the staging area is ``DIRTY``). The\n        setup methods here ensure that we can safety make any changes to the\n        staging area without overwriting uncommitted changes, and then perform\n        the setup steps to checkout staging area state at that point in time.\n\n        Raises\n        ------\n        ValueError\n            if there are changes previously made in the staging area which were\n            based on one branch's ``HEAD``, but a different branch was specified to\n            be used for the base of this checkout.\n        \"\"\"\n        self._verify_alive()\n        current_head = heads.get_staging_branch_head(self._branchenv)\n        currentDiff = WriterUserDiff(stageenv=self._stageenv,\n                                     refenv=self._refenv,\n                                     branchenv=self._branchenv,\n                                     branch_name=current_head)\n        if currentDiff.status() == 'DIRTY':\n            if current_head != self._branch_name:\n                e = ValueError(\n                    f'Unable to check out branch: {self._branch_name} for writing '\n                    f'as the staging area has uncommitted changes on branch: '\n                    f'{current_head}. Please commit or stash uncommitted changes '\n                    f'before checking out a different branch for writing.')\n                self.close()\n                raise e\n        else:\n            if current_head != self._branch_name:\n                try:\n                    cmt = heads.get_branch_head_commit(\n                        branchenv=self._branchenv, branch_name=self._branch_name)\n                except ValueError as e:\n                    self.close()\n                    raise e\n                commiting.replace_staging_area_with_commit(\n                    refenv=self._refenv, stageenv=self._stageenv, commit_hash=cmt)\n                heads.set_staging_branch_head(\n                    branchenv=self._branchenv, branch_name=self._branch_name)\n\n        self._columns = Columns._from_staging_area(\n            repo_pth=self._repo_path,\n            hashenv=self._hashenv,\n            stageenv=self._stageenv,\n            stagehashenv=self._stagehashenv)\n        self._differ = WriterUserDiff(\n            stageenv=self._stageenv,\n            refenv=self._refenv,\n            branchenv=self._branchenv,\n            branch_name=self._branch_name)\n\n    @property\n    def columns(self) -> Columns:\n        \"\"\"Provides access to column interaction object.\n\n        Can be used to either return the columns accessor for all elements or\n        a single column instance by using dictionary style indexing.\n\n            >>> co = repo.checkout(write=True)\n            >>> cols = co.columns\n            >>> len(cols)\n            0\n            >>> fooCol = co.add_ndarray_column('foo', shape=(10, 10), dtype=np.uint8)\n            >>> len(co.columns)\n            1\n            >>> len(co)\n            1\n            >>> list(co.columns.keys())\n            ['foo']\n            >>> list(co.keys())\n            ['foo']\n            >>> fooCol = co.columns['foo']\n            >>> fooCol.dtype\n            np.fooDtype\n            >>> fooCol = cols.get('foo')\n            >>> fooCol.dtype\n            np.fooDtype\n            >>> 'foo' in co.columns\n            True\n            >>> 'bar' in co.columns\n            False\n\n        .. seealso::\n\n            The class :class:`~.columns.column.Columns` contains all methods\n            accessible by this property accessor\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            the columns object which behaves exactly like a columns accessor\n            class but which can be invalidated when the writer lock is\n            released.\n        \"\"\"\n        self._verify_alive()\n        return self._columns\n\n    @property\n    def diff(self) -> WriterUserDiff:\n        \"\"\"Access the differ methods which are aware of any staged changes.\n\n        .. seealso::\n\n            The class :class:`hangar.diff.WriterUserDiff` contains all methods\n            accessible by this property accessor\n\n        Returns\n        -------\n        WriterUserDiff\n            weakref proxy to the differ object (and contained methods) which\n            behaves exactly like the differ class but which can be invalidated\n            when the writer lock is released.\n        \"\"\"\n        self._verify_alive()\n        wr = weakref.proxy(self._differ)\n        return wr\n\n    @property\n    def branch_name(self) -> str:\n        \"\"\"Branch this write enabled checkout's staging area was based on.\n\n        Returns\n        -------\n        str\n            name of the branch whose commit ``HEAD`` changes are staged from.\n        \"\"\"\n        self._verify_alive()\n        return self._branch_name\n\n    @property\n    def commit_hash(self) -> str:\n        \"\"\"Commit hash which the staging area of `branch_name` is based on.\n\n        Returns\n        -------\n        str\n            commit hash\n        \"\"\"\n        self._verify_alive()\n        cmt = heads.get_branch_head_commit(branchenv=self._branchenv,\n                                           branch_name=self._branch_name)\n        return cmt\n\n    def log(self,\n            branch: str = None,\n            commit: str = None,\n            *,\n            return_contents: bool = False,\n            show_time: bool = False,\n            show_user: bool = False) -> Optional[dict]:\n        \"\"\"Displays a pretty printed commit log graph to the terminal.\n\n        .. note::\n\n            For programatic access, the return_contents value can be set to true\n            which will retrieve relevant commit specifications as dictionary\n            elements.\n\n        if Neither `branch` nor `commit` arguments are supplied, the branch which\n        is currently checked out for writing will be used as default.\n\n        Parameters\n        ----------\n        branch : str, optional\n            The name of the branch to start the log process from. (Default value\n            = None)\n        commit : str, optional\n            The commit hash to start the log process from. (Default value = None)\n        return_contents : bool, optional, kwarg only\n            If true, return the commit graph specifications in a dictionary\n            suitable for programatic access/evaluation.\n        show_time : bool, optional, kwarg only\n            If true and return_contents is False, show the time of each commit\n            on the printed log graph\n        show_user : bool, optional, kwarg only\n            If true and return_contents is False, show the committer of each\n            commit on the printed log graph\n        Returns\n        -------\n        Optional[dict]\n            Dict containing the commit ancestor graph, and all specifications.\n        \"\"\"\n        self._verify_alive()\n        if (branch is None) and (commit is None):\n            branch = self.branch_name\n        res = summarize.log(branchenv=self._branchenv,\n                            refenv=self._refenv,\n                            branch=branch,\n                            commit=commit,\n                            return_contents=return_contents,\n                            show_time=show_time,\n                            show_user=show_user)\n        return res\n\n    def add_str_column(self,\n                       name: str,\n                       contains_subsamples: bool = False,\n                       *,\n                       backend: Optional[str] = None,\n                       backend_options: Optional[dict] = None):\n        \"\"\"Initializes a :class:`str` container column\n\n        Columns are created in order to store some arbitrary collection of data\n        pieces. In this case, we store :class:`str` data. Items need not be\n        related to each-other in any direct capacity; the only criteria hangar\n        requires is that all pieces of data stored in the column have a\n        compatible schema with each-other (more on this below). Each piece of\n        data is indexed by some key (either user defined or automatically\n        generated depending on the user's preferences). Both single level\n        stores (sample keys mapping to data on disk) and nested stores (where\n        some sample key maps to an arbitrary number of subsamples, in turn each\n        pointing to some piece of store data on disk) are supported.\n\n        All data pieces within a column have the same data type. For\n        :class:`str` columns, there is no distinction between\n        ``'variable_shape'`` and ``'fixed_shape'`` schema types. Values are\n        allowed to take on a value of any size so long as the datatype and\n        contents are valid for the schema definition.\n\n        Parameters\n        ----------\n        name : str\n            Name assigned to the column\n        contains_subsamples : bool, optional\n            True if the column column should store data in a nested structure.\n            In this scheme, a sample key is used to index an arbitrary number\n            of subsamples which map some (sub)key to a piece of data. If False,\n            sample keys map directly to a single piece of data; essentially\n            acting as a single level key/value store. By default, False.\n        backend : Optional[str], optional\n            ADVANCED USERS ONLY, backend format code to use for column data. If\n            None, automatically inferred and set based on data shape and type.\n            by default None\n        backend_options : Optional[dict], optional\n            ADVANCED USERS ONLY, filter opts to apply to column data. If None,\n            automatically inferred and set based on data shape and type.\n            by default None\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            instance object of the initialized column.\n        \"\"\"\n        self._verify_alive()\n        if self.columns._any_is_conman() or self._is_conman:\n            raise PermissionError('Not allowed while context manager is used.')\n\n        # ------------- Checks for argument validity --------------------------\n\n        try:\n            if (not is_suitable_user_key(name)) or (not is_ascii(name)):\n                raise ValueError(\n                    f'Column name provided: `{name}` is invalid. Can only contain '\n                    f'alpha-numeric or \".\" \"_\" \"-\" ascii characters (no whitespace). '\n                    f'Must be <= 64 characters long')\n            if name in self._columns:\n                raise LookupError(f'Column already exists with name: {name}.')\n            if not isinstance(contains_subsamples, bool):\n                raise ValueError(f'contains_subsamples argument must be bool, '\n                                 f'not type {type(contains_subsamples)}')\n        except (ValueError, LookupError) as e:\n            raise e from None\n\n        # ---------- schema validation handled automatically by typesystem ----\n\n        layout = 'nested' if contains_subsamples else 'flat'\n        schema = StringVariableShape(\n            dtype=str, column_layout=layout, backend=backend, backend_options=backend_options)\n\n        # ------------------ create / return new column -----------------------\n\n        col = self._initialize_new_column(\n            column_name=name, column_layout=layout, schema=schema)\n        return col\n\n    def add_bytes_column(self,\n                         name: str,\n                         contains_subsamples: bool = False,\n                         *,\n                         backend: Optional[str] = None,\n                         backend_options: Optional[dict] = None):\n        \"\"\"Initializes a :class:`bytes` container column\n\n        Columns are created in order to store some arbitrary collection of data\n        pieces. In this case, we store :class:`bbytes` data. Items need not be\n        related to each-other in any direct capacity; the only criteria hangar\n        requires is that all pieces of data stored in the column have a\n        compatible schema with each-other (more on this below). Each piece of\n        data is indexed by some key (either user defined or automatically\n        generated depending on the user's preferences). Both single level\n        stores (sample keys mapping to data on disk) and nested stores (where\n        some sample key maps to an arbitrary number of subsamples, in turn each\n        pointing to some piece of store data on disk) are supported.\n\n        All data pieces within a column have the same data type. For\n        :class:`bytes` columns, there is no distinction between\n        ``'variable_shape'`` and ``'fixed_shape'`` schema types. Values are\n        allowed to take on a value of any size so long as the datatype and\n        contents are valid for the schema definition.\n\n        Parameters\n        ----------\n        name : str\n            Name assigned to the column\n        contains_subsamples : bool, optional\n            True if the column column should store data in a nested structure.\n            In this scheme, a sample key is used to index an arbitrary number\n            of subsamples which map some (sub)key to a piece of data. If False,\n            sample keys map directly to a single piece of data; essentially\n            acting as a single level key/value store. By default, False.\n        backend : Optional[str], optional\n            ADVANCED USERS ONLY, backend format code to use for column data. If\n            None, automatically inferred and set based on data shape and type.\n            by default None\n        backend_options : Optional[dict], optional\n            ADVANCED USERS ONLY, filter opts to apply to column data. If None,\n            automatically inferred and set based on data shape and type.\n            by default None\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            instance object of the initialized column.\n        \"\"\"\n        self._verify_alive()\n        if self.columns._any_is_conman() or self._is_conman:\n            raise PermissionError('Not allowed while context manager is used.')\n\n        # ------------- Checks for argument validity --------------------------\n\n        try:\n            if (not is_suitable_user_key(name)) or (not is_ascii(name)):\n                raise ValueError(\n                    f'Column name provided: `{name}` is invalid. Can only contain '\n                    f'alpha-numeric or \".\" \"_\" \"-\" ascii characters (no whitespace). '\n                    f'Must be <= 64 characters long')\n            if name in self._columns:\n                raise LookupError(f'Column already exists with name: {name}.')\n            if not isinstance(contains_subsamples, bool):\n                raise ValueError(f'contains_subsamples argument must be bool, '\n                                 f'not type {type(contains_subsamples)}')\n        except (ValueError, LookupError) as e:\n            raise e from None\n\n        # ---------- schema validation handled automatically by typesystem ----\n\n        layout = 'nested' if contains_subsamples else 'flat'\n        schema = BytesVariableShape(\n            dtype=bytes, column_layout=layout, backend=backend, backend_options=backend_options)\n\n        # ------------------ create / return new column -----------------------\n\n        col = self._initialize_new_column(\n            column_name=name, column_layout=layout, schema=schema)\n        return col\n\n    def add_ndarray_column(self,\n                           name: str,\n                           shape: Optional[Union[int, tuple]] = None,\n                           dtype: Optional[np.dtype] = None,\n                           prototype: Optional[np.ndarray] = None,\n                           variable_shape: bool = False,\n                           contains_subsamples: bool = False,\n                           *,\n                           backend: Optional[str] = None,\n                           backend_options: Optional[dict] = None):\n        \"\"\"Initializes a :class:`numpy.ndarray` container column.\n\n        Columns are created in order to store some arbitrary collection of data\n        pieces. In this case, we store :class:`numpy.ndarray` data. Items need\n        not be related to each-other in any direct capacity; the only criteria\n        hangar requires is that all pieces of data stored in the column have a\n        compatible schema with each-other (more on this below). Each piece of\n        data is indexed by some key (either user defined or automatically\n        generated depending on the user's preferences). Both single level\n        stores (sample keys mapping to data on disk) and nested stores (where\n        some sample key maps to an arbitrary number of subsamples, in turn each\n        pointing to some piece of store data on disk) are supported.\n\n        All data pieces within a column have the same data type and number of\n        dimensions. The size of each dimension can be either fixed (the default\n        behavior) or variable per sample. For fixed dimension sizes, all data\n        pieces written to the column must have the same shape & size which was\n        specified at the time the column column was initialized. Alternatively,\n        variable sized columns can write data pieces with dimensions of any\n        size (up to a specified maximum).\n\n        Parameters\n        ----------\n        name : str\n            The name assigned to this column.\n        shape : Optional[Union[int, Tuple[int]]]\n            The shape of the data samples which will be written in this column.\n            This argument and the `dtype` argument are required if a `prototype`\n            is not provided, defaults to None.\n        dtype : Optional[:class:`numpy.dtype`]\n            The datatype of this column. This argument and the `shape` argument\n            are required if a `prototype` is not provided., defaults to None.\n        prototype : Optional[:class:`numpy.ndarray`]\n            A sample array of correct datatype and shape which will be used to\n            initialize the column storage mechanisms. If this is provided, the\n            `shape` and `dtype` arguments must not be set, defaults to None.\n        variable_shape : bool, optional\n            If this is a variable sized column. If true, a the maximum shape is\n            set from the provided ``shape`` or ``prototype`` argument. Any sample\n            added to the column can then have dimension sizes <= to this\n            initial specification (so long as they have the same rank as what\n            was specified) defaults to False.\n        contains_subsamples : bool, optional\n            True if the column column should store data in a nested structure.\n            In this scheme, a sample key is used to index an arbitrary number of\n            subsamples which map some (sub)key to some piece of data. If False,\n            sample keys map directly to a single piece of data; essentially\n            acting as a single level key/value store. By default, False.\n        backend : Optional[str], optional\n            ADVANCED USERS ONLY, backend format code to use for column data. If\n            None, automatically inferred and set based on data shape and type.\n            by default None\n        backend_options : Optional[dict], optional\n            ADVANCED USERS ONLY, filter opts to apply to column data. If None,\n            automatically inferred and set based on data shape and type.\n            by default None\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            instance object of the initialized column.\n        \"\"\"\n        self._verify_alive()\n        if self.columns._any_is_conman() or self._is_conman:\n            raise PermissionError('Not allowed while context manager is used.')\n\n        # ------------- Checks for argument validity --------------------------\n\n        try:\n            if (not is_suitable_user_key(name)) or (not is_ascii(name)):\n                raise ValueError(\n                    f'Column name provided: `{name}` is invalid. Can only contain '\n                    f'alpha-numeric or \".\" \"_\" \"-\" ascii characters (no whitespace). '\n                    f'Must be <= 64 characters long')\n            if name in self.columns:\n                raise LookupError(f'Column already exists with name: {name}.')\n            if not isinstance(contains_subsamples, bool):\n                raise ValueError(f'contains_subsamples is not bool type')\n\n            # If shape/dtype is passed instead of a prototype arg, we use those values\n            # to initialize a numpy array prototype. Using a :class:`numpy.ndarray`\n            # for specification of dtype / shape params lets us offload much of the\n            # required type checking / sanitization of userspace input to libnumpy,\n            # rather than attempting to cover all possible cases here.\n            if prototype is not None:\n                if (shape is not None) or (dtype is not None):\n                    raise ValueError(f'cannot set both prototype and shape/dtype args.')\n            else:\n                prototype = np.zeros(shape, dtype=dtype)\n            dtype = prototype.dtype\n            shape = prototype.shape\n            if not all([x > 0 for x in shape]):\n                raise ValueError(f'all dimensions must be sized greater than zero')\n        except (ValueError, LookupError) as e:\n            raise e from None\n\n        # ---------- schema validation handled automatically by typesystem ----\n\n        column_layout = 'nested' if contains_subsamples else 'flat'\n        if variable_shape:\n            schema = NdarrayVariableShape(dtype=dtype, shape=shape, column_layout=column_layout,\n                                          backend=backend, backend_options=backend_options)\n        else:\n            schema = NdarrayFixedShape(dtype=dtype, shape=shape, column_layout=column_layout,\n                                       backend=backend, backend_options=backend_options)\n\n        # ------------------ create / return new column -----------------------\n\n        col = self._initialize_new_column(\n            column_name=name, column_layout=column_layout, schema=schema)\n        return col\n\n    def _initialize_new_column(self,\n                               column_name: str,\n                               column_layout: str,\n                               schema) -> Columns:\n        \"\"\"Initialize a column and write spec to record db.\n\n        Parameters\n        ----------\n        column_name: str\n            name of the column\n        column_layout: str\n            One of ['flat', 'nested'] indicating column layout class to use\n            during generation.\n        schema: ColumnBase\n            schema class instance providing column data spec, schema/column digest,\n            data validator / hashing methods, and backend ID / options; all of which\n            are needed to successfully create & save the column instance\n\n        Returns\n        -------\n        Columns\n            initialized column class instance.\n        \"\"\"\n        # -------- set vals in lmdb only after schema is sure to exist --------\n\n        schema_digest = schema.schema_hash_digest()\n        columnSchemaKey = schema_db_key_from_column(column_name, layout=column_layout)\n        columnSchemaVal = schema_record_db_val_from_digest(schema_digest)\n        hashSchemaKey = schema_hash_db_key_from_digest(schema_digest)\n        hashSchemaVal = schema_hash_record_db_val_from_spec(schema.schema)\n\n        txnctx = ColumnTxn(self._stageenv, self._hashenv, self._stagehashenv)\n        with txnctx.write() as ctx:\n            ctx.dataTxn.put(columnSchemaKey, columnSchemaVal)\n            ctx.hashTxn.put(hashSchemaKey, hashSchemaVal, overwrite=False)\n\n        # ------------- create column instance and return to user -------------\n\n        if column_layout == 'nested':\n            setup_args = generate_nested_column(\n                txnctx=txnctx, column_name=column_name,\n                path=self._repo_path, schema=schema, mode='a')\n        else:\n            setup_args = generate_flat_column(\n                txnctx=txnctx, column_name=column_name,\n                path=self._repo_path, schema=schema, mode='a')\n\n        self.columns._columns[column_name] = setup_args\n        return self.columns[column_name]\n\n    def merge(self, message: str, dev_branch: str) -> str:\n        \"\"\"Merge the currently checked out commit with the provided branch name.\n\n        If a fast-forward merge is possible, it will be performed, and the\n        commit message argument to this function will be ignored.\n\n        Parameters\n        ----------\n        message : str\n            commit message to attach to a three-way merge\n        dev_branch : str\n            name of the branch which should be merge into this branch\n            (ie `master`)\n\n        Returns\n        -------\n        str\n            commit hash of the new commit for the `master` branch this checkout\n            was started from.\n        \"\"\"\n        self._verify_alive()\n        commit_hash = select_merge_algorithm(\n            message=message,\n            branchenv=self._branchenv,\n            stageenv=self._stageenv,\n            refenv=self._refenv,\n            stagehashenv=self._stagehashenv,\n            master_branch=self._branch_name,\n            dev_branch=dev_branch,\n            repo_path=self._repo_path,\n            writer_uuid=self._writer_lock)\n\n        for asetHandle in self._columns.values():\n            with suppress(KeyError):\n                asetHandle._close()\n\n        self._columns = Columns._from_staging_area(\n            repo_pth=self._repo_path,\n            hashenv=self._hashenv,\n            stageenv=self._stageenv,\n            stagehashenv=self._stagehashenv)\n        self._differ = WriterUserDiff(\n            stageenv=self._stageenv,\n            refenv=self._refenv,\n            branchenv=self._branchenv,\n            branch_name=self._branch_name)\n\n        return commit_hash\n\n    def commit(self, commit_message: str) -> str:\n        \"\"\"Commit the changes made in the staging area on the checkout branch.\n\n        Parameters\n        ----------\n        commit_message : str, optional\n            user proved message for a log of what was changed in this commit.\n            Should a fast forward commit be possible, this will NOT be added to\n            fast-forward ``HEAD``.\n\n        Returns\n        -------\n        str\n            The commit hash of the new commit.\n\n        Raises\n        ------\n        RuntimeError\n            If no changes have been made in the staging area, no commit occurs.\n        \"\"\"\n        self._verify_alive()\n\n        open_columns = []\n        for column in self._columns.values():\n            if column._is_conman:\n                open_columns.append(column.column)\n\n        try:\n            for column_name in open_columns:\n                self._columns[column_name].__exit__()\n\n            if self._differ.status() == 'CLEAN':\n                e = RuntimeError('No changes made in staging area. Cannot commit.')\n                raise e from None\n\n            self._columns._close()\n            commit_hash = commiting.commit_records(message=commit_message,\n                                                   branchenv=self._branchenv,\n                                                   stageenv=self._stageenv,\n                                                   refenv=self._refenv,\n                                                   repo_path=self._repo_path)\n            # purge recs then reopen file handles so that we don't have to invalidate\n            # previous weakproxy references like if we just called :meth:``_setup```\n            hashs.clear_stage_hash_records(self._stagehashenv)\n            self._columns._open()\n\n        finally:\n            for column_name in open_columns:\n                self._columns[column_name].__enter__()\n\n        return commit_hash\n\n    def reset_staging_area(self, *, force=False) -> str:\n        \"\"\"Perform a hard reset of the staging area to the last commit head.\n\n        After this operation completes, the writer checkout will automatically\n        close in the typical fashion (any held references to :attr:``column``\n        or :attr:``metadata`` objects will finalize and destruct as normal), In\n        order to perform any further operation, a new checkout needs to be\n        opened.\n\n        .. warning::\n\n            This operation is IRREVERSIBLE. all records and data which are note\n            stored in a previous commit will be permanently deleted.\n\n        Returns\n        -------\n        str\n            Commit hash of the head which the staging area is reset to.\n\n        Raises\n        ------\n        RuntimeError\n            If no changes have been made to the staging area, No-Op.\n        \"\"\"\n        self._verify_alive()\n        print(f'Hard reset requested with writer_lock: {self._writer_lock}')\n\n        if self._differ.status() == 'CLEAN':\n            if not force:\n                e = RuntimeError(f'No changes made in staging area. No reset necessary.')\n                raise e from None\n\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n        if hasattr(self._columns, '_destruct'):\n            self._columns._destruct()\n\n        hashs.remove_stage_hash_records_from_hashenv(self._hashenv, self._stagehashenv)\n        hashs.clear_stage_hash_records(self._stagehashenv)\n        hashs.backends_remove_in_process_data(self._repo_path)\n\n        branch_head = heads.get_staging_branch_head(self._branchenv)\n        head_commit = heads.get_branch_head_commit(self._branchenv, branch_head)\n        if head_commit == '':\n            with suppress(ValueError):\n                commiting.replace_staging_area_with_commit(refenv=self._refenv,\n                                                           stageenv=self._stageenv,\n                                                           commit_hash=head_commit)\n        else:\n            commiting.replace_staging_area_with_commit(refenv=self._refenv,\n                                                       stageenv=self._stageenv,\n                                                       commit_hash=head_commit)\n\n        self._columns = Columns._from_staging_area(\n            repo_pth=self._repo_path,\n            hashenv=self._hashenv,\n            stageenv=self._stageenv,\n            stagehashenv=self._stagehashenv)\n        self._differ = WriterUserDiff(\n            stageenv=self._stageenv,\n            refenv=self._refenv,\n            branchenv=self._branchenv,\n            branch_name=self._branch_name)\n        return head_commit\n\n    def close(self) -> None:\n        \"\"\"Close all handles to the writer checkout and release the writer lock.\n\n        Failure to call this method after the writer checkout has been used\n        will result in a lock being placed on the repository which will not\n        allow any writes until it has been manually cleared.\n        \"\"\"\n        with suppress(lmdb.Error):\n            self._verify_alive()\n\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n\n        if hasattr(self, '_columns'):\n            if hasattr(self._columns, '_destruct'):\n                self._columns._destruct()\n\n        with suppress(lmdb.Error):\n            heads.release_writer_lock(self._branchenv, self._writer_lock)\n\n        for attr in list(self.__dict__.keys()):\n            delattr(self, attr)\n        atexit.unregister(self.close)\n        return\n"
  },
  {
    "path": "src/hangar/cli/__init__.py",
    "content": "from .cli import main\n\n__all__ = ['main']\n"
  },
  {
    "path": "src/hangar/cli/cli.py",
    "content": "\"\"\"Module that contains the command line app.\n\nWhy does this file exist, and why not put this in __main__?\n\n   You might be tempted to import things from __main__ later, but that will cause\n   problems: the code will get executed twice:\n\n      - When you run `python -m hangar` python will execute\n        ``__main__.py`` as a script. That means there won't be any\n        ``hangar.__main__`` in ``sys.modules``.\n      - When you import __main__ it will get executed again (as a module) because\n        there's no ``hangar.__main__`` in ``sys.modules``.\n\nAlso see (1) from http://click.pocoo.org/7/setuptools/#setuptools-integration\n\"\"\"\nimport os\nimport time\nfrom pathlib import Path\n\nimport click\nimport numpy as np\n\nfrom hangar import Repository, __version__\n\nfrom .utils import parse_custom_arguments, StrOrIntType\n\n\npass_repo = click.make_pass_decorator(Repository, ensure=True)\n\n\n@click.group(no_args_is_help=True, add_help_option=True, invoke_without_command=True)\n@click.version_option(version=__version__, help='display current Hangar Version')\n@click.pass_context\ndef main(ctx):  # pragma: no cover\n    P = os.getcwd()\n    ctx.obj = Repository(path=P, exists=False)\n\n\n# -------------------------------- Init ---------------------------------------\n\n\n@main.command()\n@click.option('--name', prompt='User Name', help='first and last name of user')\n@click.option('--email', prompt='User Email', help='email address of the user')\n@click.option('--overwrite', is_flag=True, default=False,\n              help='overwrite a repository if it exists at the current path')\n@pass_repo\ndef init(repo: Repository, name, email, overwrite):\n    \"\"\"Initialize an empty repository at the current path.\n    \"\"\"\n    if repo.initialized and (not overwrite):\n        click.echo(f'Repo already exists at: {repo.path}')\n    else:\n        repo.init(user_name=name, user_email=email, remove_old=overwrite)\n\n\n# -------------------------- Writer Lock -------------------------------------\n\n\n@main.command(name='writer-lock')\n@click.option('--force-release', 'force_release_', is_flag=True, default=False,\n              help='force release writer lock from the CLI.')\n@pass_repo\ndef writer_lock_held(repo: Repository, force_release_):\n    \"\"\"Determine if the writer lock is held for a repository.\n\n    Passing the ``--force-release`` flag will instantly release the writer lock,\n    invalidating any process which currently holds it.\n    \"\"\"\n    if force_release_:\n        repo.force_release_writer_lock()\n        click.echo(f'Success force release of writer lock.')\n    else:\n        if repo.writer_lock_held:\n            click.echo(f'Writer lock is held.')\n        else:\n            click.echo(f'Writer lock is available.')\n\n\n\n# -------------------------- Checkout Writer ----------------------------------\n\n\n@main.command()\n@click.argument('branchname', nargs=1, required=True)\n@pass_repo\ndef checkout(repo: Repository, branchname):\n    \"\"\"Checkout writer head branch at BRANCHNAME.\n\n    This method requires that no process currently holds the writer lock.\n    In addition, it requires that the contents of the staging area are\n    'CLEAN' (no changes have been staged).\n    \"\"\"\n    try:\n        co = repo.checkout(write=True, branch=branchname)\n        co.close()\n        click.echo(f'Writer checkout head set to branch: {branchname}')\n    except (ValueError, PermissionError) as e:\n        raise click.ClickException(e)\n\n\n@main.command()\n@click.option('--message', '-m', multiple=True,\n              help=('The commit message. If provided multiple times '\n                    'each argument gets converted into a new line.'))\n@pass_repo\ndef commit(repo: Repository, message):\n    \"\"\"Commits outstanding changes.\n\n    Commit changes to the given files into the repository. You will need to\n    'push' to push up your changes to other repositories.\n    \"\"\"\n    from hangar.records.summarize import status\n\n    co = repo.checkout(write=True)\n    try:\n        if not message:\n            diff = co.diff.staged()\n            status_txt = status(co._hashenv, co.branch_name, diff.diff)\n            status_txt.seek(0)\n            marker = '# Changes To Be committed: \\n'\n            hint = ['\\n', '\\n', marker, '# \\n']\n            for line in status_txt.readlines():\n                hint.append(f'# {line}')\n            # open default system editor\n            message = click.edit(''.join(hint))\n            if message is None:\n                click.echo('Aborted!')\n                return\n            msg = message.split(marker)[0].rstrip()\n            if not msg:\n                click.echo('Aborted! Empty commit message')\n                return\n        else:\n            msg = '\\n'.join(message)\n\n        click.echo('Commit message:\\n' + msg)\n        try:\n            digest = co.commit(msg)\n            click.echo(f'Commit Successful. Digest: {digest}')\n        except RuntimeError as e:\n            raise click.ClickException(e)\n    finally:\n        co.close()\n\n\n# -------------------------- Column Interactor ------------------------------\n\n\n@main.group(no_args_is_help=True, add_help_option=True)\n@click.pass_context\ndef column(ctx):  # pragma: no cover\n    \"\"\"Operations for working with columns in the writer checkout.\n    \"\"\"\n    pass\n\n\n@column.command(name='create')\n@click.option('--variable-shape', 'variable_', is_flag=True, default=False,\n              help='flag indicating sample dimensions can be any size up to max shape.')\n@click.option('--contains-subsamples', 'subsamples_', is_flag=True, default=False,\n              help=('flag indicating if this is a column which nests multiple '\n                    'subsamples under a common sample key.'))\n@click.argument('name', nargs=1, type=click.STRING, required=True)\n@click.argument('dtype', nargs=1, type=click.Choice([\n    'UINT8', 'INT8', 'UINT16', 'INT16', 'UINT32', 'INT32',\n    'UINT64', 'INT64', 'FLOAT16', 'FLOAT32', 'FLOAT64', 'STR']), required=True)\n@click.argument('shape', nargs=-1, type=click.INT, required=False)\n@pass_repo\ndef create_column(repo: Repository, name, dtype, shape, variable_, subsamples_):\n    \"\"\"Create an column with NAME and DTYPE of SHAPE.\n\n    The column will be created in the staging area / branch last used by a\n    writer-checkout. Valid NAMEs contain only ascii letters and [``'.'``,\n    ``'_'``, ``'-'``] (no whitespace). The DTYPE must be one of [``'UINT8'``,\n    ``'INT8'``, ``'UINT16'``, ``'INT16'``, ``'UINT32'``, ``'INT32'``,\n    ``'UINT64'``, ``'INT64'``, ``'FLOAT16'``, ``'FLOAT32'``, ``'FLOAT64'``,\n    ``'STR'``].\n\n    If a ndarray dtype is specified (not 'STR'), then the SHAPE must be the\n    last argument(s) specified, where each dimension size is identified by\n    a (space seperated) list of numbers.\n\n    Examples:\n\n    To specify, a column for some training images of dtype uint8 and shape\n    (256, 256, 3) we should say:\n\n       .. code-block:: console\n\n          $ hangar column create train_images UINT8 256 256 3\n\n    To specify that the samples can be variably shaped (have any dimension size\n    up to the maximum SHAPE specified) we would say:\n\n       .. code-block:: console\n\n          $ hangar column create train_images UINT8 256 256 3 --variable-shape\n\n    or equivalently:\n\n       .. code-block:: console\n\n          $ hangar column create --variable-shape train_images UINT8 256 256 3\n\n    To specify that the column contains a nested set of subsample data under a\n    common sample key, the ``--contains-subsamples`` flag can be used.\n\n       .. code-block:: console\n\n          $ hangar column create --contains-subsamples train_images UINT8 256 256 3\n\n    \"\"\"\n    try:\n        co = repo.checkout(write=True)\n        if dtype == 'STR':\n            col = co.add_str_column(name=name, contains_subsamples=subsamples_)\n        else:\n            col = co.add_ndarray_column(name=name,\n                                        shape=shape,\n                                        dtype=np.typeDict[dtype.lower()],\n                                        variable_shape=variable_,\n                                        contains_subsamples=subsamples_)\n        click.echo(f'Initialized Column: {col.column}')\n    except (ValueError, LookupError, PermissionError) as e:\n        raise click.ClickException(e)\n    finally:\n        try:\n            co.close()\n        except NameError:\n            pass\n\n\n@column.command(name='remove')\n@click.argument('name', nargs=1, type=click.STRING, required=True)\n@pass_repo\ndef remove_column(repo: Repository, name):\n    \"\"\"Delete the column NAME (and all samples) from staging area.\n\n    The column will be removed from the staging area / branch last used by a\n    writer-checkout.\n    \"\"\"\n    try:\n        co = repo.checkout(write=True)\n        removed = co.columns.delete(name)\n        click.echo(f'Successfully removed column: {removed}')\n    except (ValueError, KeyError, PermissionError) as e:\n        raise click.ClickException(e)\n    finally:\n        try:\n            co.close()\n        except NameError:\n            pass\n\n\n# ---------------------------- Remote Interaction -----------------------------\n\n\n@main.command()\n@click.argument('remote', nargs=1, required=True)\n@click.option('--name', prompt='User Name', help='first and last name of user')\n@click.option('--email', prompt='User Email', help='email address of the user')\n@click.option('--overwrite', is_flag=True, default=False,\n              help='overwrite a repository if it exists at the current path')\n@pass_repo\ndef clone(repo: Repository, remote, name, email, overwrite):\n    \"\"\"Initialize a repository at the current path and fetch updated records from REMOTE.\n\n    Note: This method does not actually download the data to disk. Please look\n    into the ``fetch-data`` command.\n    \"\"\"\n    if repo.initialized and (not overwrite):\n        click.echo(f'Repo already exists at: {repo.path}')\n    else:\n        repo.clone(name, email, remote, remove_old=overwrite)\n\n\n@main.command(name='fetch')\n@click.argument('remote', nargs=1, required=True)\n@click.argument('branch', nargs=1, required=True)\n@pass_repo\ndef fetch_records(repo: Repository, remote, branch):\n    \"\"\"Retrieve the commit history from REMOTE for BRANCH.\n\n    This method does not fetch the data associated with the commits. See\n    ``fetch-data`` to download the tensor data corresponding to a commit.\n    \"\"\"\n    bName = repo.remote.fetch(remote=remote, branch=branch)\n    click.echo(f'Fetched branch Name: {bName}')\n\n\n@main.command(name='fetch-data')\n@click.argument('remote', nargs=1, required=True)\n@click.argument('startpoint', nargs=1, required=True)\n@click.option('--column', '-d', multiple=True, required=False, default=None,\n              help='specify any number of column keys to fetch data for.')\n@click.option('--all-history', '-a', 'all_', is_flag=True, default=False, required=False,\n              help='Retrieve data referenced in every parent commit accessible to the STARTPOINT')\n@pass_repo\ndef fetch_data(repo: Repository, remote, startpoint, column, all_):\n    \"\"\"Get data from REMOTE referenced by STARTPOINT (short-commit or branch).\n\n    The default behavior is to only download a single commit's data or the HEAD\n    commit of a branch. Please review optional arguments for other behaviors.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n    from hangar.records.heads import get_branch_head_commit\n    from hangar.records.heads import get_staging_branch_head\n\n    if startpoint is None:\n        branch = get_staging_branch_head(repo._env.branchenv)\n        commit = get_branch_head_commit(repo._env.branchenv, branch)\n    elif startpoint in repo.list_branches():\n        commit = get_branch_head_commit(repo._env.branchenv, startpoint)\n    else:\n        commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n    click.echo(f'Fetching data for commit: {commit}')\n\n    if len(column) == 0:\n        column = None\n\n    commits = repo.remote.fetch_data(remote=remote,\n                                     commit=commit,\n                                     column_names=column,\n                                     retrieve_all_history=all_)\n    click.echo(f'completed data for commits: {commits}')\n\n\n@main.command()\n@click.argument('remote', nargs=1, required=True)\n@click.argument('branch', nargs=1, required=True)\n@pass_repo\ndef push(repo: Repository, remote, branch):\n    \"\"\"Upload local BRANCH commit history / data to REMOTE server.\n    \"\"\"\n    commit_hash = repo.remote.push(remote=remote, branch=branch)\n    click.echo(f'Push data for commit hash: {commit_hash}')\n\n\n# ----------------------- Remote Server References ----------------------------\n\n\n@main.group(no_args_is_help=True, add_help_option=True)\n@click.pass_context\ndef remote(ctx):  # pragma: no cover\n    \"\"\"Operations for working with remote server references\n    \"\"\"\n    pass\n\n\n@remote.command(name='list')\n@pass_repo\ndef list_remotes(repo: Repository):\n    \"\"\"List all remote repository records.\n    \"\"\"\n    click.echo(repo.remote.list_all())\n\n\n@remote.command(name='add')\n@click.argument('name', nargs=1, required=True)\n@click.argument('address', nargs=1, required=True)\n@pass_repo\ndef add_remote(repo: Repository, name, address):\n    \"\"\"Add a new remote server NAME with url ADDRESS to the local client.\n\n    This name must be unique. In order to update an old remote, please remove it\n    and re-add the remote NAME / ADDRESS combination.\n    \"\"\"\n    click.echo(repo.remote.add(name=name, address=address))\n\n\n@remote.command(name='remove')\n@click.argument('name', nargs=1, required=True)\n@pass_repo\ndef remove_remote(repo: Repository, name):\n    \"\"\"Remove the remote server NAME from the local client.\n\n    This will not remove any tracked remote reference branches.\n    \"\"\"\n    click.echo(repo.remote.remove(name=name))\n\n\n# ---------------------------- User Visualizations ----------------------------\n\n\n@main.command()\n@click.argument('dev', nargs=1, required=True)\n@click.argument('master', nargs=1, required=False, default=None)\n@pass_repo\ndef diff(repo: Repository, dev, master):\n    \"\"\"Display diff of DEV commit/branch to MASTER commit/branch.\n\n    If no MASTER is specified, then the staging area branch HEAD will\n    will be used as the commit digest for MASTER. This operation will\n    return a diff which could be interpreted as if you were merging\n    the changes in DEV into MASTER.\n\n    TODO: VERIFY ORDER OF OUTPUT IS CORRECT.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n    from hangar.records.commiting import get_staging_branch_head\n    from hangar.records.summarize import status\n\n    if dev not in repo.list_branches():\n        dev = expand_short_commit_digest(repo._env.refenv, dev)\n\n    if master is None:\n        master = get_staging_branch_head(repo._env.branchenv)\n    elif master not in repo.list_branches():\n        master = expand_short_commit_digest(repo._env.refenv, master)\n\n    diff_spec = repo.diff(master, dev)\n    buf = status(hashenv=repo._env.hashenv, branch_name=dev, diff=diff_spec.diff)\n    click.echo(buf.getvalue())\n\n@main.command()\n@click.argument('startpoint', nargs=1, required=False)\n@pass_repo\ndef summary(repo: Repository, startpoint):\n    \"\"\"Display content summary at STARTPOINT (short-digest or branch).\n\n    If no argument is passed in, the staging area branch HEAD wil be used as the\n    starting point. In order to recieve a machine readable, and more complete\n    version of this information, please see the ``Repository.summary()`` method\n    of the API.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n\n    if startpoint is None:\n        click.echo(repo.summary())\n    elif startpoint in repo.list_branches():\n        click.echo(repo.summary(branch=startpoint))\n    else:\n        base_commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n        click.echo(repo.summary(commit=base_commit))\n\n\n@main.command()\n@click.argument('startpoint', required=False, default=None)\n@pass_repo\ndef log(repo: Repository, startpoint):\n    \"\"\"Display commit graph starting at STARTPOINT (short-digest or name)\n\n    If no argument is passed in, the staging area branch HEAD will be used as the\n    starting point.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n\n    if startpoint is None:\n        click.echo(repo.log())\n    elif startpoint in repo.list_branches():\n        click.echo(repo.log(branch=startpoint))\n    else:\n        base_commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n        click.echo(repo.log(commit=base_commit))\n\n\n@main.command()\n@pass_repo\ndef status(repo: Repository):\n    \"\"\"Display changes made in the staging area compared to its base commit.\n    \"\"\"\n    from hangar.records.summarize import status\n    co = repo.checkout(write=True)\n    try:\n        diff = co.diff.staged()\n        click.echo(status(co._hashenv, co.branch_name, diff.diff).getvalue(), nl=False)\n    finally:\n        co.close()\n\n\n# ------------------------------- Branching -----------------------------------\n\n\n@main.group(no_args_is_help=True, add_help_option=True)\n@click.pass_context\ndef branch(ctx):  # pragma: no cover\n    \"\"\"Operate on and list branch pointers.\n    \"\"\"\n    pass\n\n\n@branch.command(name='list')\n@pass_repo\ndef branch_list(repo: Repository):\n    \"\"\"List all branch names.\n\n    Includes both remote branches as well as local branches.\n    \"\"\"\n    click.echo(repo.list_branches())\n\n\n@branch.command(name='create')\n@click.argument('name', nargs=1, required=True)\n@click.argument('startpoint', nargs=1, default=None, required=False)\n@pass_repo\ndef branch_create(repo: Repository, name, startpoint):\n    \"\"\"Create a branch with NAME at STARTPOINT (short-digest or branch)\n\n    If no STARTPOINT is provided, the new branch is positioned at the HEAD of\n    the staging area branch, automatically.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n    from hangar.records.heads import get_branch_head_commit\n    from hangar.records.heads import get_staging_branch_head\n\n    branch_names = repo.list_branches()\n    if name in branch_names:\n        e = ValueError(f'branch name: {name} already exists')\n        raise click.ClickException(e)\n\n    try:\n        if startpoint is None:\n            branch = get_staging_branch_head(repo._env.branchenv)\n            base_commit = get_branch_head_commit(repo._env.branchenv, branch)\n        elif startpoint in branch_names:\n            base_commit = get_branch_head_commit(repo._env.branchenv, startpoint)\n        else:\n            base_commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n\n        res = repo.create_branch(name, base_commit=base_commit)\n    except (KeyError, ValueError, RuntimeError) as e:\n        raise click.ClickException(e)\n\n    click.echo(f'Created BRANCH: {res.name} HEAD: {res.digest}')\n\n\n@branch.command(name='delete')\n@click.argument('name', nargs=1, required=True)\n@click.option('--force', '-f', is_flag=True, default=False,\n              help='flag to force delete branch which has un-merged history.')\n@pass_repo\ndef branch_remove(repo: Repository, name, force):\n    \"\"\"Remove a branch pointer with the provided NAME.\n\n    The NAME must be a branch present on the local machine.\n    \"\"\"\n    try:\n        res = repo.remove_branch(name, force_delete=force)\n    except (ValueError, PermissionError, RuntimeError) as e:\n        raise click.ClickException(e)\n\n    click.echo(f'Deleted BRANCH: {res.name} HEAD: {res.digest}')\n\n\n# ---------------------------- Server Commands --------------------------------\n\n\n@main.command()\n@click.option('--overwrite', is_flag=True, default=False,\n              help='overwrite the hangar server instance if it exists at the current path.')\n@click.option('--ip', default='localhost', show_default=True,\n              help='the ip to start the server on. default is `localhost`')\n@click.option('--port', default='50051', show_default=True,\n              help='port to start the server on. default in `50051`')\n@click.option('--timeout', default=60 * 60 * 24, required=False, show_default=True,\n              help='time (in seconds) before server is stopped automatically')\ndef server(overwrite, ip, port, timeout):\n    \"\"\"Start a hangar server, initializing one if does not exist.\n\n    The server is configured to top working in 24 Hours from the time it was\n    initially started. To modify this value, please see the ``--timeout``\n    parameter.\n\n    The hangar server directory layout, contents, and access conventions are\n    similar, though significantly different enough to the regular user \"client\"\n    implementation that it is not possible to fully access all information via\n    regular API methods. These changes occur as a result of the uniformity of\n    operations promised by both the RPC structure and negotiations between the\n    client/server upon connection.\n\n    More simply put, we know more, so we can optimize access more; similar, but\n    not identical.\n    \"\"\"\n    from hangar.remote.server import serve\n\n    P = os.getcwd()\n    ip_port = f'{ip}:{port}'\n    server, hangserver, channel_address = serve(P, overwrite, channel_address=ip_port)\n    server.start()\n    click.echo(f'Hangar Server Started')\n    click.echo(f'* Start Time: {time.asctime()}')\n    click.echo(f'* Base Directory Path: {P}')\n    click.echo(f'* Operating on `IP_ADDRESS:PORT`: {channel_address}')\n    try:\n        startTime = time.time()\n        while True:\n            time.sleep(0.1)\n            if time.time() - startTime > timeout:\n                raise SystemExit\n    except (KeyboardInterrupt, SystemExit):\n        click.echo(f'Server Stopped at Time: {time.asctime()}')\n        hangserver.close()\n        server.stop(0)\n\n\n# ---------------------------- Import Exporters -------------------------------\n\n\n@main.command(name='import',\n              context_settings=dict(allow_extra_args=True, ignore_unknown_options=True, ))\n@click.argument('column', required=True)\n@click.argument('path',\n                required=True,\n                type=click.Path(exists=True, dir_okay=True, file_okay=True, readable=True,\n                                resolve_path=True))\n@click.option('--branch', default=None, help='branch to import data')\n@click.option('--plugin', default=None, help='override auto-infered plugin')\n@click.option('--overwrite', is_flag=True,\n              help='overwrite data samples with the same name as the imported data file ')\n@pass_repo\n@click.pass_context\ndef import_data(ctx, repo: Repository, column, path, branch, plugin, overwrite):\n    \"\"\"Import file or directory of files at PATH to COLUMN in the staging area.\n\n    If passing in a directory, all files in the directory will be imported, if\n    passing in a file, just that files specified will be imported.\n    \"\"\"\n    # TODO: ignore warning through env variable\n    from types import GeneratorType\n    from hangar import external\n    from hangar.records.heads import get_staging_branch_head\n\n    kwargs = parse_custom_arguments(ctx.args)\n    if branch is None:\n        branch = get_staging_branch_head(repo._env.branchenv)\n    elif branch not in repo.list_branches():\n        raise click.ClickException(f'Branch name: {branch} does not exist, Exiting.')\n    click.echo(f'Writing to branch: {branch}')\n\n    co = repo.checkout(write=True, branch=branch)\n    try:\n        active_aset = co.columns.get(column)\n        p = Path(path)\n        files = [f.resolve() for f in p.iterdir()] if p.is_dir() else [p.resolve()]\n        with active_aset as aset, click.progressbar(files) as filesBar:\n            for f in filesBar:\n                ext = ''.join(f.suffixes).strip('.')  # multi-suffix files (tar.bz2)\n                loaded = external.load(f, plugin=plugin, extension=ext, **kwargs)\n                if not isinstance(loaded, GeneratorType):\n                    loaded = [loaded]\n                for arr, fname in loaded:\n                    if (not overwrite) and (fname in aset):\n                        continue\n                    try:\n                        aset[fname] = arr\n                    except ValueError as e:\n                        click.echo(e)\n    except (ValueError, KeyError) as e:\n        raise click.ClickException(e)\n    finally:\n        co.close()\n\n\n@main.command(name='export',\n              context_settings=dict(allow_extra_args=True, ignore_unknown_options=True, ))\n@click.argument('column', nargs=1, required=True)\n@click.argument('startpoint', nargs=1, default=None, required=False)\n@click.option('-o', '--out', 'outdir',\n              nargs=1,\n              required=False,\n              default=os.getcwd(),\n              type=click.Path(exists=True, dir_okay=True, file_okay=False, readable=True,\n                              resolve_path=True),\n              help=\"Directory to export data\")\n@click.option('-s', '--sample',\n              nargs=1,\n              default=None,\n              type=StrOrIntType(),\n              help=('Sample name to export. Default implementation is to interpret all input '\n                    'names as string type. As a column can contain samples with both ``str`` '\n                    'and ``int`` types, we allow you to specify ``name type`` of the sample. To '\n                    'identify a potentially ambiguous name, we allow you to prepend the type of '\n                    'sample name followed by a colon and then the sample name (ex. ``str:54`` '\n                    'or ``int:54``). This can be done for any sample key.'))\n@click.option('-f', '--format', 'format_',\n              nargs=1,\n              required=False,\n              help='File format of output file')\n@click.option('--plugin', required=False, help='override auto-inferred plugin')\n@pass_repo\n@click.pass_context\ndef export_data(ctx, repo: Repository, column, outdir, startpoint, sample, format_, plugin):\n    \"\"\"Export COLUMN sample data as it existed a STARTPOINT to some format and path.\n\n    Specifying which sample to be exported is possible by using the switch\n    ``--sample`` (without this, all the samples in the given column will be\n    exported). Since hangar supports both int and str datatype for the sample\n    name, specifying that while mentioning the sample name might be necessary\n    at times. It is possible to do that by separating the name and type by a\n    colon.\n\n    Example:\n\n       1. if the sample name is string of numeric 10 - ``str:10`` or ``10``\n\n       2. if the sample name is ``sample1`` - ``str:sample1`` or ``sample1``\n\n       3. if the sample name is an int, let say 10 - ``int:10``\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n    from hangar.records.heads import get_branch_head_commit, get_staging_branch_head\n    from hangar import external\n    kwargs = parse_custom_arguments(ctx.args)\n\n    if startpoint in repo.list_branches():\n        base_commit = get_branch_head_commit(repo._env.branchenv, startpoint)\n    elif startpoint:\n        base_commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n    else:\n        branch_name = get_staging_branch_head(repo._env.branchenv)\n        base_commit = get_branch_head_commit(repo._env.branchenv, branch_name)\n\n    co = repo.checkout(commit=base_commit)\n    try:\n        aset = co.columns.get(column)\n        sampleNames = [sample] if sample is not None else list(aset.keys())\n        extension = format_.lstrip('.') if format_ else None\n        with aset, click.progressbar(sampleNames) as sNamesBar:\n            for sampleN in sNamesBar:\n                data = aset[sampleN]\n                formated_sampleN = f'{type(sampleN).__name__}:{sampleN}'\n                try:\n                    external.save(data, outdir, formated_sampleN, extension, plugin, **kwargs)\n                except Exception as e:\n                    raise click.ClickException(e)\n    except KeyError as e:\n        raise click.ClickException(e)\n    finally:\n        co.close()\n\n\n@main.command(name='view',\n              context_settings=dict(allow_extra_args=True, ignore_unknown_options=True, ))\n@click.argument('column', nargs=1, type=str, required=True)\n@click.argument('sample', nargs=1, type=StrOrIntType(), required=True)\n@click.argument('startpoint', nargs=1, default=None, required=False)\n@click.option('-f', '--format', 'format_', required=False, help='File format of output file')\n@click.option('--plugin', default=None, help='Plugin name to use instead of auto-inferred plugin')\n@pass_repo\n@click.pass_context\ndef view_data(ctx, repo: Repository, column, sample, startpoint, format_, plugin):\n    \"\"\"Use a plugin to view the data of some SAMPLE in COLUMN at STARTPOINT.\n    \"\"\"\n    from hangar.records.commiting import expand_short_commit_digest\n    from hangar.records.heads import get_branch_head_commit, get_staging_branch_head\n    from hangar import external\n\n    kwargs = parse_custom_arguments(ctx.args)\n    if startpoint in repo.list_branches():\n        base_commit = get_branch_head_commit(repo._env.branchenv, startpoint)\n    elif startpoint:\n        base_commit = expand_short_commit_digest(repo._env.refenv, startpoint)\n    else:\n        branch_name = get_staging_branch_head(repo._env.branchenv)\n        base_commit = get_branch_head_commit(repo._env.branchenv, branch_name)\n\n    co = repo.checkout(commit=base_commit)\n    try:\n        aset = co.columns.get(column)\n        extension = format_.lstrip('.') if format_ else None\n        data = aset[sample]\n        try:\n            external.show(data, plugin=plugin, extension=extension, **kwargs)\n        except Exception as e:\n            raise click.ClickException(e)\n    except KeyError as e:\n        raise click.ClickException(e)\n    finally:\n        co.close()\n\n\n# ---------------------------- Developer Utils --------------------------------\n\n\n@main.command(name='db-view', hidden=True)\n@click.option('-a', is_flag=True, help='display all dbs in the repository')\n@click.option('-b', is_flag=True, help='display the branch/heads db')\n@click.option('-r', is_flag=True, help='display the references db')\n@click.option('-d', is_flag=True, help='display the data hash db')\n@click.option('-s', is_flag=True, help='display the stage record db')\n@click.option('-z', is_flag=True, help='display the staged hash record db')\n@click.option('--limit', default=30, help='limit the amount of records displayed before truncation')\n@pass_repo\ndef lmdb_record_details(repo: Repository, a, b, r, d, s, z, limit):\n    \"\"\"DEVELOPER TOOL ONLY\n\n    Display key/value pairs making up the dbs.\n    \"\"\"\n    from hangar.context import Environments\n    from hangar.records.summarize import details\n    from hangar import constants as c\n\n    if repo._repo_path.is_dir():\n        repo_path = repo._repo_path\n    elif repo._repo_path.parent.joinpath(c.DIR_HANGAR_SERVER).is_dir():\n        repo_path = repo._repo_path.parent.joinpath(c.DIR_HANGAR_SERVER)\n    else:\n        click.echo(f'NO HANGAR INSTALLATION AT PATH: {repo._repo_path.parent}')\n        return\n\n    envs = Environments(pth=repo_path)\n    try:\n        if a:\n            b, r, d, s, z = True, True, True, True, True\n        if b:\n            click.echo(details(envs.branchenv, line_limit=limit).getvalue())\n        if r:\n            click.echo(details(envs.refenv, line_limit=limit).getvalue())\n        if d:\n            click.echo(details(envs.hashenv, line_limit=limit).getvalue())\n        if s:\n            click.echo(details(envs.stageenv, line_limit=limit).getvalue())\n        if z:\n            click.echo(details(envs.stagehashenv, line_limit=limit).getvalue())\n    finally:\n        envs._close_environments()\n"
  },
  {
    "path": "src/hangar/cli/utils.py",
    "content": "import click\n\n\nclass StrOrIntType(click.ParamType):\n    \"\"\"Custom type for click to parse the sample name\n    argument to integer or string\n    \"\"\"\n\n    def convert(self, value, param, ctx):\n        if not value:\n            return None\n\n        try:\n            stype, sample = value.split(':') if ':' in value else ('str', value)\n        except ValueError:\n            self.fail(f\"Sample name {value} not formatted properly\", param, ctx)\n        try:\n            if stype not in ('str', 'int'):\n                self.fail(f\"type {stype} is not allowed\", param, ctx)\n            return int(sample) if stype == 'int' else str(sample)\n        except (ValueError, TypeError):\n            self.fail(f\"{sample} is not a valid {stype}\", param, ctx)\n\n\ndef parse_custom_arguments(click_args: list) -> dict:\n    \"\"\"\n    Parse all the unknown arguments from click for downstream tasks. Used in\n    user plugins for custom command line arguments.\n\n    Parameters\n    ----------\n    click_args : list\n        Unknown arguments from click\n\n    Returns\n    -------\n    parsed : dict\n        Parsed arguments stored as key value pair\n\n    Note\n    -----\n    Unknown arguments must be long arguments i.e should start with --\n    \"\"\"\n    parsed = {}\n    for i in range(0, len(click_args), 2):\n        key = click_args[i]\n        val = click_args[i + 1]\n        if not key.startswith('--'):\n            raise RuntimeError(f\"Could not parse argument {key}. It should be prefixed with `--`\")\n        parsed[key[2:]] = val\n    return parsed\n"
  },
  {
    "path": "src/hangar/columns/__init__.py",
    "content": "from .column import Columns, ModifierTypes\nfrom .common import ColumnTxn\nfrom .constructors import (\n    generate_flat_column,\n    generate_nested_column,\n    column_type_object_from_schema\n)\nfrom .introspection import is_column, is_writer_column\n\n__all__ = (\n    'Columns',\n    'ModifierTypes',\n    'generate_flat_column',\n    'generate_nested_column',\n    'column_type_object_from_schema',\n    'ColumnTxn',\n    'is_column',\n    'is_writer_column'\n)\n"
  },
  {
    "path": "src/hangar/columns/column.py",
    "content": "\"\"\"Constructor and Interaction Class for Columns\n\"\"\"\nfrom contextlib import ExitStack\nfrom pathlib import Path\nfrom typing import Iterable, List, Mapping, Optional, Tuple, Union, Dict, TYPE_CHECKING\n\nimport lmdb\n\nfrom .common import ColumnTxn\nfrom .constructors import (\n    generate_flat_column, generate_nested_column, column_type_object_from_schema\n)\nfrom ..records import (\n    schema_db_key_from_column,\n    schema_hash_db_key_from_digest,\n    schema_column_record_from_db_key,\n    schema_spec_from_db_val,\n    dynamic_layout_data_record_db_start_range_key,\n)\nfrom ..records.queries import RecordQuery\nfrom ..op_state import writer_checkout_only\nfrom ..txnctx import TxnRegister\n\nif TYPE_CHECKING:\n    from .layout_flat import FlatSampleWriter\n    from .layout_nested import NestedSampleWriter, FlatSubsampleWriter\n\n\nModifierTypes = Union['NestedSampleWriter', 'FlatSubsampleWriter', 'FlatSampleWriter']\nKeyType = Union[str, int]\n\n\nclass Columns:\n    \"\"\"Common access patterns and initialization/removal of columns in a checkout.\n\n    This object is the entry point to all data stored in their\n    individual columns. Each column contains a common schema which dictates\n    the general shape, dtype, and access patters which the backends optimize\n    access for. The methods contained within allow us to create, remove, query,\n    and access these collections of common data pieces.\n    \"\"\"\n\n    def __init__(self,\n                 mode: str,\n                 repo_pth: Path,\n                 columns: Dict[str, ModifierTypes],\n                 hashenv: Optional[lmdb.Environment] = None,\n                 dataenv: Optional[lmdb.Environment] = None,\n                 stagehashenv: Optional[lmdb.Environment] = None,\n                 txnctx: Optional[ColumnTxn] = None):\n        \"\"\"Developer documentation for init method.\n\n        .. warning::\n\n            This class should not be instantiated directly. Instead use the factory\n            functions :py:meth:`_from_commit` or :py:meth:`_from_staging` to return\n            a pre-initialized class instance appropriately constructed for either a\n            read-only or write-enabled checkout.\n\n        Parameters\n        ----------\n        mode : str\n            one of 'r' or 'a' to indicate read or write mode\n        repo_pth : Path\n            path to the repository on disk\n        columns : Mapping[str, Union[ArraysetDataReader, ArraysetDataWriter]]\n            dictionary of ArraysetData objects\n        hashenv : Optional[lmdb.Environment]\n            environment handle for hash records\n        dataenv : Optional[lmdb.Environment]\n            environment handle for the unpacked records. `data` is means to refer to\n            the fact that the stageenv is passed in for for write-enabled, and a\n            cmtrefenv for read-only checkouts.\n        stagehashenv : Optional[lmdb.Environment]\n            environment handle for newly added staged data hash records.\n        txnctx: Optional[ColumnTxn]\n            class implementing context managers to handle lmdb transactions\n        \"\"\"\n        self._stack: Optional[ExitStack] = None\n        self._is_conman_counter = 0\n        self._mode = mode\n        self._repo_pth = repo_pth\n        self._columns = columns\n\n        self._hashenv = hashenv\n        self._dataenv = dataenv\n        self._stagehashenv = stagehashenv\n        self._txnctx = txnctx\n\n    def _open(self):\n        for v in self._columns.values():\n            v._open()\n\n    def _close(self):\n        for v in self._columns.values():\n            v._close()\n\n    def _destruct(self):\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n        self._close()\n        for column in self._columns.values():\n            column._destruct()\n        for attr in list(self.__dict__.keys()):\n            delattr(self, attr)\n\n    def __getattr__(self, name):\n        \"\"\"Raise permission error after checkout is closed.\n\n         Only runs after a call to :meth:`_destruct`, which is responsible\n         for deleting all attributes from the object instance.\n        \"\"\"\n        try:\n            self.__getattribute__('_mode')  # once checkout is closed, this won't exist.\n        except AttributeError:\n            err = (f'Unable to operate on past checkout objects which have been '\n                   f'closed. No operation occurred. Please use a new checkout.')\n            raise PermissionError(err) from None\n        return self.__getattribute__(name)\n\n# ------------- Methods Available To Both Read & Write Checkouts ------------------\n\n    def _repr_pretty_(self, p, cycle):\n        res = f'Hangar {self.__class__.__qualname__}\\\n                \\n    Writeable         : {False if self._mode == \"r\" else True}\\\n                \\n    Number of Columns : {len(self)}\\\n                \\n    Column Names / Partial Remote References:\\\n                \\n      - ' + '\\n      - '.join(\n            f'{asetn} / {aset.contains_remote_references}'\n            for asetn, aset in self._columns.items())\n        p.text(res)\n\n    def __repr__(self):\n        res = f'{self.__class__}('\\\n              f'repo_pth={self._repo_pth}, '\\\n              f'columns={self._columns}, '\\\n              f'mode={self._mode})'\n        return res\n\n    def _ipython_key_completions_(self):\n        \"\"\"Let ipython know that any key based access can use the column keys\n\n        Since we don't want to inherit from dict, nor mess with `__dir__` for\n        the sanity of developers, this is the best way to ensure users can\n        autocomplete keys.\n\n        Returns\n        -------\n        list\n            list of strings, each being one of the column keys for access.\n        \"\"\"\n        return self.keys()\n\n    def __getitem__(self, key: str) -> ModifierTypes:\n        \"\"\"Dict style access to return the column object with specified key/name.\n\n        Parameters\n        ----------\n        key : string\n            name of the column object to get.\n\n        Returns\n        -------\n        ModifierTypes\n            The object which is returned depends on the mode of checkout\n            specified. If the column was checked out with write-enabled,\n            return writer object, otherwise return read only object.\n        \"\"\"\n        try:\n            return self._columns[key]\n        except KeyError:\n            raise KeyError(f'No column exists with name: {key}')\n\n    def __contains__(self, key: str) -> bool:\n        \"\"\"Determine if a column with a particular name is stored in the checkout\n\n        Parameters\n        ----------\n        key : str\n            name of the column to check for\n\n        Returns\n        -------\n        bool\n            True if a column with the provided name exists in the checkout,\n            otherwise False.\n        \"\"\"\n        return True if key in self._columns else False\n\n    def __len__(self) -> int:\n        \"\"\"Get the number of column columns contained in the checkout.\n        \"\"\"\n        return len(self._columns)\n\n    def __iter__(self) -> Iterable[str]:\n        return iter(self._columns)\n\n    @property\n    def _is_conman(self):\n        return bool(self._is_conman_counter)\n\n    def _any_is_conman(self) -> bool:\n        \"\"\"Determine if self or any contains column class is conman.\n\n        Returns\n        -------\n        bool\n            [description]\n        \"\"\"\n        res = any([self._is_conman, *[x._is_conman for x in self._columns.values()]])\n        return res\n\n    def __enter__(self):\n        with ExitStack() as stack:\n            for asetN in list(self._columns.keys()):\n                stack.enter_context(self._columns[asetN])\n            self._is_conman_counter += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._is_conman_counter -= 1\n        self._stack.close()\n\n    @property\n    def iswriteable(self) -> bool:\n        \"\"\"Bool indicating if this column object is write-enabled. Read-only attribute.\n        \"\"\"\n        return False if self._mode == 'r' else True\n\n    @property\n    def contains_remote_references(self) -> Mapping[str, bool]:\n        \"\"\"Dict of bool indicating data reference locality in each column.\n\n        Returns\n        -------\n        Mapping[str, bool]\n            For each column name key, boolean value where False indicates all\n            samples in column exist locally, True if some reference remote\n            sources.\n        \"\"\"\n        res = {}\n        for asetn, aset in self._columns.items():\n            res[asetn] = aset.contains_remote_references\n        return res\n\n    @property\n    def remote_sample_keys(self) -> Mapping[str, Iterable[Union[int, str]]]:\n        \"\"\"Determine columns samples names which reference remote sources.\n\n        Returns\n        -------\n        Mapping[str, Iterable[Union[int, str]]]\n            dict where keys are column names and values are iterables of\n            samples in the column containing remote references\n        \"\"\"\n        res = {}\n        for asetn, aset in self._columns.items():\n            res[asetn] = aset.remote_reference_keys\n        return res\n\n    def keys(self) -> List[str]:\n        \"\"\"list all column keys (names) in the checkout\n\n        Returns\n        -------\n        List[str]\n            list of column names\n        \"\"\"\n        return list(self._columns.keys())\n\n    def values(self) -> Iterable[ModifierTypes]:\n        \"\"\"Yield all column object instances in the checkout.\n\n        Yields\n        -------\n        Iterable[ModifierTypes]\n            Generator of ColumnData accessor objects (set to read or write mode\n            as appropriate)\n        \"\"\"\n        for asetN in list(self._columns.keys()):\n            asetObj = self._columns[asetN]\n            yield asetObj\n\n    def items(self) -> Iterable[Tuple[str, ModifierTypes]]:\n        \"\"\"Generator providing access to column_name, :class:`Columns`\n\n        Yields\n        ------\n        Iterable[Tuple[str, ModifierTypes]]\n            returns two tuple of all all column names/object pairs in the checkout.\n        \"\"\"\n        for asetN in list(self._columns.keys()):\n            asetObj = self._columns[asetN]\n            yield (asetN, asetObj)\n\n    def get(self, name: str) -> ModifierTypes:\n        \"\"\"Returns a column access object.\n\n        This can be used in lieu of the dictionary style access.\n\n        Parameters\n        ----------\n        name : str\n            name of the column to return\n\n        Returns\n        -------\n        ModifierTypes\n            ColumnData accessor (set to read or write mode as appropriate) which\n            governs interaction with the data\n        \"\"\"\n        return self[name]\n\n    # -------------------- Writer-Enabled Methods Only ------------------------\n\n    @writer_checkout_only\n    def __delitem__(self, key: str) -> str:\n        \"\"\"Remove a column and all data records if write-enabled process.\n\n        Parameters\n        ----------\n        key : str\n            Name of the column to remove from the repository. This will remove\n            all records from the staging area (though the actual data and all\n            records are still accessible) if they were previously committed.\n\n        Returns\n        -------\n        str\n            If successful, the name of the removed column.\n\n        Raises\n        ------\n        PermissionError\n            If any enclosed column is opened in a connection manager.\n        \"\"\"\n        if self._any_is_conman():\n            raise PermissionError(\n                'Not allowed while any columns class is opened in a context manager')\n        return self.delete(key)\n\n    @writer_checkout_only\n    def delete(self, column: str) -> str:\n        \"\"\"Remove the column and all data contained within it.\n\n        Parameters\n        ----------\n        column : str\n            name of the column to remove\n\n        Returns\n        -------\n        str\n            name of the removed column\n\n        Raises\n        ------\n        PermissionError\n            If any enclosed column is opened in a connection manager.\n        KeyError\n            If a column does not exist with the provided name\n        \"\"\"\n        if self._any_is_conman():\n            raise PermissionError(\n                'Not allowed while any columns class is opened in a context manager')\n\n        with ExitStack() as stack:\n            datatxn = TxnRegister().begin_writer_txn(self._dataenv)\n            stack.callback(TxnRegister().commit_writer_txn, self._dataenv)\n\n            if column not in self._columns:\n                e = KeyError(f'Cannot remove: {column}. Key does not exist.')\n                raise e from None\n\n            column_layout = self._columns[column].column_layout\n            columnSchemaKey = schema_db_key_from_column(column, layout=column_layout)\n            column_record = schema_column_record_from_db_key(columnSchemaKey)\n            startRangeKey = dynamic_layout_data_record_db_start_range_key(column_record)\n\n            self._columns[column]._close()\n            self._columns.__delitem__(column)\n            with datatxn.cursor() as cursor:\n                cursor.first()\n                recordsExist = cursor.set_range(startRangeKey)\n                while recordsExist:\n                    k = cursor.key()\n                    if k.startswith(startRangeKey):\n                        recordsExist = cursor.delete()\n                    else:\n                        recordsExist = False\n            datatxn.delete(columnSchemaKey)\n\n        return column\n\n    @classmethod\n    def _from_staging_area(cls, repo_pth, hashenv, stageenv, stagehashenv):\n        \"\"\"INTERNAL USE ONLY\n\n        Class method factory to checkout :class:`Columns` in write mode\n\n        Once you get here, we assume the write lock verification has\n        passed, and that write operations are safe to perform.\n\n        Parameters\n        ----------\n        repo_pth : Path\n            directory path to the hangar repository on disk\n        hashenv : lmdb.Environment\n            environment where tensor data hash records are open in write mode.\n        stageenv : lmdb.Environment\n            environment where staging records (dataenv) are opened in write mode.\n        stagehashenv : lmdb.Environment\n            environment where the staged hash records are stored in write mode\n\n        Returns\n        -------\n        :class:`~column.Columns`\n            Interface class with write-enabled attributes activate which contains\n            live column data accessors in `write` mode.\n        \"\"\"\n        columns = {}\n        txnctx = ColumnTxn(stageenv, hashenv, stagehashenv)\n        query = RecordQuery(stageenv)\n        stagedSchemaSpecs = query.schema_specs()\n\n        staged_col_schemas = {}\n        with txnctx.read() as r_txn:\n            # need to do some conversions here...\n            # ref record digest -> hash db key -> schema spec dict -> schema obj\n            for column_record, schema_digest_rec in stagedSchemaSpecs.items():\n                hashSchemaKey = schema_hash_db_key_from_digest(schema_digest_rec.digest)\n                hashSchemaVal = r_txn.hashTxn.get(hashSchemaKey)\n                schema_dict = schema_spec_from_db_val(hashSchemaVal)\n                schema = column_type_object_from_schema(schema_dict)\n                staged_col_schemas[column_record] = schema\n\n        for column_record, schema in staged_col_schemas.items():\n            if column_record.layout == 'nested':\n                column = generate_nested_column(\n                    txnctx=txnctx, column_name=column_record.column,\n                    path=repo_pth, schema=schema, mode='a')\n            else:\n                column = generate_flat_column(\n                    txnctx=txnctx, column_name=column_record.column,\n                    path=repo_pth, schema=schema, mode='a')\n            columns[column_record.column] = column\n\n        return cls(mode='a',\n                   repo_pth=repo_pth,\n                   columns=columns,\n                   hashenv=hashenv,\n                   dataenv=stageenv,\n                   stagehashenv=stagehashenv,\n                   txnctx=txnctx)\n\n    @classmethod\n    def _from_commit(cls, repo_pth, hashenv, cmtrefenv):\n        \"\"\"INTERNAL USE ONLY\n\n        Class method factory to checkout :class:`.Columns` in read-only mode\n\n        For read mode, no locks need to be verified, but construction should\n        occur through this interface only.\n\n        Parameters\n        ----------\n        repo_pth : Path\n            directory path to the hangar repository on disk\n        hashenv : lmdb.Environment\n            environment where tensor data hash records are open in read-only mode.\n        cmtrefenv : lmdb.Environment\n            environment where staging checkout records are opened in read-only mode.\n\n        Returns\n        -------\n        :class:`~column.Columns`\n            Interface class with write-enabled attributes deactivated which\n            contains live column data accessors in `read-only` mode.\n        \"\"\"\n        columns = {}\n        txnctx = ColumnTxn(cmtrefenv, hashenv, None)\n        query = RecordQuery(cmtrefenv)\n        cmtSchemaSpecs = query.schema_specs()\n\n        cmt_col_schemas = {}\n        with txnctx.read() as r_txn:\n            # need to do some conversions here...\n            # ref record digest -> hash db key -> schema spec dict -> schema obj\n            for column_record, schema_digest_rec in cmtSchemaSpecs.items():\n                hashSchemaKey = schema_hash_db_key_from_digest(schema_digest_rec.digest)\n                hashSchemaVal = r_txn.hashTxn.get(hashSchemaKey)\n                schema_dict = schema_spec_from_db_val(hashSchemaVal)\n                schema = column_type_object_from_schema(schema_dict)\n                cmt_col_schemas[column_record] = schema\n\n        for column_record, schema in cmt_col_schemas.items():\n            if column_record.layout == 'nested':\n                column = generate_nested_column(\n                    txnctx=txnctx, column_name=column_record.column,\n                    path=repo_pth, schema=schema, mode='r')\n            else:\n                column = generate_flat_column(\n                    txnctx=txnctx, column_name=column_record.column,\n                    path=repo_pth, schema=schema, mode='r')\n            columns[column_record.column] = column\n\n        return cls(mode='r',\n                   repo_pth=repo_pth,\n                   columns=columns,\n                   hashenv=None,\n                   dataenv=None,\n                   stagehashenv=None,\n                   txnctx=None)\n"
  },
  {
    "path": "src/hangar/columns/common.py",
    "content": "from contextlib import contextmanager\nfrom typing import Optional\n\nimport lmdb\n\nfrom ..txnctx import TxnRegister\n\n\nclass ColumnTxn(object):\n    \"\"\"Provides context manager ready methods to handle lmdb transactions.\n\n    In order to prevent passing around lmdb.Environment objects, we instantiate\n    this class once for each column column and pass weakref proxy handels\n    around to reference this object. Calling open / close methods (or using the\n    ``with`` style methods) initializes transactions for the appropraite\n    environments which are stored in instance attributes for access by the\n    caller.\n    \"\"\"\n\n    __slots__ = ('stagehashenv', 'dataenv', 'hashenv', 'hashTxn',\n                 'dataTxn', 'stageHashTxn', '_TxnRegister', '__weakref__')\n\n    def __init__(self, dataenv, hashenv, stagehashenv):\n\n        self._TxnRegister = TxnRegister()\n        self.stagehashenv = stagehashenv\n        self.dataenv = dataenv\n        self.hashenv = hashenv\n\n        self.hashTxn: Optional[lmdb.Transaction] = None\n        self.dataTxn: Optional[lmdb.Transaction] = None\n        self.stageHashTxn: Optional[lmdb.Transaction] = None\n\n    @property\n    def _debug_(self):  # pragma: no cover\n        return {\n            f'__class__': self.__class__,\n            f'_TxnRegister': self._TxnRegister._debug_,\n            f'dataenv': self.dataenv,\n            f'hashenv': self.hashenv,\n            f'hashTxn': self.hashTxn,\n            f'dataTxn': self.dataTxn,\n            f'stageHashTxn': self.stageHashTxn,\n        }\n\n    def open_read(self):\n        \"\"\"Manually open read-only transactions, caller responsible for closing.\n        \"\"\"\n        self.hashTxn = self._TxnRegister.begin_reader_txn(self.hashenv)\n        self.dataTxn = self._TxnRegister.begin_reader_txn(self.dataenv)\n        return self\n\n    def close_read(self):\n        \"\"\"Manually close read-only transactions, must be called after manual open.\n        \"\"\"\n        self.hashTxn = self._TxnRegister.abort_reader_txn(self.hashenv)\n        self.dataTxn = self._TxnRegister.abort_reader_txn(self.dataenv)\n\n    def open_write(self):\n        \"\"\"Manually open write-enabled transactions, caller responsible for closing.\n        \"\"\"\n        self.hashTxn = self._TxnRegister.begin_writer_txn(self.hashenv)\n        self.dataTxn = self._TxnRegister.begin_writer_txn(self.dataenv)\n        self.stageHashTxn = self._TxnRegister.begin_writer_txn(self.stagehashenv)\n        return self\n\n    def close_write(self):\n        \"\"\"Manually close write-enabled transactions, must be called after manual open.\n        \"\"\"\n        self.hashTxn = self._TxnRegister.commit_writer_txn(self.hashenv)\n        self.dataTxn = self._TxnRegister.commit_writer_txn(self.dataenv)\n        self.stageHashTxn = self._TxnRegister.commit_writer_txn(self.stagehashenv)\n\n    @contextmanager\n    def read(self):\n        \"\"\"Use ``with`` style context manager to open read-only transaction.\n\n        Transaction is automatically closed for the caller irregardless of any\n        application exceptions.\n        \"\"\"\n        try:\n            yield self.open_read()\n        finally:\n            self.close_read()\n\n    @contextmanager\n    def write(self):\n        \"\"\"Use ``with`` style context manager to open write-enabled transaction.\n\n        Transaction is automatically closed for the caller irregardless of any\n        application exceptions.\n        \"\"\"\n        try:\n            yield self.open_write()\n        finally:\n            self.close_write()\n\n\ndef open_file_handles(backends, path, mode, schema, *, remote_operation=False):\n    \"\"\"Open backend accessor file handles for reading\n\n    Parameters\n    ----------\n    backends : Set[str]\n        if ``mode == 'r'`` then this should be the used backend format\n        codes in the column. if ``mode == 'a'``, then this should be a\n        list of the allowed backend format codes this schema can feasably\n        write to.\n    path : Path\n        path to the hangar repository on disk\n    mode : str\n        one of ['r', 'a'] indicating read or write mode to open backends in.\n    schema : ColumnDefinitionTypes\n        schema spec so required values can be filled in to backend openers.\n\n    Returns\n    -------\n    AccessorMapType\n        dict mapping backend format codes to initialized instances of each\n        read-only backend.\n    \"\"\"\n    from ..backends import BACKEND_ACCESSOR_MAP\n\n    fhandles = {}\n    for be, accessor in BACKEND_ACCESSOR_MAP.items():\n        if be in backends:\n            if accessor is None:\n                continue\n\n            init_requires = schema._beopts.init_requires\n            # TODO rework names for this hack\n            kwargs = {}\n            for arg in init_requires:\n                if arg == 'repo_path':\n                    kwargs[arg] = path\n                elif arg == 'schema_shape':\n                    kwargs[arg] = schema.shape\n                elif arg == 'schema_dtype':\n                    kwargs[arg] = schema.dtype\n\n            fhandles[be] = accessor(**kwargs)\n            fhandles[be].open(mode=mode, remote_operation=remote_operation)\n\n    if mode == 'a':\n        if schema.backend in fhandles:\n            fhandles[schema.backend].backend_opts = schema.backend_options\n    return fhandles\n"
  },
  {
    "path": "src/hangar/columns/constructors.py",
    "content": "\"\"\"Constructors for initializing FlatSampleReader and NestedSampleReader columns\n\"\"\"\nimport warnings\nfrom _weakref import proxy\nfrom collections import defaultdict\nfrom typing import Union\n\nfrom wrapt import ObjectProxy\n\nfrom .common import open_file_handles\nfrom .layout_flat import FlatSampleReader, FlatSampleWriter\nfrom .layout_nested import (\n    FlatSubsampleReader, FlatSubsampleWriter,\n    NestedSampleReader, NestedSampleWriter,\n)\nfrom ..records.queries import RecordQuery\nfrom ..records import hash_data_db_key_from_raw_key\nfrom ..typesystem import (\n    NdarrayFixedShape,\n    NdarrayVariableShape,\n    StringVariableShape,\n    BytesVariableShape\n)\nfrom ..backends import BACKEND_IS_LOCAL_MAP, backend_decoder\n\n\n# --------------- methods common to all column layout types -------------------\n\n\nKeyType = Union[str, int]\n\n_column_definitions = (\n    NdarrayVariableShape,\n    NdarrayFixedShape,\n    StringVariableShape,\n    BytesVariableShape\n)\n\n\ndef column_type_object_from_schema(schema: dict):\n    for c in _column_definitions:\n        try:\n            instance = c(**schema)\n            return instance\n        except (TypeError, ValueError) as e:\n            pass\n    else:  # N.B. for-else loop (ie. \"no-break\")\n        raise ValueError(f'Could not instantiate column schema object for {schema}')\n\n\ndef _warn_remote(aset_name):\n    warnings.warn(\n        f'Column: {aset_name} contains `reference-only` samples, with '\n        f'actual data residing on a remote server. A `fetch-data` '\n        f'operation is required to access these samples.', UserWarning)\n\n\n# --------- FlatSampleReader constructor metaclass / setup methods ------------------\n\n\ndef _flat_load_sample_keys_and_specs(column_name, txnctx):\n    \"\"\"Load flat sample key / backend location mapping info memory.\n\n    Parameters\n    ----------\n    column_name: str\n        name of the column to load.\n    txnctx: ColumnTxn\n        transaction context object used to access commit ref info on disk\n\n    Returns\n    -------\n    Tuple[FlatSampleMapType, Set[str]]\n        First element is single level dictionary mapping sample key to backend\n        location. Second element is set of all unique backends encountered\n        for every data pice in the column.\n    \"\"\"\n    seen_bes = set()\n    sspecs = {}\n    with txnctx.read() as ctx:\n        hashTxn = ctx.hashTxn\n        asetNamesSpec = RecordQuery(ctx.dataenv).column_data_records(column_name)\n        for asetNames, dataSpec in asetNamesSpec:\n            hashKey = hash_data_db_key_from_raw_key(dataSpec.digest)\n            hash_ref = hashTxn.get(hashKey)\n            be_loc = backend_decoder(hash_ref)\n            sspecs[asetNames.sample] = be_loc\n    seen_bes.update((spc.backend for spc in sspecs.values()))\n    return (sspecs, seen_bes)\n\n\ndef generate_flat_column(txnctx, column_name, path, schema, mode):\n    \"\"\"Generate instance ready structures for read-only checkouts\n\n    Parameters\n    ----------\n    txnctx : ColumnTxn\n        transaction context object used to access commit ref info on disk\n    column_name : str\n        name of the column that the reader constructors are being\n        generated for\n    path : Path\n        path to the repository on disk\n    schema : ColumnDefinitionTypes\n        schema definition of the column.\n    mode: str\n        read-only or write-enabled mode. one of ['a', 'r'].\n\n    Returns\n    -------\n    :class:`~.flat.FlatSampleReader`\n        Top level column accessor classes fully initialized for requested\n        state. initailized structures defining and initializing access to\n        the sample data on disk.\n    \"\"\"\n    sspecs, bes = _flat_load_sample_keys_and_specs(column_name, txnctx)\n    if not all([BACKEND_IS_LOCAL_MAP[be] for be in bes]):\n        _warn_remote(column_name)\n    if mode == 'a':\n        bes.add(schema.backend)\n    file_handles = open_file_handles(backends=bes, path=path, mode=mode, schema=schema)\n\n    if mode == 'r':\n        res = FlatSampleReader(columnname=column_name,\n                               samples=sspecs,\n                               backend_handles=file_handles,\n                               schema=schema,\n                               repo_path=path,\n                               mode=mode)\n    elif mode == 'a':\n        res = FlatSampleWriter(aset_ctx=txnctx,\n                               columnname=column_name,\n                               samples=sspecs,\n                               backend_handles=file_handles,\n                               schema=schema,\n                               repo_path=path,\n                               mode=mode)\n    else:\n        raise ValueError(f'mode {mode} is not valid.')\n\n    return res\n\n\n# --------- NestedSampleReader constructor metaclass / setup methods ----------------\n\n\ndef _nested_load_sample_keys_and_specs(column_name, txnctx):\n    \"\"\"Load nested sample/subsample keys and backend location into memory from disk.\n\n    Parameters\n    ----------\n    column_name : str\n        name of the column to load.\n    txnctx : ColumnTxn\n        transaction context object used to access commit ref info on disk\n\n    Returns\n    -------\n    Tuple[NestedSampleMapType, Set[str]]\n        First element is nested dictionary where each sample name maps to\n        subsample contents dict (associating subsample names with backend\n        locations). Second element is set of all unique backends encountered\n        for every data pice in the column.\n    \"\"\"\n    seen_bes = set()\n    sspecs = defaultdict(dict)\n    with txnctx.read() as ctx:\n        hashTxn = ctx.hashTxn\n        asetNamesSpec = RecordQuery(ctx.dataenv).column_data_records(column_name)\n        for asetNames, dataSpec in asetNamesSpec:\n            hashKey = hash_data_db_key_from_raw_key(dataSpec.digest)\n            hash_ref = hashTxn.get(hashKey)\n            be_loc = backend_decoder(hash_ref)\n            sspecs[asetNames.sample].update({asetNames.subsample: be_loc})\n            seen_bes.add(be_loc.backend)\n    return (sspecs, seen_bes)\n\n\ndef generate_nested_column(txnctx, column_name, path, schema, mode):\n    \"\"\"Generate instance ready structures for read-only checkouts\n\n    Parameters\n    ----------\n    txnctx : ColumnTxn\n        transaction context object used to access commit ref info on disk\n    column_name : str\n        name of the column that the reader constructors are being\n        generated for\n    path : Path\n        path to the repository on disk\n    schema : ColumnDefinitionTypes\n        schema definition of the column.\n    mode: str\n    read-only or write-enabled mode. one of ['a', 'r'].\n\n    Returns\n    -------\n    :class:`~.nested.NestedSampleReader`\n        Top level column accessor classes fully initialized for requested\n        state. Initailized structures defining and initializing access to\n        the subsample data on disk.\n    \"\"\"\n    specs, bes = _nested_load_sample_keys_and_specs(column_name, txnctx)\n    if not all([BACKEND_IS_LOCAL_MAP[be] for be in bes]):\n        _warn_remote(column_name)\n    if mode == 'a':\n        bes.add(schema.backend)\n    fhand = open_file_handles(backends=bes, path=path, mode=mode, schema=schema)\n    samples = {}\n    schema_proxy = proxy(schema)\n    fhand['enter_count'] = 0\n\n    if mode == 'r':\n        for samp, subspecs in specs.items():\n            samples[samp] = FlatSubsampleReader(\n                columnname=column_name,\n                samplen=samp,\n                be_handles=fhand,\n                specs=subspecs,\n                mode='r')\n        res = NestedSampleReader(\n            columnname=column_name,\n            samples=samples,\n            backend_handles=fhand,\n            repo_path=path,\n            mode='r',\n            schema=schema)\n    elif mode == 'a':\n        fhand = ObjectProxy(fhand)\n        fhand_proxy = proxy(fhand)\n        for samp, subspecs in specs.items():\n            samples[samp] = FlatSubsampleWriter(\n                schema=schema_proxy,\n                aset_ctx=proxy(txnctx),\n                repo_path=path,\n                columnname=column_name,\n                samplen=samp,\n                be_handles=fhand_proxy,\n                specs=subspecs,\n                mode='a')\n        res = NestedSampleWriter(\n            aset_ctx=txnctx,\n            columnname=column_name,\n            samples=samples,\n            backend_handles=fhand,\n            schema=schema,\n            repo_path=path,\n            mode='a')\n    else:\n        raise ValueError(f'mode {mode} is not valid.')\n\n    return res\n"
  },
  {
    "path": "src/hangar/columns/introspection.py",
    "content": "from .layout_flat import FlatSampleReader, FlatSampleWriter\nfrom .layout_nested import (\n    FlatSubsampleReader,\n    FlatSubsampleWriter,\n    NestedSampleReader,\n    NestedSampleWriter\n)\n\n\ndef is_column(obj) -> bool:\n    \"\"\"Determine if arbitrary input is an instance of a column layout.\n\n    Returns\n    -------\n    bool: True if input is an column, otherwise False.\n    \"\"\"\n    return isinstance(obj, (FlatSampleReader, FlatSubsampleReader, NestedSampleReader))\n\n\ndef is_writer_column(obj) -> bool:\n    \"\"\"Determine if arbitrary input is an instance of a write-enabled column layout.\n\n    Returns\n    -------\n    bool: True if input is write-enabled column, otherwise False.\n    \"\"\"\n    return isinstance(obj, (FlatSampleWriter, FlatSubsampleWriter, NestedSampleWriter))\n"
  },
  {
    "path": "src/hangar/columns/layout_flat.py",
    "content": "\"\"\"Accessor class for columns containing single-level key/value mappings\n\nThe FlatSampleReader container is used to store data (in any backend) in a column\ncontaining a single level key/value mapping from names/ids to data.\n\nAll backends are supported.\n\"\"\"\nfrom contextlib import ExitStack\nfrom pathlib import Path\nfrom operator import attrgetter as op_attrgetter\nfrom typing import Tuple, Union, Iterable, Optional, Any\n\nfrom .common import open_file_handles\nfrom ..records import (\n    data_record_db_val_from_digest,\n    data_record_digest_val_from_db_val,\n    flat_data_db_key_from_names,\n    hash_data_db_key_from_raw_key,\n    schema_db_key_from_column,\n    schema_hash_db_key_from_digest,\n    schema_hash_record_db_val_from_spec,\n    schema_record_db_val_from_digest\n)\nfrom ..records.parsing import generate_sample_name\nfrom ..backends import backend_decoder\nfrom ..op_state import reader_checkout_only\nfrom ..utils import is_suitable_user_key\nfrom ..optimized_utils import valfilter, valfilterfalse\n\n\nKeyType = Union[str, int]\n\n\nclass FlatSampleReader:\n    \"\"\"Class implementing get access to data in a column.\n\n    This class exposes the standard API to access data stored in a single level\n    key / value mapping column. Usage is modeled after the python :class:`dict`\n    style syntax -- with a few additional utility and inspection methods and\n    properties added. Methods named after those of a python :class:`dict` have\n    syntactically identical arguments and behavior to that of the standard\n    library.\n\n    If not opened in a ``write-enabled`` checkout, then attempts to add or\n    delete data or container properties will raise an exception (in the form of\n    a :class:`PermissionError`). No changes will be propogated unless a\n    ``write-enabled`` checkout is used.\n\n    This object can be serialized -- pickled -- for parallel processing /\n    reading if opened in a ``read-only`` checkout. Parallel operations are both\n    thread and process safe, though performance may significantly differ\n    between multithreaded vs multiprocessed code (depending on the backend data\n    is stored in). Attempts to serialize objects opened in ``write-enabled``\n    checkouts are not supported and will raise a :class:`PermissionError` if\n    attempted. This behavior is enforced in order to ensure data and record\n    integrity while writing to the repository.\n    \"\"\"\n\n    __slots__ = ('_mode', '_column_name', '_samples', '_be_fs',\n                 '_path', '_stack', '_enter_count', '_schema')\n    _attrs = __slots__\n\n    def __init__(self,\n                 columnname: str,\n                 samples,\n                 backend_handles,\n                 schema,\n                 repo_path: Path,\n                 mode: str,\n                 *args, **kwargs):\n\n        self._stack: Optional[ExitStack] = None\n        self._mode = mode\n        self._column_name = columnname\n        self._samples = samples\n        self._be_fs = backend_handles\n        self._path = repo_path\n        self._schema = schema\n        self._enter_count = 0\n\n    @property\n    def _debug_(self):  # pragma: no cover\n        return {\n            '__class__': self.__class__,\n            '_mode': self._mode,\n            '_column_name': self._column_name,\n            '_be_fs': self._be_fs,\n            '_path': self._path,\n            '_contains_subsamples': self.contains_subsamples,\n            '_stack': self._stack._exit_callbacks if self._stack else self._stack,\n            '_enter_count': self._enter_count,\n        }\n\n    def __repr__(self):\n        res = (\n            f'{self.__class__.__qualname__}('\n            f'repo_pth={self._path}, '\n            f'aset_name={self._column_name}, '\n            f\"{[f'{key}={val}, ' for key, val in self._schema.schema.items()]}, \"\n            f'mode={self._mode})')\n        return res\n\n    def _repr_pretty_(self, p, cycle):\n        res = f'Hangar {self.__class__.__qualname__} \\\n                \\n    Column Name              : {self._column_name}\\\n                \\n    Writeable                : {self.iswriteable}\\\n                \\n    Column Type              : {self.column_type}\\\n                \\n    Column Layout            : {self.column_layout}\\\n                \\n    Schema Type              : {self.schema_type}\\\n                \\n    DType                    : {self.dtype}\\\n                \\n    Shape                    : {self.shape}\\\n                \\n    Number of Samples        : {self.__len__()}\\\n                \\n    Partial Remote Data Refs : {bool(self.contains_remote_references)}\\n'\n        p.text(res)\n\n    def _ipython_key_completions_(self):  # pragma: no cover\n        \"\"\"Let ipython know that any key based access can use the column keys\n\n        Since we don't want to inherit from dict, nor mess with `__dir__` for\n        the sanity of developers, this is the best way to ensure users can\n        autocomplete keys.\n\n        Returns\n        -------\n        list\n            list of strings, each being one of the column keys for access.\n        \"\"\"\n        return list(self.keys())\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        return {slot: getattr(self, slot) for slot in self.__slots__}\n\n    def __setstate__(self, state: dict) -> None:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n\n        Technically should be decorated with @reader_checkout_only, but since\n        at instance creation that is not an attribute, the decorator won't\n        know. Since only readers can be pickled, This isn't much of an issue.\n        \"\"\"\n        for slot, value in state.items():\n            setattr(self, slot, value)\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, *exc):\n        return\n\n    def _destruct(self):\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n        self._close()\n        for attr in self._attrs:\n            delattr(self, attr)\n\n    def __getattr__(self, name):\n        \"\"\"Raise permission error after checkout is closed.\n\n         Only runs after a call to :meth:`_destruct`, which is responsible for\n         deleting all attributes from the object instance.\n        \"\"\"\n        try:\n            self.__getattribute__('_mode')  # once checkout is closed, this won't exist.\n        except AttributeError:\n            err = (f'Unable to operate on past checkout objects which have been '\n                   f'closed. No operation occurred. Please use a new checkout.')\n            raise PermissionError(err) from None\n        return self.__getattribute__(name)\n\n    @property\n    def _is_conman(self) -> bool:\n        return bool(self._enter_count)\n\n    def __iter__(self) -> Iterable[KeyType]:\n        \"\"\"Create iterator yielding an column sample keys.\n\n        Yields\n        -------\n        Iterable[KeyType]\n            Sample key contained in the column.\n        \"\"\"\n        yield from self.keys()\n\n    def __len__(self) -> int:\n        \"\"\"Check how many samples are present in a given column.\n        \"\"\"\n        return len(self._samples)\n\n    def __contains__(self, key: KeyType) -> bool:\n        \"\"\"Determine if a key is a valid sample name in the column.\n        \"\"\"\n        return key in self._samples\n\n    def _open(self):\n        for val in self._be_fs.values():\n            val.open(mode=self._mode)\n\n    def _close(self):\n        for val in self._be_fs.values():\n            val.close()\n\n    def __getitem__(self, key: KeyType):\n        \"\"\"Retrieve data for some sample key via dict style access conventions.\n\n        .. seealso:: :meth:`get`\n\n        Parameters\n        ----------\n        key : KeyType\n            Sample key to retrieve from the column.\n\n        Returns\n        -------\n        value\n            Data corresponding to the provided sample key.\n\n        Raises\n        ------\n        KeyError\n            if no sample with the requested key exists.\n        \"\"\"\n        spec = self._samples[key]\n        return self._be_fs[spec.backend].read_data(spec)\n\n    def get(self, key: KeyType, default=None):\n        \"\"\"Retrieve the data associated with some sample key\n\n        Parameters\n        ----------\n        key : KeyType\n            The name of the subsample(s) to retrieve. Passing a single\n            subsample key will return the stored data value.\n        default : Any\n            if a `key` parameter is not found, then return this value instead.\n            By default, None.\n\n        Returns\n        -------\n        value\n            data data stored under subsample key if key exists, else\n            default value if not found.\n        \"\"\"\n        try:\n            return self[key]\n        except KeyError:\n            return default\n\n    @property\n    def column(self) -> str:\n        \"\"\"Name of the column.\n        \"\"\"\n        return self._column_name\n\n    @property\n    def column_type(self):\n        \"\"\"Data container type of the column ('ndarray', 'str', etc).\n        \"\"\"\n        return self._schema.column_type\n\n    @property\n    def column_layout(self):\n        \"\"\"Column layout type ('nested', 'flat', etc).\n        \"\"\"\n        return self._schema.column_layout\n\n    @property\n    def schema_type(self):\n        \"\"\"Schema type of the contained data ('variable_shape', 'fixed_shape', etc).\n        \"\"\"\n        return self._schema.schema_type\n\n    @property\n    def dtype(self):\n        \"\"\"Dtype of the columns data (np.float, str, etc).\n        \"\"\"\n        return self._schema.dtype\n\n    @property\n    def shape(self):\n        \"\"\"(Max) shape of data that can (is) written in the column.\n        \"\"\"\n        try:\n            return self._schema.shape\n        except AttributeError:\n            return None\n\n    @property\n    def backend(self) -> str:\n        \"\"\"Code indicating which backing store is used when writing data.\n        \"\"\"\n        return self._schema.backend\n\n    @property\n    def backend_options(self):\n        \"\"\"Filter / Compression options applied to backend when writing data.\n        \"\"\"\n        return self._schema.backend_options\n\n    @property\n    def iswriteable(self) -> bool:\n        \"\"\"Bool indicating if this column object is write-enabled.\n        \"\"\"\n        return False if self._mode == 'r' else True\n\n    @property\n    def contains_subsamples(self) -> bool:\n        \"\"\"Bool indicating if sub-samples are contained in this column container.\n        \"\"\"\n        return False\n\n    @property\n    def contains_remote_references(self) -> bool:\n        \"\"\"Bool indicating if all samples in column exist on local disk.\n\n        The data associated with samples referencing some remote server will\n        need to be downloaded (``fetched`` in the hangar vocabulary) before\n        they can be read into memory.\n\n        Returns\n        -------\n        bool\n            False if at least one sample in the column references data stored\n            on some remote server. True if all sample data is available on the\n            machine's local disk.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        return not all(map(_islocal_func, self._samples.values()))\n\n    @property\n    def remote_reference_keys(self) -> Tuple[KeyType]:\n        \"\"\"Compute sample names whose data is stored in a remote server reference.\n\n        Returns\n        -------\n        Tuple[KeyType]\n            list of sample keys in the column whose data references indicate\n            they are stored on a remote server.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        return tuple(valfilterfalse(_islocal_func, self._samples).keys())\n\n    def _mode_local_aware_key_looper(self, local: bool) -> Iterable[KeyType]:\n        \"\"\"Generate keys for iteration with dict update safety ensured.\n\n        Parameters\n        ----------\n        local : bool\n            True if keys should be returned which only exist on the local machine.\n            Fale if remote sample keys should be excluded.\n\n        Returns\n        -------\n        Iterable[KeyType]\n            Sample keys conforming to the `local` argument spec.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        if local:\n            if self._mode == 'r':\n                yield from valfilter(_islocal_func, self._samples).keys()\n            else:\n                yield from tuple(valfilter(_islocal_func, self._samples).keys())\n        else:\n            if self._mode == 'r':\n                yield from self._samples.keys()\n            else:\n                yield from tuple(self._samples.keys())\n\n    def keys(self, local: bool = False) -> Iterable[KeyType]:\n        \"\"\"Generator yielding the name (key) of every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys will only correspond to data which is\n            available for reading on the local disk, by default False.\n\n        Yields\n        ------\n        Iterable[KeyType]\n            Keys of one subsample at a time inside the sample.\n        \"\"\"\n        yield from self._mode_local_aware_key_looper(local)\n\n    def values(self, local: bool = False) -> Iterable[Any]:\n        \"\"\"Generator yielding the data for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned values will only correspond to data which is\n            available for reading on the local disk. No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Any]\n            Values of one subsample at a time inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield self[key]\n\n    def items(self, local: bool = False) -> Iterable[Tuple[KeyType, Any]]:\n        \"\"\"Generator yielding (name, data) tuple for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys/values will only correspond to data which is\n            available for reading on the local disk, No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Tuple[KeyType, Any]]\n            Name and stored value for every subsample inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield (key, self[key])\n\n# ---------------- writer methods only after this point -------------------\n\n\nclass FlatSampleWriter(FlatSampleReader):\n\n    __slots__ = ('_txnctx',)\n    _attrs = __slots__ + FlatSampleReader.__slots__\n\n    def __init__(self, aset_ctx, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self._txnctx = aset_ctx\n\n    def __enter__(self):\n        with ExitStack() as stack:\n            self._txnctx.open_write()\n            stack.callback(self._txnctx.close_write)\n            if self._enter_count == 0:\n                for k in self._be_fs.keys():\n                    stack.enter_context(self._be_fs[k])\n            self._enter_count += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._stack.close()\n        self._enter_count -= 1\n\n    def _set_arg_validate(self, key, value):\n        \"\"\"Verify if key / value pair is valid to be written in this column\n\n        Parameters\n        ----------\n        key\n            name to associate with this data piece\n        value\n            piece of data to store in the column\n\n        Raises\n        ------\n        ValueError\n            If key is not valid type/contents or if value is not correct object\n            type / if it does not conform to column schema\n        \"\"\"\n        if not is_suitable_user_key(key):\n            raise ValueError(f'Sample name `{key}` is not suitable.')\n\n        isCompat = self._schema.verify_data_compatible(value)\n        if not isCompat.compatible:\n            raise ValueError(isCompat.reason)\n\n    def _perform_set(self, key, value):\n        \"\"\"Internal write method. Assumes all arguments validated and context is open\n\n        Parameters\n        ----------\n        key\n            sample key to store\n        value\n            data to store\n        \"\"\"\n        full_hash = self._schema.data_hash_digest(value)\n\n        hashKey = hash_data_db_key_from_raw_key(full_hash)\n        # check if data record already exists with given key\n        dataRecKey = flat_data_db_key_from_names(self._column_name, key)\n        existingDataRecVal = self._txnctx.dataTxn.get(dataRecKey, default=False)\n        if existingDataRecVal:\n            # check if data record already with same key & hash value\n            existingDataRec = data_record_digest_val_from_db_val(existingDataRecVal)\n            if full_hash == existingDataRec.digest:\n                return\n\n        # write new data if data hash does not exist\n        existingHashVal = self._txnctx.hashTxn.get(hashKey, default=False)\n        if existingHashVal is False:\n            hashVal = self._be_fs[self._schema.backend].write_data(value)\n            self._txnctx.hashTxn.put(hashKey, hashVal)\n            self._txnctx.stageHashTxn.put(hashKey, hashVal)\n            hash_spec = backend_decoder(hashVal)\n        else:\n            hash_spec = backend_decoder(existingHashVal)\n            if hash_spec.backend not in self._be_fs:\n                # when adding data which is already stored in the repository, the\n                # backing store for the existing data location spec may not be the\n                # same as the backend which the data piece would have been saved in here.\n                #\n                # As only the backends actually referenced by a columns samples are\n                # initialized (accessible by the column), there is no guarantee that\n                # an accessor exists for such a sample. In order to prevent internal\n                # errors from occurring due to an uninitialized backend if a previously\n                # existing data piece is \"saved\" here and subsequently read back from\n                # the same writer checkout, we perform an existence check and backend\n                # initialization, if appropriate.\n                fh = open_file_handles(backends=(hash_spec.backend,),\n                                       path=self._path,\n                                       mode='a',\n                                       schema=self._schema)\n                self._be_fs[hash_spec.backend] = fh[hash_spec.backend]\n\n        # add the record to the db\n        dataRecVal = data_record_db_val_from_digest(full_hash)\n        self._txnctx.dataTxn.put(dataRecKey, dataRecVal)\n        self._samples[key] = hash_spec\n\n    def __setitem__(self, key, value):\n        \"\"\"Store a piece of data in a column.\n\n        .. seealso::\n\n            :meth:`update` for an implementation analogous to python's built in\n            :meth:`dict.update` method which accepts a dict or iterable of\n            key/value pairs to add in the same operation.\n\n        Parameters\n        ----------\n        key\n            name to assign to the sample (assuming the column accepts named\n            samples), If str, can only contain alpha-numeric ascii characters\n            (in addition to '-', '.', '_'). Integer key must be >= 0. by\n            default, None\n        value\n            data to store as a sample in the column.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n            self._set_arg_validate(key, value)\n            self._perform_set(key, value)\n\n    def append(self, value) -> KeyType:\n        \"\"\"Store some data in a sample with an automatically generated key.\n\n        This method should only be used if the context some piece of data is\n        used in is independent from its value (i.e. when reading data back,\n        there is no useful information which needs to be conveyed between the\n        data source's name/id and the value of that piece of information.)\n        Think carefully before going this route, as this posit does not apply\n        to many common use cases.\n\n        To store the data with a user defined key, use :meth:`update` or\n        :meth:`__setitem__`\n\n        Parameters\n        ----------\n        value\n            Piece of data to store in the column.\n\n        Returns\n        -------\n        KeyType\n            Name of the generated key this data is stored with.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n            key = generate_sample_name()\n            while key in self._samples:\n                key = generate_sample_name()\n            self._set_arg_validate(key, value)\n            self._perform_set(key, value)\n            return key\n\n    def update(self, other=None, **kwargs):\n        \"\"\"Store some data with the key/value pairs from other, overwriting existing keys.\n\n        :meth:`update` implements functionality similar to python's builtin\n        :meth:`dict.update` method, accepting either a dictionary or other\n        iterable (of length two) listing key / value pairs.\n\n        Parameters\n        ----------\n        other\n            Accepts either another dictionary object or an iterable of\n            key/value pairs (as tuples or other iterables of length two).\n            mapping sample names to data value instances instances, If sample\n            name is string type, can only contain alpha-numeric ascii\n            characters (in addition to '-', '.', '_'). Int key must be >= 0. By\n            default, None.\n        **kwargs\n            keyword arguments provided will be saved with keywords as sample keys\n            (string type only) and values as np.array instances.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if other:\n                if not isinstance(other, dict):\n                    other = dict(other)\n                else:\n                    other = other.copy()\n            elif other is None:\n                other = {}\n            if kwargs:\n                # we have to merge kwargs dict with `other` before operating on\n                # either so all validation and writing occur atomically\n                other.update(kwargs)\n\n            for key, val in other.items():\n                self._set_arg_validate(key, val)\n            for key, val in other.items():\n                self._perform_set(key, val)\n\n    def __delitem__(self, key: KeyType) -> None:\n        \"\"\"Remove a sample from the column. Convenience method to :meth:`delete`.\n\n        .. seealso::\n\n            :meth:`pop` to return a value and then delete it in the same operation\n\n        Parameters\n        ----------\n        key : KeyType\n            Name of the sample to remove from the column.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if key not in self._samples:\n                raise KeyError(key)\n\n            dataKey = flat_data_db_key_from_names(self._column_name, key)\n            isRecordDeleted = self._txnctx.dataTxn.delete(dataKey)\n            if isRecordDeleted is False:\n                raise RuntimeError(\n                    f'Internal error. Not able to delete key {key} from staging '\n                    f'db even though existance passed in memory verification. '\n                    f'Please report this message in full to the hangar development team.',\n                    f'Specified key: <{type(key)} {key}>', f'Calculated dataKey: <{dataKey}>',\n                    f'isRecordDeleted: <{isRecordDeleted}>', f'DEBUG STRING: {self._debug_}')\n            del self._samples[key]\n\n    def pop(self, key: KeyType):\n        \"\"\"Retrieve some value for some key(s) and delete it in the same operation.\n\n        Parameters\n        ----------\n        key : KeysType\n            Sample key to remove\n\n        Returns\n        -------\n        value\n            Upon success, the value of the removed key.\n\n        Raises\n        ------\n        KeyError\n            If there is no sample with some key in the column.\n        \"\"\"\n        value = self[key]\n        del self[key]\n        return value\n\n    def change_backend(self, backend: str, backend_options: Optional[dict] = None):\n        \"\"\"Change the default backend and filters applied to future data writes.\n\n        .. warning::\n\n           This method is meant for advanced users only. Please refer to the\n           hangar backend codebase for information on accepted parameters and\n           options.\n\n        Parameters\n        ----------\n        backend : str\n            Backend format code to swtich to.\n        backend_options : Optional[dict]\n            Backend option specification to use (if specified). If left to\n            default value of None, then default options for backend are\n            automatically used.\n\n        Raises\n        ------\n        RuntimeError\n            If this method was called while this column is invoked in a\n            context manager\n        ValueError\n            If the backend format code is not valid.\n        \"\"\"\n        if self._is_conman:\n            raise RuntimeError('Cannot call method inside column context manager.')\n\n        self._schema.change_backend(backend, backend_options=backend_options)\n\n        new_schema_digest = self._schema.schema_hash_digest()\n        columnSchemaKey = schema_db_key_from_column(self._column_name, layout=self.column_layout)\n        columnSchemaVal = schema_record_db_val_from_digest(new_schema_digest)\n        hashSchemaKey = schema_hash_db_key_from_digest(new_schema_digest)\n        hashSchemaVal = schema_hash_record_db_val_from_spec(self._schema.schema)\n\n        # -------- set vals in lmdb only after schema is sure to exist --------\n\n        with self._txnctx.write() as ctx:\n            ctx.dataTxn.put(columnSchemaKey, columnSchemaVal)\n            ctx.hashTxn.put(hashSchemaKey, hashSchemaVal, overwrite=False)\n\n        new_backend = self._schema.backend\n        if new_backend not in self._be_fs:\n            fhands = open_file_handles(\n                backends=[new_backend],\n                path=self._path,\n                mode='a',\n                schema=self._schema)\n            self._be_fs[new_backend] = fhands[new_backend]\n        else:\n            self._be_fs[new_backend].close()\n        self._be_fs[new_backend].open(mode='a')\n        self._be_fs[new_backend].backend_opts = self._schema.backend_options\n        return\n"
  },
  {
    "path": "src/hangar/columns/layout_nested.py",
    "content": "\"\"\"Accessor column containing nested mapping of data under top level keys.\n\"\"\"\nfrom contextlib import ExitStack\nfrom pathlib import Path\nfrom typing import (\n    Tuple, Union, Dict, Iterable, Any, Optional\n)\nfrom operator import attrgetter as op_attrgetter\nfrom operator import getitem as op_getitem\nfrom weakref import proxy\nfrom functools import reduce\n\nfrom .common import open_file_handles\nfrom ..records import (\n    data_record_db_val_from_digest,\n    data_record_digest_val_from_db_val,\n    nested_data_db_key_from_names,\n    hash_data_db_key_from_raw_key,\n    schema_db_key_from_column,\n    schema_hash_db_key_from_digest,\n    schema_hash_record_db_val_from_spec,\n    schema_record_db_val_from_digest,\n)\nfrom ..records.parsing import generate_sample_name\nfrom ..backends import backend_decoder, BACKEND_ACCESSOR_MAP\nfrom ..op_state import reader_checkout_only\nfrom ..utils import is_suitable_user_key\nfrom ..optimized_utils import valfilter, valfilterfalse\n\n\nKeyType = Union[str, int]\nEllipsisType = type(Ellipsis)\nSubsampleGetKeysType = Union[KeyType, EllipsisType, slice]\nSampleGetKeysType = Union[KeyType, Tuple[KeyType, SubsampleGetKeysType]]\n\n\nclass FlatSubsampleReader(object):\n\n    __slots__ = ('_column_name', '_stack', '_be_fs',\n                 '_mode', '_subsamples', '_samplen')\n    _attrs = __slots__\n\n    def __init__(self,\n                 columnname: str,\n                 samplen: str,\n                 be_handles: BACKEND_ACCESSOR_MAP,\n                 specs,\n                 mode: str,\n                 *args, **kwargs):\n\n        self._column_name = columnname\n        self._samplen = samplen\n        self._be_fs = be_handles\n        self._subsamples = specs\n        self._mode = mode\n        self._stack: Optional[ExitStack] = None\n\n    @property\n    def _debug_(self):  # pragma: no cover\n        return {\n            '__class__': self.__class__,\n            '_column_name': self._column_name,\n            '_samplen': self._samplen,\n            '_be_fs': self._be_fs,\n            '_subsamples': self._subsamples,\n            '_mode': self._mode,\n            '_stack': self._stack._exit_callbacks if self._stack else self._stack,\n        }\n\n    def __repr__(self):\n        res = f'{self.__class__}('\\\n              f'column_name={self._column_name}, '\\\n              f'sample_name={self._samplen})'\n        return res\n\n    def _repr_pretty_(self, p, cycle):\n        res = f'Hangar {self.__class__.__name__} \\\n                \\n    Column Name          : {self._column_name}\\\n                \\n    Sample Name          : {self._samplen}\\\n                \\n    Writeable            : \"{self.iswriteable}\"\\\n                \\n    Number of Subsamples : {len(self)}\\n'\n        p.text(res)\n\n    def _ipython_key_completions_(self):\n        \"\"\"Let ipython know that any key based access can use the column keys\n\n        Since we don't want to inherit from dict, nor mess with `__dir__` for\n        the sanity of developers, this is the best way to ensure users can\n        autocomplete keys.\n\n        Returns\n        -------\n        list\n            list of strings, each being one of the column keys for access.\n        \"\"\"\n        return list(self.keys())\n\n    def __enter__(self):\n        self._enter_count += 1\n        return self\n\n    def __exit__(self, *exc):\n        self._enter_count -= 1\n\n    def _destruct(self):\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n        for attr in self._attrs:\n            delattr(self, attr)\n\n    def __getattr__(self, name):\n        \"\"\"Raise permission error after checkout is closed.\n\n         Only runs after a call to :meth:`_destruct`, which is responsible for\n         deleting all attributes from the object instance.\n        \"\"\"\n        try:\n            self.__getattribute__('_mode')  # once checkout is closed, this won't exist.\n        except AttributeError:\n            err = (f'Unable to operate on past checkout objects which have been '\n                   f'closed. No operation occurred. Please use a new checkout.')\n            raise PermissionError(err) from None\n        return self.__getattribute__(name)\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        return {slot: getattr(self, slot) for slot in self.__slots__}\n\n    def __setstate__(self, state: dict) -> None:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n\n        Technically should be decorated with @reader_checkout_only, but since\n        at instance creation that is not an attribute, the decorator won't\n        know. Since only readers can be pickled, This isn't much of an issue.\n        \"\"\"\n        for slot, value in state.items():\n            setattr(self, slot, value)\n\n    def __len__(self) -> int:\n        return len(self._subsamples)\n\n    def __contains__(self, key: KeyType) -> bool:\n        return key in self._subsamples\n\n    def __iter__(self) -> Iterable[KeyType]:\n        yield from self.keys()\n\n    def __getitem__(self, key: SubsampleGetKeysType) -> Union[Any, Dict[KeyType, Any]]:\n        \"\"\"Retrieve data for some subsample key via dict style access conventions.\n\n        .. seealso:: :meth:`get`\n\n        Parameters\n        ----------\n        key : SubsampleGetKeysType\n            Sample key to retrieve from the column. Alternatively, ``slice``\n            syntax can be used to retrieve a selection of subsample\n            keys/values. An empty slice (``: == slice(None)``) or ``Ellipsis``\n            (``...``) will return all subsample keys/values. Passing a\n            non-empty slice (``[1:5] == slice(1, 5)``) will select keys to\n            retrieve by enumerating all subsamples and retrieving the element\n            (key) for each step across the range. Note: order of enumeration is\n            not guaranteed; do not rely on any ordering observed when using\n            this method.\n\n        Returns\n        -------\n        Union[Any, Dict[KeyType, Any]]\n            Sample data corresponding to the provided key. or dictionary\n            of subsample keys/data if Ellipsis or slice passed in as key.\n\n        Raises\n        ------\n        KeyError\n            if no sample with the requested key exists.\n        \"\"\"\n        # select subsample(s) with regular keys\n        if isinstance(key, (str, int)):\n            spec = self._subsamples[key]\n            return self._be_fs[spec.backend].read_data(spec)\n        # select all subsamples\n        elif key is Ellipsis:\n            res = {}\n            for subsample, spec in self._subsamples.items():\n                res[subsample] = self._be_fs[spec.backend].read_data(spec)\n            return res\n        # slice subsamples by sorted order of keys\n        elif isinstance(key, slice):\n            res = {}\n            subsample_spec_slice = tuple(self._subsamples.items())[key]\n            for subsample, spec in subsample_spec_slice:\n                spec = self._subsamples[subsample]\n                res[subsample] = self._be_fs[spec.backend].read_data(spec)\n            return res\n        else:\n            raise TypeError(f'key {key} type {type(key)} not valid.')\n\n    @property\n    def _enter_count(self):\n        return self._be_fs['enter_count']\n\n    @_enter_count.setter\n    def _enter_count(self, value):\n        self._be_fs['enter_count'] = value\n\n    @property\n    def _is_conman(self):\n        return bool(self._enter_count)\n\n    @property\n    def sample(self) -> KeyType:\n        \"\"\"Name of the sample this column subsamples are stured under.\n        \"\"\"\n        return self._samplen\n\n    @property\n    def column(self) -> str:\n        \"\"\"Name of the column.\n        \"\"\"\n        return self._column_name\n\n    @property\n    def iswriteable(self) -> bool:\n        \"\"\"Bool indicating if this column object is write-enabled.\n        \"\"\"\n        return False if self._mode == 'r' else True\n\n    @property\n    def data(self) -> Dict[KeyType, Any]:\n        \"\"\"Return dict mapping every subsample key / data value stored in the sample.\n\n        Returns\n        -------\n        Dict[KeyType, Any]\n            Dictionary mapping subsample name(s) (keys) to their stored values\n            as :class:`numpy.ndarray` instances.\n        \"\"\"\n        return self[...]\n\n    def _mode_local_aware_key_looper(self, local: bool) -> Iterable[KeyType]:\n        \"\"\"Generate keys for iteration with dict update safety ensured.\n\n        Parameters\n        ----------\n        local : bool\n            True if keys should be returned which only exist on the local\n            machine. False if remote sample keys should be excluded.\n\n        Returns\n        -------\n        Iterable[KeyType]\n            Sample keys conforming to the `local` argument spec.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        if local:\n            if self._mode == 'r':\n                yield from valfilter(_islocal_func, self._subsamples).keys()\n            else:\n                yield from tuple(valfilter(_islocal_func, self._subsamples).keys())\n        else:\n            if self._mode == 'r':\n                yield from self._subsamples.keys()\n            else:\n                yield from tuple(self._subsamples.keys())\n\n    @property\n    def contains_remote_references(self) -> bool:\n        \"\"\"Bool indicating all subsamples in sample column exist on local disk.\n\n        The data associated with subsamples referencing some remote server will\n        need to be downloaded (``fetched`` in the hangar vocabulary) before\n        they can be read into memory.\n\n        Returns\n        -------\n        bool\n            False if at least one subsample in the column references data\n            stored on some remote server. True if all sample data is available\n            on the machine's local disk.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        return not all(map(_islocal_func, self._subsamples.values()))\n\n    @property\n    def remote_reference_keys(self) -> Tuple[KeyType]:\n        \"\"\"Compute subsample names whose data is stored in a remote server reference.\n\n        Returns\n        -------\n        Tuple[KeyType]\n            list of subsample keys in the column whose data references indicate\n            they are stored on a remote server.\n        \"\"\"\n        _islocal_func = op_attrgetter('islocal')\n        return tuple(valfilterfalse(_islocal_func, self._subsamples).keys())\n\n    def keys(self, local: bool = False) -> Iterable[KeyType]:\n        \"\"\"Generator yielding the name (key) of every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys will only correspond to data which is\n            available for reading on the local disk, by default False.\n\n        Yields\n        ------\n        Iterable[KeyType]\n            Keys of one subsample at a time inside the sample.\n        \"\"\"\n        yield from self._mode_local_aware_key_looper(local)\n\n    def values(self, local: bool = False) -> Iterable[Any]:\n        \"\"\"Generator yielding the data for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned values will only correspond to data which is\n            available for reading on the local disk. No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Any]\n            Values of one subsample at a time inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield self[key]\n\n    def items(self, local: bool = False) -> Iterable[Tuple[KeyType, Any]]:\n        \"\"\"Generator yielding (name, data) tuple for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys/values will only correspond to data which is\n            available for reading on the local disk, No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Tuple[KeyType, Any]]\n            Name and stored value for every subsample inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield (key, self[key])\n\n    def get(self, key: KeyType, default=None):\n        \"\"\"Retrieve the data associated with some subsample key\n\n        Parameters\n        ----------\n        key : SubsampleGetKeysType\n            The name of the subsample(s) to retrieve. Passing a single\n            subsample key will return the stored :class:`numpy.ndarray`\n        default\n            if a `key` parameter is not found, then return this value instead.\n            By default, None.\n\n        Returns\n        -------\n        value\n            data stored under subsample key if key exists, else default\n            value if not found.\n        \"\"\"\n        try:\n            return self[key]\n        except KeyError:\n            return default\n\n\n# ---------------- writer methods only after this point -------------------\n\n\nclass FlatSubsampleWriter(FlatSubsampleReader):\n\n    __slots__ = ('_schema', '_txnctx', '_path')\n    _attrs = __slots__ + FlatSubsampleReader.__slots__\n\n    def __init__(self,\n                 schema,\n                 repo_path: Path,\n                 aset_ctx=None,\n                 *args, **kwargs):\n\n        super().__init__(*args, **kwargs)\n        self._path = repo_path\n        self._schema = schema\n        self._txnctx = aset_ctx\n\n    def __enter__(self):\n        with ExitStack() as stack:\n            self._txnctx.open_write()\n            stack.callback(self._txnctx.close_write)\n            if self._enter_count == 0:\n                for k in self._be_fs.keys():\n                    if k in ('enter_count', 'schema_spec'):\n                        continue\n                    stack.enter_context(self._be_fs[k])\n            self._enter_count += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._stack.close()\n        self._enter_count -= 1\n        if self._enter_count == 0:\n            self._stack = None\n\n    def _set_arg_validate(self, key, value):\n        if not is_suitable_user_key(key):\n            raise ValueError(f'Sample name `{key}` is not suitable.')\n        isCompat = self._schema.verify_data_compatible(value)\n        if not isCompat.compatible:\n            raise ValueError(isCompat.reason)\n\n    def _perform_set(self, key, value):\n        \"\"\"Internal write method. Assumes all arguments validated and cm open.\n\n        Parameters\n        ----------\n        key\n            subsample key to store\n        value\n            data to store\n        \"\"\"\n        # full_hash = ndarray_hasher_tcode_0(value)\n        full_hash = self._schema.data_hash_digest(value)\n        hashKey = hash_data_db_key_from_raw_key(full_hash)\n\n        # check if data record already exists with given key\n        dataRecKey = nested_data_db_key_from_names(self._column_name, self._samplen, key)\n        existingDataRecVal = self._txnctx.dataTxn.get(dataRecKey, default=False)\n        if existingDataRecVal:\n            # check if data record already with same key & hash value\n            existingDataRec = data_record_digest_val_from_db_val(existingDataRecVal)\n            if full_hash == existingDataRec.digest:\n                return\n\n        # write new data if data hash does not exist\n        existingHashVal = self._txnctx.hashTxn.get(hashKey, default=False)\n        if existingHashVal is False:\n            backendCode = self._schema.backend\n            hashVal = self._be_fs[backendCode].write_data(value)\n            self._txnctx.hashTxn.put(hashKey, hashVal)\n            self._txnctx.stageHashTxn.put(hashKey, hashVal)\n            hash_spec = backend_decoder(hashVal)\n        else:\n            hash_spec = backend_decoder(existingHashVal)\n            if hash_spec.backend not in self._be_fs:\n                # when adding data which is already stored in the repository, the\n                # backing store for the existing data location spec may not be the\n                # same as the backend which the data piece would have been saved in here.\n                #\n                # As only the backends actually referenced by a columns samples are\n                # initialized (accessible by the column), there is no guarantee that\n                # an accessor exists for such a sample. In order to prevent internal\n                # errors from occurring due to an uninitialized backend if a previously\n                # existing data piece is \"saved\" here and subsequently read back from\n                # the same writer checkout, we perform an existence check and backend\n                # initialization, if appropriate.\n                fh = open_file_handles(backends=(hash_spec.backend,),\n                                       path=self._path,\n                                       mode='a',\n                                       schema=self._schema)\n                self._be_fs[hash_spec.backend] = fh[hash_spec.backend]\n\n        # add the record to the db\n        dataRecVal = data_record_db_val_from_digest(full_hash)\n        self._txnctx.dataTxn.put(dataRecKey, dataRecVal)\n        self._subsamples[key] = hash_spec\n\n    def __setitem__(self, key, value):\n        \"\"\"Store data as a subsample. Convenience method to :meth:`add`.\n\n        .. seealso::\n\n            :meth:`update` for an implementation analogous to python's built\n            in :meth:`dict.update` method which accepts a dict or iterable of\n            key/value pairs to add in the same operation.\n\n        Parameters\n        ----------\n        key\n            Key (name) of the subsample to add to the column.\n        value\n            Data to add as the sample.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n            self._set_arg_validate(key, value)\n            self._perform_set(key, value)\n\n    def append(self, value) -> KeyType:\n        \"\"\"Store some data in a subsample with an automatically generated key.\n\n        This method should only be used if the context some piece of data is\n        used in is independent from it's value (ie. when reading data back,\n        there is no useful information which needs to be conveyed between the\n        data source's name/id and the value of that piece of information.)\n        Think carefully before going this route, as this posit does not apply\n        to many common use cases.\n\n        .. seealso::\n\n            In order to store the data with a user defined key, use\n            :meth:`update` or :meth:`__setitem__`\n\n        Parameters\n        ----------\n        value\n            Piece of data to store in the column.\n\n        Returns\n        -------\n        KeyType\n            Name of the generated key this data is stored with.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n            key = generate_sample_name()\n            while key in self._subsamples:\n                key = generate_sample_name()\n            self._set_arg_validate(key, value)\n            self._perform_set(key, value)\n            return key\n\n    def update(self, other=None, **kwargs):\n        \"\"\"Store data with the key/value pairs, overwriting existing keys.\n\n        :meth:`update` implements functionality similar to python's builtin\n        :meth:`dict.update` method, accepting either a dictionary or other\n        iterable (of length two) listing key / value pairs.\n\n        Parameters\n        ----------\n        other\n            Accepts either another dictionary object or an iterable of\n            key/value pairs (as tuples or other iterables of length two).\n            mapping sample names to data values, If sample name is string type,\n            can only contain alpha-numeric ascii characters (in addition to\n            '-', '.', '_'). Int key must be >= 0. By default, None.\n        **kwargs\n            keyword arguments provided will be saved with keywords as subsample\n            keys (string type only) and values as np.array instances.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if other:\n                if not isinstance(other, dict):\n                    other = dict(other)\n                else:\n                    other = other.copy()\n            elif other is None:\n                other = {}\n            if kwargs:\n                # we have to merge kwargs dict with `other` before operating on\n                # either so all validation and writing occur atomically\n                other.update(kwargs)\n\n            for key, val in other.items():\n                self._set_arg_validate(key, val)\n            for key, val in other.items():\n                self._perform_set(key, val)\n\n    def __delitem__(self, key: KeyType):\n        \"\"\"Remove a subsample from the column.`.\n\n        .. seealso::\n\n            :meth:`pop` to simultaneously get a keys value and delete it.\n\n        Parameters\n        ----------\n        key : KeyType\n            Name of the sample to remove from the column.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if key not in self._subsamples:\n                raise KeyError(key)\n\n            dbKey = nested_data_db_key_from_names(self._column_name, self._samplen, key)\n            isRecordDeleted = self._txnctx.dataTxn.delete(dbKey)\n            if isRecordDeleted is False:\n                raise RuntimeError(\n                    f'Internal error. Not able to delete key {key} from staging '\n                    f'db even though existence passed in memory verification. '\n                    f'Please report this message in full to the hangar development team.',\n                    f'Specified key: <{type(key)} {key}>', f'Calculated dbKey: <{dbKey}>',\n                    f'isRecordDeleted: <{isRecordDeleted}>', f'DEBUG STRING: {self._debug_}')\n            del self._subsamples[key]\n\n    def pop(self, key: KeyType):\n        \"\"\"Retrieve some value for some key(s) and delete it in the same operation.\n\n        Parameters\n        ----------\n        key : KeysType\n            Sample key to remove\n\n        Returns\n        -------\n        value\n            Upon success, the value of the removed key.\n        \"\"\"\n        value = self[key]\n        del self[key]\n        return value\n\n\nclass NestedSampleReader:\n\n    __slots__ = ('_mode', '_column_name', '_samples',\n                 '_be_fs', '_path', '_stack', '_schema')\n    _attrs = __slots__\n\n    def __init__(self,\n                 columnname: str,\n                 samples: Dict[KeyType, FlatSubsampleReader],\n                 backend_handles: Dict[str, Any],\n                 repo_path: Path,\n                 mode: str,\n                 schema=None,\n                 *args, **kwargs):\n\n        self._mode = mode\n        self._column_name = columnname\n        self._samples = samples\n        self._be_fs = backend_handles\n        self._path = repo_path\n        self._stack: Optional[ExitStack] = None\n        self._schema = schema\n\n    def __repr__(self):\n        res = (\n            f'{self.__class__.__qualname__}('\n            f'repo_pth={self._path}, '\n            f'columnname={self._column_name}, '\n            f\"{[f'{key}={val}, ' for key, val in self._schema.schema.items()]}, \"\n            f'mode={self._mode})')\n        return res\n\n    def _repr_pretty_(self, p, cycle):\n        res = f'Hangar {self.__class__.__qualname__} \\\n                \\n    Column Name              : {self.column}\\\n                \\n    Writeable                : {self.iswriteable}\\\n                \\n    Column Type              : {self.column_type}\\\n                \\n    Column Layout            : {self.column_layout}\\\n                \\n    Schema Type              : {self.schema_type}\\\n                \\n    DType                    : {self.dtype}\\\n                \\n    Shape                    : {self.shape}\\\n                \\n    Number of Samples        : {len(self)}\\\n                \\n    Number of Subsamples     : {self.num_subsamples}\\\n                \\n    Partial Remote Data Refs : {bool(self.contains_remote_references)}\\n'\n        p.text(res)\n\n    def _ipython_key_completions_(self):\n        \"\"\"Let ipython know that any key based access can use the column keys\n\n        Since we don't want to inherit from dict, nor mess with `__dir__` for\n        the sanity of developers, this is the best way to ensure users can\n        autocomplete keys.\n\n        Returns\n        -------\n        list\n            list of strings, each being one of the column keys for access.\n        \"\"\"\n        return list(self.keys())\n\n    def __enter__(self):\n        self._enter_count += 1\n        return self\n\n    def __exit__(self, *exc):\n        self._enter_count -= 1\n\n    def _destruct(self):\n        if isinstance(self._stack, ExitStack):\n            self._stack.close()\n        self._close()\n        for sample in self._samples.values():\n            sample._destruct()\n        for attr in self._attrs:\n            delattr(self, attr)\n\n    def __getattr__(self, name):\n        \"\"\"Raise permission error after checkout is closed.\n\n         Only runs after a call to :meth:`_destruct`, which is responsible\n         for deleting all attributes from the object instance.\n        \"\"\"\n        try:\n            self.__getattribute__('_mode')  # once checkout is closed, this won't exist.\n        except AttributeError:\n            err = (f'Unable to operate on past checkout objects which have been '\n                   f'closed. No operation occurred. Please use a new checkout.')\n            raise PermissionError(err) from None\n        return self.__getattribute__(name)\n\n    @reader_checkout_only\n    def __getstate__(self) -> dict:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n        \"\"\"\n        return {slot: getattr(self, slot) for slot in self.__slots__}\n\n    def __setstate__(self, state: dict) -> None:\n        \"\"\"ensure multiprocess operations can pickle relevant data.\n\n        Technically should be decorated with @reader_checkout_only, but since\n        at instance creation the '_mode' is not a set attribute, the decorator\n        won't know how to process. Since only readers can be pickled, This\n        isn't much of an issue.\n        \"\"\"\n        for slot, value in state.items():\n            setattr(self, slot, value)\n\n    def __getitem__(\n            self, key: SampleGetKeysType\n    ) -> Union[FlatSubsampleReader, Union[Any, Dict[KeyType, Any]]]:\n        \"\"\"Get the sample access class for some sample key.\n\n        Parameters\n        ----------\n        key\n            Name of sample to retrieve\n\n        Returns\n        -------\n        Union[FlatSubsampleReader, Union[Any, Dict[KeyType, Any]]]\n            Sample accessor corresponding to the given key\n\n        Raises\n        ------\n        KeyError\n            If no sample with the provided key exists.\n        \"\"\"\n        if isinstance(key, (list, tuple)):\n            return reduce(op_getitem, key, self._samples)\n        else:\n            res = self._samples[key]\n        return res\n\n    def __iter__(self) -> Iterable[KeyType]:\n        \"\"\"Create iterator yielding an column sample keys.\n\n        Yields\n        -------\n        Iterable[KeyType]\n            Sample key contained in the column.\n        \"\"\"\n        yield from self.keys()\n\n    def __len__(self) -> int:\n        \"\"\"Find number of samples in the column\n        \"\"\"\n        return len(self._samples)\n\n    def __contains__(self, key: KeyType) -> bool:\n        \"\"\"Determine if some sample key exists in the column.\n        \"\"\"\n        return key in self._samples\n\n    def _open(self):\n        for val in self._be_fs.values():\n            try:\n                # since we are storing non backend accessor information in the\n                # be_fs weakref proxy for the purpose of memory savings, not\n                # all elements have a `open` method\n                val.open(mode=self._mode)\n            except AttributeError:\n                pass\n\n    def _close(self):\n        for val in self._be_fs.values():\n            # since we are storing non backend accessor information in the\n            # be_fs weakref proxy for the purpose of memory savings, not all\n            # elements have a `close` method\n            try:\n                val.close()\n            except AttributeError:\n                pass\n\n    @property\n    def _enter_count(self):\n        return self._be_fs['enter_count']\n\n    @_enter_count.setter\n    def _enter_count(self, value):\n        self._be_fs['enter_count'] = value\n\n    @property\n    def _is_conman(self):\n        return bool(self._enter_count)\n\n    @property\n    def column(self) -> str:\n        \"\"\"Name of the column.\n        \"\"\"\n        return self._column_name\n\n    @property\n    def column_type(self):\n        \"\"\"Data container type of the column ('ndarray', 'str', etc).\n        \"\"\"\n        return self._schema.column_type\n\n    @property\n    def column_layout(self):\n        \"\"\"Column layout type ('nested', 'flat', etc).\n        \"\"\"\n        return self._schema.column_layout\n\n    @property\n    def schema_type(self):\n        \"\"\"Schema type of the contained data ('variable_shape', 'fixed_shape', etc).\n        \"\"\"\n        return self._schema.schema_type\n\n    @property\n    def dtype(self):\n        \"\"\"Dtype of the columns data (np.float, str, etc).\n        \"\"\"\n        return self._schema.dtype\n\n    @property\n    def shape(self):\n        \"\"\"(Max) shape of data that can (is) written in the column.\n        \"\"\"\n        try:\n            return self._schema.shape\n        except AttributeError:\n            return None\n\n    @property\n    def backend(self) -> str:\n        \"\"\"Code indicating which backing store is used when writing data.\n        \"\"\"\n        return self._schema.backend\n\n    @property\n    def backend_options(self):\n        \"\"\"Filter / Compression options applied to backend when writing data.\n        \"\"\"\n        return self._schema.backend_options\n\n    @property\n    def iswriteable(self) -> bool:\n        \"\"\"Bool indicating if this column object is write-enabled.\n        \"\"\"\n        return False if self._mode == 'r' else True\n\n    def _mode_local_aware_key_looper(self, local: bool) -> Iterable[KeyType]:\n        \"\"\"Generate keys for iteration with dict update safety ensured.\n\n        Parameters\n        ----------\n        local\n            True if keys should be returned which only exist on the local\n            machine. False if remote sample keys should be excluded.\n\n        Returns\n        -------\n        Iterable[KeyType]\n            Sample keys conforming to the `local` argument spec.\n        \"\"\"\n        _contains_remote_func = op_attrgetter('contains_remote_references')\n        if local:\n            if self._mode == 'r':\n                yield from valfilterfalse(_contains_remote_func, self._samples).keys()\n            else:\n                yield from tuple(valfilterfalse(_contains_remote_func, self._samples).keys())\n        else:\n            if self._mode == 'r':\n                yield from self._samples.keys()\n            else:\n                yield from tuple(self._samples.keys())\n\n    @property\n    def contains_remote_references(self) -> bool:\n        \"\"\"Bool indicating all subsamples in sample column exist on local disk.\n\n        The data associated with subsamples referencing some remote server will\n        need to be downloaded (``fetched`` in the hangar vocabulary) before\n        they can be read into memory.\n\n        Returns\n        -------\n        bool\n            False if at least one subsample in the column references data\n            stored on some remote server. True if all sample data is available\n            on the machine's local disk.\n        \"\"\"\n        _contains_remote_func = op_attrgetter('contains_remote_references')\n        return any(map(_contains_remote_func, self._samples.values()))\n\n    @property\n    def remote_reference_keys(self) -> Tuple[KeyType]:\n        \"\"\"Compute subsample names whose data is stored in a remote server reference.\n\n        Returns\n        -------\n        Tuple[KeyType]\n            list of subsample keys in the column whose data references indicate\n            they are stored on a remote server.\n        \"\"\"\n        _remote_keys_func = op_attrgetter('remote_reference_keys')\n        return tuple(valfilter(_remote_keys_func, self._samples).keys())\n\n    @property\n    def contains_subsamples(self) -> bool:\n        \"\"\"Bool indicating if sub-samples are contained in this column container.\n        \"\"\"\n        return True\n\n    @property\n    def num_subsamples(self) -> int:\n        \"\"\"Calculate total number of subsamples existing in all samples in column\n        \"\"\"\n        total = 0\n        for sample in self._samples.values():\n            total += len(sample)\n        return total\n\n    def keys(self, local: bool = False) -> Iterable[KeyType]:\n        \"\"\"Generator yielding the name (key) of every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys will only correspond to data which is\n            available for reading on the local disk, by default False.\n\n        Yields\n        ------\n        Iterable[KeyType]\n            Keys of one subsample at a time inside the sample.\n        \"\"\"\n        yield from self._mode_local_aware_key_looper(local)\n\n    def values(self, local: bool = False) -> Iterable[Any]:\n        \"\"\"Generator yielding the tensor data for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned values will only correspond to data which is\n            available for reading on the local disk. No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Any]\n            Values of one subsample at a time inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield self[key]\n\n    def items(self, local: bool = False) -> Iterable[Tuple[KeyType, Any]]:\n        \"\"\"Generator yielding (name, data) tuple for every subsample.\n\n        Parameters\n        ----------\n        local : bool, optional\n            If True, returned keys/values will only correspond to data which is\n            available for reading on the local disk, No attempt will be made to\n            read data existing on a remote server, by default False.\n\n        Yields\n        ------\n        Iterable[Tuple[KeyType, Any]]\n            Name and stored value for every subsample inside the sample.\n        \"\"\"\n        for key in self._mode_local_aware_key_looper(local):\n            yield (key, self[key])\n\n    def get(\n            self, key: SampleGetKeysType, default: Any = None\n    ) -> Union[FlatSubsampleReader, Union[Any, Dict[KeyType, Any]]]:\n        \"\"\"Retrieve data for some sample key(s) in the column.\n\n        Parameters\n        ----------\n        key\n            The name of the subsample(s) to retrieve\n        default\n            if a `key` parameter is not found, then return this value instead.\n            By default, None.\n\n        Returns\n        -------\n        Union[FlatSubsampleReader, Union[Any, Dict[KeyType, Any]]]\n            Sample accessor class given by name ``key`` which can be used to\n            access subsample data.\n        \"\"\"\n        try:\n            return self[key]\n        except KeyError:\n            return default\n\n\n# ---------------- writer methods only after this point -------------------\n\n\nclass NestedSampleWriter(NestedSampleReader):\n\n    __slots__ = ('_txnctx',)\n    _attrs = __slots__ + NestedSampleReader.__slots__\n\n    def __init__(self, aset_ctx=None, *args, **kwargs):\n\n        super().__init__(*args, **kwargs)\n        self._txnctx = aset_ctx\n\n    def __enter__(self):\n        with ExitStack() as stack:\n            self._txnctx.open_write()\n            stack.callback(self._txnctx.close_write)\n            if self._enter_count == 0:\n                for k in tuple(self._be_fs.keys()):\n                    if k in ('enter_count', 'schema_spec'):\n                        continue\n                    stack.enter_context(self._be_fs[k])\n            self._enter_count += 1\n            self._stack = stack.pop_all()\n        return self\n\n    def __exit__(self, *exc):\n        self._stack.close()\n        self._enter_count -= 1\n\n    def _set_arg_validate(self, sample_key, subsample_map):\n        if not is_suitable_user_key(sample_key):\n            raise ValueError(f'Sample name `{sample_key}` is not suitable.')\n\n        for subsample_key, subsample_val in subsample_map.items():\n            if not is_suitable_user_key(subsample_key):\n                raise ValueError(f'Sample name `{sample_key}` is not suitable.')\n            isCompat = self._schema.verify_data_compatible(subsample_val)\n            if not isCompat.compatible:\n                raise ValueError(isCompat.reason)\n\n    def _perform_set(self, key, value) -> None:\n        if key in self._samples:\n            self._samples[key].update(value)\n        else:\n            self._samples[key] = FlatSubsampleWriter(\n                schema=proxy(self._schema),\n                aset_ctx=proxy(self._txnctx),\n                repo_path=self._path,\n                columnname=self._column_name,\n                samplen=key,\n                be_handles=proxy(self._be_fs),\n                specs={},\n                mode='a')\n            try:\n                self._samples[key].update(value)\n            except Exception as e:\n                del self._samples[key]\n                raise e\n\n    def __setitem__(self, key, value) -> None:\n        \"\"\"Store some subsample key / subsample data map, overwriting existing keys.\n\n        .. seealso::\n\n            :meth:`update` for alternative syntax for setting values.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n            value = dict(value)\n            self._set_arg_validate(key, value)\n            self._perform_set(key, value)\n\n    def update(self, other=None, **kwargs) -> None:\n        \"\"\"Store some data with the key/value pairs, overwriting existing keys.\n\n        :meth:`update` implements functionality similar to python's builtin\n        :meth:`dict.update` method, accepting either a dictionary or other\n        iterable (of length two) listing key / value pairs.\n\n        Parameters\n        ----------\n        other\n            Dictionary mapping sample names to subsample data maps. Or Sequence\n            (list or tuple) where element one is the sample name and element\n            two is a subsample data map.\n        **kwargs\n            keyword arguments provided will be saved with keywords as sample\n            keys (string type only) and values as a mapping of subarray keys\n            to data values.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if isinstance(other, dict):\n                other = other.copy()\n            elif other:\n                other = dict(other)\n            else:\n                other = {}\n            if kwargs:\n                # we merge kwargs dict with `other` before operating on either\n                # so all necessary validation and writing occur atomically\n                other.update(kwargs)\n            for sample in tuple(other.keys()):\n                other[sample] = dict(other[sample])\n\n            for key, val in other.items():\n                self._set_arg_validate(key, val)\n            for key, val in other.items():\n                self._perform_set(key, val)\n\n    def __delitem__(self, key: KeyType):\n        \"\"\"Remove a sample (including all contained subsamples) from the column.\n\n        .. seealso::\n\n            :meth:`pop` for alternative implementing a simultaneous get value\n            and delete operation.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            sample = self._samples[key]\n            subsample_keys = list(sample.keys())\n            for subkey in subsample_keys:\n                del sample[subkey]\n\n            self._samples[key]._destruct()\n            del self._samples[key]\n\n    def pop(self, key: KeyType) -> Dict[KeyType, Any]:\n        \"\"\"Retrieve some value for some key(s) and delete it in the same operation.\n\n        Parameters\n        ----------\n        key : KeysType\n            sample key to remove\n\n        Returns\n        -------\n        Dict[KeyType, KeyArrMap]\n            Upon success, a nested dictionary mapping sample names to a dict of\n            subsample names and subsample values for every sample key passed\n            into this method.\n        \"\"\"\n        res = self._samples[key].data\n        del self[key]\n        return res\n\n    def change_backend(self, backend: str, backend_options: Optional[dict] = None):\n        \"\"\"Change the default backend and filters applied to future data writes.\n\n        .. warning::\n\n           This method is meant for advanced users only. Please refer to the\n           hangar backend codebase for information on accepted parameters and\n           options.\n\n        Parameters\n        ----------\n        backend : str\n            Backend format code to swtich to.\n        backend_options\n            Backend option specification to use (if specified). If left to\n            default value of None, then default options for backend are\n            automatically used.\n\n        Raises\n        ------\n        RuntimeError\n            If this method was called while this column is invoked in a\n            context manager\n        ValueError\n            If the backend format code is not valid.\n        \"\"\"\n        if self._is_conman:\n            raise RuntimeError('Cannot call method inside column context manager.')\n\n        self._schema.change_backend(backend, backend_options=backend_options)\n\n        new_schema_digest = self._schema.schema_hash_digest()\n        columnSchemaKey = schema_db_key_from_column(self._column_name, layout=self.column_layout)\n        columnSchemaVal = schema_record_db_val_from_digest(new_schema_digest)\n        hashSchemaKey = schema_hash_db_key_from_digest(new_schema_digest)\n        hashSchemaVal = schema_hash_record_db_val_from_spec(self._schema.schema)\n\n        # -------- set vals in lmdb only after schema is sure to exist --------\n\n        with self._txnctx.write() as ctx:\n            ctx.dataTxn.put(columnSchemaKey, columnSchemaVal)\n            ctx.hashTxn.put(hashSchemaKey, hashSchemaVal, overwrite=False)\n\n        new_backend = self._schema.backend\n        if new_backend not in self._be_fs:\n            fhands = open_file_handles(\n                backends=[new_backend],\n                path=self._path,\n                mode='a',\n                schema=self._schema)\n            self._be_fs[new_backend] = fhands[new_backend]\n        else:\n            self._be_fs[new_backend].close()\n        self._be_fs[new_backend].open(mode='a')\n        self._be_fs[new_backend].backend_opts = self._schema.backend_options\n        return\n"
  },
  {
    "path": "src/hangar/constants.py",
    "content": "from .utils import is_64bits, parse_bytes\n\n# parsing constants\n\nSEP_KEY = ':'\nSEP_LST = ' '\nSEP_CMT = ' << '\nSEP_SLC = \"*\"\nSEP_HSH = '$'\n\nCMT_KV_JOIN_KEY = SEP_LST.encode()\nCMT_DIGEST_JOIN_KEY = ''\nCMT_REC_JOIN_KEY = SEP_HSH.encode()\n\nK_INT = f'#'  # must be length 1 value\nK_BRANCH = f'branch{SEP_KEY}'\nK_HEAD = 'head'\nK_REMOTES = f'remote{SEP_KEY}'\nK_STGARR = f'a{SEP_KEY}'\nK_STGMETA = f'l{SEP_KEY}'\nK_SCHEMA = f's{SEP_KEY}'\nK_HASH = f'h{SEP_KEY}'\nK_WLOCK = f'writerlock{SEP_KEY}'\nK_VERSION = 'software_version'\n\nWLOCK_SENTINAL = 'LOCK_AVAILABLE'\n\n# directory names\n\nDIR_HANGAR = '.hangar'\nDIR_HANGAR_SERVER = '.hangar_server'\nDIR_DATA = 'data'\nDIR_DATA_STORE = 'store_data'\nDIR_DATA_STAGE = 'stage_data'\nDIR_DATA_REMOTE = 'remote_data'\n\n# configuration file names:\n\nCONFIG_USER_NAME = 'config_user.ini'\nCONFIG_SERVER_NAME = 'config_server.ini'\n\n# LMDB database names and settings.\n\n\nLMDB_SETTINGS = {\n    # lmdb cannot open map larger than 2GB on 32 bit machines.\n    'map_size': 50_000_000_000,\n    'meminit': False,\n    'subdir': False,\n    'lock': False,\n    'max_spare_txns': 10,\n}\n\nLMDB_REF_NAME = 'ref.lmdb'\nLMDB_HASH_NAME = 'hash.lmdb'\nLMDB_BRANCH_NAME = 'branch.lmdb'\nLMDB_STAGE_REF_NAME = 'stage_ref.lmdb'\nLMDB_STAGE_HASH_NAME = 'stage_hash.lmdb'\n\n# readme file\n\nREADME_FILE_NAME = 'README.txt'\n\n\n"
  },
  {
    "path": "src/hangar/context.py",
    "content": "import configparser\nimport os\nfrom pathlib import Path\nimport platform\nimport shutil\nimport tempfile\nimport warnings\nfrom typing import MutableMapping, Optional\n\nimport lmdb\n\nfrom . import __version__\nfrom .constants import (\n    CONFIG_USER_NAME,\n    DIR_DATA_REMOTE,\n    DIR_DATA_STAGE,\n    DIR_DATA_STORE,\n    DIR_DATA,\n    LMDB_BRANCH_NAME,\n    LMDB_HASH_NAME,\n    LMDB_REF_NAME,\n    LMDB_SETTINGS,\n    LMDB_STAGE_HASH_NAME,\n    LMDB_STAGE_REF_NAME,\n    README_FILE_NAME,\n)\nfrom .records.commiting import unpack_commit_ref\nfrom .records.heads import (\n    create_branch,\n    get_branch_head_commit,\n    get_staging_branch_head,\n    set_staging_branch_head,\n)\nfrom .records.parsing import repo_version_raw_spec_from_raw_string\nfrom .records.vcompat import (\n    is_repo_software_version_compatible,\n    set_repository_software_version,\n    startup_check_repo_version,\n)\nfrom .utils import readme_contents, is_64bits\n\n\nclass Environments(object):\n\n    def __init__(self, pth: Path):\n\n        self.repo_path: Path = pth\n        self.refenv: Optional[lmdb.Environment] = None\n        self.hashenv: Optional[lmdb.Environment] = None\n        self.stageenv: Optional[lmdb.Environment] = None\n        self.branchenv: Optional[lmdb.Environment] = None\n        self.stagehashenv: Optional[lmdb.Environment] = None\n        self.cmtenv: MutableMapping[str, lmdb.Environment] = {}\n        self._startup()\n\n    @property\n    def repo_is_initialized(self) -> bool:\n        \"\"\"Property to check if the repository is initialized, read-only attribute\n\n        Returns\n        -------\n        bool\n            True if repo environments are initialized, False otherwise\n        \"\"\"\n        ret = True if isinstance(self.refenv, lmdb.Environment) else False\n        return ret\n\n    def _startup(self) -> bool:\n        \"\"\"When first access to the Repo starts, attempt to open the db envs.\n\n        This function is designed to fail if a repository does not exist at the\n        :py:attribute:`repo_path` which is specified, so the user can\n        explicitly choose to initialize the repo. Once opened, the lmdb\n        environments should not be closed until the program terminates.\n\n        Returns\n        -------\n        bool False if no repository exists at the given path, otherwise True\n\n        Warns\n        -----\n        UserWarning Should the repository not exist at the provided repo path.\n\n        Raises\n        ------\n        RuntimeError If the repository version is not compatible with the\n        current software.\n        \"\"\"\n\n        if not self.repo_path.joinpath(LMDB_BRANCH_NAME).is_file():\n            msg = f'No repository exists at {self.repo_path}, please use `repo.init()` method'\n            warnings.warn(msg, UserWarning)\n            return False\n\n        if not is_64bits():\n            raise OSError(f'Hangar cannot run on 32 bit machines')\n\n        repo_ver = startup_check_repo_version(self.repo_path)\n        curr_ver = repo_version_raw_spec_from_raw_string(v_str=__version__)\n        if not is_repo_software_version_compatible(repo_ver, curr_ver):\n            msg = f'repository written version: {repo_ver} is not comatible '\\\n                  f'with the current Hangar software version: {curr_ver}'\n            raise RuntimeError(msg)\n\n        self._open_environments()\n        return True\n\n    def init_repo(self,\n                  user_name: str,\n                  user_email: str,\n                  remove_old: bool = False) -> Path:\n        \"\"\"Create a new hangar repositiory at the specified environment path.\n\n        Parameters\n        ----------\n        user_name : str\n            Name of the repository user.\n        user_email : str\n            Email address of the respository user.\n        remove_old : bool, optional(default value = False)\n            DEVELOPER USE ONLY --- Remove all data and records stored in the\n            repository if this opetion is enabled, defaults to False.\n\n        Returns\n        -------\n        Path\n            The path to the newly created repository on disk.\n\n        Raises\n        ------\n        OSError\n            If a hangar repository exists at the specified path, and `remove_old`\n            was not set to ``True``.\n        \"\"\"\n        if self.repo_path.joinpath(LMDB_BRANCH_NAME).is_file():\n            if remove_old is True:\n                shutil.rmtree(str(self.repo_path))\n            else:\n                raise OSError(f'Hangar Directory: {self.repo_path} already exists')\n\n        self.repo_path.mkdir()\n        self.repo_path.joinpath(DIR_DATA_STORE).mkdir()\n        self.repo_path.joinpath(DIR_DATA_STAGE).mkdir()\n        self.repo_path.joinpath(DIR_DATA_REMOTE).mkdir()\n        self.repo_path.joinpath(DIR_DATA).mkdir()\n        print(f'Hangar Repo initialized at: {self.repo_path}')\n\n        userConf = {'USER': {'name': user_name, 'email': user_email}}\n        CFG = configparser.ConfigParser()\n        CFG.read_dict(userConf)\n        with self.repo_path.joinpath(CONFIG_USER_NAME).open('w') as f:\n            CFG.write(f)\n\n        readmeTxt = readme_contents(user_name, user_email)\n        with self.repo_path.joinpath(README_FILE_NAME).open('w') as f:\n            f.write(readmeTxt.getvalue())\n\n        self._open_environments()\n        set_repository_software_version(branchenv=self.branchenv, ver_str=__version__)\n        create_branch(self.branchenv, 'master', '')\n        set_staging_branch_head(self.branchenv, 'master')\n        return self.repo_path\n\n    def checkout_commit(self, branch_name: str = '', commit: str = '') -> str:\n        \"\"\"Set up db environment with unpacked commit ref records.\n\n        Parameters\n        ----------\n        branch_name : str, optional\n            name of the branch to read, defaults to ''\n        commit : str, optional\n            name of the commit to read, defaults to ''\n\n        Returns\n        -------\n        str\n            commit hash which was checked out\n        \"\"\"\n        if commit != '':\n            commit_hash = commit\n            txt = f' * Checking out COMMIT: {commit_hash}'\n        elif branch_name != '':\n            commit_hash = get_branch_head_commit(self.branchenv, branch_name)\n            txt = f' * Checking out BRANCH: {branch_name} with current HEAD: {commit_hash}'\n        else:\n            head_branch = get_staging_branch_head(self.branchenv)\n            commit_hash = get_branch_head_commit(self.branchenv, head_branch)\n            txt = f'\\n Neither BRANCH or COMMIT specified.'\\\n                  f'\\n * Checking out writing HEAD BRANCH: {head_branch}'\n        print(txt)\n\n        # On UNIX-like system, an open process still retains ability to\n        # interact with disk space allocated to a file when it is removed from\n        # disk. Windows does not, and will not allow file to be removed if a\n        # process is interacting with it. While the CM form is cleaner, this\n        # hack allows similar usage on Windows platforms.\n\n        if platform.system() != 'Windows':\n            with tempfile.TemporaryDirectory() as tempD:\n                tmpDF = os.path.join(tempD, f'{commit_hash}.lmdb')\n                tmpDB = lmdb.open(path=tmpDF, **LMDB_SETTINGS)\n                unpack_commit_ref(self.refenv, tmpDB, commit_hash)\n                self.cmtenv[commit_hash] = tmpDB\n        else:\n            tempD = tempfile.mkdtemp()\n            tmpDF = os.path.join(tempD, f'{commit_hash}.lmdb')\n            tmpDB = lmdb.open(path=tmpDF, **LMDB_SETTINGS)\n            unpack_commit_ref(self.refenv, tmpDB, commit_hash)\n            self.cmtenv[commit_hash] = tmpDB\n\n        return commit_hash\n\n    def _open_environments(self):\n        \"\"\"Open the standard lmdb databases at the repo path.\n\n        If any commits are checked out (in an unpacked state), read those in as\n        well.\n        \"\"\"\n        ref_pth = str(self.repo_path.joinpath(LMDB_REF_NAME))\n        hash_pth = str(self.repo_path.joinpath(LMDB_HASH_NAME))\n        stage_pth = str(self.repo_path.joinpath(LMDB_STAGE_REF_NAME))\n        branch_pth = str(self.repo_path.joinpath(LMDB_BRANCH_NAME))\n        stagehash_pth = str(self.repo_path.joinpath(LMDB_STAGE_HASH_NAME))\n\n        self.refenv = lmdb.open(path=ref_pth, **LMDB_SETTINGS)\n        self.hashenv = lmdb.open(path=hash_pth, **LMDB_SETTINGS)\n        self.stageenv = lmdb.open(path=stage_pth, **LMDB_SETTINGS)\n        self.branchenv = lmdb.open(path=branch_pth, **LMDB_SETTINGS)\n        self.stagehashenv = lmdb.open(path=stagehash_pth, **LMDB_SETTINGS)\n\n    def _close_environments(self):\n\n        self.refenv.close()\n        self.hashenv.close()\n        self.stageenv.close()\n        self.branchenv.close()\n        self.stagehashenv.close()\n        for env in self.cmtenv.values():\n            if platform.system() == 'Windows':\n                envpth = env.path()\n                env.close()\n                os.remove(envpth)\n            else:\n                env.close()\n"
  },
  {
    "path": "src/hangar/dataset/__init__.py",
    "content": "__all__ = ('make_numpy_dataset', 'make_torch_dataset', 'make_tensorflow_dataset')\n\nfrom typing import Sequence, Callable, TYPE_CHECKING, Union, List, Tuple\n\nif TYPE_CHECKING:\n    from ..columns import ModifierTypes as Columns\n    from .torch_dset import TorchDataset\n    from .numpy_dset import NumpyDataset\n    from .tensorflow_dset import tf_Dataset\n    KeyType = Union[str, int, List, Tuple]\n\n\ndef make_numpy_dataset(\n        columns: Sequence['Columns'],\n        keys: 'KeyType' = None,\n        batch_size: int = None,\n        drop_last: bool = False,\n        shuffle: bool = True,\n        collate_fn: Callable = None) -> 'NumpyDataset':\n    \"\"\"Group column into a single numpy dataset, provides iterative looping over data.\n\n    This API also provides the options to batch the data which is a major difference\n    between other dataset APIs. In traditional Machine learning applications, it's quite\n    natural to load the whole dataset as a single batch because it's possible to fit into\n    the system memory. Passing the size of the dataset as the batch size would make it\n    possible here to do just that. This API also acts as an entry point for other\n    non-supported frameworks to load data from hangar as batches into the training loop.\n\n    Parameters\n    ----------\n    columns\n        A column object, a tuple of column object or a list of column\n        objects.\n    keys\n        An sequence collection of sample names. If given only those samples will\n        fetched from the column\n    batch_size\n        Size of the batch. This will batch the dataset on the zeroth dimension. For\n        example, if the data is of the shape (H x W x C) the batched data will be shaped\n        as (B x H x W x C) where B is the batch size\n    drop_last\n        Should the last uncompleted batch be dropped\n    shuffle\n        Should the data be shuffled on each epoch\n    collate_fn\n        A function to collate samples together in a batch. In case this option is absent,\n        the heuristics to collate the batch is\n            1. If the column is an ndarray flat column, then `np.stack` will be used\n            2. If the column is with any other properties, `list.append` will be used\n        Note that the batch of data that comes to callate_fn will have each elements consist\n        of datapoints from all the columns. For example, if the columns from where the data\n        being fetched are col1 and col2 then the batch would look like\n\n        ```python\n        [\n            (data0_col1, data0_col2),\n            (data1_col1, data1_col2),\n            (data2_col1, data2_col2),\n            ...\n        ]\n        ```\n\n    Examples\n    --------\n    >>> from hangar import Repository\n    >>> from hangar.dataset import make_numpy_dataset\n    >>> repo = Repository('.')\n    >>> co = repo.checkout()\n    >>> imgcol = co.columns['images']\n    >>> classcol = co.columns['classes']\n    >>> dataset = make_numpy_dataset((imgcol, classcol), batch_size=64)\n    >>> for batch in dataset:\n    ...     out = train_model(batch[0])\n    ...     loss = loss_fn(out, batch[1])\n\n    Returns\n    -------\n    :class: `~.numpy_dset.NumpyDataset`\n    \"\"\"\n    from .numpy_dset import _make_numpy_dataset\n    return _make_numpy_dataset(\n        columns=columns,\n        keys=keys,\n        batch_size=batch_size,\n        drop_last=drop_last,\n        shuffle=shuffle,\n        collate_fn=collate_fn)\n\n\ndef make_torch_dataset(\n        columns: Sequence['Columns'],\n        keys: 'KeyType' = None,\n        as_dict: bool = False) -> 'TorchDataset':\n    \"\"\"Returns a :class:`torch.utils.data.Dataset` object which can be loaded into\n    a :class:`torch.utils.data.DataLoader`.\n\n    .. note::\n\n        PyTorch's :class:`torch.utils.data.DataLoader` can effectively do custom\n        operations such as shuffling, batching, multiprocessed read etc and hence we\n        limit the surface area of the dataset API here just to open the channel for\n        reading. Use DataLoaders for such operations\n\n    .. warning::\n\n       On Windows systems, setting the parameter ``num_workers`` in the\n       resulting :class:`torch.utils.data.DataLoader` method will result in a\n       RuntimeError or deadlock. This is due to limitations of multiprocess\n       start methods on Windows itself. Using the default argument value\n       (``num_workers=0``) will let the DataLoader work in single process mode\n       as expected.\n\n    Parameters\n    ----------\n    columns\n        A column object, a tuple of column object or a list of column\n        objects.\n    keys\n        An sequence collection of sample names. If given only those samples will\n        fetched from the column\n    as_dict\n        Return the data as an OrderedDict with column names as keys. If False,\n        it returns a tuple of arrays\n\n    Examples\n    --------\n    >>> from hangar import Repository\n    >>> from torch.utils.data import DataLoader\n    >>> from hangar.dataset import make_torch_dataset\n    >>> from collections import namedtuple\n    >>> repo = Repository('.')\n    >>> co = repo.checkout()\n    >>> imgcol = co.columns['images']\n    >>> classcol = co.columns['classes']\n    >>> dataset = make_torch_dataset((imgcol, classcol), as_dict=True)\n    >>> loader = DataLoader(dataset, batch_size=16)\n    >>> for batch in loader:\n    ...     out = train_model(batch['images'])\n    ...     loss = loss_fn(out, batch['classes'])\n\n    Returns\n    -------\n    :class:`torch.utils.data.Dataset`\n    \"\"\"\n    from .torch_dset import _make_torch_dataset\n    return _make_torch_dataset(columns=columns, keys=keys, as_dict=as_dict)\n\n\ndef make_tensorflow_dataset(\n        columns: Sequence['Columns'],\n        keys: 'KeyType' = None,\n        shuffle: bool = False) -> 'tf_Dataset':\n    \"\"\"Make a tensorflow dataset from a hangar column.\n\n    This method uses `from_generator` function from `tensorflow.data.Dataset` with a\n    generator function that wraps all the hangar columns. This function also accepts an\n    optional ``shuffle`` argument that does a global shuffle across all the samples.\n    This is convenient since Tensorflow Dataset does shuffling by loading the subset\n    of data which can fit into the memory and shuffle that subset.\n\n    .. warning::\n\n        This function relies on `tf.data.Dataset.from_generator` and which calls into the\n        python interpreter for running the generator funciton. This generator function\n        will not be serialized in a GraphDef and hence has limited portability. The\n        operation must run in the same address space as the Python program that calls\n        'make_tensorflow_dataset'. Also, since it calls back into the python interpreter,\n        we'll have the GIL problem and is not parellel-izable even with a `Dataset.map`\n        call. In fact, any attempts to parellelize the read will result in worse\n        performance\n\n    Parameters\n    ----------\n    columns\n        A column object, a tuple of column object or a list of column objects`\n    keys\n        An sequence of sample names. If given only those samples will fetched from\n        the column\n    shuffle\n        The generator uses this to decide a global shuffle across all the samples is\n        required or not. But user doesn't have any restriction on doing`column.shuffle()`\n        on the returned column\n\n    Examples\n    --------\n    >>> from hangar import Repository\n    >>> from hangar.dataset import make_tensorflow_dataset\n    >>> import tensorflow as tf\n    >>> tf.compat.v1.enable_eager_execution()\n    >>> repo = Repository('')\n    >>> co = repo.checkout()\n    >>> data = co.columns['mnist_data']\n    >>> target = co.columns['mnist_target']\n    >>> tf_dset = make_tensorflow_dataset([data, target])\n    >>> tf_dset = tf_dset.batch(512)\n    >>> for bdata, btarget in tf_dset:\n    ...     print(bdata.shape, btarget.shape)\n\n    Returns\n    -------\n    :class:`tf_Dataset`\n    \"\"\"\n    from .tensorflow_dset import _make_tensorflow_dataset\n    return _make_tensorflow_dataset(columns=columns, keys=keys, shuffle=shuffle)\n"
  },
  {
    "path": "src/hangar/dataset/common.py",
    "content": "import typing\nfrom typing import Union, Sequence, Tuple, List, Dict\nfrom collections import OrderedDict\n\nfrom ..columns import is_column, is_writer_column\nfrom ..optimized_utils import is_ordered_sequence\n\nif typing.TYPE_CHECKING:\n    from hangar.columns.column import ModifierTypes as Columns\n    KeyType = Union[str, int, List, Tuple]\n\n\nclass HangarDataset:\n    \"\"\"Dataset class that does the initial checks to verify whether the provided\n    columns can be arranged together as a dataset. These verifications are done on the\n    keys of each column. If ``keys`` argument is ``None``, initializer of this class\n    makes the key list by checking the local keys across all columns.\n    If ``keys`` argument is provided, then it assumes the provided keys are valid and\n    restrain from doing any more check on it.\n    It provides the ``__getitem__`` accessor for downstream process to consume the\n    grouped data\n\n\n    Parameters\n    ----------\n    columns\n        A single column object of a sequence the column objects\n    keys\n        An sequence collection of sample names. If given only those samples will\n        fetched from the column\n    \"\"\"\n\n    def __init__(self,\n                 columns: Union['Columns', Sequence['Columns']],\n                 keys: 'KeyType' = None):\n\n        self._columns: Dict[str, 'Columns'] = OrderedDict()\n        self._is_conman_counter = 0\n\n        if is_ordered_sequence(columns):\n            if len(columns) == 0:\n                raise TypeError(f'Atleast one element must exist in input sequence.')\n        else:\n            columns = (columns,)\n\n        for obj in columns:\n            if not is_column(obj):\n                raise TypeError(\n                    f'All elements of input sequence must be hangar column objects.')\n            elif is_writer_column(obj):\n                raise PermissionError(\n                    f'Columns cannot be used while accessed via a `write-enabled` '\n                    f'checkout. Please close the checkout and reopen the column in '\n                    f'via a new checkout opened in `read-only` mode.')\n            column_name = obj.column\n            self._columns[column_name] = obj\n\n        if keys:\n            self._keys = keys\n        else:\n            if len(set((col.column_layout for col in self._columns.values()))) != 1:  # all same type\n                raise ValueError(f\"keys must be passed when all columns are not same type\")\n\n            keys = []\n            standard_keys = set()\n            for idx, col in enumerate(self._columns.values()):\n                # only match top level keys, even for nested columns\n                if idx == 0:\n                    standard_keys = set(col.keys(local=True))\n                    if len(standard_keys) == 0:\n                        raise RuntimeError(\"No local data found\")\n                else:\n                    key_set = set(col.keys(local=True))\n                    if len(standard_keys.symmetric_difference(key_set)) != 0:\n                        raise KeyError(\"Keys from multiple columns couldn't be matched. \"\n                                       \"Pass keys explicitly while creating dataset\")\n                if col.column_layout == 'flat':\n                    column_keys = (sample for sample in col.keys(local=True))\n                elif col.column_layout == 'nested':\n                    column_keys = ((sample, ...) for sample in col.keys(local=True))\n                else:\n                    raise RuntimeError(f'unknown column layout: {col}')\n\n                keys.append(column_keys)\n            if len(keys) == 1:\n                self._keys = tuple(keys[0])\n            else:\n                self._keys = tuple(zip(*keys))\n\n    @property\n    def columns(self):\n        return self._columns\n\n    def __len__(self):\n        return len(self._keys)\n\n    def index_get(self, index: int):\n        \"\"\"It takes one sample index and returns a the items from each column for\n        the given sample name for the given index.\n        \"\"\"\n        keys = self._keys[index]\n        if len(self._columns) == 1:\n            for col in self.columns.values():\n                return col[keys]\n        else:\n            if len(self.columns) != len(keys):\n                raise RuntimeError(\n                    f'Internal error setting up columns/keys. '\n                    f'columns: {self.columns} keys: {keys}'\n                )\n            res = (column[key] for column, key in zip(self.columns.values(), keys))\n            return tuple(res)\n"
  },
  {
    "path": "src/hangar/dataset/numpy_dset.py",
    "content": "from typing import Sequence, Callable, TYPE_CHECKING, Union, List, Tuple\nimport random\n\nimport numpy as np\n\nfrom .common import HangarDataset\n\nif TYPE_CHECKING:\n    from ..columns import ModifierTypes\n    Columns = ModifierTypes\n    KeyType = Union[str, int, List, Tuple]\n\n\ndef default_collate_fn(batch):\n    elem = batch[0]\n    if isinstance(elem, np.ndarray):\n        # TODO: stack to numpy array (out=) for performance\n        return np.stack(batch)\n    elif isinstance(elem, str):\n        return batch\n    elif isinstance(elem, dict):  # nested\n        return batch\n    elif isinstance(elem, tuple):  # multiple columns\n        out = (default_collate_fn(dt) for dt in zip(*batch))\n        return tuple(out)\n\n\nclass NumpyDataset:\n    \"\"\"NumpyDataset class provides interfaces for users to iterate over the batches of\n    data from different columns. The only user facing APIs it exposes are ``__len__`` and\n    ``__iter__``. Batch and shuffle operations are handled by `:func:`make_numpy_dataset`\n    based on the arguments it gets and hence user should not interact with this class for\n    such operations. Note that, user would never instantiate this class directly. Instead\n    `:func:`make_numpy_dataset` act as the entry point and return an object of this class\n    to the user\n\n    Parameters\n    ----------\n    dataset\n        Hangar's Dataset object that groups columns for downstream processing\n    batch_size\n        Size of the individual batch. If specified batches of this size will be returned\n        on each iteration\n    drop_last\n        Should drop the last incomplete batch\n    shuffle\n        Should shuffle the batch on each epoch\n    collate_fn\n        A function to collate samples together in a batch. In case this option is absent,\n        the heuristics to collate the batch is\n            1. If the column is an ndarray flat column, then `np.stack` will be used\n            2. If the column is with any other properties, `list.append` will be used\n        Note that the batch of data that comes to callate_fn will have each elements consist\n        of datapoints from all the columns. For example, if the columns from where the data\n        being fetched are col1 and col2 then the batch would look like\n\n        ```python\n        [\n            (data0_col1, data0_col2),\n            (data1_col1, data1_col2),\n            (data2_col1, data2_col2),\n            ...\n        ]\n        ```\n    \"\"\"\n    def __init__(self, dataset: HangarDataset, batch_size: int, drop_last: bool,\n                 shuffle: bool, collate_fn: Callable = None):\n        self._dataset = dataset\n        self._num_batches = None\n        self._batch_size = None\n        if batch_size:\n            self.collate_fn = collate_fn if collate_fn else default_collate_fn\n            self._batch(batch_size, drop_last)\n        else:\n            if collate_fn:\n                raise RuntimeError(\"Found `collate_fn` in the argument which is a no-op \"\n                                   \"since batching is not enabled\")\n            if drop_last:\n                raise RuntimeError(\"Setting `drop_last` is a no-op when batching is not enabled\")\n        self._shuffle = shuffle\n        self._indices = list(range(len(self._dataset)))\n\n    @property\n    def dataset(self):\n        return self._dataset\n\n    @property\n    def num_batches(self):\n        return self._num_batches\n\n    @property\n    def batch_size(self):\n        return self._batch_size\n\n    @batch_size.setter\n    def batch_size(self, value: int):\n        if not isinstance(value, int):\n            raise TypeError(f'Expected integer type, recieved {type(value)}')\n        elif value < 1:\n            raise ValueError(f'batch_size value must be >= 1, recieved {value}')\n        self._batch_size = value\n\n    @property\n    def shuffle(self):\n        return self._shuffle\n\n    @shuffle.setter\n    def shuffle(self, value: bool):\n        if not isinstance(value, bool):\n            raise TypeError(f'Expected bool type, recieved {type(value)}')\n        self._shuffle = value\n\n    def __len__(self):\n        return len(self._dataset)\n\n    def _batch(self, batch_size, drop_last=True) -> None:\n        \"\"\"Private function to this class to calculate the batch parameters. These\n        calculated parameters will be considered by the ``__iter__`` method while\n        fetching the batches for downstream process. This function will be called at\n        the time of object instantiation and should not be triggered independently\n\n        Parameters\n        ----------\n        batch_size : int\n            Size of the individual batch. If specified batches of this size will be returned\n            on each iteration\n        drop_last : bool\n            Should drop the last incomplete batch\n        \"\"\"\n        num_batches, has_last = divmod(len(self._dataset), batch_size)\n        if num_batches == 0:\n            raise RuntimeError(\"Batch size exceeded the number of samples\")\n        if has_last and not drop_last:\n            num_batches += 1\n        self._num_batches = num_batches\n        self._batch_size = batch_size\n\n    def __iter__(self):\n        if self._shuffle:\n            random.shuffle(self._indices)\n        if self._num_batches is None:\n            for i in self._indices:\n                yield self._dataset.index_get(i)\n        else:\n            start = 0\n            end = self._batch_size\n            for i in range(self._num_batches):\n                batch = self._indices[start:end]\n                out = [self._dataset.index_get(i) for i in batch]\n                start = end\n                end = end + self._batch_size\n                yield self.collate_fn(out)\n\n\ndef _make_numpy_dataset(columns: Sequence['Columns'],\n                        keys: 'KeyType' = None,\n                        batch_size: int = None,\n                        drop_last: bool = False,\n                        shuffle: bool = True,\n                        collate_fn: Callable = None) -> NumpyDataset:\n    \"\"\"Group column into a single numpy dataset, provides iterative looping over data.\n\n    This API also provides the options to batch the data which is a major difference\n    between other dataset APIs. In traditional Machine learning applications, it's quite\n    natural to load the whole dataset as a single batch because it's possible to fit into\n    the system memory. Passing the size of the dataset as the batch size would make it\n    possible here to do just that. This API also acts as an entry point for other\n    non-supported frameworks to load data from hangar as batches into the training loop.\n\n    .. note::\n\n        Column with layouts ``str`` or ``ndarray nested`` are not compatible with the\n        dataset APIs in the current release. So making dataset is only possible for\n        columns with layout ``ndarray flat``\n\n    .. note::\n\n        This is an experimental method in the current Hangar version. Please be aware\n        that Significant changes may be introduced in future releases without advance\n        notice or deprication warnings.\n\n    Parameters\n    ----------\n    columns : :class:`~hangar.columns.column.Columns` or Sequence\n        A column object, a tuple of column object or a list of column\n        objects.\n    keys : Union[str, int, List, Tuple]\n        An sequence collection of sample names. If given only those samples will\n        fetched from the column\n    batch_size : int\n        Size of the batch. This will batch the dataset on the zeroth dimension. For\n        example, if the data is of the shape (H x W x C) the batched data will be shaped\n        as (B x H x W x C) where B is the batch size\n    drop_last : bool\n        Should the last uncompleted batch be dropped\n    shuffle : bool\n        Should the data be shuffled on each epoch\n    collate_fn : Callable\n        A function to collate samples together in a batch. In case this option is absent,\n        the heuristics to collate the batch is\n            1. If the column is an ndarray flat column, then `np.stack` will be used\n            2. If the column is with any other properties, `list.append` will be used\n        Note that the batch of data that comes to callate_fn will have each elements consist\n        of datapoints from all the columns. For example, if the columns from where the data\n        being fetched are col1 and col2 then the batch would look like\n\n        ```python\n        [\n            (data0_col1, data0_col2),\n            (data1_col1, data1_col2),\n            (data2_col1, data2_col2),\n            ...\n        ]\n        ```\n\n    Returns\n    -------\n    :class: `.NumpyDataset`\n\n    DEVELOPER NOTE\n    --------------\n    - Any update to this function signature or docstring must be reflected in the\n      equivalent loader function in hangar/dataset/__init__.py. This function is\n      \"copied\" to a top level __init__.py to allow unified API and lazyloader access\n    \"\"\"\n    dataset = HangarDataset(columns, keys)\n    dataset = NumpyDataset(dataset, batch_size, drop_last, shuffle, collate_fn)\n    return dataset\n"
  },
  {
    "path": "src/hangar/dataset/tensorflow_dset.py",
    "content": "from typing import Sequence, Callable, List, Tuple, Union\nimport typing\nfrom functools import partial\nimport random\n\ntry:\n    import tensorflow as tf\nexcept (ImportError, ModuleNotFoundError):\n    raise ImportError(\n        'Could not import \"tensorflow\" library. Ensure library is '\n        'installed correctly to use tensorflow dataloader functions') from None\n\nfrom .common import HangarDataset\n\nif typing.TYPE_CHECKING:\n    tf_TensorType = tf.python.framework.dtypes.DType\n    tf_TensorShape = tf.TensorShape\n    tf_Dataset = tf.data.Dataset\n    KeyType = Union[str, int, List, Tuple]\n    from ..columns.column import ModifierTypes as Columns\n    import numpy as np\n\n\ndef yield_data(dataset: HangarDataset, indices: list,\n               shuffle: bool) -> Tuple['np.ndarray']:\n    if shuffle:\n        random.shuffle(indices)\n    for i in indices:\n        out = dataset.index_get(i)\n        yield out if isinstance(out, tuple) else (out,)\n\n\ndef _make_tensorflow_dataset(columns: Sequence['Columns'],\n                             keys: 'KeyType' = None,\n                             shuffle: bool = False) -> 'tf_Dataset':\n    \"\"\"Make a tensorflow dataset from a hangar column.\n\n    This method uses `from_generator` function from `tensorflow.data.Dataset` with a\n    generator function that wraps all the hangar columns. This function also accepts an\n    optional ``shuffle`` argument that does a global shuffle across all the samples.\n    This is convenient since Tensorflow Dataset does shuffling by loading the subset\n    of data which can fit into the memory and shuffle that subset.\n\n    .. Note::\n\n        Column with layouts ``str`` or ``ndarray nested`` are not compatible with the\n        dataset APIs in the current release. So making dataset is only possible for\n        columns with layout ``ndarray flat``\n\n    .. warning::\n\n        This function relies on `tf.data.Dataset.from_generator` and which calls into the\n        python interpreter for running the generator funciton. This generator function\n        will not be serialized in a GraphDef and hence has limited portability. The\n        operation must run in the same address space as the Python program that calls\n        'make_tensorflow_dataset'. Also, since it calls back into the python interpreter,\n        we'll have the GIL problem and is not parellel-izable even with a `Dataset.map`\n        call. In fact, any attempts to parellelize the read will result in worse\n        performance\n\n    .. note::\n\n        This is an experimental method in the current Hangar version. Please be aware\n        that Significant changes may be introduced in future releases without advance\n        notice or deprication warnings.\n\n    Parameters\n    ----------\n    columns\n        A column object, a tuple of column object or a list of column objects`\n    keys\n        An sequence of sample names. If given only those samples will fetched from\n        the column\n    shuffle\n        The generator uses this to decide a global shuffle across all the samples is\n        required or not. But user doesn't have any restriction on doing`column.shuffle()`\n        on the returned column\n\n\n    Examples\n    --------\n    >>> from hangar import Repository\n    >>> from hangar.dataset import make_tensorflow_dataset\n    >>> import tensorflow as tf\n    >>> tf.compat.v1.enable_eager_execution()\n    >>> repo = Repository('')\n    >>> co = repo.checkout()\n    >>> data = co.columns['mnist_data']\n    >>> target = co.columns['mnist_target']\n    >>> tf_dset = make_tensorflow_dataset([data, target])\n    >>> tf_dset = tf_dset.batch(512)\n    >>> for bdata, btarget in tf_dset:\n    ...     print(bdata.shape, btarget.shape)\n\n    Returns\n    -------\n    :class:`tf_Dataset`\n\n    DEVELOPER NOTE\n    --------------\n    - Any update to this function signature or docstring must be reflected in the\n      equivalent loader function in hangar/dataset/__init__.py. This function is\n      \"coppied\" to a top level __init__.py to allow unified API and lazyloader access\n    \"\"\"\n\n    dataset = HangarDataset(columns, keys)\n    indices = list(range(len(dataset)))\n    generator: Callable = partial(yield_data, dataset, indices, shuffle)\n    shapes: List[tf_TensorShape] = []\n    types: List[tf_TensorType] = []\n\n    for col in dataset.columns.values():\n        if col.schema_type == 'variable_shape':\n            shape = (None,) * len(col.shape)\n        else:\n            shape = col.shape\n        shapes.append(tf.TensorShape(shape))\n        types.append(tf.as_dtype(col.dtype))\n\n    return tf.data.Dataset.from_generator(generator=generator,\n                                          output_types=tuple(types),\n                                          output_shapes=tuple(shapes))\n"
  },
  {
    "path": "src/hangar/dataset/torch_dset.py",
    "content": "from typing import Sequence, TYPE_CHECKING, Union, List, Tuple\nfrom collections import OrderedDict\n\ntry:\n    import torch\nexcept (ImportError, ModuleNotFoundError):\n    raise ImportError(\n        'Could not import \"pytorch\" library. Ensure library is '\n        'installed correctly to use pytorch dataloader functions')\n\nfrom .common import HangarDataset\n\nif TYPE_CHECKING:\n    from ..columns.column import ModifierTypes as Columns\n    KeyType = Union[str, int, List, Tuple]\n\n\nclass TorchDataset(torch.utils.data.Dataset):\n    \"\"\"TorchDataset inherits :class:`torch.utils.data.Dataset` and accepts few convenient\n    arguments to wrap hangar columns to be used in :class:`torch.utils.data.DataLoaders`.\n    It accepts a hangar Dataset object which exposes all the user requested columns and\n    an array of keys to sample from. For more details, checkout\n    `PyTorch Dataset <https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset>`_\n    \"\"\"\n\n    def __init__(self, hangar_dataset: HangarDataset, as_dict: bool = False):\n        self.dataset = hangar_dataset\n        self.column_names = list(hangar_dataset.columns.keys())\n        self._as_dict = as_dict\n\n    def __len__(self) -> int:\n        return len(self.dataset)\n\n    def __getitem__(self, index: int):\n        data = self.dataset.index_get(index)\n        if not self._as_dict:\n            return data\n        if len(self.column_names) == 1:\n            return {self.column_names[0]: data}\n        else:\n            return OrderedDict(zip(self.column_names, data))\n\n\ndef _make_torch_dataset(columns: Sequence['Columns'],\n                        keys: 'KeyType' = None,\n                        as_dict: bool = False) -> TorchDataset:\n    \"\"\"Returns a :class:`torch.utils.data.Dataset` object which can be loaded into\n    a :class:`torch.utils.data.DataLoader`.\n\n    .. note::\n\n        Column with layouts ``str`` or ``ndarray nested`` are not compatible with the\n        dataset APIs in the current release. So making dataset is only possible for\n        columns with layout ``ndarray flat``\n\n    .. note::\n\n        PyTorch's :class:`torch.utils.data.DataLoader` can effectively do custom\n        operations such as shuffling, batching, multiprocessed read etc and hence we\n        limit the surface area of the dataset API here just to open the channel for\n        reading. Use DataLoaders for such operations\n\n    .. warning::\n\n       On Windows systems, setting the parameter ``num_workers`` in the\n       resulting :class:`torch.utils.data.DataLoader` method will result in a\n       RuntimeError or deadlock. This is due to limitations of multiprocess\n       start methods on Windows itself. Using the default argument value\n       (``num_workers=0``) will let the DataLoader work in single process mode\n       as expected.\n\n    .. note::\n\n        This is an experimental method in the current Hangar version. Please be aware\n        that Significant changes may be introduced in future releases without advance\n        notice or deprication warnings.\n\n    Parameters\n    ----------\n    columns\n        A column object, a tuple of column object or a list of column\n        objects.\n    keys\n        An sequence collection of sample names. If given only those samples will\n        fetched from the column\n    as_dict\n        Return the data as an OrderedDict with column names as keys. If False, it returns\n        a tuple of arrays\n\n    Returns\n    -------\n    dict or tuple\n\n    Examples\n    --------\n    >>> from hangar import Repository\n    >>> from torch.utils.data import DataLoader\n    >>> from hangar.dataset import make_torch_dataset\n    >>> from collections import namedtuple\n    >>> repo = Repository('.')\n    >>> co = repo.checkout()\n    >>> imgcol = co.columns['images']\n    >>> classcol = co.columns['classes']\n    >>> dataset = make_torch_dataset((imgcol, classcol), as_dict=True)\n    >>> loader = DataLoader(dataset, batch_size=16)\n    >>> for batch in loader:\n    ...     out = train_model(batch['images'])\n    ...     loss = loss_fn(out, batch['classes'])\n\n    Returns\n    -------\n    :class:`torch.utils.data.Dataset`\n\n    DEVELOPER NOTE\n    --------------\n    - Any update to this function signature or docstring must be reflected in the\n      equivalent loader function in hangar/dataset/__init__.py. This function is\n      \"coppied\" to a top level __init__.py to allow unified API and lazyloader access\n    \"\"\"\n    hangar_dataset = HangarDataset(columns, keys)\n    return TorchDataset(hangar_dataset, as_dict)\n"
  },
  {
    "path": "src/hangar/diagnostics/__init__.py",
    "content": "__version__ = '0.5.2'\n\nfrom .graphing import Graph\n\n__all__ = ['Graph']"
  },
  {
    "path": "src/hangar/diagnostics/ecosystem.py",
    "content": "from typing import Dict, List, Tuple, Union\n\n\nrequired_packages = [\n    ('hangar', lambda p: p.__version__),\n    ('click', lambda p: p.__version__),\n    ('lmdb', lambda p: p.__version__),\n    ('h5py', lambda p: p.__version__),\n    ('hdf5plugin', lambda p: p.version),\n    ('numpy', lambda p: p.__version__),\n    ('blosc', lambda p: p.__version__),\n    ('tqdm', lambda p: p.__version__),\n    ('wrapt', lambda p: p.__version__),\n    ('grpc', lambda p: p.__version__),\n    ('xxhash', lambda p: p.VERSION),\n]\n\n\ndef get_versions() -> dict:\n    \"\"\"Return information on software, machine, installed versions of packages.\n\n    dict\n        host, package, and `optional` package info.\n    \"\"\"\n    d = {'host': get_system_info(),\n         'packages': get_package_info(required_packages),\n         'optional': get_optional_info()}\n    return d\n\n\ndef get_system_info() -> List[Tuple[str, str]]:\n    \"\"\"Return local computer python, OS, and Machine info\n\n    Returns\n    -------\n    List[Tuple[str, str]]\n        field collected and value of the system parameter.\n    \"\"\"\n    import locale\n    import os\n    import platform\n    import struct\n    import sys\n\n    (sysname, nodename, release,\n     version, machine, processor) = platform.uname()\n\n    try:\n        loc = locale.getlocale()\n    except ValueError:  # pragma: no cover\n        loc = None\n\n    host = [\n        ('python', f'{sys.version_info[:]}'),\n        ('python-bits', f'{struct.calcsize(\"P\") * 8}'),\n        ('OS', f'{sysname}'),\n        ('OS-release', f'{release}'),\n        ('machine', f'{machine}'),\n        ('processor', f'{processor}'),\n        ('byteorder', f'{sys.byteorder}'),\n        ('LC_ALL', f'{os.environ.get(\"LC_ALL\", \"None\")}'),\n        ('LANG', f'{os.environ.get(\"LANG\", \"None\")}'),\n        ('LOCALE', f'{loc}'),\n        ('cpu-count', f'{os.cpu_count()}'),\n    ]\n\n    return host\n\n\ndef get_optional_info() -> Dict[str, Union[str, bool]]:\n    \"\"\"Get optional package info (tensorflow, pytorch, hdf5_bloscfilter, etc.)\n\n    Returns\n    -------\n    Dict[str, Union[str, False]]\n        package name, package version (if installed, otherwise False)\n    \"\"\"\n    res = {}\n    try:\n        import h5py\n        bloscFilterAvail = h5py.h5z.filter_avail(32001)\n    except ImportError:  # pragma: no cover\n        bloscFilterAvail = False\n    res['blosc-hdf5-plugin'] = bloscFilterAvail\n\n    try:\n        import torch\n        torchVersion = torch.__version__\n    except ImportError:  # pragma: no cover\n        torchVersion = False\n    res['pytorch'] = torchVersion\n\n    try:\n        import tensorflow\n        tensorflowVersion = tensorflow.__version__\n    except ImportError:  # pragma: no cover\n        tensorflowVersion = False\n    res['tensorflow'] = tensorflowVersion\n\n    return res\n\n\ndef get_package_info(pkgs):\n    \"\"\" get package versions for the passed required & optional packages.\n\n    Using local imports to avoid import overhead on interpreter startup.\n    \"\"\"\n    import importlib\n\n    pversions = []\n    for modname, ver_f in pkgs:\n        try:\n            mod = importlib.import_module(modname)\n            ver = ver_f(mod)\n            pversions.append((modname, ver))\n        except Exception:  # pragma: no cover\n            pversions.append((modname, None))\n\n    return pversions\n"
  },
  {
    "path": "src/hangar/diagnostics/graphing.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\nPortions of this code have been taken and modified from the \"asciidag\" project.\n\nURL:      https://github.com/sambrightman/asciidag/\nFile:     asciidag/graph.py\nCommit:   7c1eefe3895630dc3906bbe9d553e0169202756a\nAccessed: 25 MAR 2019\n\nasciidag License\n-------------------------------------------------------------------------------\nLicense: Mozilla Public License 2.0\nURL:     https://github.com/sambrightman/asciidag/blob/7c1eefe3895630dc3906bbe9d553e0169202756a/LICENSE\n\"\"\"\n\nimport sys\nimport time\nfrom enum import Enum\n\n__all__ = ('Graph',)\n\nCOLOR_NORMAL = \"\"\nCOLOR_RESET = \"\\033[m\"\nCOLOR_BOLD = \"\\033[1m\"\nCOLOR_RED = \"\\033[31m\"\nCOLOR_GREEN = \"\\033[32m\"\nCOLOR_YELLOW = \"\\033[33m\"\nCOLOR_BLUE = \"\\033[34m\"\nCOLOR_MAGENTA = \"\\033[35m\"\nCOLOR_CYAN = \"\\033[36m\"\nCOLOR_BOLD_RED = \"\\033[1;31m\"\nCOLOR_BOLD_GREEN = \"\\033[1;32m\"\nCOLOR_BOLD_YELLOW = \"\\033[1;33m\"\nCOLOR_BOLD_BLUE = \"\\033[1;34m\"\nCOLOR_BOLD_MAGENTA = \"\\033[1;35m\"\nCOLOR_BOLD_CYAN = \"\\033[1;36m\"\nCOLOR_BG_RED = \"\\033[41m\"\nCOLOR_BG_GREEN = \"\\033[42m\"\nCOLOR_BG_YELLOW = \"\\033[43m\"\nCOLOR_BG_BLUE = \"\\033[44m\"\nCOLOR_BG_MAGENTA = \"\\033[45m\"\nCOLOR_BG_CYAN = \"\\033[46m\"\n\nCOLUMN_COLORS_ANSI = [\n    COLOR_BOLD_RED,\n    COLOR_BOLD_GREEN,\n    COLOR_BOLD_YELLOW,\n    COLOR_BOLD_BLUE,\n    COLOR_BOLD_MAGENTA,\n    COLOR_BOLD_CYAN,\n    COLOR_RED,\n    COLOR_GREEN,\n    COLOR_YELLOW,\n    COLOR_BLUE,\n    COLOR_MAGENTA,\n    COLOR_CYAN,\n    COLOR_RESET,\n]\n\n\nclass Column(object):  # pylint: disable=too-few-public-methods\n    \"\"\"A single column of output.\n\n    Attributes:\n        commit -- The parent commit of this column.\n        color  -- The color to (optionally) print this column in.\n                  This is an index into column_colors.\n\n    \"\"\"\n\n    def __init__(self, commit, color):\n        self.commit = commit\n        self.color = color\n\n\nclass GraphState(Enum):  # pylint: disable=too-few-public-methods\n    PADDING = 0\n    SKIP = 1\n    PRE_COMMIT = 2\n    COMMIT = 3\n    POST_MERGE = 4\n    COLLAPSING = 5\n\n\nclass Graph(object):  # pragma: no cover\n    \"\"\"\n    The commit currently being processed\n        struct commit *commit\n\n    The number of interesting parents that this commit has. Note that this is\n    not the same as the actual number of parents. This count excludes parents\n    that won't be printed in the graph output, as determined by\n    is_interesting().\n        int num_parents\n\n    The width of the graph output for this commit. All rows for this commit are\n    padded to this width, so that messages printed after the graph output are\n    aligned.\n        int width\n\n    The next expansion row to print when state is GraphState.PRE_COMMIT\n        int expansion_row\n\n    The current output state. This tells us what kind of line next_line()\n    should output.\n        enum graph_state state\n\n    The output state for the previous line of output. This is primarily used to\n    determine how the first merge line should appear, based on the last line of\n    the previous commit.\n        enum graph_state prev_state\n\n    The index of the column that refers to this commit. If none of the incoming\n    columns refer to this commit, this will be equal to num_columns.\n        int commit_index\n\n    The commit_index for the previously displayed commit. This is used to\n    determine how the first line of a merge graph output should appear, based\n    on the last line of the previous commit.\n        int prev_commit_index\n\n    The maximum number of columns that can be stored in the columns and\n    new_columns arrays. This is also half the number of entries that can be\n    stored in the mapping and new_mapping arrays.\n        int column_capacity\n\n    The number of columns (also called \"branch lines\" in some places)\n        int num_columns\n\n    The number of columns in the new_columns array\n        int num_new_columns\n\n    The number of entries in the mapping array\n        int mapping_size\n\n    The column state before we output the current commit.\n        struct column *columns\n\n    The new column state after we output the current commit. Only valid when\n    state is GraphState.COLLAPSING.\n        struct column *new_columns\n\n    An array that tracks the current state of each character in the output line\n    during state GraphState.COLLAPSING. Each entry is -1 if this character is\n    empty, or a non-negative integer if the character contains a branch line.\n    The value of the integer indicates the target position for this branch\n    line. (I.e., this array maps the current column positions to their desired\n    positions.)\n\n    The maximum capacity of this array is always sizeof(int) * 2 *\n    column_capacity.\n        int *mapping\n\n    A temporary array for computing the next mapping state while we are\n    outputting a mapping line. This is stored as part of the git_graph simply\n    so we don't have to allocate a new temporary array each time we have to\n    output a collapsing line.\n        int *new_mapping\n\n    The current default column color being used. This is stored as an index\n    into the array column_colors.\n        unsigned short default_column_color\n    \"\"\"\n    def __init__(self,\n                 fh=None,\n                 first_parent_only=False,\n                 use_color=True,\n                 column_colors=None):\n        \"\"\"State machine for processing DAG nodes into ASCII graphs.\n\n        show_nodes() deals with sorting the nodes from tips down into\n        topological order. It then displays them line-by-line.\n\n        \"\"\"\n        self.commit = None\n        self.buf = ''\n\n        if fh is None:\n            self.outfile = sys.stdout\n        else:\n            self.outfile = fh\n        self.first_parent_only = first_parent_only\n        self.use_color = use_color\n        if column_colors is None:\n            self.column_colors = COLUMN_COLORS_ANSI\n        else:\n            self.column_colors = column_colors\n\n        self.num_parents = 0\n        self.width = 0\n        self.expansion_row = 0\n        self.state = GraphState.PADDING\n        self.prev_state = GraphState.PADDING\n        self.commit_index = 0\n        self.prev_commit_index = 0\n        self.num_columns = 0\n        self.num_new_columns = 0\n        self.mapping_size = 0\n        # Start the column color at the maximum value, since we'll always\n        # increment it for the first commit we output. This way we start at 0\n        # for the first commit.\n        self.default_column_color = len(self.column_colors) - 1\n\n        self.columns = {}\n        self.new_columns = {}\n        self.mapping = {}\n        self.new_mapping = {}\n\n    def show_nodes(self, dag, spec, branch, start, order, stop='',\n                   *, show_time=True, show_user=True):\n        \"\"\"Printing function that displays a DAG representing the commit history\n\n        Print a revision history alongside a revision graph drawn with ASCII\n        characters. Nodes printed as an * character are parents of the working\n        directory. Any unreachable (but referenced nodes) are displayed at +\n\n        Parameters\n        ----------\n        dag : dict\n            directed acyclic graph of nodes and connections in commits. No more than\n            2 connections per node\n        spec: dict\n            dictionary of commit specification (user name, email, message, etc).\n        branch : dict\n            dict of commit hash -> list of branch names whose HEAD commit is at\n            that key.\n        start : string\n            commit hash to act as the top of the topological sort.\n        order: list\n            time based ordering of commit hashs\n        stop : str, optional\n            commit hash to stop generating the graph at if the DAG contains more\n            history than is needed (the default is '', which is the \"parent\" of\n            the initial repository commit.)\n        \"\"\"\n        if start == stop:\n            return\n\n        fmtSpec = {}\n        for cmt, cmtspec in spec.items():\n            if show_time:\n                t = f\"({time.strftime('%d%b%Y %H:%M:%S', time.gmtime(cmtspec['commit_time']))})\"\n            else:\n                t = ''\n            if show_user:\n                u = f\"({cmtspec['commit_user']})\"\n            else:\n                u = ''\n            m = cmtspec['commit_message']\n            br = ' '\n            if cmt in branch:\n                for branchName in branch[cmt]:\n                    if self.use_color is True:\n                        br = f'{br}({COLOR_BOLD_RED}{branchName}{COLOR_RESET}) '\n                    else:\n                        br = f'{br}({branchName}) '\n            fmtSpec[cmt] = f'{cmt}{br}{t}{u}: {m}'\n\n        for rev in order:\n            parents = dag[rev]\n            self._update(rev, parents)\n            self._show_commit()\n            self.outfile.write(fmtSpec[rev])\n            if not self._is_commit_finished():\n                self.outfile.write('\\n')\n                self._show_remainder()\n            self.outfile.write('\\n')\n\n    def _write_column(self, col, col_char):\n        if col.color is not None:\n            self.buf += self.column_colors[col.color]\n        self.buf += col_char\n        if col.color is not None:\n            self.buf += self.column_colors[-1]\n\n    def _update_state(self, state):\n        self.prev_state = self.state\n        self.state = state\n\n    def _interesting_parents(self):\n        for parent in self.commit_parents:\n            yield parent\n            if self.first_parent_only:\n                break\n\n    def _get_current_column_color(self):\n        if not self.use_color:\n            return None\n        return self.default_column_color\n\n    def _increment_column_color(self):\n        self.default_column_color = ((self.default_column_color + 1)\n                                     % len(self.column_colors))\n\n    def _find_commit_color(self, commit):\n        for i in range(self.num_columns):\n            if self.columns[i].commit == commit:\n                return self.columns[i].color\n        return self._get_current_column_color()\n\n    def _insert_into_new_columns(self, commit, mapping_index):\n        \"\"\"\n        If the commit is already in the new_columns list, we don't need to add\n        it. Just update the mapping correctly.\n        \"\"\"\n        for i in range(self.num_new_columns):\n            if self.new_columns[i].commit == commit:\n                self.mapping[mapping_index] = i\n                return mapping_index + 2\n\n        # This commit isn't already in new_columns. Add it.\n        column = Column(commit, self._find_commit_color(commit))\n        self.new_columns[self.num_new_columns] = column\n        self.mapping[mapping_index] = self.num_new_columns\n        self.num_new_columns += 1\n        return mapping_index + 2\n\n    def _update_width(self, is_commit_in_existing_columns):\n        \"\"\"\n        Compute the width needed to display the graph for this commit. This is\n        the maximum width needed for any row. All other rows will be padded to\n        this width.\n\n        Compute the number of columns in the widest row: Count each existing\n        column (self.num_columns), and each new column added by this commit.\n        \"\"\"\n        max_cols = self.num_columns + self.num_parents\n\n        # Even if the current commit has no parents to be printed, it still\n        # takes up a column for itself.\n        if self.num_parents < 1:\n            max_cols += 1\n\n        # We added a column for the current commit as part of self.num_parents.\n        # If the current commit was already in self.columns, then we have double\n        # counted it.\n        if is_commit_in_existing_columns:\n            max_cols -= 1\n\n        # Each column takes up 2 spaces\n        self.width = max_cols * 2\n\n    def _update_columns(self):\n        \"\"\"\n        Swap self.columns with self.new_columns self.columns contains the state\n        for the previous commit, and new_columns now contains the state for our\n        commit.\n\n        We'll re-use the old columns array as storage to compute the new columns\n        list for the commit after this one.\n        \"\"\"\n        self.columns, self.new_columns = self.new_columns, self.columns\n        self.num_columns = self.num_new_columns\n        self.num_new_columns = 0\n\n        # Now update new_columns and mapping with the information for the commit\n        # after this one.\n        #\n        # First, make sure we have enough room. At most, there will be\n        # self.num_columns + self.num_parents columns for the next commit.\n        max_new_columns = self.num_columns + self.num_parents\n\n        # Clear out self.mapping\n        self.mapping_size = 2 * max_new_columns\n        for i in range(self.mapping_size):\n            self.mapping[i] = -1\n\n        # Populate self.new_columns and self.mapping\n        #\n        # Some of the parents of this commit may already be in self.columns. If\n        # so, self.new_columns should only contain a single entry for each such\n        # commit. self.mapping should contain information about where each\n        # current branch line is supposed to end up after the collapsing is\n        # performed.\n        seen_this = False\n        mapping_idx = 0\n        is_commit_in_columns = True\n        for i in range(self.num_columns + 1):\n            if i == self.num_columns:\n                if seen_this:\n                    break\n                is_commit_in_columns = False\n                col_commit = self.commit\n            else:\n                col_commit = self.columns[i].commit\n\n            if col_commit == self.commit:\n                old_mapping_idx = mapping_idx\n                seen_this = True\n                self.commit_index = i\n                for parent in self._interesting_parents():\n                    # If this is a merge, or the start of a new childless\n                    # column, increment the current color.\n                    if self.num_parents > 1 or not is_commit_in_columns:\n                        self._increment_column_color()\n                    mapping_idx = self._insert_into_new_columns(\n                        parent,\n                        mapping_idx)\n                # We always need to increment mapping_idx by at least 2, even if\n                # it has no interesting parents. The current commit always takes\n                # up at least 2 spaces.\n                if mapping_idx == old_mapping_idx:\n                    mapping_idx += 2\n            else:\n                mapping_idx = self._insert_into_new_columns(col_commit,\n                                                            mapping_idx)\n\n        # Shrink mapping_size to be the minimum necessary\n        while (self.mapping_size > 1 and\n               self.mapping[self.mapping_size - 1] < 0):\n            self.mapping_size -= 1\n\n        # Compute self.width for this commit\n        self._update_width(is_commit_in_columns)\n\n    def _update(self, commit, parents):\n        self.commit = commit\n        self.commit_parents = parents\n        self.num_parents = len(list(self._interesting_parents()))\n\n        # Store the old commit_index in prev_commit_index.\n        # update_columns() will update self.commit_index for this commit.\n        self.prev_commit_index = self.commit_index\n\n        # Call update_columns() to update\n        # columns, new_columns, and mapping.\n        self._update_columns()\n        self.expansion_row = 0\n\n        # Update self.state.\n        # Note that we don't call update_state() here, since we don't want to\n        # update self.prev_state. No line for self.state was ever printed.\n        #\n        # If the previous commit didn't get to the GraphState.PADDING state, it\n        # never finished its output. Goto GraphState.SKIP, to print out a line\n        # to indicate that portion of the graph is missing.\n        #\n        # If there are 3 or more parents, we may need to print extra rows before\n        # the commit, to expand the branch lines around it and make room for it.\n        # We need to do this only if there is a branch row (or more) to the\n        # right of this commit.\n        #\n        # If less than 3 parents, we can immediately print the commit line.\n        if self.state != GraphState.PADDING:\n            self.state = GraphState.SKIP\n        elif (self.num_parents >= 3 and\n              self.commit_index < (self.num_columns - 1)):\n            self.state = GraphState.PRE_COMMIT  # noqa: E501 pylint: disable=redefined-variable-type\n        else:\n            self.state = GraphState.COMMIT\n\n    def _is_mapping_correct(self):\n        \"\"\"\n        The mapping is up to date if each entry is at its target, or is 1\n        greater than its target. (If it is 1 greater than the target, '/' will\n        be printed, so it will look correct on the next row.)\n        \"\"\"\n        for i in range(self.mapping_size):\n            target = self.mapping[i]\n            if target < 0:\n                continue\n            if target == i // 2:\n                continue\n            return False\n        return True\n\n    def _pad_horizontally(self, chars_written):\n        \"\"\"Add spaces to string end so all lines of a commit have the same width.\n\n        This way, fields printed to the right of the graph will remain aligned\n        for the entire commit.\n        \"\"\"\n        if chars_written >= self.width:\n            return\n\n        extra = self.width - chars_written\n        self.buf += ' ' * extra\n\n    def _output_padding_line(self):\n        \"\"\"Output a padding row, that leaves all branch lines unchanged\n        \"\"\"\n        for i in range(self.num_new_columns):\n            self._write_column(self.new_columns[i], '|')\n            self.buf += ' '\n\n        self._pad_horizontally(self.num_new_columns * 2)\n\n    def _output_skip_line(self):\n        \"\"\"Output an ellipsis to indicate that a portion of the graph is missing.\n        \"\"\"\n        self.buf += '...'\n        self._pad_horizontally(3)\n\n        if self.num_parents >= 3 and self.commit_index < self.num_columns - 1:\n            self._update_state(GraphState.PRE_COMMIT)\n        else:\n            self._update_state(GraphState.COMMIT)\n\n    def _output_pre_commit_line(self):\n        \"\"\"Formats a row with increased space around a commit with multiple parents.\n\n        This is done in order to make room for the commit. It should only be\n        called when there are 3 or more parents. We need 2 extra rows for every\n        parent over 2.\n        \"\"\"\n        assert self.num_parents >= 3, 'not enough parents to add expansion row'\n        num_expansion_rows = (self.num_parents - 2) * 2\n\n        # self.expansion_row tracks the current expansion row we are on.\n        # It should be in the range [0, num_expansion_rows - 1]\n        assert (0 <= self.expansion_row < num_expansion_rows), \\\n            'wrong number of expansion rows'\n\n        # Output the row\n        seen_this = False\n        chars_written = 0\n        for i in range(self.num_columns):\n            col = self.columns[i]\n            if col.commit == self.commit:\n                seen_this = True\n                self._write_column(col, '|')\n                self.buf += ' ' * self.expansion_row\n                chars_written += 1 + self.expansion_row\n            elif seen_this and (self.expansion_row == 0):\n                # This is the first line of the pre-commit output. If the\n                # previous commit was a merge commit and ended in the\n                # GraphState.POST_MERGE state, all branch lines after\n                # self.prev_commit_index were printed as \"\\\" on the previous\n                # line. Continue to print them as \"\\\" on this line. Otherwise,\n                # print the branch lines as \"|\".\n                if (self.prev_state == GraphState.POST_MERGE and\n                        self.prev_commit_index < i):\n                    self._write_column(col, '\\\\')\n                else:\n                    self._write_column(col, '|')\n                chars_written += 1\n            elif seen_this and (self.expansion_row > 0):\n                self._write_column(col, '\\\\')\n                chars_written += 1\n            else:\n                self._write_column(col, '|')\n                chars_written += 1\n            self.buf += ' '\n            chars_written += 1\n\n        self._pad_horizontally(chars_written)\n\n        # Increment self.expansion_row, and move to state GraphState.COMMIT if\n        # necessary\n        self.expansion_row += 1\n        if self.expansion_row >= num_expansion_rows:\n            self._update_state(GraphState.COMMIT)\n\n    # Draw an octopus merge and return the number of characters written.\n    def _draw_octopus_merge(self):\n        \"\"\"\n        Here dashless_commits represents the number of parents which don't\n        need to have dashes (because their edges fit neatly under the commit).\n        \"\"\"\n        dashless_commits = 2\n        num_dashes = ((self.num_parents - dashless_commits) * 2) - 1\n        for i in range(num_dashes):\n            col_num = i // 2 + dashless_commits + self.commit_index\n            self._write_column(self.new_columns[col_num], '-')\n        col_num = num_dashes // 2 + dashless_commits + self.commit_index\n        self._write_column(self.new_columns[col_num], '.')\n        return num_dashes + 1\n\n    def _output_commit_line(self):  # noqa: C901, E501 pylint: disable=too-many-branches\n        \"\"\"\n        Output the row containing this commit Iterate up to and including\n        self.num_columns, since the current commit may not be in any of the\n        existing columns. (This happens when the current commit doesn't have\n        any children that we have already processed.)\n        \"\"\"\n        seen_this = False\n        chars_written = 0\n        for i in range(self.num_columns + 1):\n            if i == self.num_columns:\n                if seen_this:\n                    break\n                col_commit = self.commit\n            else:\n                col = self.columns[i]\n                col_commit = self.columns[i].commit\n\n            if col_commit == self.commit:\n                seen_this = True\n                self.buf += '*'\n                chars_written += 1\n\n                if self.num_parents > 2:\n                    chars_written += self._draw_octopus_merge()\n            elif seen_this and self.num_parents > 2:\n                self._write_column(col, '\\\\')\n                chars_written += 1\n            elif seen_this and self.num_parents == 2:\n                # This is a 2-way merge commit. There is no\n                # GraphState.PRE_COMMIT stage for 2-way merges, so this is the\n                # first line of output for this commit. Check to see what the\n                # previous line of output was.\n                #\n                # If it was GraphState.POST_MERGE, the branch line coming into\n                # this commit may have been '\\', and not '|' or '/'. If so,\n                # output the branch line as '\\' on this line, instead of '|'.\n                # This makes the output look nicer.\n                if (self.prev_state == GraphState.POST_MERGE and\n                        self.prev_commit_index < i):\n                    self._write_column(col, '\\\\')\n                else:\n                    self._write_column(col, '|')\n                chars_written += 1\n            else:\n                self._write_column(col, '|')\n                chars_written += 1\n            self.buf += ' '\n            chars_written += 1\n\n        self._pad_horizontally(chars_written)\n        if self.num_parents > 1:\n            self._update_state(GraphState.POST_MERGE)\n        elif self._is_mapping_correct():\n            self._update_state(GraphState.PADDING)\n        else:\n            self._update_state(GraphState.COLLAPSING)\n\n    def _find_new_column_by_commit(self, commit):\n        for i in range(self.num_new_columns):\n            if self.new_columns[i].commit == commit:\n                return self.new_columns[i]\n        return None\n\n    def _output_post_merge_line(self):\n        seen_this = False\n        chars_written = 0\n        for i in range(self.num_columns + 1):\n            if i == self.num_columns:\n                if seen_this:\n                    break\n                col_commit = self.commit\n            else:\n                col = self.columns[i]\n                col_commit = col.commit\n\n            if col_commit == self.commit:\n                # Since the current commit is a merge find the columns for the\n                # parent commits in new_columns and use those to format the\n                # edges.\n                seen_this = True\n                parents = self._interesting_parents()\n                assert parents, 'merge has no parents'\n                par_column = self._find_new_column_by_commit(next(parents))\n                assert par_column, 'parent column not found'\n                self._write_column(par_column, '|')\n                chars_written += 1\n                for parent in parents:\n                    assert parent, 'parent is not valid'\n                    par_column = self._find_new_column_by_commit(parent)\n                    assert par_column, 'parent column not found'\n                    self._write_column(par_column, '\\\\')\n                    self.buf += ' '\n                chars_written += (self.num_parents - 1) * 2\n            elif seen_this:\n                self._write_column(col, '\\\\')\n                self.buf += ' '\n                chars_written += 2\n            else:\n                self._write_column(col, '|')\n                self.buf += ' '\n                chars_written += 2\n\n        self._pad_horizontally(chars_written)\n\n        if self._is_mapping_correct():\n            self._update_state(GraphState.PADDING)\n        else:\n            self._update_state(GraphState.COLLAPSING)\n\n    def _output_collapsing_line(self):  # noqa: C901, E501 pylint: disable=too-many-branches\n        used_horizontal = False\n        horizontal_edge = -1\n        horizontal_edge_target = -1\n\n        # Clear out the new_mapping array\n        for i in range(self.mapping_size):\n            self.new_mapping[i] = -1\n\n        for i in range(self.mapping_size):\n            target = self.mapping[i]\n            if target < 0:\n                continue\n\n            # Since update_columns() always inserts the leftmost column first,\n            # each branch's target location should always be either its current\n            # location or to the left of its current location.\n            #\n            # We never have to move branches to the right. This makes the graph\n            # much more legible, since whenever branches cross, only one is\n            # moving directions.\n            assert target * 2 <= i, \\\n                'position {} targetting column {}'.format(i, target * 2)\n\n            if target * 2 == i:\n                # This column is already in the correct place\n                assert self.new_mapping[i] == -1\n                self.new_mapping[i] = target\n            elif self.new_mapping[i - 1] < 0:\n                # Nothing is to the left. Move to the left by one.\n                self.new_mapping[i - 1] = target\n                # If there isn't already an edge moving horizontally select this one.\n                if horizontal_edge == -1:\n                    horizontal_edge = i\n                    horizontal_edge_target = target\n                    # The variable target is the index of the graph column, and\n                    # therefore target * 2 + 3 is the actual screen column of\n                    # the first horizontal line.\n                    for j in range((target * 2) + 3, i - 2, 2):\n                        self.new_mapping[j] = target\n            elif self.new_mapping[i - 1] == target:\n                # There is a branch line to our left already, and it is our\n                # target. We combine with this line, since we share the same\n                # parent commit.\n                #\n                # We don't have to add anything to the output or new_mapping,\n                # since the existing branch line has already taken care of it.\n                pass\n            else:\n                # There is a branch line to our left, but it isn't our target.\n                # We need to cross over it.\n                #\n                # The space just to the left of this branch should always be empty.\n                #\n                # The branch to the left of that space should be our eventual target.\n                assert self.new_mapping[i - 1] > target\n                assert self.new_mapping[i - 2] < 0\n                assert self.new_mapping[i - 3] == target\n                self.new_mapping[i - 2] = target\n                # Mark this branch as the horizontal edge to prevent any other\n                # edges from moving horizontally.\n                if horizontal_edge == -1:\n                    horizontal_edge = i\n\n        # The new mapping may be 1 smaller than the old mapping\n        if self.new_mapping[self.mapping_size - 1] < 0:\n            self.mapping_size -= 1\n\n        # Output a line based on the new mapping info\n        for i in range(self.mapping_size):\n            target = self.new_mapping[i]\n            if target < 0:\n                self.buf += ' '\n            elif target * 2 == i:\n                self._write_column(self.new_columns[target], '|')\n            elif target == horizontal_edge_target and i != horizontal_edge - 1:\n                # Set the mappings for all but the first segment to -1 so that\n                # they won't continue into the next line.\n                if i != (target * 2) + 3:\n                    self.new_mapping[i] = -1\n                used_horizontal = True\n                self._write_column(self.new_columns[target], '_')\n            else:\n                if used_horizontal and i < horizontal_edge:\n                    self.new_mapping[i] = -1\n                self._write_column(self.new_columns[target], '/')\n\n        self._pad_horizontally(self.mapping_size)\n        self.mapping, self.new_mapping = self.new_mapping, self.mapping\n\n        # If self.mapping indicates that all of the branch lines are already in\n        # the correct positions, we are done. Otherwise, we need to collapse\n        # some branch lines together.\n        if self._is_mapping_correct():\n            self._update_state(GraphState.PADDING)\n\n    def _next_line(self):  # pylint: disable=too-many-return-statements\n        if self.state == GraphState.PADDING:\n            self._output_padding_line()\n            return False\n        elif self.state == GraphState.SKIP:\n            self._output_skip_line()\n            return False\n        elif self.state == GraphState.PRE_COMMIT:\n            self._output_pre_commit_line()\n            return False\n        elif self.state == GraphState.COMMIT:\n            self._output_commit_line()\n            return True\n        elif self.state == GraphState.POST_MERGE:\n            self._output_post_merge_line()\n            return False\n        elif self.state == GraphState.COLLAPSING:\n            self._output_collapsing_line()\n            return False\n        else:\n            return False\n\n    def _padding_line(self):\n        \"\"\"Output a padding line in the graph.\n\n        This is similar to next_line(). However, it is guaranteed to never print\n        the current commit line. Instead, if the commit line is next, it will\n        simply output a line of vertical padding, extending the branch lines\n        downwards, but leaving them otherwise unchanged.\n        \"\"\"\n        if self.state != GraphState.COMMIT:\n            self._next_line()\n            return\n\n        # Output the row containing this commit\n        # Iterate up to and including self.num_columns, since the current commit\n        # may not be in any of the existing columns. (This happens when the\n        # current commit doesn't have any children that we have already\n        # processed.)\n        for i in range(self.num_columns):\n            col = self.columns[i]\n            self._write_column(col, '|')\n            if col.commit == self.commit and self.num_parents > 2:\n                self.buf += ' ' * (self.num_parents - 2) * 2\n            else:\n                self.buf += ' '\n\n        self._pad_horizontally(self.num_columns)\n\n        # Update self.prev_state since we have output a padding line\n        self.prev_state = GraphState.PADDING\n\n    def _is_commit_finished(self):\n        return self.state == GraphState.PADDING\n\n    def _show_commit(self):\n        shown_commit_line = False\n\n        # When showing a diff of a merge against each of its parents, we are\n        # called once for each parent without update having been called. In this\n        # case, simply output a single padding line.\n        if self._is_commit_finished():\n            self._show_padding()\n            shown_commit_line = True\n\n        while not shown_commit_line and not self._is_commit_finished():\n            shown_commit_line = self._next_line()\n            self.outfile.write(self.buf)\n            if not shown_commit_line:\n                self.outfile.write('\\n')\n            self.buf = ''\n\n    def _show_padding(self):\n        self._padding_line()\n        self.outfile.write(self.buf)\n        self.buf = ''\n\n    def _show_remainder(self):\n        shown = False\n\n        if self._is_commit_finished():\n            return False\n\n        while True:\n            self._next_line()\n            self.outfile.write(self.buf)\n            self.buf = ''\n            shown = True\n\n            if not self._is_commit_finished():\n                self.outfile.write('\\n')\n            else:\n                break\n\n        return shown\n"
  },
  {
    "path": "src/hangar/diagnostics/integrity.py",
    "content": "from pathlib import Path\nimport warnings\n\nimport lmdb\nfrom tqdm import tqdm\n\nfrom ..records import (\n    hash_data_db_key_from_raw_key,\n    hash_schema_db_key_from_raw_key,\n)\nfrom ..backends import BACKEND_ACCESSOR_MAP\nfrom ..txnctx import TxnRegister\nfrom ..records import commiting, hashmachine, hashs, parsing, queries, heads\nfrom ..op_state import report_corruption_risk_on_parsing_error\n\n\n@report_corruption_risk_on_parsing_error\ndef _verify_column_integrity(hashenv: lmdb.Environment, repo_path: Path):\n\n    hq = hashs.HashQuery(hashenv)\n    narrays, nremote = hq.num_data_records(), 0\n    array_kvs = hq.gen_all_data_digests_and_parsed_backend_specs()\n    try:\n        bes = {}\n        for digest, spec in tqdm(array_kvs, total=narrays, desc='verifying column data'):\n            if spec.backend not in bes:\n                bes[spec.backend] = BACKEND_ACCESSOR_MAP[spec.backend](repo_path, None, None)\n                bes[spec.backend].open(mode='r')\n            if spec.islocal is False:\n                nremote += 1\n                continue\n            data = bes[spec.backend].read_data(spec)\n            tcode = hashmachine.hash_type_code_from_digest(digest)\n\n            hash_func = hashmachine.hash_func_from_tcode(tcode)\n            calc_digest = hash_func(data)\n            if calc_digest != digest:\n                raise RuntimeError(\n                    f'Data corruption detected for array. Expected digest `{digest}` '\n                    f'currently mapped to spec `{spec}`. Found digest `{calc_digest}`')\n        if nremote > 0:\n            warnings.warn(\n                'Can not verify integrity of partially fetched array data references. '\n                f'For complete proof, fetch all remote data locally. Did not verify '\n                f'{nremote}/{narrays} arrays', RuntimeWarning)\n    finally:\n        for be in bes.keys():\n            bes[be].close()\n\n\n@report_corruption_risk_on_parsing_error\ndef _verify_schema_integrity(hashenv: lmdb.Environment):\n\n    hq = hashs.HashQuery(hashenv)\n    schema_kvs = hq.gen_all_schema_digests_and_parsed_specs()\n    nschemas = hq.num_schema_records()\n    for digest, val in tqdm(schema_kvs, total=nschemas, desc='verifying schemas'):\n        tcode = hashmachine.hash_type_code_from_digest(digest)\n        hash_func = hashmachine.hash_func_from_tcode(tcode)\n        calc_digest = hash_func(val)\n        if calc_digest != digest:\n            raise RuntimeError(\n                f'Data corruption detected for schema. Expected digest `{digest}` '\n                f'currently mapped to spec `{val}`. Found digest `{calc_digest}`')\n\n\n@report_corruption_risk_on_parsing_error\ndef _verify_commit_tree_integrity(refenv: lmdb.Environment):\n\n    initialCmt = None\n    all_commits = set(commiting.list_all_commits(refenv))\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        for cmt in tqdm(all_commits, desc='verifying commit trees'):\n            pKey = parsing.commit_parent_db_key_from_raw_key(cmt)\n            pVal = reftxn.get(pKey, default=False)\n            if pVal is False:\n                raise RuntimeError(\n                    f'Data corruption detected for parent ref of commit `{cmt}`. '\n                    f'Parent ref not recorded in refs db.')\n\n            p_val = parsing.commit_parent_raw_val_from_db_val(pVal)\n            parents = p_val.ancestor_spec\n            if parents.master_ancestor != '':\n                if parents.master_ancestor not in all_commits:\n                    raise RuntimeError(\n                        f'Data corruption detected in commit tree. Commit `{cmt}` '\n                        f'with ancestors val `{parents}` references non-existing '\n                        f'master ancestor `{parents.master_ancestor}`.')\n            if parents.dev_ancestor != '':\n                if parents.dev_ancestor not in all_commits:\n                    raise RuntimeError(\n                        f'Data corruption detected in commit tree. Commit `{cmt}` '\n                        f'with ancestors val `{parents}` references non-existing '\n                        f'dev ancestor `{parents.dev_ancestor}`.')\n            if (parents.master_ancestor == '') and (parents.dev_ancestor == ''):\n                if initialCmt is not None:\n                    raise RuntimeError(\n                        f'Commit tree integrity compromised. Multiple \"initial\" (commits '\n                        f'with no parents) found. First `{initialCmt}`, second `{cmt}`')\n                else:\n                    initialCmt = cmt\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n\n\n@report_corruption_risk_on_parsing_error\ndef _verify_commit_ref_digests_exist(hashenv: lmdb.Environment, refenv: lmdb.Environment):\n\n    all_commits = commiting.list_all_commits(refenv)\n    datatxn = TxnRegister().begin_reader_txn(hashenv, buffer=True)\n    try:\n        with datatxn.cursor() as cur:\n            for cmt in tqdm(all_commits, desc='verifying commit ref digests'):\n                with commiting.tmp_cmt_env(refenv, cmt) as tmpDB:\n                    rq = queries.RecordQuery(tmpDB)\n                    array_data_digests = set(rq.data_hashes())\n                    schema_digests = set(rq.schema_hashes())\n\n                    for datadigest in array_data_digests:\n                        dbk = hash_data_db_key_from_raw_key(datadigest)\n                        exists = cur.set_key(dbk)\n                        if exists is False:\n                            raise RuntimeError(\n                                f'Data corruption detected in commit refs. Commit `{cmt}` '\n                                f'references array data digest `{datadigest}` which does not '\n                                f'exist in data hash db.')\n\n                    for schemadigest in schema_digests:\n                        dbk = hash_schema_db_key_from_raw_key(schemadigest)\n                        exists = cur.set_key(dbk)\n                        if exists is False:\n                            raise RuntimeError(\n                                f'Data corruption detected in commit refs. Commit `{cmt}` '\n                                f'references schema digest `{schemadigest}` which does not '\n                                f'exist in data hash db.')\n\n    finally:\n        TxnRegister().abort_reader_txn(hashenv)\n\n\n@report_corruption_risk_on_parsing_error\ndef _verify_branch_integrity(branchenv: lmdb.Environment, refenv: lmdb.Environment):\n\n    branch_names = heads.get_branch_names(branchenv)\n    if len(branch_names) < 1:\n        raise RuntimeError(\n            f'Branch map compromised. Repo must contain atleast one branch. '\n            f'Found {len(branch_names)} branches.')\n\n    for bname in tqdm(branch_names, desc='verifying branches'):\n        bhead = heads.get_branch_head_commit(branchenv=branchenv, branch_name=bname)\n        exists = commiting.check_commit_hash_in_history(refenv=refenv, commit_hash=bhead)\n        if exists is False:\n            raise RuntimeError(\n                f'Branch commit map compromised. Branch name `{bname}` references '\n                f'commit digest `{bhead}` which does not exist in refs db.')\n\n    staging_bname = heads.get_staging_branch_head(branchenv)\n    if staging_bname not in branch_names:\n        raise RuntimeError(\n            f'Brach commit map compromised. Staging head refers to branch name '\n            f'`{staging_bname}` which does not exist in the branch db.')\n\n\ndef run_verification(branchenv: lmdb.Environment,\n                     hashenv: lmdb.Environment,\n                     refenv: lmdb.Environment,\n                     repo_path: Path):\n\n    _verify_branch_integrity(branchenv, refenv)\n    _verify_commit_tree_integrity(refenv)\n    _verify_commit_ref_digests_exist(hashenv, refenv)\n    _verify_schema_integrity(hashenv)\n    _verify_column_integrity(hashenv, repo_path)\n"
  },
  {
    "path": "src/hangar/diff.py",
    "content": "from itertools import starmap\nfrom typing import Iterable, List, NamedTuple, Set, Tuple, Union\n\nimport lmdb\n\nfrom .records import (\n    dynamic_layout_data_record_from_db_key,\n    schema_column_record_from_db_key,\n    data_record_digest_val_from_db_val,\n    ColumnSchemaKey,\n    FlatColumnDataKey,\n    NestedColumnDataKey,\n)\nfrom .records.commiting import (\n    check_commit_hash_in_history,\n    get_commit_ancestors_graph,\n    get_commit_ref,\n    get_commit_spec,\n    tmp_cmt_env,\n)\nfrom .records.heads import get_branch_head_commit, get_branch_names\nfrom .records.queries import RecordQuery\nfrom .txnctx import TxnRegister\n\n# ------------------------- Differ Types --------------------------------------\n\n\nclass HistoryDiffStruct(NamedTuple):\n    masterHEAD: str\n    devHEAD: str\n    ancestorHEAD: str\n    canFF: bool\n\n\nclass Changes(NamedTuple):\n    schema: dict\n    samples: tuple\n\n\nclass DiffOutDB(NamedTuple):\n    added: Set[Tuple[bytes, bytes]]\n    deleted: Set[Tuple[bytes, bytes]]\n    mutated: Set[Tuple[bytes, bytes]]\n\n\nclass DiffOut(NamedTuple):\n    added: Changes\n    deleted: Changes\n    mutated: Changes\n\n\nConflictKeys = Union[str, FlatColumnDataKey, NestedColumnDataKey, ColumnSchemaKey]\n\n\nclass Conflicts(NamedTuple):\n    \"\"\"Four types of conflicts are accessible through this object.\n\n    Attributes\n    ----------\n    t1\n        Addition of key in master AND dev with different values.\n    t21\n        Removed key in master, mutated value in dev.\n    t22\n        Removed key in dev, mutated value in master.\n    t3\n        Mutated key in both master AND dev to different values.\n    conflict\n        Bool indicating if any type of conflict is present.\n    \"\"\"\n    t1: Iterable[ConflictKeys]\n    t21: Iterable[ConflictKeys]\n    t22: Iterable[ConflictKeys]\n    t3: Iterable[ConflictKeys]\n    conflict: bool\n\n\nclass DiffAndConflictsDB(NamedTuple):\n    diff: DiffOutDB\n    conflict: Conflicts\n\n\nclass DiffAndConflicts(NamedTuple):\n    diff: DiffOut\n    conflict: Conflicts\n\n\n# ------------------------------- Differ Methods ------------------------------\n\n\ndef diff_envs(base_env: lmdb.Environment, head_env: lmdb.Environment, ) -> DiffOutDB:\n    \"\"\"Main diff algorithm to determine changes between unpacked lmdb environments.\n\n    Parameters\n    ----------\n    base_env : lmdb.Environment\n        starting point to calculate changes from\n    head_env : lmdb.Environment\n        some commit which should be compared to BASE\n\n    Returns\n    -------\n    DiffOutDB\n        iterable of db formatted key/value pairs for `added`, `deleted`,\n        `mutated` fields\n    \"\"\"\n    added, deleted, mutated = [], [], []\n\n    baseTxn = TxnRegister().begin_reader_txn(base_env)\n    headTxn = TxnRegister().begin_reader_txn(head_env)\n    baseCur = baseTxn.cursor()\n    headCur = headTxn.cursor()\n    try:\n        moreBase = baseCur.first()\n        moreHead = headCur.first()\n\n        while True:\n            if moreBase and moreHead:\n                bKey, bVal = baseCur.item()\n                hKey, hVal = headCur.item()\n            elif (not moreBase) and (not moreHead):\n                break\n            # necessary to avoid deadlock at last items\n            elif not moreBase:\n                bKey = b'x'\n                bVal = b''\n                hKey, hVal = headCur.item()\n            else:  # (not moreHead)\n                hKey = b'x'\n                hVal = b''\n                bKey, bVal = baseCur.item()\n\n            # inserted\n            if bKey > hKey:\n                added.append((hKey, hVal))\n                moreHead = headCur.next()\n                continue\n            # deleted\n            elif bKey < hKey:\n                deleted.append((bKey, bVal))\n                moreBase = baseCur.next()\n                continue\n            # no change\n            elif (bKey == hKey) and (bVal == hVal):\n                moreBase = baseCur.next()\n                moreHead = headCur.next()\n                continue\n            # mutated\n            else:  # (bKey == hKey) and (bVal != hVal)\n                mutated.append((hKey, hVal))\n                moreBase = baseCur.next()\n                moreHead = headCur.next()\n                continue\n\n    finally:\n        baseCur.close()\n        headCur.close()\n        TxnRegister().abort_reader_txn(base_env)\n        TxnRegister().abort_reader_txn(head_env)\n\n    return DiffOutDB(set(added), set(deleted), set(mutated))\n\n\ndef _raw_from_db_change(changes: Set[Tuple[bytes, bytes]]) -> Changes:\n    \"\"\"Perform conversion for records from db -> raw\n\n    Parameters\n    ----------\n    changes : Set[Tuple[bytes, bytes]]\n        iterable of db formatted key/value pairs\n\n    Returns\n    -------\n    Changes\n        human readable formatted dict of key/value pairs.\n    \"\"\"\n    columnKeys, metadataKeys, schemaKeyVals = [], [], []\n    for k, v in changes:\n        if k[:2] == b'f:':\n            columnKeys.append(k)\n            continue\n        elif k[:2] == b'n:':\n            columnKeys.append(k)\n            continue\n        elif k[:2] == b's:':\n            schemaKeyVals.append((k, v))\n            continue\n        else:\n            raise RuntimeError(f'Unknown record type prefix encountered: '\n                               f'{k[:2]}. full record => k: {k} & v: {v}')\n\n    columndata = map(dynamic_layout_data_record_from_db_key, columnKeys)\n    schemas = {\n        schema_column_record_from_db_key(k):\n            data_record_digest_val_from_db_val(v) for k, v in schemaKeyVals\n    }\n    return Changes(schema=schemas, samples=tuple(columndata))\n\n\ndef _all_raw_from_db_changes(outDb: DiffAndConflictsDB) -> DiffAndConflicts:\n    \"\"\"Convert db formatted db diff/conflict results to human readable\n\n    Parameters\n    ----------\n    outDb : DiffAndConflictsDB\n        raw formatted structure containg `diff` and `conflict` fields\n\n    Returns\n    -------\n    DiffAndConflicts\n        Human readable struct containing ``diff`` and ``conflict`` fields.\n    \"\"\"\n    it = (outDb.diff.added, outDb.diff.deleted, outDb.diff.mutated)\n    out = map(_raw_from_db_change, it)  # significant perf improvement for large commits\n    outRawDiff = DiffOut(*out)\n\n    t1 = _raw_from_db_change(outDb.conflict.t1)\n    t21 = _raw_from_db_change(outDb.conflict.t21)\n    t22 = _raw_from_db_change(outDb.conflict.t22)\n    t3 = _raw_from_db_change(outDb.conflict.t3)\n    outRawConf = Conflicts(t1=t1, t21=t21, t22=t22, t3=t3, conflict=outDb.conflict.conflict)\n    res = DiffAndConflicts(diff=outRawDiff, conflict=outRawConf)\n    return res\n\n# ------------------------- Commit Differ -------------------------------------\n\n\ndef _symmetric_difference_keys(pair1: Set[Tuple[bytes, bytes]],\n                               pair2: Set[Tuple[bytes, bytes]]\n                               ) -> List[Tuple[bytes, bytes]]:\n    \"\"\"Find all keys common to both input pairs AND which have different values.\n\n    Essentially a moddified `symmetric_difference` set operation, which keeps\n    track of all seen items. Note: This ignores any `count` tracking values in\n    the input tuples (ie. lmdb keys ending in \":\")\n\n    Parameters\n    ----------\n    pair1 : Set[Tuple[bytes, bytes]]\n        key/value pairs making up the first set\n    pair2 : Set[Tuple[bytes, bytes]]\n        key/value pairs making up the second set\n\n    Returns\n    -------\n    List[Tuple[bytes, bytes]]\n        keys which appear in both input pair sets but which have different values.\n    \"\"\"\n    seen = set()\n    conflict = []\n    for k, v in pair1.symmetric_difference(pair2):\n        if k in seen:\n            conflict.append((k, v))\n        else:\n            seen.add(k)\n    return conflict\n\n\ndef find_conflicts(master_diff: DiffOutDB, dev_diff: DiffOutDB) -> Conflicts:\n    \"\"\"Determine if/which type of conflicting changes occur in diverged commits.\n\n    This function expects the output of :func:`diff_envs` for two commits\n    between a base commit.\n\n    Parameters\n    ----------\n    master_diff : DiffOutDB\n        changes (adds, dels, mutations) between base and master HEAD\n    dev_diff : DiffOutDB\n        changes (adds, dels, mutations) between base and dev HEAD\n\n    Returns\n    -------\n    Conflicts\n        Tuple containing fields for `t1`, `t21`, `t22`, `t3`, and (bool)\n        `conflicts` recording output info for if and what type of conflict has\n        occured\n    \"\"\"\n    t1 = _symmetric_difference_keys(master_diff.added, dev_diff.added)\n    t21 = _symmetric_difference_keys(master_diff.deleted, dev_diff.mutated)\n    t22 = _symmetric_difference_keys(master_diff.mutated, dev_diff.deleted)\n    t3 = _symmetric_difference_keys(master_diff.mutated, dev_diff.mutated)\n    isConflict = bool(any([t1, t21, t22, t3]))\n\n    res = Conflicts(t1=t1, t21=t21, t22=t22, t3=t3, conflict=isConflict)\n    return res\n\n\n# ---------------------------- Differ Base  -----------------------------------\n\n\nclass BaseUserDiff(object):\n\n    def __init__(self, branchenv: lmdb.Environment, refenv: lmdb.Environment, *args, **kwargs):\n\n        self._branchenv: lmdb.Environment = branchenv\n        self._refenv: lmdb.Environment = refenv\n\n    def _determine_ancestors(self, mHEAD: str, dHEAD: str) -> HistoryDiffStruct:\n        \"\"\"Search the commit history to determine the closest common ancestor.\n\n        The closest common ancestor is important because it serves as the \"merge\n        base\" in a 3-way merge strategy. This is a very naive implementation, but it\n        works well enough right now for simple branch histories.\n\n        Parameters\n        ----------\n        mHEAD : str\n            full commit hash to use as the `master` branch head commit\n        dHEAD : str\n            full commit hash to use as the `dev` branch head commit\n\n        Returns\n        -------\n        HistoryDiffStruct\n            indicating the masterHEAD, devHEAD, ancestorHEAD, and canFF which\n            tells if this is a fast-forward-able commit.\n        \"\"\"\n        mAncestors = get_commit_ancestors_graph(self._refenv, mHEAD)\n        dAncestors = get_commit_ancestors_graph(self._refenv, dHEAD)\n        cAncestors = set(mAncestors.keys()).intersection(set(dAncestors.keys()))\n        canFF = True if mHEAD in cAncestors else False\n\n        ancestorOrder = []\n        for ancestor in cAncestors:\n            timeOfCommit = get_commit_spec(self._refenv, ancestor).commit_time\n            ancestorOrder.append((ancestor, timeOfCommit))\n\n        ancestorOrder.sort(key=lambda t: t[1], reverse=True)\n        commonAncestor = ancestorOrder[0][0]\n        res = HistoryDiffStruct(\n            masterHEAD=mHEAD, devHEAD=dHEAD, ancestorHEAD=commonAncestor, canFF=canFF)\n        return res\n\n    @staticmethod\n    def _diff3(a_env: lmdb.Environment,\n               m_env: lmdb.Environment,\n               d_env: lmdb.Environment) -> DiffAndConflictsDB:\n        \"\"\"Three way diff and conflict finder from ancestor, master, and dev commits.\n\n        Parameters\n        ----------\n        a_env : lmdb.Environment\n            unpacked lmdb environment for the ancestor commit\n        m_env : lmdb.Environment\n            unpacked lmdb environment for the master commit, current HEAD\n        d_env : lmdb.Environment\n            unpacked lmdb environment for the dev commit, compare to HEAD\n\n        Returns\n        -------\n        DiffAndConflictsDB\n            structure containing (`additions`, `deletions`, `mutations`) for\n            diff, as well as the ConflictRecord struct.\n        \"\"\"\n        it = ((a_env, m_env), (a_env, d_env), (d_env, m_env))\n        diffs = tuple(starmap(diff_envs, it))  # significant perf improvement by map.\n        conflict = find_conflicts(diffs[0], diffs[1])\n        return DiffAndConflictsDB(diff=diffs[2], conflict=conflict)\n\n    @staticmethod\n    def _diff(a_env: lmdb.Environment, m_env: lmdb.Environment) -> DiffAndConflictsDB:\n        \"\"\"Fast Forward differ from ancestor to master commit.\n\n        Note: this method returns the same MasterDevDiff struct as the three\n        way commit diff method, but the `dev` and `conflicts` fields will be\n        empty\n\n        Parameters\n        ----------\n        a_env : lmdb.Environment\n            unpacked lmdb environment for the ancestor commit\n        m_env : lmdb.Environment\n            unpacked lmdb environment for the master commit\n\n        Returns\n        -------\n        DiffAndConflictsDB\n            structure containing (`additions`, `deletions`, `mutations`) for\n            the ancestor -> master (head) env diff\n        \"\"\"\n        m_diff = diff_envs(a_env, m_env)\n        conflict = Conflicts(t1=[], t21=[], t22=[], t3=[], conflict=False)\n        return DiffAndConflictsDB(diff=m_diff, conflict=conflict)\n\n\n# ------------------------ Read-Only Checkouts Only ---------------------------\n\n\nclass ReaderUserDiff(BaseUserDiff):\n    \"\"\"Methods diffing contents of a :class:`~hangar.checkout.ReaderCheckout` instance.\n\n    These provide diffing implementations to compare the current checkout\n    ``HEAD`` of a to a branch or commit. The results are generally returned as\n    a nested set of named tuples.\n\n    When diffing of commits or branches is performed, if there is not a linear\n    history of commits between current ``HEAD`` and the diff commit (ie. a\n    history which would permit a ``\"fast-forward\" merge``), the result field\n    named ``conflict`` will contain information on any merge conflicts that\n    would exist if staging area ``HEAD`` and the (compared) ``\"dev\" HEAD`` were\n    merged \"right now\". Though this field is present for all diff comparisons,\n    it can only contain non-empty values in the cases where a three way merge\n    would need to be performed.\n\n    ::\n\n       Fast Forward is Possible\n       ========================\n\n           (master)          (foo)\n       a ----- b ----- c ----- d\n\n\n       3-Way Merge Required\n       ====================\n\n                            (master)\n       a ----- b ----- c ----- d\n               \\\\\n                \\\\               (foo)\n                 \\\\----- ee ----- ff\n    \"\"\"\n\n    def __init__(self, commit_hash, *args, **kwargs):\n\n        super().__init__(*args, **kwargs)\n        self._commit_hash = commit_hash\n\n    def _run_diff(self, dev_commit_hash: str) -> DiffAndConflictsDB:\n        \"\"\"Compute diff between head and commit hash, returning DB formatted results\n\n        Parameters\n        ----------\n        dev_commit_hash : str\n            hash of the commit to be used as the comparison.\n\n        Returns\n        -------\n        DiffAndConflictsDB\n            two-tuple of `diff`, `conflict` (if any) calculated in the diff\n            algorithm.\n        \"\"\"\n        hist = self._determine_ancestors(self._commit_hash, dev_commit_hash)\n        mH, dH, aH = hist.masterHEAD, hist.devHEAD, hist.ancestorHEAD\n        with tmp_cmt_env(self._refenv, mH) as m_env, tmp_cmt_env(self._refenv, dH) as d_env:\n            if hist.canFF is True:\n                outDb = self._diff(m_env, d_env)\n            else:\n                with tmp_cmt_env(self._refenv, aH) as a_env:\n                    outDb = self._diff3(a_env, m_env, d_env)\n        return outDb\n\n    def commit(self, dev_commit_hash: str) -> DiffAndConflicts:\n        \"\"\"Compute diff between HEAD and commit hash, returning user-facing results.\n\n        Parameters\n        ----------\n        dev_commit_hash : str\n            hash of the commit to be used as the comparison.\n\n        Returns\n        -------\n        DiffAndConflicts\n            two-tuple of ``diff``, ``conflict`` (if any) calculated in the diff\n            algorithm.\n\n        Raises\n        ------\n        ValueError\n            if the specified ``dev_commit_hash`` is not a valid commit reference.\n        \"\"\"\n        if not check_commit_hash_in_history(self._refenv, dev_commit_hash):\n            msg = f'HANGAR VALUE ERROR: dev_commit_hash: {dev_commit_hash} does not exist'\n            raise ValueError(msg)\n\n        outDb = self._run_diff(dev_commit_hash=dev_commit_hash)\n        outRaw = _all_raw_from_db_changes(outDb)\n        return outRaw\n\n    def branch(self, dev_branch: str) -> DiffAndConflicts:\n        \"\"\"Compute diff between HEAD and branch name, returning user-facing results.\n\n        Parameters\n        ----------\n        dev_branch : str\n            name of the branch whose HEAD will be used to calculate the diff of.\n\n        Returns\n        -------\n        DiffAndConflicts\n            two-tuple of ``diff``, ``conflict`` (if any) calculated in the diff\n            algorithm.\n\n        Raises\n        ------\n        ValueError\n            If the specified `dev_branch` does not exist.\n        \"\"\"\n        branchNames = get_branch_names(self._branchenv)\n        if dev_branch in branchNames:\n            dHEAD = get_branch_head_commit(self._branchenv, dev_branch)\n        else:\n            msg = f'HANGAR VALUE ERROR: dev_branch: {dev_branch} invalid branch name'\n            raise ValueError(msg)\n\n        outDb = self._run_diff(dev_commit_hash=dHEAD)\n        outRaw = _all_raw_from_db_changes(outDb)\n        return outRaw\n\n\n# ---------------------- Write Enabled Checkouts Only -------------------------\n\n\nclass WriterUserDiff(BaseUserDiff):\n    \"\"\"Methods diffing contents of a :class:`~hangar.checkout.WriterCheckout` instance.\n\n    These provide diffing implementations to compare the current ``HEAD`` of a\n    checkout to a branch, commit, or the staging area ``\"base\"`` contents. The\n    results are generally returned as a nested set of named tuples. In\n    addition, the :meth:`status` method is implemented which can be used to\n    quickly determine if there are any uncommitted changes written in the\n    checkout.\n\n    When diffing of commits or branches is performed, if there is not a linear\n    history of commits between current ``HEAD`` and the diff commit (ie. a\n    history which would permit a ``\"fast-forward\" merge``), the result field\n    named ``conflict`` will contain information on any merge conflicts that\n    would exist if staging area ``HEAD`` and the (compared) ``\"dev\" HEAD`` were\n    merged \"right now\". Though this field is present for all diff comparisons,\n    it can only contain non-empty values in the cases where a three way merge\n    would need to be performed.\n\n    ::\n\n       Fast Forward is Possible\n       ========================\n\n           (master)          (foo)\n       a ----- b ----- c ----- d\n\n\n       3-Way Merge Required\n       ====================\n\n                            (master)\n       a ----- b ----- c ----- d\n               \\\\\n                \\\\               (foo)\n                 \\\\----- ee ----- ff\n    \"\"\"\n\n    def __init__(self, stageenv: lmdb.Environment, branch_name: str, *args, **kwargs):\n\n        super().__init__(*args, **kwargs)\n        self._stageenv: lmdb.Environment = stageenv\n        self._branch_name: str = branch_name\n\n    def _run_diff(self, dev_commit_hash: str) -> DiffAndConflictsDB:\n        \"\"\"Compute diff between head and commit, returning DB formatted results.\n\n        Parameters\n        ----------\n        dev_commit_hash : str\n            hash of the commit to be used as the comparison.\n\n        Returns\n        -------\n        DiffAndConflictsDB\n            two-tuple of `diff`, `conflict` (if any) calculated in the diff\n            algorithm.\n        \"\"\"\n        commit_hash = get_branch_head_commit(self._branchenv, self._branch_name)\n        hist = self._determine_ancestors(commit_hash, dev_commit_hash)\n        with tmp_cmt_env(self._refenv, hist.devHEAD) as d_env:\n            if hist.canFF is True:\n                res = self._diff(self._stageenv, d_env)\n            else:\n                with tmp_cmt_env(self._refenv, hist.ancestorHEAD) as a_env:\n                    res = self._diff3(a_env, self._stageenv, d_env)\n        return res\n\n    def commit(self, dev_commit_hash: str) -> DiffAndConflicts:\n        \"\"\"Compute diff between HEAD and commit, returning user-facing results.\n\n        Parameters\n        ----------\n        dev_commit_hash : str\n            hash of the commit to be used as the comparison.\n\n        Returns\n        -------\n        DiffAndConflicts\n            two-tuple of ``diff``, ``conflict`` (if any) calculated in the diff\n            algorithm.\n\n        Raises\n        ------\n        ValueError\n            if the specified ``dev_commit_hash`` is not a valid commit reference.\n        \"\"\"\n        if not check_commit_hash_in_history(self._refenv, dev_commit_hash):\n            msg = f'HANGAR VALUE ERROR: dev_commit_hash: {dev_commit_hash} does not exist'\n            raise ValueError(msg)\n\n        outDb = self._run_diff(dev_commit_hash=dev_commit_hash)\n        outRaw = _all_raw_from_db_changes(outDb)\n        return outRaw\n\n    def branch(self, dev_branch: str) -> DiffAndConflicts:\n        \"\"\"Compute diff between HEAD and branch, returning user-facing results.\n\n        Parameters\n        ----------\n        dev_branch : str\n            name of the branch whose HEAD will be used to calculate the diff of.\n\n        Returns\n        -------\n        DiffAndConflicts\n            two-tuple of ``diff``, ``conflict`` (if any) calculated in the diff\n            algorithm.\n\n        Raises\n        ------\n        ValueError\n            If the specified ``dev_branch`` does not exist.\n        \"\"\"\n        branchNames = get_branch_names(self._branchenv)\n        if dev_branch in branchNames:\n            dHEAD = get_branch_head_commit(self._branchenv, dev_branch)\n        else:\n            msg = f'HANGAR VALUE ERROR: dev_branch: {dev_branch} invalid branch name'\n            raise ValueError(msg)\n\n        outDb = self._run_diff(dev_commit_hash=dHEAD)\n        outRaw = _all_raw_from_db_changes(outDb)\n        return outRaw\n\n    def staged(self) -> DiffAndConflicts:\n        \"\"\"Return diff of staging area to base, returning user-facing results.\n\n        Returns\n        -------\n        DiffAndConflicts\n            two-tuple of ``diff``, ``conflict`` (if any) calculated in the diff\n            algorithm.\n        \"\"\"\n        commit_hash = get_branch_head_commit(self._branchenv, self._branch_name)\n        with tmp_cmt_env(self._refenv, commit_hash) as base_env:\n            outDb = self._diff(base_env, self._stageenv)\n        outRaw = _all_raw_from_db_changes(outDb)\n        return outRaw\n\n    def status(self) -> str:\n        \"\"\"Determine if changes have been made in the staging area\n\n        If the contents of the staging area and it's parent commit are the\n        same, the status is said to be \"CLEAN\". If even one column or\n        metadata record has changed however, the status is \"DIRTY\".\n\n        Returns\n        -------\n        str\n            \"CLEAN\" if no changes have been made, otherwise \"DIRTY\"\n        \"\"\"\n        head_commit = get_branch_head_commit(self._branchenv, self._branch_name)\n        if head_commit == '':\n            base_refs = ()\n        else:\n            base_refs = get_commit_ref(self._refenv, head_commit)\n\n        stage_refs = tuple(RecordQuery(self._stageenv)._traverse_all_records())\n        status = 'DIRTY' if (base_refs != stage_refs) else 'CLEAN'\n        return status\n"
  },
  {
    "path": "src/hangar/external/__init__.py",
    "content": "from ._external import load, save, show, board_show\nfrom .plugin_manager import PluginManager\nfrom .base_plugin import BasePlugin\n\n\n__all__ = ['load', 'save', 'show', 'board_show', 'BasePlugin', 'PluginManager']\n\n"
  },
  {
    "path": "src/hangar/external/_external.py",
    "content": "\"\"\"\nHigh level methods let user interact with hangar without diving into the internal\nmethods of hangar. We have enabled four basic entry points as high level methods\n\n1. :func:`.load`\n2. :func:`.save`\n3. :func:`.show`\n4. :func:`.board_show`\n\nThese entry points by itself is not capable of doing anything. But they are entry\npoints to the same methods in the `hangar.external` plugins available in pypi. These\nhigh level entry points are used by the CLI for doing import, export and view\noperations as well as the `hangarboard <https://github.com/tensorwerk/hangarboard>`_\nfor visualization (using ``board_show``)\n\"\"\"\nfrom typing import Tuple\n\nimport numpy as np\n\nfrom .plugin_manager import PluginManager\n\n\npm = PluginManager()\n\n\ndef load(fpath: str,\n         plugin: str = None,\n         extension: str = None,\n         **plugin_kwargs) -> Tuple[np.ndarray, str]:\n    \"\"\"\n    Wrapper to load data from file into memory as numpy arrays using\n    plugin's `load` method\n\n    Parameters\n    ----------\n    fpath : str\n        Data file path, e.g. ``path/to/test.jpg``\n    plugin : str, optional\n        Name of plugin to use.  By default, the preferred plugin for the\n        given file format tried until a suitable. This cannot be `None` if\n        `extension` is also `None`\n    extension : str, optional\n        Format of the file. This is used to infer which plugin to use\n        in case plugin name is not provided. This cannot be `None` if\n        `plugin` is also `None`\n\n    Other Parameters\n    ----------------\n    plugin_kwargs : dict\n        Plugin specific keyword arguments. If the function is being called from\n        command line argument, all the unknown keyword arguments will be collected\n        as ``plugin_kwargs``\n\n    Returns\n    -------\n    img_array : :class:`numpy.ndarray`\n        data returned from the given plugin.\n\n    \"\"\"\n    if not pm.plugins_loaded:\n        pm.reset_plugins()\n    func = pm.get_plugin('load', plugin=plugin, extension=extension)\n    return func(fpath, **plugin_kwargs)\n\n\ndef save(arr: np.ndarray, outdir: str, sample_det: str, extension: str,\n         plugin: str = None, **plugin_kwargs):\n    \"\"\"Wrapper plugin ``save`` methods which dump :class:`numpy.ndarray` to disk.\n\n    Parameters\n    ----------\n    arr : :class:`numpy.ndarray`\n        Numpy array to be saved to file\n    outdir : str\n        Target directory\n    sample_det : str\n        Sample name and type of the sample name formatted as\n        ``sample_name_type:sample_name``\n    extension : str\n        Format of the file. This is used to infer which plugin to use in case\n        plugin name is not provided. This cannot be ``None`` if ``plugin`` is\n        also ``None``\n    plugin : str, optional\n        Name of plugin to use.  By default, the preferred plugin for the given\n        file format tried until a suitable. This cannot be ``None`` if\n        ``extension`` is also ``None``\n\n    Other Parameters\n    ----------------\n    plugin_kwargs : dict\n        Plugin specific keyword arguments. If the function is being called from\n        command line argument, all the unknown keyword arguments will be\n        collected as ``plugin_kwargs``\n\n    Notes\n    -----\n    CLI or this method does not create the file name where to save. Instead\n    they pass the required details downstream to the plugins to do that once\n    they verify the given ``outdir`` is a valid directory. It is because we\n    expect to get data entries where one data entry is one file (like images)\n    and also data entries where multiple entries goes to single file (like\n    CSV). With these ambiguous cases in hand, it's more sensible to let the\n    plugin handle the file handling accordingly.\n    \"\"\"\n    if not pm.plugins_loaded:\n        pm.reset_plugins()\n    func = pm.get_plugin('save', plugin=plugin, extension=extension)\n    func(arr, outdir, sample_det, extension, **plugin_kwargs)\n\n\ndef show(arr: np.ndarray, plugin: str = None,\n         extension: str = None, **plugin_kwargs):  # pragma: no cover\n    \"\"\"Wrapper to display :class:`numpy.ndarray` via plugin ``show`` method.\n\n    Parameters\n    ----------\n    arr : :class:`numpy.ndarray`\n        Data to process into some human understandable representation.\n    plugin : str, optional\n        Name of plugin to use.  By default, the preferred plugin for the\n        given file format tried until a suitable. This cannot be ``None`` if\n        ``extension`` is also ``None``\n    extension : str, optional\n        Format of the file. This is used to infer which plugin to use\n        in case plugin name is not provided. This cannot be ``None`` if\n        ``plugin`` is also ``None``\n\n    Other Parameters\n    ----------------\n    plugin_kwargs : dict\n        Plugin specific keyword arguments. If the function is being called from\n        command line argument, all the unknown keyword arguments will be\n        collected as ``plugin_kwargs``\n    \"\"\"\n    if not pm.plugins_loaded:\n        pm.reset_plugins()\n    func = pm.get_plugin('show', plugin=plugin, extension=extension)\n    return func(arr, **plugin_kwargs)\n\n\ndef board_show(arr: np.ndarray, plugin: str = None,\n               extension: str = None, **plugin_kwargs):\n    \"\"\"\n    Wrapper to convert the numpy array using the ``board_show`` method\n    of the plugin to make it displayable in the web UI\n\n    Parameters\n    ----------\n    arr : :class:`numpy.ndarray`\n        Data to process into some human understandable representation.\n    plugin : str, optional\n        Name of plugin to use.  By default, the preferred plugin for the\n        given file format tried until a suitable. This cannot be ``None`` if\n        ``extension`` is also ``None``\n    extension : str, optional\n        Format of the file. This is used to infer which plugin to use\n        in case plugin name is not provided. This cannot be ``None`` if\n        ``plugin`` is also ``None``\n\n    Other Parameters\n    ----------------\n    plugin_kwargs : dict\n        Plugin specific keyword arguments. If the function is being called from\n        command line argument, all the unknown keyword arguments will be\n        collected as ``plugin_kwargs``\n    \"\"\"\n    if not pm.plugins_loaded:\n        pm.reset_plugins()\n    func = pm.get_plugin('board_show', plugin=plugin, extension=extension)\n    return func(arr, **plugin_kwargs)\n"
  },
  {
    "path": "src/hangar/external/base_plugin.py",
    "content": "\"\"\"\nHangar's external plugin system is designed to make it flexible for users to\nwrite custom plugins for custom data formats. External plugins should be python\ninstallables and should make itself discoverable using package meta data. A\n`detailed documentation <https://packaging.python.org/guides/creating-and-discovering-plugins/#using-package-metadata>`_\ncan be found in the official python doc. But for a headstart and to avoid going\nthrough this somewhat complex process, we have made a `cookiecutter\n<https://github.com/tensorwerk/cookiecutter-hangar-external-plugin>`_ package.\nAll the hangar plugins follow the naming standard similar to Flask plugins i.e\n`hangar_pluginName`\n\"\"\"\n\nimport os\n\n\nclass BasePlugin(object):\n    \"\"\"Base plugin class from where all the external plugins should be inherited.\n\n    Child classes can have four methods to expose - ``load``, ``save``,\n    ``show`` and ``board_show``. These are considered as valid methods and\n    should be passed as the first argument while initializing the parent from\n    child. Child should also inform the parent about the acceptable file\n    formats by passing that as second argument. :class:`.BasePlugin` accepts\n    ``provides`` and ``accepts`` on init and exposes them which is then used by\n    plugin manager while loading the modules. BasePlugin also provides\n    ``sample_name`` function to figure out the sample name from the file path.\n    This function is used by ``load`` method to return the sample name which is\n    then used by hangar as a key to save the data\n    \"\"\"\n    def __init__(self, provides, accepts):\n        if not provides:\n            raise ValueError(\"Argument ``provides`` cannot be empty\")\n        if not accepts:\n            raise ValueError(\"Argument ``accepts`` cannot be empty\")\n        self._provides = provides\n        self._accepts = accepts\n\n    @property\n    def provides(self):\n        return self._provides\n\n    @property\n    def accepts(self):\n        return self._accepts\n\n    def load(self, fpath, *args, **kwargs):\n        \"\"\"Load some data file on disk to recover it in :class:`numpy.ndarray` form.\n\n        Loads the data provided from the disk for the file path given and\n        returns the data as :class:`numpy.ndarray` and name of the data sample.\n        Names returned from this function will be used by the import cli system\n        as the key for the returned data. This function can return either a\n        single :class:`numpy.ndarray`, sample name, combination, or a generator\n        that produces one of the the above combinations. This helps when the\n        input file is not a single data entry like an image but has multiple\n        data points like CSV files.\n\n        An example implementation that returns a single data point:\n\n        .. code-block:: python\n\n            def load(self, fpath, *args, **kwargs):\n                data = create_np_array('myimg.jpg')\n                name = create_sample_name('myimg.jpg')  # could use `self.sample_name`\n                return data, name\n\n        An example implementation that returns a generator could look like this:\n\n        .. code-block:: python\n\n            def load(self, fpath, *args, **kwargs):\n                for i, line in enumerate('myfile.csv'):\n                    data = create_np_array(line)\n                    name = create_sample_name(fpath, i)\n                    yield data, name\n        \"\"\"\n        raise NotImplementedError\n\n    def save(self, arr, outdir, sample_detail, extension, *args, **kwargs):\n        \"\"\"Save data in a :class:`numpy.ndarray` to a specific file format on disk.\n\n        If the plugin is developed for files like CSV, JSON, etc - where\n        multiple data entry would go to the same file - this should check\n        whether the file exist already and weather it should modify / append\n        the new data entry to the structure, instead of overwriting it or\n        throwing an exception.\n\n        Note\n        ----\n        Name of the file and the whole path to save the data should be constructed\n        by this function. This can be done using the information gets as arguments\n        such as, ``outdir``, ``sample_detail`` and ``extension``. It has been\n        offloaded to this function instead of handling it before because, decisions\n        like whether the multiple data entry should go to a single file or mutltpile\n        file cannot be predicted before hand as are always data specific (and hence\n        plugin specific)\n\n        Note\n        ----\n        If the call to this function is initiated by the CLI, ``sample_detail`` argument\n        will be a string formatted as `sample_name_type:sample_name`. For example, if\n        the sample name is `sample1` (and type of sample name is `str`) then\n        ``sample_detail`` will be `str:sample1`. This is to avoid the ambiguity that\n        could arise by having both integer and string form of numerical as the sample\n        name (ex: if column[123] and column[\"123\"] exist). Formatting\n        ``sample_detail`` to make a proper filename (not necessary) is upto the\n        plugin developer.\n        \"\"\"\n        raise NotImplementedError\n\n    def show(self, arr, *args, **kwargs):\n        \"\"\"Show/Display the data to the user.\n\n        This function should process the input :class:`numpy.ndarray` and show\n        that to the user using a data dependant display mechanism. A good\n        example for such a system is ``matplotlib.pyplot``'s ``plt.show``,\n        which displays the image data inline in the running terminal / kernel\n        ui.\n        \"\"\"\n        raise NotImplementedError\n\n    def board_show(self, arr, *args, **kwargs):\n        \"\"\"Show/display data in hangarboard format.\n\n        Hangarboard is capable of displaying three most common data formats:\n        image, text and audio. This function should process the input\n        :class:`numpy.ndarray` data and convert it to any of the supported\n        formats.\n        \"\"\"\n        raise NotImplementedError\n\n    @staticmethod\n    def sample_name(fpath: os.PathLike) -> str:\n        \"\"\"Sample the name from file path.\n\n        This function comes handy since the :meth:`.load` method needs to\n        ``yield`` or ``return`` both data and sample name. If there no specific\n        requirements regarding sample name creation, you can use this function\n        which removes the extension from the file name and returns just the\n        name. For example, if filepath is ``/path/to/myfile.ext``, then it\n        returns ``myfile``\n\n        Parameters\n        ----------\n        fpath : os.PathLike\n            Path to the file which is being loaded by `load`\n        \"\"\"\n        return os.path.splitext(os.path.basename(fpath))[0]\n"
  },
  {
    "path": "src/hangar/external/plugin_manager.py",
    "content": "import pkg_resources\nfrom typing import Callable\n\n\nclass PluginManager(object):\n    \"\"\"\n    Container class that holds the information about available plugins and\n    provides required method to fetch and clean up the plugin systems\n    \"\"\"\n\n    valid_provides = ['load', 'save', 'show', 'board_show']\n\n    def __init__(self):\n        self._plugin_store = {}  # ex: {'pil': loaded_pil_module}\n        self._default_plugins = {}  # ex: {'jpg': {'save': 'pil'}}\n        self.plugins_loaded = False\n\n    def reset_plugins(self):\n        \"\"\"\n        Reset plugin clears the existing storages and then scans\n        `hangar.external.plugins` for plugins. Once `plugin store` is populated,\n        it creates finds the default plugins to make auto-inference based on\n        file format possible\n        \"\"\"\n        self._clear_plugins()\n        self._scan_plugins()\n        self._read_defaults()\n\n    def _clear_plugins(self):\n        \"\"\"\n        Clear the plugin state to the default, i.e., where no plugins are loaded\n        \"\"\"\n        self._plugin_store.clear()\n        self._default_plugins.clear()\n        self.plugins_loaded = False\n\n    def _scan_plugins(self):\n        \"\"\"\n        Scan for entry points, find the plugins and store them in provided storage\n        containers\n        \"\"\"\n        for entry_point in pkg_resources.iter_entry_points('hangar.external.plugins'):\n            PluginClass = entry_point.load()\n            self._plugin_store[entry_point.name] = PluginClass()\n        self.plugins_loaded = True\n\n    def _read_defaults(self):\n        \"\"\"\n        Populate default plugin dict that maps file formats to plugins and methods. This\n        is used to infer which plugin to use at runtime based on file format\n        \"\"\"\n        for fname, plugin in self._plugin_store.items():\n            generator = ((ext, method) for ext in plugin.accepts for method in plugin.provides)\n            for pair in generator:\n                if pair not in self._default_plugins:\n                    self._default_plugins[pair] = fname\n\n    def get_plugin(self, method: str, plugin: str = None, extension: str = None) -> Callable:\n        \"\"\"Load installed plugin.\n\n        User either needs to specify which plugin to load or should provide\n        file format to infer which plugin to use\n\n        Parameters\n        ----------\n        method : str\n            Which method to import from the plugin. Methods implemented by the\n            extension author should be declared as arguments passed into the\n            BasePlugin superclass constructor\n        plugin : str, optional\n            Which plugin to load the method from. Cannot leave as ``None`` if\n            ``extension`` is also ``None``\n        extension : str, optional\n            format of the data on the disk. This information is used to infer\n            which plugin to use in case ``plugin`` is not provided explicitly.\n            Cannot leave as ``None`` if ``plugin`` is also ``None``\n\n        Returns\n        -------\n        plugin_method : function\n            requested method from the plugin\n        \"\"\"\n\n        if not plugin:\n            if not extension:\n                raise ValueError(\"Both `plugin` and `extension` cannot be empty together\")\n\n            plugin = self._default_plugins.get((extension, method))\n            if plugin is None:\n                raise ValueError(f\"No plugins found for the file extension {extension} that could \"\n                                 f\"do {method}\")\n        else:\n            if plugin not in self._plugin_store:\n                raise ValueError(f\"Plugin {plugin} not found\")\n        loaded_plugin = self._plugin_store[plugin]\n        try:\n            return getattr(loaded_plugin, method)\n        except AttributeError:\n            raise RuntimeError(f\"Method {method} found in `plugin.provides` but could \"\n                               f\"not invoke from {plugin}. You might have forgot to define \"\n                               f\"the function\")\n"
  },
  {
    "path": "src/hangar/external_cpython.pxd",
    "content": "\"\"\" Additional bindings to Python's C-API.\nThese differ from Cython's bindings in ``cpython``.\n\"\"\"\nfrom cpython.ref cimport PyObject\n\ncdef extern from \"Python.h\":\n    PyObject* PtrIter_Next \"PyIter_Next\"(object o)\n    PyObject* PtrObject_Call \"PyObject_Call\"(object callable_object, object args, object kw)\n    PyObject* PtrObject_GetItem \"PyObject_GetItem\"(object o, object key)\n    int PyDict_Next_Compat \"PyDict_Next\"(object p, Py_ssize_t *ppos, PyObject* *pkey, PyObject* *pvalue) except -1\n"
  },
  {
    "path": "src/hangar/merger.py",
    "content": "\"\"\"Merge Methods\n\nIn the current implementation only fast-forward and a competent, but limited,\nthree-way merge algorithm are implemented. All user facing API calls should be\nfunneled through the :function:`select_merge_algorithm` function\n\n.. note::\n\n    In the current implementation, it is not possible to stop a merge in progress or\n    to revert a bad merge commit. All revert like operations should be made by\n    creating new branches from the last \"good\" state, after which new merge\n    operations can be attempted (if desired.)\n\"\"\"\nfrom pathlib import Path\n\nimport lmdb\n\nfrom .diff import WriterUserDiff, diff_envs, find_conflicts\nfrom .records.commiting import (\n    tmp_cmt_env,\n    replace_staging_area_with_commit,\n    replace_staging_area_with_refs,\n    commit_records,\n)\nfrom .records.hashs import clear_stage_hash_records, backends_remove_in_process_data\nfrom .records.heads import (\n    get_staging_branch_head,\n    get_branch_head_commit,\n    set_staging_branch_head,\n    set_branch_head_commit,\n    release_writer_lock,\n    acquire_writer_lock,\n)\n\n\ndef select_merge_algorithm(message: str,\n                           branchenv: lmdb.Environment,\n                           stageenv: lmdb.Environment,\n                           refenv: lmdb.Environment,\n                           stagehashenv: lmdb.Environment,\n                           master_branch: str,\n                           dev_branch: str,\n                           repo_path: Path,\n                           *,\n                           writer_uuid: str = 'MERGE_PROCESS') -> str:\n    \"\"\"Entry point to perform a merge.\n\n    Automatically selects algorithm and does the operation if no conflicts are\n    found. This call requires that the staging area status be \"CLEAN\", if\n    a \"DIRTY\" staging environment is found, an RuntimeError will be thrown.\n\n    Parameters\n    ----------\n    message : str\n        user message describing the commit\n    branchenv : lmdb.Environment\n        where the branch references are stored\n    stageenv : lmdb.Environment\n        where the staging area is open\n    refenv : lmdb.Environment\n        where commit history is stored\n    stagehashenv: lmdb.Environment\n        where the stage hash environment data is stored\n    master_branch : str\n        name of the branch to serve as a merge master\n    dev_branch : str\n        name of the branch to use as the feature branch\n    repo_path: Path\n        path to the repository on disk\n    writer_uuid : str, optional, kwarg only\n        if the merge method is called from the repo level, the default writer\n        lock `MERGE_PROCESS` is used to ensure that a writer is active. If\n        called from within a write-enabled checkout, the writer lock is set to\n        the writer_uuid of the writer checkout so that the lock can be acquired.\n\n    Raises\n    ------\n    RuntimeError\n        if the staging area is not `CLEAN` of other changes\n    PermissionError\n        if the writer lock is currently held\n    ValueError\n        if a conflict is found in a three way merge, no operation will be performed.\n\n    Returns\n    -------\n    str\n        commit hash of the merge if this was a successful operation.\n    \"\"\"\n    current_head = get_staging_branch_head(branchenv)\n    wDiffer = WriterUserDiff(stageenv=stageenv,\n                             branchenv=branchenv,\n                             refenv=refenv,\n                             branch_name=current_head)\n    if wDiffer.status() != 'CLEAN':\n        e = RuntimeError(\n            'Changes are currently pending in the staging area To avoid mangled '\n            'histories, the staging area must exist in a clean state Please '\n            'reset or commit any changes before the merge operation')\n        raise e from None\n\n    try:\n        acquire_writer_lock(branchenv=branchenv, writer_uuid=writer_uuid)\n    except PermissionError as e:\n        raise e from None\n\n    try:\n        mHEAD = get_branch_head_commit(branchenv, branch_name=master_branch)\n        dHEAD = get_branch_head_commit(branchenv, branch_name=dev_branch)\n        branchHistory = wDiffer._determine_ancestors(mHEAD=mHEAD, dHEAD=dHEAD)\n\n        if branchHistory.canFF is True:\n            print('Selected Fast-Forward Merge Strategy')\n            success = _fast_forward_merge(\n                branchenv=branchenv,\n                stageenv=stageenv,\n                refenv=refenv,\n                stagehashenv=stagehashenv,\n                master_branch=master_branch,\n                new_masterHEAD=branchHistory.devHEAD,\n                repo_path=repo_path)\n        else:\n            print('Selected 3-Way Merge Strategy')\n            success = _three_way_merge(\n                message=message,\n                master_branch=master_branch,\n                masterHEAD=branchHistory.masterHEAD,\n                dev_branch=dev_branch,\n                devHEAD=branchHistory.devHEAD,\n                ancestorHEAD=branchHistory.ancestorHEAD,\n                branchenv=branchenv,\n                stageenv=stageenv,\n                refenv=refenv,\n                stagehashenv=stagehashenv,\n                repo_path=repo_path)\n\n    except ValueError as e:\n        raise e from None\n\n    finally:\n        if writer_uuid == 'MERGE_PROCESS':\n            release_writer_lock(branchenv=branchenv, writer_uuid=writer_uuid)\n\n    return success\n\n\n# ------------------ Fast Forward Merge Methods -------------------------------\n\n\ndef _fast_forward_merge(branchenv: lmdb.Environment,\n                        stageenv: lmdb.Environment,\n                        refenv: lmdb.Environment,\n                        stagehashenv: lmdb.Environment,\n                        master_branch: str,\n                        new_masterHEAD: str,\n                        repo_path: Path) -> str:\n    \"\"\"Update branch head pointer to perform a fast-forward merge.\n\n    This method does not check that it is safe to do this operation, all\n    verification should happen before this point is reached\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db with the branch head pointers\n    stageenv : lmdb.Environment\n        db where the staging area records are stored.\n    refenv : lmdb.Environment\n        db where the merge commit records are stored.\n    stagehashenv: lmdb.Environment\n        db where the staged hash records are stored\n    master_branch : str\n        name of the merge_master branch which should be updated\n    new_masterHEAD : str\n        commit hash to update the master_branch name to point to.\n    repo_path: Path\n        path to the repository on disk.\n\n    Returns\n    -------\n    str\n        if successful, returns the commit hash the master branch name was\n        updated to.\n    \"\"\"\n    try:\n        replace_staging_area_with_commit(\n            refenv=refenv, stageenv=stageenv, commit_hash=new_masterHEAD)\n\n        outBranchName = set_branch_head_commit(\n            branchenv=branchenv, branch_name=master_branch, commit_hash=new_masterHEAD)\n        set_staging_branch_head(branchenv=branchenv, branch_name=master_branch)\n\n        backends_remove_in_process_data(repo_path=repo_path)\n        clear_stage_hash_records(stagehashenv=stagehashenv)\n\n    except ValueError as e:\n        raise e from None\n\n    return outBranchName\n\n\n# ----------------------- Three-Way Merge Methods -----------------------------\n\n\ndef _three_way_merge(message: str,\n                     master_branch: str,\n                     masterHEAD: str,\n                     dev_branch: str,\n                     devHEAD: str,\n                     ancestorHEAD: str,\n                     branchenv: lmdb.Environment,\n                     stageenv: lmdb.Environment,\n                     refenv: lmdb.Environment,\n                     stagehashenv: lmdb.Environment,\n                     repo_path: Path) -> str:\n    \"\"\"Merge strategy with diff/patch computed from changes since last common ancestor.\n\n    Parameters\n    ----------\n    message : str\n        commit message to apply to this merge commit (specified by the user)\n    master_branch : str\n        name of the merge master branch\n    masterHEAD : str\n        commit hash of the merge master HEAD\n    dev_branch : str\n        name of the merge dev branch\n    devHEAD : str\n        commit hash of the merge dev HEAD\n    ancestorHEAD : str\n        commit hash of the nearest common ancestor which the merge_master and\n        merge_dev branches both share in their commit history.\n    branchenv : lmdb.Environment\n        db where the branch head records are stored\n    stageenv : lmdb.Environment\n        db where the staging area records are stored.\n    refenv : lmdb.Environment\n        db where the merge commit records are stored.\n    stagehashenv: lmdb.Environment\n        db where the staged hash records are stored\n    repo_path: Path\n        path to the repository on disk.\n\n    Returns\n    -------\n    str\n        commit hash of the new merge commit if the operation was successful.\n\n    Raises\n    ------\n    ValueError\n        If a conflict is found, the operation will abort before completing.\n    \"\"\"\n    with tmp_cmt_env(refenv, ancestorHEAD) as aEnv, tmp_cmt_env(\n            refenv, masterHEAD) as mEnv, tmp_cmt_env(refenv, devHEAD) as dEnv:\n\n        m_diff = diff_envs(aEnv, mEnv)\n        d_diff = diff_envs(aEnv, dEnv)\n        conflict = find_conflicts(m_diff, d_diff)\n        if conflict.conflict is True:\n            msg = f'HANGAR VALUE ERROR:: Merge ABORTED with conflict: {conflict}'\n            raise ValueError(msg) from None\n\n        with mEnv.begin(write=True) as txn:\n            for k, _ in d_diff.deleted:\n                txn.delete(k)\n            for k, v in d_diff.mutated:\n                txn.put(k, v, overwrite=True)\n            for k, v in d_diff.added:\n                txn.put(k, v, overwrite=True)\n\n        dbcont = []\n        with mEnv.begin(write=False) as txn:\n            with txn.cursor() as cur:\n                cur.first()\n                for kv in cur.iternext(keys=True, values=True):\n                    dbcont.append(kv)\n\n    backends_remove_in_process_data(repo_path=repo_path)\n    replace_staging_area_with_refs(stageenv=stageenv, sorted_content=dbcont)\n\n    commit_hash = commit_records(\n        message=message,\n        branchenv=branchenv,\n        stageenv=stageenv,\n        refenv=refenv,\n        repo_path=repo_path,\n        is_merge_commit=True,\n        merge_master=master_branch,\n        merge_dev=dev_branch)\n\n    clear_stage_hash_records(stagehashenv=stagehashenv)\n    return commit_hash\n"
  },
  {
    "path": "src/hangar/mixins/__init__.py",
    "content": "from .checkout_iteration import CheckoutDictIteration\nfrom .datasetget import GetMixin\nfrom .recorditer import CursorRangeIterator\n\n__all__ = ['GetMixin', 'CursorRangeIterator', 'CheckoutDictIteration']\n"
  },
  {
    "path": "src/hangar/mixins/checkout_iteration.py",
    "content": "\n\nclass CheckoutDictIteration:\n    \"\"\"Mixin class for checkout objects which mock common iter methods\n\n    Methods\n    -------\n    __len__\n    __contains__\n    __iter__\n    keys\n    values\n    items\n    \"\"\"\n\n    def __len__(self):\n        \"\"\"Returns number of columns in the checkout.\n        \"\"\"\n        self._verify_alive()\n        return len(self.columns)\n\n    def __contains__(self, key):\n        \"\"\"Determine if some column name (key) exists in the checkout.\n        \"\"\"\n        self._verify_alive()\n        return bool(key in self.columns)\n\n    def __iter__(self):\n        \"\"\"Iterate over column keys\"\"\"\n        self._verify_alive()\n        return iter(self.columns)\n\n    def keys(self):\n        \"\"\"Generator yielding the name (key) of every column\n        \"\"\"\n        self._verify_alive()\n        yield from self.columns.keys()\n\n    def values(self):\n        \"\"\"Generator yielding accessor object of every column\n        \"\"\"\n        self._verify_alive()\n        yield from self.columns.values()\n\n    def items(self):\n        \"\"\"Generator yielding tuple of (name, accessor object) of every column\n        \"\"\"\n        self._verify_alive()\n        yield from self.columns.items()\n"
  },
  {
    "path": "src/hangar/mixins/datasetget.py",
    "content": "from functools import reduce\nfrom operator import getitem as op_getitem\nfrom contextlib import ExitStack\n\n# noinspection PyUnresolvedReferences\nclass GetMixin:\n    \"\"\"Mixin methods for the checkout object classes.\n\n    Used since the read and write enabled checkouts have the same :meth:`__get__`\n    and :meth:`get` methods\n    \"\"\"\n\n    def __getitem__(self, index):\n        \"\"\"Dictionary style access to columns and samples\n\n        Checkout object can be thought of as a \"dataset\" (\"dset\") mapping a\n        view of samples across columns.\n\n            >>> dset = repo.checkout(branch='master')\n            >>>\n            # Get an column contained in the checkout.\n            >>> dset['foo']\n            ColumnDataReader\n            >>>\n            # Get a specific sample from ``'foo'`` (returns a single array)\n            >>> dset['foo', '1']\n            np.array([1])\n            >>>\n            # Get multiple samples from ``'foo'`` (returns a list of arrays, in order\n            # of input keys)\n            >>> dset[['foo', '1'], ['foo', '2'],  ['foo', '324']]\n            [np.array([1]), np.ndarray([2]), np.ndarray([324])]\n            >>>\n            # Get sample from multiple columns, column/data returned is ordered\n            # in same manner as input of func.\n            >>> dset[['foo', '1'], ['bar', '1'],  ['baz', '1']]\n            [np.array([1]), np.ndarray([1, 1]), np.ndarray([1, 1, 1])]\n            >>>\n            # Get multiple samples from multiple columns\\\n            >>> keys = [(col, str(samp)) for samp in range(2) for col in ['foo', 'bar']]\n            >>> keys\n            [('foo', '0'), ('bar', '0'), ('foo', '1'), ('bar', '1')]\n            >>> dset[keys]\n            [np.array([1]), np.array([1, 1]), np.array([2]), np.array([2, 2])]\n\n        Arbitrary column layouts are supported by simply adding additional members\n        to the keys for each piece of data. For example, getting data from a column\n        with a nested layout:\n\n            >>> dset['nested_col', 'sample_1', 'subsample_0']\n            np.array([1, 0])\n            >>>\n            # a sample accessor object can be retrieved at will...\n            >>> dset['nested_col', 'sample_1']\n            <class 'FlatSubsampleReader'>(column_name='nested_col', sample_name='sample_1')\n            >>>\n            # to get all subsamples in a nested sample use the Ellipsis operator\n            >>> dset['nested_col', 'sample_1', ...]\n            {'subsample_0': np.array([1, 0]),\n             'subsample_1': np.array([1, 1]),\n             ...\n             'subsample_n': np.array([1, 255])}\n\n        Retrieval of data from different column types can be mixed and combined\n        as desired. For example, retrieving data from both flat and nested columns\n        simultaneously:\n\n            >>> dset[('nested_col', 'sample_1', '0'), ('foo', '0')]\n            [np.array([1, 0]), np.array([0])]\n            >>> dset[('nested_col', 'sample_1', ...), ('foo', '0')]\n            [{'subsample_0': np.array([1, 0]), 'subsample_1': np.array([1, 1])},\n             np.array([0])]\n            >>> dset[('foo', '0'), ('nested_col', 'sample_1')]\n            [np.array([0]),\n             <class 'FlatSubsampleReader'>(column_name='nested_col', sample_name='sample_1')]\n\n        If a column or data key does not exist, then this method will raise a KeyError.\n        As an alternative, missing keys can be gracefully handeled by calling :meth:`get()`\n        instead. This method does not (by default) raise an error if a key is missing.\n        Instead, a (configurable) default value is simply inserted in it's place.\n\n            >>> dset['foo', 'DOES_NOT_EXIST']\n            -------------------------------------------------------------------\n            KeyError                           Traceback (most recent call last)\n            <ipython-input-40-731e6ea62fb8> in <module>\n            ----> 1 res = co['foo', 'DOES_NOT_EXIST']\n            KeyError: 'DOES_NOT_EXIST'\n\n        Parameters\n        ----------\n        index\n            column name, sample key(s) or sequence of list/tuple of column name,\n            sample keys(s) which should be retrieved in the operation.\n\n            Please see detailed explanation above for full explanation of accepted\n            argument format / result types.\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            single column parameter, no samples specified\n        Any\n            Single column specified, single sample key specified\n        List[Any]\n            arbitrary columns, multiple samples array data for each sample is\n            returned in same order sample keys are received.\n        \"\"\"\n        # not using kwargs since this could be in a tight loop.\n        # kwargs: default-None, except_missing=True\n        return self._get_in(index, None, True)\n\n    def get(self, keys, default=None, except_missing=False):\n        \"\"\"View of sample data across columns gracefully handling missing sample keys.\n\n        Please see :meth:`__getitem__()` for full description. This method is\n        identical with a single exception: if a sample key is not present in an\n        column, this method will plane a null ``None`` value in it's return\n        slot rather than throwing a ``KeyError`` like the dict style access\n        does.\n\n        Parameters\n        ----------\n        keys\n            sequence of column name (and optionally) sample key(s) or sequence of\n            list/tuple of column name, sample keys(s) which should be retrieved in\n            the operation.\n\n            Please see detailed explanation in :meth:`__getitem__()` for full\n            explanation of accepted argument format / result types.\n\n        default: Any, optional\n            default value to insert in results for the case where some column\n            name / sample key is not found, and the `except_missing` parameter\n            is set to False.\n\n        except_missing: bool, optional\n            If False, will not throw exceptions on missing sample key value.\n            Will raise KeyError if True and missing key found.\n\n        Returns\n        -------\n        :class:`~.columns.column.Columns`\n            single column parameter, no samples specified\n        Any\n            Single column specified, single sample key specified\n        List[Any]\n            arbitrary columns, multiple samples array data for each sample is\n            returned in same order sample keys are received.\n        \"\"\"\n        return self._get_in(keys, default, except_missing)\n\n    def _get_in(self, keys, default=None, except_missing=False,\n                *, _EXCEPTION_CLASSES = (KeyError, IndexError, TypeError)):\n        \"\"\"Internal method to get data from columns within a nested set of dicts.\n\n        Parameters\n        ----------\n        keys\n            sequence of column name (and optionally) sample key(s) or sequence of\n            list/tuple of column name, sample keys(s) which should be retrieved in\n            the operation.\n\n            Please see detailed explanation in :meth:`__getitem__()` for full\n            explanation of accepted argument format / result types.\n\n        default: Any, optional\n            default value to insert in results for the case where some column\n            name / sample key is not found, and the `except_missing` parameter\n            is set to False.\n\n        except_missing: bool, optional\n            If False, will not throw exceptions on missing sample key value.\n            Will raise KeyError if True and missing key found.\n\n        Returns\n        -------\n        Any\n            Single column specified, single sample key specified\n        List[Any]\n            arbitrary columns, multiple samples array data for each sample is\n            returned in same order sample keys are received.\n        \"\"\"\n        with ExitStack() as stack:\n            if not self._is_conman:\n                stack.enter_context(self)\n\n            if isinstance(keys, str):\n                return self.columns[keys]\n\n            _COLUMNS = self._columns\n            if len(keys) >= 2 and any([isinstance(k, (list, tuple)) for k in keys]):\n                res = []\n                for key in keys:\n                    try:\n                        tmp = reduce(op_getitem, key, _COLUMNS)\n                        res.append(tmp)\n                    except _EXCEPTION_CLASSES:\n                        if except_missing:\n                            raise\n                        res.append(default)\n                return res\n            else:\n                try:\n                    return reduce(op_getitem, keys, _COLUMNS)\n                except _EXCEPTION_CLASSES:\n                    if except_missing:\n                        raise\n                    return default\n"
  },
  {
    "path": "src/hangar/mixins/recorditer.py",
    "content": "from typing import Iterable, Union, Tuple\nimport lmdb\n\n\nclass CursorRangeIterator:\n\n    @staticmethod\n    def cursor_range_iterator(datatxn: lmdb.Transaction, startRangeKey: bytes, keys: bool, values: bool\n                              ) -> Iterable[Union[Tuple[bytes], Tuple[bytes, bytes]]]:\n        \"\"\"Common method used to implement cursor range iterators\n\n        Parameters\n        ----------\n        datatxn : lmdb.Transaction\n            open database transaction to read values from\n        startRangeKey : bytes\n            range in which to iterate cursor over until end of db or out of\n            lexicographic range.\n        keys : bool, optional\n            If True, yield metadata keys encountered, if False only values\n            are returned. By default, True.\n        values : bool, optional\n            If True, yield metadata hash values encountered, if False only\n            keys are returned. By default, True.\n\n        Yields\n        ------\n        Iterable[Union[Tuple[bytes], Tuple[bytes, bytes]]]:\n            db keys or key/value tuple\n        \"\"\"\n        len_RangeKey = len(startRangeKey)\n        with datatxn.cursor() as cursor:\n            rangeItemsExist = cursor.set_range(startRangeKey)\n            if not rangeItemsExist:\n                # break out prematurely in the case where no matching items exist.\n                # Important to not disrupt callers who may expect to recieves some\n                # iterable for processing.\n                return iter([])\n\n            # divide loop into returned type sections as perf optimization\n            # (rather then if/else checking on every iteration of loop)\n            if keys and not values:\n                while rangeItemsExist:\n                    recKey = cursor.key()\n                    if recKey[:len_RangeKey] == startRangeKey:\n                        yield recKey\n                        rangeItemsExist = cursor.next()\n                        continue\n                    else:\n                        rangeItemsExist = False\n            elif values and not keys:\n                while rangeItemsExist:\n                    recKey, recVal = cursor.item()\n                    if recKey[:len_RangeKey] == startRangeKey:\n                        yield recVal\n                        rangeItemsExist = cursor.next()\n                        continue\n                    else:\n                        rangeItemsExist = False\n            elif keys and values:\n                while rangeItemsExist:\n                    recKey, recVal = cursor.item()\n                    if recKey[:len_RangeKey] == startRangeKey:\n                        yield (recKey, recVal)\n                        rangeItemsExist = cursor.next()\n                        continue\n                    else:\n                        rangeItemsExist = False\n            else:  # pragma: no cover\n                raise RuntimeError(f'Internal hangar error while iterating cursor records for '\n                                   f' {startRangeKey}. one of [`keys`, `values`] must be True.')\n"
  },
  {
    "path": "src/hangar/op_state.py",
    "content": "import types\nimport sys\n\nimport wrapt\n\n\n@wrapt.decorator\ndef writer_checkout_only(wrapped, instance, args, kwargs) -> types.MethodType:\n    \"\"\"Only allow a method to be called in a write-enable checkout.\n\n    Parameters\n    ----------\n    wrapped\n        bound method which is being called\n    instance\n        class object being operated on ie. ``instance is self``\n        in both (equality and identify).\n    args\n        argument list passed to the method\n    kwargs\n        keyword args dict passed to the method.\n\n    Returns\n    -------\n    types.MethodType\n        If instance._mode == 'a' (write enabled checkout) then\n        operation is allowed and pass through args and kwargs\n        to the method as specified.\n\n    Raises\n    ------\n    PermissionError\n        If the checkout is opened in read-only mode, then deny\n        ability to call and raise error explaining why to user.\n    \"\"\"\n    try:\n        if instance._mode == 'a':  # user facing classes hide attribute\n            return wrapped(*args, **kwargs)\n        else:\n            err = (f'Method \"{wrapped.__func__.__name__}\" '\n                   f'cannot be called in a read-only checkout.')\n            raise PermissionError(err) from None\n\n    except AttributeError:\n        if instance.mode == 'a':  # internal classes don't hide attribute\n            return wrapped(*args, **kwargs)\n        else:\n            err = (f'Method \"{wrapped.__func__.__name__}\" '\n                   f'cannot be called in a read-only checkout.')\n            raise PermissionError(err) from None\n\n\n@wrapt.decorator\ndef reader_checkout_only(wrapped, instance, args, kwargs) -> types.MethodType:\n    \"\"\"Only allow a method to be called in a read-only checkout.\n\n    Parameters\n    ----------\n    wrapped\n        bound method which is being called\n    instance\n        class object being operated on ie. ``instance is self``\n        in both (equality and identify).\n    args\n        argument list passed to the method\n    kwargs\n        keyword args dict passed to the method.\n\n    Returns\n    -------\n    types.MethodType\n        If instance._mode == 'r' (read-only checkout) then\n        operation is allowed and pass through args and kwargs\n        to the method as specified.\n\n    Raises\n    ------\n    PermissionError\n        If the checkout is opened in write-enabled mode, then deny\n        ability to call and raise error explaining why to user.\n    \"\"\"\n    try:\n        if instance._mode == 'r':  # user facing classes hide attribute\n            return wrapped(*args, **kwargs)\n        else:\n            err = (f'Method \"{wrapped.__func__.__name__}\" '\n                   f'cannot be called in a write-enabled checkout.')\n            raise PermissionError(err) from None\n\n    except AttributeError:\n        if instance.mode == 'r':  # internal classes don't hide attribute\n            return wrapped(*args, **kwargs)\n        else:\n            err = (f'Method \"{wrapped.__func__.__name__}\" '\n                   f'cannot be called in a write-enabled checkout.')\n            raise PermissionError(err) from None\n\n\ndef tb_params_last_called(tb: types.TracebackType) -> dict:\n    \"\"\"Get parameters of the last function called before exception thrown.\n\n    Parameters\n    ----------\n    tb : types.TracebackType\n        traceback object returned as the third item from sys.exc_info()\n        corresponding to an exception raised in the last stack frame.\n\n    Returns\n    -------\n    dict\n        parameters passed to the last function called before the exception was\n        thrown.\n    \"\"\"\n    while tb.tb_next:\n        tb = tb.tb_next\n    frame = tb.tb_frame\n    code = frame.f_code\n    argcount = code.co_argcount\n    if code.co_flags & 4:  # *args\n        argcount += 1\n    if code.co_flags & 8:  # **kwargs\n        argcount += 1\n    names = code.co_varnames[:argcount]\n    params = {}\n    for name in names:\n        params[name] = frame.f_locals.get(name, '<deleted>')\n    return params\n\n\ndef report_corruption_risk_on_parsing_error(func):\n    \"\"\"Decorator adding try/except handling non-explicit exceptions.\n\n    Explicitly raised RuntimeErrors generally point to corrupted data\n    identified by a cryptographic hash mismatch. However, in order to get to\n    the point where such quantities can be processes, a non-trivial amount of\n    parsing machinery must be run. Should any error be thrown in the parse\n    machinery due to corrupted values, this method raises the exception in a\n    useful form; providing traceback context, likely root cause (displayed to\n    users), and the offending arguments passed to the function which threw the\n    error.\n    \"\"\"\n    def wrapped(*args, **kwargs):\n        try:\n            func(*args, **kwargs)\n        except RuntimeError as e:\n            raise e\n        except Exception as e:\n            raise RuntimeError(\n                f'Corruption detected during {func.__name__}. Most likely this is the '\n                f'result of unparsable record values. Exception msg `{str(e)}`. Params '\n                f'`{tb_params_last_called(sys.exc_info()[2])}`') from e\n    return wrapped\n"
  },
  {
    "path": "src/hangar/optimized_utils.pxd",
    "content": "\"\"\"\nPortions of this code have been taken and modified from the \"cytoolz\" project.\n\nURL:      https://github.com/pytoolz/cytoolz\nFile:     cytoolz/dicttoolz.pyd\nCommit:   b66732f7f51937e85f5112481baf9db9c97b2ad2\nAccessed: 05 APR 2020\n\nCyToolz License\n-------------------------------------------------------------------------------\nLicense: New BSD\nURL:     https://github.com/pytoolz/cytoolz/blob/b66732f7f51937e85f5112481baf9db9c97b2ad2/LICENSE.txt\n\"\"\"\nfrom cpython.ref cimport PyObject\n\ncdef class SizedDict(dict):\n    cdef public int _maxsize\n    cdef public object _stack\n    cdef public dict _data\n    cdef public int _stack_size\n\ncpdef object is_iterable(object x)\n\ncpdef object is_ordered_sequence(object x)\n\ncpdef int find_next_prime(int N)\n\nctypedef int (*f_map_next)(object p, Py_ssize_t *ppos, PyObject* *pkey, PyObject* *pval) except -1\n\n# utility functions to perform iteration over dicts or generic mapping\ncdef class _iter_mapping:\n    cdef object it\n    cdef object cur\n\ncdef f_map_next get_map_iter(object d, PyObject* *ptr) except NULL\n\ncdef int PyMapping_Next(object p, Py_ssize_t *ppos, PyObject* *pkey, PyObject* *pval) except -1\n\ncpdef object valfilter(object predicate, object d, object factory=*)\n\ncpdef object valfilterfalse(object predicate, object d, object factory=*)\n"
  },
  {
    "path": "src/hangar/optimized_utils.pyx",
    "content": "\"\"\"\nPortions of this code have been taken and modified from the \"cytoolz\" project.\n\nURL:      https://github.com/pytoolz/cytoolz\nFile:     cytoolz/dicttoolz.pyx\nCommit:   b66732f7f51937e85f5112481baf9db9c97b2ad2\nAccessed: 05 APR 2020\n\nCyToolz License\n-------------------------------------------------------------------------------\nLicense: New BSD\nURL:     https://github.com/pytoolz/cytoolz/blob/b66732f7f51937e85f5112481baf9db9c97b2ad2/LICENSE.txt\n\"\"\"\nfrom cpython.dict cimport PyDict_CheckExact\nfrom cpython.ref cimport PyObject, Py_DECREF, Py_INCREF, Py_XDECREF\n\n# Locally defined bindings that differ from `cython.cpython` bindings\nfrom .external_cpython cimport PyDict_Next_Compat, PtrIter_Next\nfrom collections import deque\n\n\n__all__ = ['valfilter', 'valfilterfalse', 'find_next_prime', 'is_iterable',\n           'is_ordered_sequence', 'SizedDict']\n\n\ncdef class SizedDict:\n    \"\"\"Sized dictionary\"\"\"\n\n    def __init__(self, int maxsize=1000):\n        self._data = dict()\n        self._maxsize = maxsize\n        self._stack = deque()\n        self._stack_size = 0\n\n    @property\n    def maxsize(self):\n        return self._maxsize\n\n    def __repr__(self):\n        return repr(self._data)\n\n    def __contains__(self, key):\n        \"\"\"Return True if d has a key key, else False.\"\"\"\n        cdef bint res\n        res = key in self._data\n        return res\n\n    def __getitem__(self, key):\n        \"\"\"Return the item of d with key key. Raises a KeyError if key\n        is not in the map.\n        \"\"\"\n        return self._data[key]\n\n    def get(self, key, default=None):\n        \"\"\"Return the value for key if key is in the dictionary, else default.\n\n        If default is not given, it defaults to None, so that this method\n        never raises a KeyError.\n        \"\"\"\n        return self._data.get(key, default)\n\n    def __len__(self):\n        \"\"\"Return the number of items in the dictionary d.\n        \"\"\"\n        return self._stack_size\n\n    def __iter__(self):\n        \"\"\"Return an iterator over the keys of the dictionary.\n\n        This is a shortcut for iter(d.keys()).\n        \"\"\"\n        return iter(self.keys())\n\n    def __setitem__(self, key, value):\n        \"\"\"Set d[key] to value\n        \"\"\"\n        if self._stack_size >= self._maxsize:\n            k_pop = self._stack.popleft()\n            del self._data[k_pop]\n            self._stack_size = self._stack_size - 1\n        self._stack.append(key)\n        self._data[key] = value\n        self._stack_size = self._stack_size + 1\n\n    def __delitem__(self, key):\n        \"\"\"Remove d[key] from d. Raises a KeyError if key is not in the map.\n        \"\"\"\n        del self._data[key]\n        self._stack.remove(key)\n        self._stack_size = self._stack_size - 1\n\n    def keys(self):\n        \"\"\"Return a new view of the dictionary’s keys.\"\"\"\n        return self._data.keys()\n\n    def values(self):\n        \"\"\"Return a new view of the dictionary’s values.\"\"\"\n        return self._data.values()\n\n    def items(self):\n        \"\"\"Return a new view of the dictionary’s items (``(key, value)`` pairs).\n        \"\"\"\n        return self._data.items()\n\n    def clear(self):\n        \"\"\"Remove all items from the dictionary.\n        \"\"\"\n        self._stack.clear()\n        self._data.clear()\n        self._stack_size = 0\n\n    def pop(self, key, default=None):\n        \"\"\"If key is in the dictionary, remove it and return its value,\n        else return default.\n\n        If default is not given and key is not in the dictionary, a KeyError is raised.\n        \"\"\"\n        cdef bint has_default\n\n        has_default = not bool(default is None)\n        if key in self._data:\n            val = self._data.pop(key)\n            self._stack.remove(key)\n            self._stack_size = self._stack_size - 1\n        elif has_default:\n            val = default\n        return val\n\n    def popitem(self):\n        \"\"\"Remove and return a (key, value) pair from the dictionary.\n        Pairs are returned in LIFO order.\n\n        popitem() is useful to destructively iterate over a dictionary,\n        as often used in set algorithms. If the dictionary is empty, calling\n        popitem() raises a KeyError.\n        \"\"\"\n        cdef object lifo_key, lifo_val\n        lifo_key = self._stack.pop()\n        lifo_val = self._data.pop(lifo_key)\n        self._stack_size = self._stack_size - 1\n        return lifo_key, lifo_val\n\n    def update(self, other):\n        \"\"\"Update the dictionary with the key/value pairs from other, overwriting\n        existing keys. Return None.\n\n        update() accepts either another dictionary object or an iterable of\n        key/value pairs (as tuples or other iterables of length two). If keyword\n        arguments are specified, the dictionary is then updated with those\n        key/value pairs: d.update(red=1, blue=2).\n        \"\"\"\n        if not isinstance(other, dict):\n            other = dict(other)\n        for k, v in other.items():\n            self[k] = v\n\n    def setdefault(self, key, default=None):\n        \"\"\"If key is in the dictionary, return its value. If not, insert key\n        with a value of default and return default. default defaults to None.\n        \"\"\"\n        try:\n            return self._data[key]\n        except KeyError:\n            self[key] = default\n            return default\n\n\ncpdef object is_iterable(object x):\n    \"\"\"Is x iterable?\n\n    >>> is_iterable([1, 2, 3])\n    True\n    >>> is_iterable('abc')\n    True\n    >>> is_iterable(5)\n    False\n    \"\"\"\n    try:\n        iter(x)\n        return True\n    except TypeError:\n        pass\n    return False\n\n\ncpdef object is_ordered_sequence(object x):\n    \"\"\"Is x an ordered sequence? (list, tuple)\n\n    >>> is_ordered_sequence([1, 2, 3])\n    True\n    >>> is_ordered_sequence('abc')\n    False\n    >>> is_ordered_sequence({4, '3', 2})\n    False\n    \"\"\"\n    if isinstance(x, list) or isinstance(x, tuple):\n        return True\n    return False\n\n\ncdef bint _is_prime(int n):\n    cdef int i\n\n    if n % 2 == 0:\n        return False\n    i = 3\n    while i * i <= n:\n        if n % i != 0:\n            i += 2\n        else:\n            return False\n    return True\n\n\ncpdef int find_next_prime(int N):\n    \"\"\"Find next prime >= N\n\n    Parameters\n    ----------\n    N : int\n        Starting point to find the next prime >= N.\n\n    Returns\n    -------\n    int\n        the next prime found after the number N\n    \"\"\"\n\n    if N < 3:\n        return 2\n    if N % 2 == 0:\n        N += 1\n    for n in range(N, 2 * N, 2):\n        if _is_prime(n):\n            return n\n\n\ncdef class _iter_mapping:\n    \"\"\" Keep a handle on the current item to prevent memory clean up too early\"\"\"\n    def __cinit__(self, object it):\n        self.it = it\n        self.cur = None\n\n    def __iter__(self):\n        return self\n\n    def __next__(self):\n        self.cur = next(self.it)\n        return self.cur\n\n\ncdef int PyMapping_Next(object p, Py_ssize_t *ppos, PyObject* *pkey, PyObject* *pval) except -1:\n    \"\"\"Mimic \"PyDict_Next\" interface, but for any mapping\"\"\"\n    cdef PyObject *obj\n    obj = PtrIter_Next(p)\n    if obj is NULL:\n        return 0\n    pkey[0] = <PyObject*>(<object>obj)[0]\n    pval[0] = <PyObject*>(<object>obj)[1]\n    Py_XDECREF(obj)  # removing this results in memory leak\n    return 1\n\n\ncdef f_map_next get_map_iter(object d, PyObject* *ptr) except NULL:\n    \"\"\"Return function pointer to perform iteration over object returned in ptr.\n    The returned function signature matches \"PyDict_Next\".  If ``d`` is a dict,\n    then the returned function *is* PyDict_Next, so iteration wil be very fast.\n    The object returned through ``ptr`` needs to have its reference count\n    reduced by one once the caller \"owns\" the object.\n    This function lets us control exactly how iteration should be performed\n    over a given mapping.  The current rules are:\n    1) If ``d`` is exactly a dict, use PyDict_Next\n    2) If ``d`` is subtype of dict, use PyMapping_Next.  This lets the user\n       control the order iteration, such as for ordereddict.\n    3) If using PyMapping_Next, iterate using ``iteritems`` if possible,\n       otherwise iterate using ``items``.\n    \"\"\"\n    cdef object val\n    cdef f_map_next rv\n    if PyDict_CheckExact(d):\n        val = d\n        rv = &PyDict_Next_Compat\n    elif hasattr(d, 'iteritems'):\n        val = _iter_mapping(iter(d.iteritems()))\n        rv = &PyMapping_Next\n    else:\n        val = _iter_mapping(iter(d.items()))\n        rv = &PyMapping_Next\n    Py_INCREF(val)\n    ptr[0] = <PyObject*>val\n    return rv\n\n\ncpdef object valfilter(object predicate, object d, object factory=dict):\n    \"\"\"\n    Filter items in dictionary by value\n    >>> iseven = lambda x: x % 2 == 0\n    >>> d = {1: 2, 2: 3, 3: 4, 4: 5}\n    >>> valfilter(iseven, d)\n    {1: 2, 3: 4}\n    See Also:\n        keyfilter\n        itemfilter\n        valmap\n    \"\"\"\n    cdef:\n        object rv\n        f_map_next f\n        PyObject *obj\n        PyObject *pkey\n        PyObject *pval\n        Py_ssize_t pos = 0\n\n    rv = factory()\n    f = get_map_iter(d, &obj)\n    d = <object>obj\n    Py_DECREF(d)\n    while f(d, &pos, &pkey, &pval):\n        if predicate(<object>pval):\n            rv[<object>pkey] = <object>pval\n    return rv\n\n\ncpdef object valfilterfalse(object predicate, object d, object factory=dict):\n    \"\"\" Filter items in dictionary by values which are false.\n\n    >>> iseven = lambda x: x % 2 == 0\n    >>> d = {1: 2, 2: 3, 3: 4, 4: 5}\n    >>> valfilterfalse(iseven, d)\n    {2: 3, 4: 5}\n\n    See Also:\n        valfilter\n    \"\"\"\n    cdef:\n        object rv\n        f_map_next f\n        PyObject *obj\n        PyObject *pkey\n        PyObject *pval\n        Py_ssize_t pos = 0\n\n    rv = factory()\n    f = get_map_iter(d, &obj)\n    d = <object>obj\n    Py_DECREF(d)\n    while f(d, &pos, &pkey, &pval):\n        if not predicate(<object>pval):\n            rv[<object>pkey] = <object>pval\n    return rv\n"
  },
  {
    "path": "src/hangar/records/__init__.py",
    "content": "from .hashmachine import hash_func_from_tcode\nfrom .column_parsers import *\nfrom .recordstructs import (\n    CompatibleData,\n    ColumnSchemaKey,\n    FlatColumnDataKey,\n    NestedColumnDataKey,\n    DataRecordVal,\n)\n\n\n__all__ = column_parsers.__all__ + [\n    'hash_func_from_tcode',\n    'CompatibleData',\n    'ColumnSchemaKey',\n    'FlatColumnDataKey',\n    'NestedColumnDataKey',\n    'DataRecordVal',\n]\n"
  },
  {
    "path": "src/hangar/records/column_parsers.pyx",
    "content": "from .recordstructs cimport ColumnSchemaKey, \\\n    FlatColumnDataKey, \\\n    NestedColumnDataKey, \\\n    DataRecordVal\n\nfrom .recordstructs import ColumnSchemaKey, \\\n    FlatColumnDataKey, \\\n    NestedColumnDataKey, \\\n    DataRecordVal\n\nimport ast\n\n__all__ = [\n    'schema_record_count_start_range_key',\n    'schema_db_key_from_column',\n    'schema_db_range_key_from_column_unknown_layout',\n    'schema_column_record_from_db_key',\n    'schema_hash_record_db_val_from_spec',\n    'schema_spec_from_db_val',\n    'schema_hash_db_key_from_digest',\n    'data_record_digest_val_from_db_val',\n    'data_record_db_val_from_digest',\n    'flat_data_column_record_start_range_key',\n    'flat_data_db_key_from_names',\n    'flat_data_record_from_db_key',\n    'nested_data_column_record_start_range_key',\n    'nested_data_db_key_from_names',\n    'nested_data_record_from_db_key',\n    'dynamic_layout_data_record_from_db_key',\n    'dynamic_layout_data_record_db_start_range_key',\n    'dynamic_layout_data_record_db_key_from_names',\n    'hash_schema_db_key_from_raw_key',\n    'hash_data_db_key_from_raw_key',\n    'hash_schema_raw_key_from_db_key',\n    'hash_data_raw_key_from_db_key',\n    'schema_record_db_val_from_digest',\n]\n\n\n# ----------------------- Schema Record Parsers -------------------------------\n\n\ncpdef bytes schema_record_count_start_range_key():\n    return 's:'.encode()\n\n\ncpdef bytes schema_db_key_from_column(str column, str layout):\n    \"\"\"column schema db formated key from name and layout.\n\n    Parameters\n    ----------\n    column: str\n        name of the column\n    layout: str\n        layout of the column schema ('flat', 'nested', etc.)\n    \"\"\"\n    cdef str serial\n\n    if layout == 'flat':\n        serial = f's:{column}:f'\n    elif layout == 'nested':\n        serial = f's:{column}:n'\n    else:\n        raise ValueError(f'layout {layout} not valid')\n    return serial.encode()\n\n\ncpdef bytes schema_db_range_key_from_column_unknown_layout(str column):\n    \"\"\"Find a cursor range key which will select a column schema key.\n    \n    Due to how information is appended onto the end of the schema db key,\n    there is no need to know the column_layout or schema_digest to uniquely\n    identify a column's schema record. set the cursor range and query the full\n    key value (passed into a seperate parser) to recieve the column_layout \n    and schema_digest / hash type code. The schema spec is accessed at the \n    record value, or in the hash db under the corresponding schema_digest key.\n    \n    Parameters\n    ----------\n    column: str\n        name of the column to query.\n    \"\"\"\n    cdef str serial\n\n    serial = f's:{column}:'\n    return serial.encode()\n\n\ncpdef ColumnSchemaKey schema_column_record_from_db_key(bytes raw):\n    cdef str serial, column, layout\n\n    serial = raw.decode()\n    _, column, layout = serial.split(':')\n    if layout == 'f':\n        layout = 'flat'\n    elif layout == 'n':\n        layout = 'nested'\n    else:\n        raise ValueError(f'layout unknown for serial key {serial}')\n    return ColumnSchemaKey(column, layout)\n\n\ncpdef bytes schema_hash_record_db_val_from_spec(dict schema):\n    cdef str serial\n\n    serial = repr(schema).replace(' ', '')\n    return serial.encode()\n\n\ncpdef dict schema_spec_from_db_val(bytes raw):\n    cdef str serialized\n    cdef dict schema\n\n    serialized = raw.decode()\n    schema = ast.literal_eval(serialized)\n    return schema\n\n\ncpdef bytes schema_hash_db_key_from_digest(str digest):\n    return f's:{digest}'.encode()\n\n\ncpdef bytes schema_record_db_val_from_digest(str digest):\n    return digest.encode()\n\n\n# -------------------- Data Digest Record Value Parser -------------------------\n\n\ncpdef DataRecordVal data_record_digest_val_from_db_val(bytes raw):\n    \"\"\"Convert and split a lmdb record value into data record val struct\n    \"\"\"\n    cdef str serial\n\n    serial = raw.decode()\n    return DataRecordVal(serial)\n\n\ncpdef bytes data_record_db_val_from_digest(str digest):\n    \"\"\"convert a data digest value spec into the appropriate lmdb record value\n    \"\"\"\n    return digest.encode()\n\n\n# -------------------------- flat parser --------------------------------------\n\n\ncpdef bytes flat_data_column_record_start_range_key(str column):\n    cdef str serial\n\n    serial = f'f:{column}:'\n    return serial.encode()\n\n\ncpdef bytes flat_data_db_key_from_names(str column, sample):\n    cdef str serial\n\n    if isinstance(sample, int):\n        serial = f'f:{column}:#{sample}'\n    else:\n        serial = f'f:{column}:{sample}'\n    return serial.encode()\n\n\ncpdef FlatColumnDataKey flat_data_record_from_db_key(bytes raw):\n    cdef str serial, column, sample\n\n    serial = raw.decode()\n    _, column, sample = serial.split(':')\n    return FlatColumnDataKey(column, sample)\n\n\n# -------------------------- nested parser ------------------------------------\n\n\ncpdef bytes nested_data_column_record_start_range_key(str column):\n    cdef str serial\n\n    serial = f'n:{column}:'\n    return serial.encode()\n\n\ncpdef bytes nested_data_db_key_from_names(str column, sample, subsample):\n    cdef str serial\n\n    if isinstance(sample, int):\n        sample = f'#{sample}'\n    if isinstance(subsample, int):\n        subsample = f'#{subsample}'\n    serial = f'n:{column}:{sample}:{subsample}'\n    return serial.encode()\n\n\ncpdef NestedColumnDataKey nested_data_record_from_db_key(bytes raw):\n    cdef str serial, column, sample, subsample\n\n    serial = raw.decode()\n    _, column, sample, subsample = serial.split(':')\n    return NestedColumnDataKey(column, sample, subsample)\n\n\n# ----------------------- dynamic parser selection ----------------------------\n\n\ncpdef object dynamic_layout_data_record_from_db_key(bytes raw):\n    if raw[0:2] == b'f:':\n        res = flat_data_record_from_db_key(raw)\n    elif raw[0:2] == b'n:':\n        res = nested_data_record_from_db_key(raw)\n    elif raw[0:2] == b's:':\n        res = schema_column_record_from_db_key(raw)\n    else:\n        raise ValueError(raw)\n    return res\n\n\ncpdef bytes dynamic_layout_data_record_db_start_range_key(ColumnSchemaKey column_record):\n    cdef bytes res\n\n    if column_record.layout == 'flat':\n        res = flat_data_column_record_start_range_key(column_record.column)\n    elif column_record.layout == 'nested':\n        res = nested_data_column_record_start_range_key(column_record.column)\n    else:\n        raise ValueError(column_record)\n    return res\n\n\ndef dynamic_layout_data_record_db_key_from_names(layout, column, *sample):\n    if layout == 'flat':\n        db_key = flat_data_db_key_from_names(column, sample[0])\n    elif layout == 'nested':\n        db_key = nested_data_db_key_from_names(column, sample[0], sample[1])\n    else:\n        raise ValueError(layout)\n    return db_key\n\n\n\n#\n# Data Hash parsing functions used to convert db key/val to raw pyhon obj\n# -----------------------------------------------------------------------\n\n\ncpdef bytes hash_record_count_start_range_key():\n    return 'h:'.encode()\n\n\ncpdef bytes hash_schema_db_key_from_raw_key(str schema_hash):\n    return f's:{schema_hash}'.encode()\n\n\ncpdef bytes hash_data_db_key_from_raw_key(str data_hash):\n    return f'h:{data_hash}'.encode()\n\n\ncpdef str hash_schema_raw_key_from_db_key(bytes db_key):\n    return db_key[2:].decode()\n\n\ncpdef str hash_data_raw_key_from_db_key(bytes db_key):\n    return db_key[2:].decode()\n"
  },
  {
    "path": "src/hangar/records/commiting.py",
    "content": "import configparser\nimport os\nimport shutil\nimport tempfile\nimport time\nfrom contextlib import contextmanager, closing\nfrom pathlib import Path\n\nimport lmdb\n\nfrom .heads import (\n    get_branch_head_commit,\n    get_staging_branch_head,\n    set_branch_head_commit,\n    set_staging_branch_head,\n)\nfrom .parsing import (\n    cmt_final_digest,\n    commit_parent_db_key_from_raw_key,\n    commit_parent_db_val_from_raw_val,\n    commit_parent_raw_key_from_db_key,\n    commit_parent_raw_val_from_db_val,\n    commit_ref_db_key_from_raw_key,\n    commit_ref_db_val_from_raw_val,\n    commit_ref_raw_val_from_db_val,\n    commit_spec_db_key_from_raw_key,\n    commit_spec_db_val_from_raw_val,\n    commit_spec_raw_val_from_db_val,\n    DigestAndBytes,\n)\nfrom ..constants import (\n    CONFIG_USER_NAME,\n    DIR_DATA_REMOTE,\n    DIR_DATA_STAGE,\n    DIR_DATA_STORE,\n    LMDB_SETTINGS,\n    SEP_KEY,\n)\nfrom ..txnctx import TxnRegister\n\n\"\"\"\nReading commit specifications and parents.\n------------------------------------------\n\"\"\"\n\n\ndef expand_short_commit_digest(refenv: lmdb.Environment, commit_hash: str) -> str:\n    \"\"\"Find the a full commit hash from a short version provided by the user\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        db where the commit references are stored\n    commit_hash : str\n        short commit hash to search for in the repository\n\n    Returns\n    -------\n    str\n        full commit hash if short maps to a unique digest in the repo history\n\n    Raises\n    ------\n    KeyError\n        If the short commit hash can reference two full commit digests\n    KeyError\n        if no expanded commit digest is found starting with the short version.\n    \"\"\"\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    commitParentStart = commit_parent_db_key_from_raw_key(commit_hash)\n    with reftxn.cursor() as cursor:\n        shortHashExists = cursor.set_range(commitParentStart)\n        if shortHashExists is True:\n            commitKey = cursor.key()\n            commit_key = commit_parent_raw_key_from_db_key(commitKey)\n            cursor.next()\n            cursor.next()\n            nextHashExist = cursor.next()\n            if nextHashExist is False:\n                return commit_key\n            nextCommitKey = cursor.key()\n            next_commit_key = commit_parent_raw_key_from_db_key(nextCommitKey)\n            if next_commit_key.startswith(commit_hash) is True:\n                raise KeyError(f'Non unique short commit hash: {commit_hash}')\n            else:\n                return commit_key\n        else:\n            raise KeyError(f'No matching commit hash found starting with: {commit_hash}')\n\n\ndef check_commit_hash_in_history(refenv, commit_hash):\n    \"\"\"Check if a commit hash exists in the repository history\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        refenv where the commit history is stored\n    commit_hash : str\n        hash of the commit to check for existence\n\n    Returns\n    -------\n    bool\n        True if exists, otherwise False\n    \"\"\"\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        commitParentKey = commit_parent_db_key_from_raw_key(commit_hash)\n        commitParentVal = reftxn.get(commitParentKey, default=False)\n        isCommitInHistory = True if commitParentVal is not False else False\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n    return isCommitInHistory\n\n\ndef get_commit_spec(refenv, commit_hash):\n    \"\"\"Get the commit specifications of a particular hash.\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        refenv where the specs are stored\n    commit_hash : str\n        commit hash to query\n\n    Returns\n    -------\n    namedtuple\n        named tuple with all the commit specs included\n\n    Raises\n    ------\n    ValueError\n        if no commit exists with the provided hash\n    \"\"\"\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        parentCommitSpecKey = commit_spec_db_key_from_raw_key(commit_hash)\n        parentCommitSpecVal = reftxn.get(parentCommitSpecKey, default=False)\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n\n    if parentCommitSpecVal is False:\n        raise ValueError(f'No commit exists with the hash: {commit_hash}')\n\n    parentCommitSpec = commit_spec_raw_val_from_db_val(parentCommitSpecVal)\n    return parentCommitSpec.user_spec\n\n\ndef get_commit_ancestors(refenv, commit_hash):\n    \"\"\"find the ancestors of a particular commit hash.\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        lmdb environment where the commit refs are stored\n    commit_hash : string\n        commit hash to find the ancestors for\n\n    Returns\n    -------\n    namedtuple\n        Namedtuple describing is_merge_commit, master_ancestor, &\n        child_ancestor (in the even of merge commit)\n\n    Raises\n    ------\n    ValueError\n        if no commit exists with the provided hash\n    \"\"\"\n\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        parentCommitKey = commit_parent_db_key_from_raw_key(commit_hash)\n        parentCommitVal = reftxn.get(parentCommitKey, default=False)\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n\n    if parentCommitVal is False:\n        raise ValueError(f'No commit exists with the hash: {commit_hash}')\n\n    parentCommitAncestors = commit_parent_raw_val_from_db_val(parentCommitVal)\n    return parentCommitAncestors.ancestor_spec\n\n\ndef get_commit_ancestors_graph(refenv, starting_commit):\n    \"\"\"returns a DAG of all commits starting at some hash pointing to the repo root.\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        lmdb environment where the commit refs are stored\n    starting_commit : string\n        commit hash to start creating the DAG from\n\n    Returns\n    -------\n    dict\n        a dictionary where each key is a commit hash encountered along the way,\n        and it's value is a list containing either one or two elements which\n        identify the child commits of that parent hash.\n    \"\"\"\n    parent_commit = starting_commit\n    commit_graph = {}\n    seen = set(starting_commit)\n    more_work = []\n    end_commit = False\n\n    if parent_commit == '':\n        end_commit = True\n\n    while end_commit is not True:\n        childCommit = get_commit_ancestors(refenv, parent_commit)\n\n        if (childCommit.master_ancestor == '') or (childCommit.master_ancestor in seen):\n            end_commit = True\n            commit_graph[parent_commit] = [childCommit.master_ancestor]\n            if len(more_work) != 0:\n                master_commit = more_work.pop(0)\n                end_commit = False\n            else:\n                continue\n\n        elif childCommit.is_merge_commit is True:\n            master_commit = childCommit.master_ancestor\n            dev_commit = childCommit.dev_ancestor\n            more_work.append(dev_commit)\n            commit_graph[parent_commit] = [master_commit, dev_commit]\n            seen.add(master_commit)\n            seen.add(dev_commit)\n\n        else:\n            master_commit = childCommit.master_ancestor\n            commit_graph[parent_commit] = [master_commit]\n            seen.add(master_commit)\n\n        parent_commit = master_commit\n\n    return commit_graph\n\n\n\"\"\"\nMethods for reading packed commit data and reconstructing an unpacked format.\n-----------------------------------------------------------------------------\n\"\"\"\n\n\ndef get_commit_ref(refenv, commit_hash):\n    \"\"\"Read the commit data record references from a specific commit.\n\n    This only returns a list of tuples with binary encoded key/value pairs.\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment`\n        lmdb environment where the references are stored\n    commit_hash : string\n        hash of the commit to retrieve.\n\n    Returns\n    -------\n    tuple\n        tuple of tuples containing encoded key/value pairs of the data\n        records\n\n    Raises\n    ------\n    ValueError\n        if no commit exists with the provided hash\n    \"\"\"\n    reftxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        cmtRefKey = commit_ref_db_key_from_raw_key(commit_hash)\n        cmtSpecKey = commit_spec_db_key_from_raw_key(commit_hash)\n        cmtParentKey = commit_parent_db_key_from_raw_key(commit_hash)\n\n        cmtRefVal = reftxn.get(cmtRefKey, default=False)\n        cmtSpecVal = reftxn.get(cmtSpecKey, default=False)\n        cmtParentVal = reftxn.get(cmtParentKey, default=False)\n    except lmdb.BadValsizeError:\n        raise ValueError(f'No commit exists with the hash: {commit_hash}')\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n\n    if (cmtRefVal is False) or (cmtSpecVal is False) or (cmtParentVal is False):\n        raise ValueError(f'No commit exists with the hash: {commit_hash}')\n\n    commitRefs = commit_ref_raw_val_from_db_val(cmtRefVal)\n    commitSpecs = commit_spec_raw_val_from_db_val(cmtSpecVal)\n    commitParent = commit_parent_raw_val_from_db_val(cmtParentVal)\n\n    calculatedDigest = cmt_final_digest(\n        parent_digest=commitParent.digest,\n        spec_digest=commitSpecs.digest,\n        refs_digest=commitRefs.digest)\n\n    if calculatedDigest != commit_hash:\n        raise IOError(\n            f'Data Corruption Detected. On retrieval of stored references for '\n            f'commit_hash: {commit_hash} validation of commit record/contents '\n            f'integrity failed. Calculated digest: {calculatedDigest} != '\n            f'expected: {commit_hash}. Please alert the Hangar development team to '\n            f'this error if possible.')\n\n    return commitRefs.db_kvs\n\n\ndef unpack_commit_ref(refenv, cmtrefenv, commit_hash):\n    \"\"\"unpack a commit record ref into a new key/val db for reader checkouts.\n\n    This method also validates that the record data (parent, spec, and refs)\n    have not been corrupted on disk (ie)\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        environment handle open for reading in the refenv\n    cmtrefenv : lmdb.Environment\n        environment handle open for writing on disk. this db must be empty.\n    commit_hash : str\n        hash of the commit to read in from refs and unpack in a checkout.\n    \"\"\"\n\n    commitRefs = get_commit_ref(refenv=refenv, commit_hash=commit_hash)\n    cmttxn = TxnRegister().begin_writer_txn(cmtrefenv)\n    try:\n        with cmttxn.cursor() as cursor:\n            cursor.first()\n            cursor.putmulti(commitRefs, append=True)\n        try:\n            cursor.close()\n        except Exception as e:\n            msg = f'could not close cursor cmttxn {cmttxn} commit_hash {commit_hash}'\n            e.args = (*e.args, msg)\n            raise e\n    finally:\n        TxnRegister().commit_writer_txn(cmtrefenv)\n\n    return\n\n\n@contextmanager\ndef tmp_cmt_env(refenv: lmdb.Environment, commit_hash: str):\n    \"\"\"create temporary unpacked lmdb environment from compressed structure\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        lmdb environment where the commit refs are stored\n    commit_hash : str\n        hash of the commit to get the contents of\n\n    Returns\n    -------\n    lmdb.Environment\n        environment with all db contents from ``commit`` unpacked\n    \"\"\"\n    with tempfile.TemporaryDirectory() as tmpdir:\n        tmpDF = os.path.join(tmpdir, 'test.lmdb')\n        with closing(\n                lmdb.open(tmpDF, sync=False, writemap=True, **LMDB_SETTINGS)\n        ) as tmpDB:\n            unpack_commit_ref(refenv, tmpDB, commit_hash)\n            yield tmpDB\n\n\n\"\"\"\nMethods to write new commits\n----------------------------\n\nThe functions below act to:\n    - Reading and formatting all record data from the staging area.\n    - Determining the ancestor(s) of the new commit\n    - Specify commit details (message, time, committer-info, etc.)\n    - Coordinate record hashing\n    - Write the commit record\n    - Update the branch head to point to the new commit hash\n\"\"\"\n\n# ---------------- Functions to format the written values of a commit --------------------\n\n\ndef _commit_ancestors(branchenv: lmdb.Environment,\n                      *,\n                      is_merge_commit: bool = False,\n                      master_branch: str = '',\n                      dev_branch: str = '') -> DigestAndBytes:\n    \"\"\"Format the commit parent db value, finding HEAD commits automatically.\n\n    This method handles formatting for both regular & merge commits through the\n    the keyword only arguments.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        Lmdb environment where branch data is located. If not merge commit, head\n        branch and commit will be found.\n    is_merge_commit : bool, optional\n        If this is a merge commit or now, defaults to False\n    master_branch : str, optional\n        If merge commit, the master branch name must be specified, and the\n        branch HEAD commit hash will be determined automatically, defaults to ''\n    dev_branch : str, optional\n        If merge commit, the dev branch name must be specified, and the branch\n        HEAD commit hash will be determined automatically, defaults to ''\n\n    Returns\n    -------\n    DigestAndBytes\n        Commit parent db value and digest of commit parent val formatted\n        appropriately based on the repo state and any specified arguments.\n    \"\"\"\n    if not is_merge_commit:\n        masterBranch = get_staging_branch_head(branchenv)\n        master_ancestor = get_branch_head_commit(branchenv, masterBranch)\n        dev_ancestor = ''\n    else:\n        master_ancestor = get_branch_head_commit(branchenv, master_branch)\n        dev_ancestor = get_branch_head_commit(branchenv, dev_branch)\n\n    commitParentVal = commit_parent_db_val_from_raw_val(\n        master_ancestor=master_ancestor,\n        dev_ancestor=dev_ancestor,\n        is_merge_commit=is_merge_commit)\n\n    return commitParentVal\n\n\ndef _commit_spec(message: str, user: str, email: str) -> DigestAndBytes:\n    \"\"\"Format the commit specification according to the supplied username and email.\n\n    This method currently only acts as a pass through to the parsing options\n    (with time filled in).\n\n    Parameters\n    ----------\n    message : string\n        Commit message sent in by the user.\n    user : str, optional\n        Name of the committer\n    email : str, optional\n        Email of the committer\n\n    Returns\n    -------\n    DigestAndBytes\n        Formatted value for the specification field of the commit and digest of\n        spec.\n    \"\"\"\n    spec_db = commit_spec_db_val_from_raw_val(commit_time=time.time(),\n                                              commit_message=message,\n                                              commit_user=user,\n                                              commit_email=email)\n    return spec_db\n\n\ndef _commit_ref(stageenv: lmdb.Environment) -> DigestAndBytes:\n    \"\"\"Query and format all staged data records, and format it for ref storage.\n\n    Parameters\n    ----------\n    stageenv : lmdb.Environment\n        lmdb environment where the staged record data is actually stored.\n\n    Returns\n    -------\n    DigestAndBytes\n        Serialized and compressed version of all staged record data along with\n        digest of commit refs.\n    \"\"\"\n    from .queries import RecordQuery  # needed to avoid cyclic import\n\n    querys = RecordQuery(dataenv=stageenv)\n    allRecords = tuple(querys._traverse_all_records())\n    res = commit_ref_db_val_from_raw_val(allRecords)\n    return res\n\n\n# -------------------- Format ref k/v pairs and write the commit to disk ----------------\n\n\ndef commit_records(message, branchenv, stageenv, refenv, repo_path: Path,\n                   *, is_merge_commit=False, merge_master=None, merge_dev=None):\n    \"\"\"Commit all staged records to the repository, updating branch HEAD as needed.\n\n    This method is intended to work for both merge commits as well as regular\n    ancestor commits.\n\n    Parameters\n    ----------\n    message : string\n        Message the user associates with what has been added, removed, or\n        changed in this commit. Must not be empty.\n    branchenv : lmdb.Environment\n        lmdb environment where branch records are stored.\n    stageenv : lmdb.Environment\n        lmdb environment where the staged data records are stored in\n        uncompressed format.\n    refenv : lmdb.Environment\n        lmdb environment where the commit ref records are stored.\n    repo_path : Path\n        path to the hangar repository on disk\n    is_merge_commit : bool, optional\n        Is the commit a merge commit or not? defaults to False\n    merge_master : string, optional\n        If merge commit, specify the name of the master branch, defaults to None\n    merge_dev : string, optional\n        If merge commit, specify the name of the dev branch, defaults to None\n\n    Returns\n    -------\n    string\n        Commit hash of the newly added commit\n    \"\"\"\n    cmtParent = _commit_ancestors(branchenv=branchenv,\n                                  is_merge_commit=is_merge_commit,\n                                  master_branch=merge_master,\n                                  dev_branch=merge_dev)\n\n    user_info_pth = Path(repo_path, CONFIG_USER_NAME)\n    CFG = configparser.ConfigParser()\n    CFG.read(user_info_pth)\n\n    USER_NAME = CFG['USER']['name']\n    USER_EMAIL = CFG['USER']['email']\n    if (USER_NAME is None) or (USER_EMAIL is None):\n        raise RuntimeError(f'Username and Email are required. Please configure.')\n\n    cmtSpec = _commit_spec(message=message, user=USER_NAME, email=USER_EMAIL)\n    cmtRefs = _commit_ref(stageenv=stageenv)\n\n    commit_hash = cmt_final_digest(parent_digest=cmtParent.digest,\n                                   spec_digest=cmtSpec.digest,\n                                   refs_digest=cmtRefs.digest)\n\n    commitSpecKey = commit_spec_db_key_from_raw_key(commit_hash)\n    commitParentKey = commit_parent_db_key_from_raw_key(commit_hash)\n    commitRefKey = commit_ref_db_key_from_raw_key(commit_hash)\n\n    reftxn = TxnRegister().begin_writer_txn(refenv)\n    try:\n        reftxn.put(commitSpecKey, cmtSpec.raw, overwrite=False)\n        reftxn.put(commitParentKey, cmtParent.raw, overwrite=False)\n        reftxn.put(commitRefKey, cmtRefs.raw, overwrite=False)\n    finally:\n        TxnRegister().commit_writer_txn(refenv)\n\n    # possible separate function\n    move_process_data_to_store(repo_path)\n    if is_merge_commit is False:\n        headBranchName = get_staging_branch_head(branchenv)\n        set_branch_head_commit(branchenv, headBranchName, commit_hash)\n    else:\n        set_staging_branch_head(branchenv=branchenv, branch_name=merge_master)\n        set_branch_head_commit(branchenv, merge_master, commit_hash)\n\n    return commit_hash\n\n\n# --------------------- staging setup, may need to move this elsewhere ------------------\n\n\ndef replace_staging_area_with_commit(refenv, stageenv, commit_hash):\n    \"\"\"DANGER ZONE: Delete the stage db and replace it with a copy of a commit environment.\n\n    .. warning::\n\n        In the current implementation, this method will not validate that it is safe\n        to do this operation. All validation logic must be handled upstream.\n\n    Parameters\n    ----------\n    refenv : [type]\n        lmdb environment opened to the long term storage commit env\n    stageenv : lmdb.Environment\n        lmdb environment opened to the staging area.\n    commit_hash : str\n        commit hash to read from the refenv and replace the stage contents with.\n    \"\"\"\n    stagetxn = TxnRegister().begin_writer_txn(stageenv)\n    with stagetxn.cursor() as cursor:\n        positionExists = cursor.first()\n        while positionExists:\n            positionExists = cursor.delete()\n    cursor.close()\n    TxnRegister().commit_writer_txn(stageenv)\n\n    unpack_commit_ref(refenv=refenv, cmtrefenv=stageenv, commit_hash=commit_hash)\n    return\n\n\ndef replace_staging_area_with_refs(stageenv, sorted_content):\n    \"\"\"DANGER ZONE: Delete all stage db records and replace it with specified data.\n\n    .. warning::\n\n        In the current implementation, this method will not validate that it is safe\n        to do this operation. All validation logic must be handled upstream.\n\n    Parameters\n    ----------\n    stageenv : lmdb.Environment\n        staging area db to replace all data in.\n    sorted_content : iterable of tuple\n        iterable containing two-tuple of byte encoded record data to place in\n        the stageenv db. index 0 -> db key; index 1 -> db val, it is assumed\n        that the order of the tuples is lexicographically sorted by index 0\n        values, if not, this will result in unknown behavior.\n    \"\"\"\n    stagetxn = TxnRegister().begin_writer_txn(stageenv)\n    with stagetxn.cursor() as cursor:\n        positionExists = cursor.first()\n        while positionExists:\n            positionExists = cursor.delete()\n    cursor.close()\n    TxnRegister().commit_writer_txn(stageenv)\n\n    cmttxn = TxnRegister().begin_writer_txn(stageenv)\n    try:\n        with cmttxn.cursor() as cursor:\n            cursor.first()\n            cursor.putmulti(sorted_content, append=True)\n        cursor.close()\n    finally:\n        TxnRegister().commit_writer_txn(stageenv)\n\n\ndef move_process_data_to_store(repo_path: Path, *, remote_operation: bool = False):\n    \"\"\"Move symlinks to hdf5 files from process directory to store directory\n\n    In process writes never directly access files in the data directory.\n    Instead, when the file is created is is symlinked to either the remote data\n    or stage data directory. All access is handled through this intermediate\n    symlink in order to prevent any ability to overwrite (even if there are\n    major errors in the hash records). Once the write operation is packed in\n    the staging or remote area, this method is called to move the symlinks from\n    the write enabled directory to the (read only, fully-committed) storage\n    dir.\n\n    Parameters\n    ----------\n    repo_path : Path\n        path to the repository on dir\n    remote_operation : bool, optional\n        If this operation is occurring from a remote fetch operation. (the\n        default is False, which means that all changes will occur in the\n        staging area)\n\n    \"\"\"\n    store_dir = Path(repo_path, DIR_DATA_STORE)\n\n    type_dir = DIR_DATA_REMOTE if remote_operation else DIR_DATA_STAGE\n    process_dir = Path(repo_path, type_dir)\n\n    store_fps = []\n    for be_pth in process_dir.iterdir():\n        if be_pth.is_dir():\n            for fpth in be_pth.iterdir():\n                if fpth.is_file() and not fpth.stem.startswith('.'):\n                    store_fps.append(store_dir.joinpath(be_pth.name, fpth.name))\n\n    for fpth in store_fps:\n        if not fpth.parent.is_dir():\n            fpth.parent.mkdir()\n        fpth.touch()\n\n    # reset before releasing control.\n    shutil.rmtree(process_dir)\n    process_dir.mkdir(exist_ok=False)\n\n\ndef list_all_commits(refenv):\n    \"\"\"returns a list of all commits stored in the repository\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        db where all commit data is stored\n\n    Returns\n    -------\n    list\n        list of all commit digests.\n    \"\"\"\n    refTxn = TxnRegister().begin_reader_txn(refenv)\n    try:\n        commits = set()\n        with refTxn.cursor() as cursor:\n            cursor.first()\n            for k in cursor.iternext(keys=True, values=False):\n                commitKey, *_ = k.decode().split(SEP_KEY)\n                commits.add(commitKey)\n            cursor.close()\n    finally:\n        TxnRegister().abort_reader_txn(refenv)\n\n    return list(commits)\n\n\ndef number_commits_recorded(refenv) -> int:\n    \"\"\"Returns the total number of commits made across all history.\n    \"\"\"\n    return len(list_all_commits(refenv))\n\n"
  },
  {
    "path": "src/hangar/records/hashmachine.pyx",
    "content": "import array\nfrom cpython cimport array\n\nimport numpy as np\nfrom hashlib import blake2b\n\n\ncpdef str hash_type_code_from_digest(str digest):\n    return digest[0]\n\n\ncpdef object hash_func_from_tcode(str tcode):\n    if tcode == '0':\n        return ndarray_hasher_tcode_0\n    elif tcode == '1':\n        return schema_hasher_tcode_1\n    elif tcode == '2':\n        return pystr_hasher_tcode_2\n    elif tcode == '3':\n        return pybytes_hasher_tcode_3\n    else:\n        raise ValueError(f'unknown hash function type code. tcode: {tcode}')\n\n\n# ---------------------------- numpy ndarray data ------------------------------\n\n\ncdef bytes ser_int_list(list lst):\n    cdef Py_ssize_t n=len(lst)\n    cdef array.array res=array.array('i')\n\n    array.resize(res, n)  #preallocate memory\n    for i in range(n):\n        # lst.__get__() needs Python-Integer, so let i\n        # be a python-integer (not cdef)\n        res.data.as_ints[i] = lst[i]\n    return res.data.as_chars[:n*sizeof(int)]\n\n\ndef ndarray_hasher_tcode_0(array not None):\n    \"\"\"Generate the hex digest of some array data.\n\n    This method hashes the concatenation of both array data bytes as well as a\n    binary struct with the array shape and dtype num packed in. This is in\n    order to avoid hash collisions where an array can have the same bytes, but\n    different shape. an example of a collision is: np.zeros((10, 10, 10)) and\n    np.zeros((1000,))\n\n    Parameters\n    ----------\n    array : np.ndarray\n        array data to take the hash of\n\n    Returns\n    -------\n    str\n        hex digest of the array data with typecode prepended by '{tcode}='.\n    \"\"\"\n    cdef str digest\n    cdef bytes other_info\n    cdef list shape = []\n\n    shape = list(array.shape)\n    shape.append(array.dtype.num)\n    other_info = ser_int_list(shape)\n\n    hasher = blake2b(array, digest_size=20)\n    hasher.update(other_info)\n    digest = hasher.hexdigest()\n    return f'0={digest}'\n\n\n# ------------------------------ Schema ---------------------------------------\n\n\ndef _make_hashable(o):\n    \"\"\"Sort container object and deterministically output frozen representation\n    \"\"\"\n    if isinstance(o, (tuple, list)):\n        return tuple((_make_hashable(e) for e in o))\n\n    if isinstance(o, dict):\n        return tuple(sorted((k, _make_hashable(v)) for k, v in o.items()))\n\n    if isinstance(o, (set, frozenset)):\n        return tuple(sorted(_make_hashable(e) for e in o))\n    return o\n\n\ncpdef str schema_hasher_tcode_1(dict schema):\n    \"\"\"Generate the schema hash for some schema specification\n\n    Parameters\n    ----------\n    schema : dict\n        dict representation of the schema spec.\n\n    Returns\n    -------\n    str\n        hex digest of this information with typecode prepended by '{tcode}='.\n    \"\"\"\n    cdef bytes serialized\n    cdef str digest, res\n\n    frozenschema = _make_hashable(schema)\n    serialized = repr(frozenschema).encode()\n    digest = blake2b(serialized, digest_size=6).hexdigest()\n    res = f'1={digest}'\n    return res\n\n\n# --------------------------- string type data ----------------------------------------\n\n\ncpdef str pystr_hasher_tcode_2(str value):\n    \"\"\"Generate the hash digest of some str value\n\n    Parameters\n    ----------\n    value : str\n        data value to hash\n\n    Returns\n    -------\n    str\n        hex digest of the data value with typecode prepended by '{tcode}='.\n    \"\"\"\n    cdef bytes raw\n    cdef str digest, res\n\n    raw = value.encode()\n    digest = blake2b(raw, digest_size=20).hexdigest()\n    res = f'2={digest}'\n    return res\n\n\n\n# --------------------------- bytes type data ----------------------------------------\n\n\ncpdef str pybytes_hasher_tcode_3(bytes value):\n    \"\"\"Generate the hash digest of some bytes value\n\n    Parameters\n    ----------\n    value : bytes\n        data value to hash\n\n    Returns\n    -------\n    str\n        hex digest of the data value with typecode prepended by '{tcode}='.\n    \"\"\"\n    cdef str digest, res\n\n    digest = blake2b(value, digest_size=20).hexdigest()\n    res = f'3={digest}'\n    return res\n"
  },
  {
    "path": "src/hangar/records/hashs.py",
    "content": "from pathlib import Path\nfrom typing import Iterable, List, Tuple, Union, Set\n\nimport lmdb\n\nfrom .column_parsers import (\n    hash_record_count_start_range_key,\n    hash_schema_raw_key_from_db_key,\n    hash_data_raw_key_from_db_key,\n    schema_hash_db_key_from_digest,\n    schema_spec_from_db_val,\n    schema_record_count_start_range_key\n)\nfrom ..backends import BACKEND_ACCESSOR_MAP, backend_decoder\nfrom ..txnctx import TxnRegister\nfrom ..mixins import CursorRangeIterator\nfrom ..utils import ilen\n\n\nclass HashQuery(CursorRangeIterator):\n    \"\"\"Traverse and query contents contained in ``hashenv`` db\n\n    These methods operate on the database which store the mapping of some data\n    digest to it's location on disk (or value in the case of metadata and\n    schemas). These databases are not specific to a particular commit; the\n    records are for every piece of data stored in every commit across history.\n\n    There are relatively few procedures which require traversal and mapping\n    across data records in this manner. The two most notable use cases are:\n\n        1. Remote client-server negotiation operations\n        2. Verifying the integrity of a repositories historical provenance, commit\n        contents, and data stored on disk.\n    \"\"\"\n\n    def __init__(self, hashenv: lmdb.Environment):\n        self._hashenv = hashenv\n\n    # ------------------ traversing the unpacked records ----------------------\n\n    def _traverse_all_hash_records(self, keys: bool = True, values: bool = True\n                                   ) -> Iterable[Union[bytes, Tuple[bytes, bytes]]]:\n        \"\"\"PUll out all binary encoded data hash records.\n\n        Parameters\n        ----------\n        keys : bool, optional\n            if True, returns keys, by default True\n        values : bool, optional\n            if True, return values, by default True\n\n        Yields\n        -------\n        Union[bytes, Tuple[bytes, bytes]]\n            Iterable of schema record keys, values, or items tuple\n        \"\"\"\n        startHashRangeKey = hash_record_count_start_range_key()\n        try:\n            hashtxn = TxnRegister().begin_reader_txn(self._hashenv)\n            yield from self.cursor_range_iterator(hashtxn, startHashRangeKey, keys, values)\n        finally:\n            TxnRegister().abort_reader_txn(self._hashenv)\n\n    def _traverse_all_schema_records(self, keys: bool = True, values: bool = True\n                                     ) -> Iterable[Union[bytes, Tuple[bytes, bytes]]]:\n        \"\"\"Pull out all binary encoded schema hash records.\n\n        Parameters\n        ----------\n        keys : bool, optional\n            if True, returns keys, by default True\n        values : bool, optional\n            if True, return values, by default True\n\n        Yields\n        -------\n        Union[bytes, Tuple[bytes, bytes]]\n            Iterable of schema record keys, values, or items tuple\n        \"\"\"\n        startSchemaRangeKey = schema_record_count_start_range_key()\n        try:\n            hashtxn = TxnRegister().begin_reader_txn(self._hashenv)\n            yield from self.cursor_range_iterator(hashtxn, startSchemaRangeKey, keys, values)\n        finally:\n            TxnRegister().abort_reader_txn(self._hashenv)\n\n    def list_all_hash_keys_raw(self) -> List[str]:\n        recs = self._traverse_all_hash_records(keys=True, values=False)\n        return list(map(hash_data_raw_key_from_db_key, recs))\n\n    def gen_all_hash_keys_db(self) -> Iterable[bytes]:\n        return self._traverse_all_hash_records(keys=True, values=False)\n\n    def intersect_keys_db(self, other: Set[bytes]):\n        \"\"\"Set intersection of provided keys and those contained in the database.\n\n        Parameters\n        ----------\n        other: Set[bytes]\n            Set of db formated keys to intersect with keys of the lmdb environment.\n\n        Returns\n        -------\n        Set[bytes]\n            intersection of input with the keys existing in the lmdb environment.\n        \"\"\"\n        res = []\n        hashtxn = TxnRegister().begin_reader_txn(self._hashenv)\n        try:\n            with hashtxn.cursor() as cur:\n                # sort input sequence to reduce time spent moving cursor.\n                for key in sorted(other):\n                    if cur.set_key(key):\n                        res.append(key)\n        finally:\n            TxnRegister().abort_reader_txn(self._hashenv)\n        return set(res)\n\n    def list_all_schema_digests(self) -> List[str]:\n        recs = self._traverse_all_schema_records(keys=True, values=False)\n        return list(map(hash_schema_raw_key_from_db_key, recs))\n\n    def gen_all_schema_keys_db(self) -> Iterable[bytes]:\n        return self._traverse_all_schema_records(keys=True, values=False)\n\n    def num_data_records(self) -> int:\n        \"\"\"Total count of all data digests / backends specs stored over all repo history.\n        \"\"\"\n        num_total = self._hashenv.stat()['entries']\n        remaining = num_total - self.num_schema_records()\n        return remaining\n\n    def num_schema_records(self) -> int:\n        \"\"\"Total count of schema digests / spec defs stored over all repo history.\n        \"\"\"\n        return ilen(self._traverse_all_schema_records(keys=True, values=False))\n\n    def gen_all_data_digests_and_parsed_backend_specs(self):\n        for dbk, dbv in self._traverse_all_hash_records(keys=True, values=True):\n            rawk = hash_data_raw_key_from_db_key(dbk)\n            rawv = backend_decoder(dbv)\n            yield (rawk, rawv)\n\n    def gen_all_schema_digests_and_parsed_specs(self) -> Iterable[Tuple[str, dict]]:\n        for dbk, dbv in self._traverse_all_schema_records(keys=True, values=True):\n            rawk = hash_schema_raw_key_from_db_key(dbk)\n            rawv = schema_spec_from_db_val(dbv)\n            yield (rawk, rawv)\n\n    def get_schema_digest_spec(self, digest) -> dict:\n        schemaHashKey = schema_hash_db_key_from_digest(digest)\n        try:\n            hashtxn = TxnRegister().begin_reader_txn(self._hashenv)\n            schemaSpecVal = hashtxn.get(schemaHashKey)\n        finally:\n            TxnRegister().abort_reader_txn(self._hashenv)\n\n        schema_spec = schema_spec_from_db_val(schemaSpecVal)\n        return schema_spec\n\n\ndef backends_remove_in_process_data(repo_path: Path, *, remote_operation: bool = False):\n    \"\"\"DANGER! Permanently delete uncommitted data files/links for stage or remote area.\n\n    This searches each backend accessors staged (or remote) folder structure for\n    files, and if any are present the symlinks in stagedir and backing data\n    files in datadir are removed.\n\n    Parameters\n    ----------\n    repo_path : Path\n        path to the repository on disk\n    remote_operation : optional, kwarg only, bool\n        If true, modify contents of the remote_dir, if false (default) modify\n        contents of the staging directory.\n    \"\"\"\n    for backend, accesor in BACKEND_ACCESSOR_MAP.items():\n        if accesor is not None:\n            accesor.delete_in_process_data(repo_path=repo_path,\n                                           remote_operation=remote_operation)\n\n\ndef clear_stage_hash_records(stagehashenv):\n    \"\"\"Drop all records in the stagehashenv db\n\n    This operation should be performed anytime a reset of the staging area is\n    performed (including for commits, merges, and checkouts)\n\n    Parameters\n    ----------\n    stagehashenv : lmdb.Environment\n        db where staged data hash additions are recorded\n    \"\"\"\n    stagehashtxn = TxnRegister().begin_writer_txn(stagehashenv)\n    with stagehashtxn.cursor() as cursor:\n        positionExists = cursor.first()\n        while positionExists:\n            positionExists = cursor.delete()\n    cursor.close()\n    TxnRegister().commit_writer_txn(stagehashenv)\n\n\ndef remove_stage_hash_records_from_hashenv(hashenv, stagehashenv):\n    \"\"\"Remove references to data additions during a hard reset\n\n    For every hash record in stagehashenv, remove the corresponding k/v pair\n    from the hashenv db. This is a dangerous operation if the stagehashenv was\n    not appropriately constructed!!!\n\n    Parameters\n    ----------\n    hashenv : lmdb.Environment\n        db where all the permanent hash records are stored\n    stagehashenv : lmdb.Environment\n        db where all the staged hash records to be removed are stored.\n    \"\"\"\n    stageHashKeys = HashQuery(stagehashenv).gen_all_hash_keys_db()\n    hashtxn = TxnRegister().begin_writer_txn(hashenv)\n    for hashKey in stageHashKeys:\n        hashtxn.delete(hashKey)\n    TxnRegister().commit_writer_txn(hashenv)\n"
  },
  {
    "path": "src/hangar/records/heads.py",
    "content": "import warnings\nfrom collections import defaultdict\nfrom typing import NamedTuple\n\nimport lmdb\n\nfrom .parsing import (\n    remote_db_key_from_raw_key,\n    remote_db_val_from_raw_val,\n    remote_raw_key_from_db_key,\n    remote_raw_val_from_db_val,\n    repo_branch_head_db_key_from_raw_key,\n    repo_branch_head_db_val_from_raw_val,\n    repo_branch_head_raw_key_from_db_key,\n    repo_branch_head_raw_val_from_db_val,\n    repo_head_db_key,\n    repo_head_db_val_from_raw_val,\n    repo_head_raw_val_from_db_val,\n    repo_writer_lock_db_key,\n    repo_writer_lock_db_val_from_raw_val,\n    repo_writer_lock_force_release_sentinal,\n    repo_writer_lock_sentinal_db_val,\n)\nfrom ..constants import K_REMOTES, K_BRANCH\nfrom ..txnctx import TxnRegister\n\n\nclass BranchHead(NamedTuple):\n    name: str\n    digest: str\n\n\n\"\"\"\nWrite operation enabled lock methods\n------------------------------------\n\nAny operation which wants to interact with the main storage services in a\nwrite-enabled way must acquire a lock to perform the operation. See docstrings\nbelow for more info\n\"\"\"\n\n\ndef writer_lock_held(branchenv):\n    \"\"\"Check to see if the writer lock is free before attempting to acquire it.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment where the writer lock is stored\n\n    Returns\n    -------\n    bool\n        True if the lock is available to take, False if it is currently held.\n    \"\"\"\n    writerLockKey = repo_writer_lock_db_key()\n    writerLockSentinalVal = repo_writer_lock_sentinal_db_val()\n    branchtxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        currentWriterLockVal = branchtxn.get(writerLockKey)\n        if currentWriterLockVal == writerLockSentinalVal:\n            lockAvailable = True\n        elif currentWriterLockVal is None:\n            # on first initialization, writer lock key/val is not set.\n            lockAvailable = True\n        else:\n            lockAvailable = False\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n    return lockAvailable\n\n\ndef acquire_writer_lock(branchenv, writer_uuid):\n    \"\"\"Attempt to acquire the writer lock for a write-enabled checkout object.\n\n    If the writer_uuid matches the recorded value, or the lock is available (or\n    uninitialized entirely in the case of a brand-new repository), the lock will\n    be updated with the requested uuid, and no other write-enabled checkout can\n    be started until it is either released, or a force reset is performed (in\n    the event of a system crash or user error.)\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment where the writer lock is stored\n    writer_uuid : str\n        uuid generated when a write enabled checkout instance starts\n\n    Returns\n    -------\n    bool\n        success of the operation, which will be validated by the writer class as\n        a safety net incase the upstream in the event some user code tries to\n        catch the exception.Z\n\n    Raises\n    ------\n    PermissionError\n        If the lock can not be acquired\n\n    \"\"\"\n    writerLockKey = repo_writer_lock_db_key()\n    writerLockSentinalVal = repo_writer_lock_sentinal_db_val()\n    requestWriterLockVal = repo_writer_lock_db_val_from_raw_val(writer_uuid)\n\n    branchtxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        currentWriterLockVal = branchtxn.get(writerLockKey)\n        if currentWriterLockVal == requestWriterLockVal:\n            success = True\n        elif currentWriterLockVal == writerLockSentinalVal:\n            branchtxn.put(writerLockKey, requestWriterLockVal)\n            success = True\n        elif currentWriterLockVal is None:\n            # on first initialization, writer lock key/val is not set.\n            branchtxn.put(writerLockKey, requestWriterLockVal)\n            success = True\n        else:\n            err = 'Cannot acquire the writer lock. Only one instance of a writer checkout '\\\n                  'can be active at a time. If the last checkout of this repository did '\\\n                  'not properly close, or a crash occurred, the lock must be manually freed '\\\n                  'before another writer can be instantiated.'\n            raise PermissionError(err)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return success\n\n\ndef release_writer_lock(branchenv, writer_uuid):\n    \"\"\"Internal method to release a writer lock held by a specified uuid.\n\n    This method also accept the force-release sentinel by a caller in the\n    writer_uuid field. If the writer_uuid does not match the lock value (and the\n    force sentinel is not used), then a runtime error will be thrown and no-op\n    performed\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment where the lock key/val lives\n    writer_uuid : str\n        uuid of the requested releaser\n\n    Returns\n    -------\n    bool\n        if the operation was successful or now\n\n    Raises\n    ------\n    RuntimeError\n        if the request uuid does not match the lock value.\n    \"\"\"\n    writerLockKey = repo_writer_lock_db_key()\n    forceReleaseSentinal = repo_writer_lock_force_release_sentinal()\n    lockSentinalVal = repo_writer_lock_sentinal_db_val()\n    requestWriterLockVal = repo_writer_lock_db_val_from_raw_val(writer_uuid)\n\n    txn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        currentLockVal = txn.get(writerLockKey)\n        if writer_uuid == forceReleaseSentinal:\n            warnings.warn('Writer lock force successfully force released.', ResourceWarning)\n            txn.put(writerLockKey, lockSentinalVal)\n            success = True\n        elif currentLockVal == requestWriterLockVal:\n            txn.put(writerLockKey, lockSentinalVal)\n            success = True\n        elif currentLockVal == lockSentinalVal:\n            warnings.warn('The lock is already available, no release is necessary.', UserWarning)\n            success = True\n        else:\n            err = f'FATAL ERROR Requested release of writer lock: {currentLockVal} by '\\\n                  f'non-valid requestor: {requestWriterLockVal} -- How did this happen?'\n            raise RuntimeError(err)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return success\n\n\n\"\"\"\nMethods to interact with the branch head records\n------------------------------------------------\n\n.. todo::\n   Need a delete branch operation.\n\"\"\"\n\n# ---------------- branch creation and deletion operations ------------------------------\n\n\ndef create_branch(branchenv, name, base_commit) -> BranchHead:\n    \"\"\"Internal operations used to create a branch.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment of the branch db\n    name : str\n        Name of the branch to create, if a branch with this name exists no\n        operation  will occur and a `ValueError` will be thrown.\n    base_commit : str\n        The commit to start this branch from.\n\n    Returns\n    -------\n    BranchHead\n        NamedTuple[str, str] with fields for `name` and `digest` of the branch\n        created (if the operation was successful)\n\n    Raises\n    ------\n    ValueError\n        If the branch already exists, no-op and raise this.\n    RuntimeError\n        If the repository does not have at-least one commit on the `default`\n        (ie. `master`) branch.\n    \"\"\"\n    if base_commit is None:\n        headBranch = get_staging_branch_head(branchenv)\n        base_commit = get_branch_head_commit(branchenv, headBranch)\n        if (headBranch == 'master') and (base_commit == ''):\n            msg = 'At least one commit must be made in the repository on the `default` '\\\n                  '(`master`) branch before new branches can be created'\n            raise RuntimeError(msg)\n\n    branchHeadKey = repo_branch_head_db_key_from_raw_key(name)\n    branchHeadVal = repo_branch_head_db_val_from_raw_val(base_commit)\n\n    branchtxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        success = branchtxn.put(branchHeadKey, branchHeadVal, overwrite=False)\n        if success is False:\n            err = f'A branch with the name {name} already exists, please specify'\\\n                  f'a different name or delete the branch.'\n            raise ValueError(err)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return BranchHead(name=name, digest=base_commit)\n\n\ndef remove_branch(branchenv: lmdb.Environment,\n                  refenv: lmdb.Environment,\n                  name: str,\n                  *,\n                  force_delete: bool = False) -> BranchHead:\n    \"\"\"Remove a branch head pointer after verifying validity and safety\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db containing the branch head specs\n    refenv : lmdb.Environment\n        db containing the commit refs\n    name : str\n        name of the branch which should be deleted.\n    force_delete : bool, optional\n        If True, remove the branch pointer even if the changes are un-merged in\n        other branch histories. by default False\n\n    Returns\n    -------\n    BranchHead\n        NamedTuple[str, str] with fields for `name` and `digest` of the branch\n        pointer deleted.\n\n    Raises\n    ------\n    ValueError\n        If a branch with the provided name does not exist locally\n    PermissionError\n        If removal of the branch would result in a repository with zero local\n        branches.\n    PermissionError\n        If a write enabled checkout is holding the writer-lock at time of this\n        call.\n    PermissionError\n        If the branch to be removed was the last used in a write-enabled\n        checkout, and whose contents form the base of the staging area.\n    RuntimeError\n        If the branch has not been fully merged into other branch histories,\n        and ``force_delete`` option is not ``True``.\n    \"\"\"\n    from .commiting import get_commit_ancestors_graph\n\n    all_branches = get_branch_names(branchenv)\n    alive_branches = [x for x in all_branches if '/' not in x]  # exclude remotes\n    if name not in alive_branches:\n        raise ValueError(f'Branch: {name} does not exist')\n\n    alive_branches.remove(name)\n    if len(alive_branches) == 0:\n        msg = f'Not allowed to remove all branches from a repository! '\\\n              f'Operation aborted without completing removal of branch: {name}'\n        raise PermissionError(msg)\n\n    if writer_lock_held(branchenv) is False:\n        msg = f'Cannot remove branch when a `write-enabled` checkout is active. '\\\n              f're-run after committing/closing the writer.'\n        raise PermissionError(msg)\n\n    staging_branch = get_staging_branch_head(branchenv)\n    if staging_branch == name:\n        msg = f'Branch: {name} cannot be deleted while acting as the base for '\\\n              f'contents of the staging area. re-run after checking out a '\\\n              f'different branch in `write` mode.'\n        raise PermissionError(msg)\n\n    HEAD = get_branch_head_commit(branchenv, name)\n    if not force_delete:\n        for branch in alive_branches:\n            b_head = get_branch_head_commit(branchenv, branch)\n            b_ancestors = get_commit_ancestors_graph(refenv, starting_commit=b_head)\n            if HEAD in b_ancestors:\n                break\n        else:  # N.B. for-else conditional (ie. \"no break\")\n            msg = f'The branch {name} is not fully merged. If you are sure '\\\n                  f'you want to delete it, re-run with force-remove parameter set'\n            raise RuntimeError(msg)\n\n    branchtxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        branchHeadKey = repo_branch_head_db_key_from_raw_key(name)\n        branchtxn.delete(branchHeadKey)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return BranchHead(name=name, digest=HEAD)\n\n\n# ------------- set and get with staging area HEAD branch name --------------------------\n\n\ndef get_staging_branch_head(branchenv):\n    \"\"\"Get the name of the current staging area HEAD branch\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment for the branch references\n\n    Returns\n    -------\n    str\n        name of the staging HEAD branch\n    \"\"\"\n    headKey = repo_head_db_key()\n    txn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        headBranchVal = txn.get(headKey)\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n    headBranch = repo_head_raw_val_from_db_val(headBranchVal)\n    return headBranch\n\n\ndef set_staging_branch_head(branchenv, branch_name):\n    \"\"\"Set the writer HEAD to a branch name. Does not modify staging area contents.\n\n    A writer-checkout must specify a branch name to use as it's ancestor. We do\n    not allow a writer (or staging area) to exist in a \"Detached HEAD\" state. In\n    order to make modifications starting from a specific commit, the user must\n    create a branch with that commit hash as the specified \"base\".\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment of the branch db.\n    branch_name : str\n        name of the branch to checkout.\n\n    Returns\n    -------\n    bool\n        if the operation was successful.\n\n    Raises\n    ------\n    ValueError\n        If the specified branch name does not exist.\n    \"\"\"\n    headKey = repo_head_db_key()\n    requestedHeadVal = repo_head_db_val_from_raw_val(branch_name)\n    requestedBranchKey = repo_branch_head_db_key_from_raw_key(branch_name)\n\n    branchtxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        branchNameExists = branchtxn.get(requestedBranchKey, default=False)\n        if branchNameExists is False:\n            err = f'No branch with the name: {branch_name} exists, no-op performed'\n            raise ValueError(err)\n        else:\n            branchtxn.put(headKey, requestedHeadVal)\n            success = True\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return success\n\n\n# ------------- get and set a named branch HEAD commit hash --------------------------===\n\n\ndef get_branch_head_commit(branchenv, branch_name):\n    \"\"\"Find the commit hash which corresponds to the HEAD of a particular branch.\n\n    Parameters\n    ----------\n    branchenv: lmdb.Environment\n        lmdb environment for the branch spec\n    branch_name: str\n        name of the branch to find the head commit hash for\n\n    Returns\n    -------\n    str\n        the commit hash of the branch head\n\n    Raises\n    ------\n    ValueError\n        if `branch_name` does not exist in the repository\n    \"\"\"\n    requestedBranchKey = repo_branch_head_db_key_from_raw_key(branch_name)\n    branchtxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        branchNameVal = branchtxn.get(requestedBranchKey, default=False)\n        if branchNameVal is False:\n            err = f'branch with name: {branch_name} does not exist. cannot get head.'\n            raise ValueError(err)\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n\n    commit_hash = repo_branch_head_raw_val_from_db_val(branchNameVal)\n    return commit_hash\n\n\ndef set_branch_head_commit(branchenv, branch_name, commit_hash):\n    \"\"\"Update an existing branch HEAD to point to a new commit hash.\n\n    Does not update stage or refenv contents. If the current HEAD of the branch\n    == the new commit hash, no operation will occur and an exception will be\n    thrown.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment where the branch records are kept\n    branch_name : string\n        Name of the branch to update the HEAD commit of\n    commit_hash : string\n        Commit hash to update the branch HEAD to point to.\n\n    Returns\n    -------\n    string\n        Commit hash of the new branch head if the operation was successful.\n\n    Raises\n    ------\n    ValueError\n        If the current HEAD is the same as the new commit hash.\n    \"\"\"\n    currentHeadCommit = get_branch_head_commit(branchenv=branchenv, branch_name=branch_name)\n    if currentHeadCommit == commit_hash:\n        err = f'Current branch: {branch_name} HEAD: {currentHeadCommit} is same as the '\\\n              f'requested updated HEAD: {commit_hash}, no-op performed'\n        raise ValueError(err)\n\n    branchtxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        branchHeadKey = repo_branch_head_db_key_from_raw_key(branch_name)\n        branchHeadVal = repo_branch_head_db_val_from_raw_val(commit_hash)\n        branchtxn.put(branchHeadKey, branchHeadVal)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return commit_hash\n\n\ndef get_branch_names(branchenv):\n    \"\"\"get a list of all branches in the repository.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment storing the branch records.\n\n    Returns\n    -------\n    list of str\n        list of branch names active in the repository.\n    \"\"\"\n    branchStartKey = K_BRANCH.encode()  # TODO: This is odd, why??\n    branchNames = []\n    branchTxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        with branchTxn.cursor() as cursor:\n            cursor.first()\n            branchRangeExists = cursor.set_range(branchStartKey)\n            while branchRangeExists:\n                branchKey = cursor.key()\n                if branchKey.startswith(branchStartKey):\n                    name = repo_branch_head_raw_key_from_db_key(branchKey)\n                    branchNames.append(name)\n                    branchRangeExists = cursor.next()\n                else:\n                    branchRangeExists = False\n        cursor.close()\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n\n    return branchNames\n\n\ndef commit_hash_to_branch_name_map(branchenv: lmdb.Environment) -> dict:\n    \"\"\"Determine branch names which map to commit hashs\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the branch references are stored\n\n    Returns\n    -------\n    dict\n        keys are commit hash strings, values are list of branch names (strings)\n        whose HEAD are at the key commit\n    \"\"\"\n    outMap = defaultdict(list)\n    branchNames = get_branch_names(branchenv=branchenv)\n    for branchName in branchNames:\n        branchHEAD = get_branch_head_commit(branchenv=branchenv, branch_name=branchName)\n        outMap[branchHEAD].append(branchName)\n\n    return outMap\n\n\n# ----------------------------- Remotes ---------------------------------------\n\n\ndef add_remote(branchenv: lmdb.Environment, name: str, address: str) -> bool:\n    \"\"\"add a remote server reference to the repository.\n\n    This method does not check that the remote is actually accessible, rather it\n    just records the reference. If a remote with the same name already exists,\n    no change will occur.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the branch (and remote) references are stored.\n    name : str\n        name of the remote to add the address for\n    address : str\n        IP:PORT where the remote server can be accessed\n\n    Returns\n    -------\n    bool\n        True if the new reference was saved, False if not.\n    \"\"\"\n    dbKey = remote_db_key_from_raw_key(name)\n    dbVal = remote_db_val_from_raw_val(address)\n\n    branchTxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        succ = branchTxn.put(dbKey, dbVal, overwrite=False)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    return succ\n\n\ndef get_remote_address(branchenv: lmdb.Environment, name: str) -> str:\n    \"\"\"Retrieve the IO:PORT of the remote server for a given name\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the branch (and remote) references are stored\n    name : str\n        name of the remote to fetch\n\n    Raises\n    ------\n    KeyError\n        if a remote with the provided name does not exist\n\n    Returns\n    -------\n    str\n        IP:PORT of the recorded remote server.\n    \"\"\"\n    dbKey = remote_db_key_from_raw_key(name)\n    branchTxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        dbVal = branchTxn.get(dbKey, default=False)\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n\n    if dbVal is False:\n        msg = f'No remote with the name: {name} exists in the repo.'\n        raise KeyError(msg)\n    else:\n        remote_address = remote_raw_val_from_db_val(dbVal)\n        return remote_address\n\n\ndef remove_remote(branchenv: lmdb.Environment, name: str) -> str:\n    \"\"\"remove a remote reference with the provided name.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the branch (and remote) records are stored.\n    name : str\n        name of the remote to remove from the repo\n\n    Raises\n    ------\n    ValueError\n        if a remote with the provided name does not exist\n\n    Returns\n    -------\n    str\n        IP:PORT of the remote with provided name (which was removed)\n    \"\"\"\n    dbKey = remote_db_key_from_raw_key(name)\n    branchTxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        dbVal = branchTxn.pop(dbKey)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n\n    if dbVal is None:\n        msg = f'No remote with the name: {name} exists in the repo.'\n        raise ValueError(msg)\n\n    remote_address = remote_raw_val_from_db_val(dbVal)\n    return remote_address\n\n\ndef get_remote_names(branchenv):\n    \"\"\"get a list of all remotes in the repository.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        lmdb environment storing the branch records.\n\n    Returns\n    -------\n    list of str\n        list of remote names active in the repository.\n    \"\"\"\n    remoteStartKey = K_REMOTES.encode()\n    remoteNames = []\n    branchTxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        with branchTxn.cursor() as cursor:\n            cursor.first()\n            remoteRangeExists = cursor.set_range(remoteStartKey)\n            while remoteRangeExists:\n                remoteKey = cursor.key()\n                if remoteKey.startswith(remoteStartKey):\n                    name = remote_raw_key_from_db_key(remoteKey)\n                    remoteNames.append(name)\n                    remoteRangeExists = cursor.next()\n                else:\n                    remoteRangeExists = False\n        cursor.close()\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n\n    return remoteNames\n"
  },
  {
    "path": "src/hangar/records/parsing.py",
    "content": "import json\nfrom hashlib import blake2b\nfrom itertools import cycle\nfrom random import randint\nfrom time import perf_counter, sleep\nfrom typing import Union, NamedTuple, Tuple, Iterable\n\nimport blosc\n\nfrom ..constants import (\n    CMT_DIGEST_JOIN_KEY,\n    CMT_KV_JOIN_KEY,\n    CMT_REC_JOIN_KEY,\n    K_BRANCH,\n    K_HEAD,\n    K_REMOTES,\n    K_VERSION,\n    K_WLOCK,\n    SEP_CMT,\n    SEP_KEY,\n    WLOCK_SENTINAL,\n)\nfrom .._version import parse as version_parse\nfrom .._version import Version\n\n\ncycle_list = (str(c).rjust(4, '0') for c in range(9_999))\nNAME_CYCLER = cycle(cycle_list)\nRANDOM_NAME_SEED = str(randint(0, 999_999_999)).rjust(0, '0')\nperf_counter()  # call to init monotonic start point\n\n\ndef generate_sample_name() -> str:\n    ncycle = next(NAME_CYCLER)\n    if ncycle == '0000':\n        sleep(0.001)\n\n    sec, subsec = str(perf_counter()).split('.')\n    name = f'{RANDOM_NAME_SEED}{sec.rjust(6, \"0\")}{subsec.ljust(9, \"0\")}{ncycle}'\n    return name\n\n\n\"\"\"\nParsing functions used to deal with repository state The parsers defined in this\nsection handle repo/branch records.\n\nMethods working with repository version specifiers\n--------------------------------------------------\n\"\"\"\n\n\ndef repo_version_raw_spec_from_raw_string(v_str: str) -> Version:\n    \"\"\"Convert from user facing string representation to Version object\n    \"\"\"\n    return version_parse(v_str)\n\n\n# ------------------------- db version key is fixed -----------------\n\n\ndef repo_version_db_key() -> bytes:\n    \"\"\"The db formated key which version information can be accessed at\n\n    Returns\n    -------\n    bytes\n        db formatted key to use to get/set the repository software version.\n    \"\"\"\n    return K_VERSION.encode()\n\n\n# ------------------------ raw -> db --------------------------------\n\n\ndef repo_version_db_val_from_raw_val(v_spec: Version) -> bytes:\n    \"\"\"determine repository version db specifier from version spec.\n\n    Parameters\n    ----------\n    v_spec : Version\n        This class abstracts handling of a project’s versions. A Version\n        instance is comparison aware and can be compared and sorted using the\n        standard Python interfaces.\n\n    Returns\n    -------\n    bytes\n        db formatted specification of version\n    \"\"\"\n    return str(v_spec).encode()\n\n\n# ---------------------------- db -> raw ----------------------------\n\n\ndef repo_version_raw_val_from_db_val(db_val: bytes) -> Version:\n    \"\"\"determine software version of hangar repository is written for.\n\n    Parameters\n    ----------\n    db_val : bytes\n        db formatted specification of version string\n\n    Returns\n    -------\n    Version\n        This class abstracts handling of a project’s versions. A Version\n        instance is comparison aware and can be compared and sorted using the\n        standard Python interfaces.\n    \"\"\"\n    db_str = db_val.decode()\n    return version_parse(db_str)\n\n\n\"\"\"\nMethods working with writer HEAD branch name\n--------------------------------------------\n\"\"\"\n\n# --------------------- db HEAD key is fixed ------------------------\n\n\ndef repo_head_db_key() -> bytes:\n    \"\"\"db_key of the head staging branch name.\n\n    Returns\n    -------\n    bytestring\n        lmdb key to query while looking up the head staging branch name\n    \"\"\"\n    return K_HEAD.encode()\n\n\n# --------------------- raw -> db -----------------------------------\n\n\ndef repo_head_db_val_from_raw_val(branch_name: str) -> bytes:\n    return f'{K_BRANCH}{branch_name}'.encode()\n\n\n# --------------------- db -> raw -----------------------------------\n\ndef repo_head_raw_val_from_db_val(db_val: bytes) -> str:\n    return db_val.decode()[len(K_BRANCH):]\n\n\n\"\"\"\nMethods working with branch names / head commit values\n------------------------------------------------------\n\"\"\"\n\n# ---------------------- raw -> db --------------------------------\n\n\ndef repo_branch_head_db_key_from_raw_key(branch_name: str) -> bytes:\n    return f'{K_BRANCH}{branch_name}'.encode()\n\n\ndef repo_branch_head_db_val_from_raw_val(commit_hash: str) -> bytes:\n    return f'{commit_hash}'.encode()\n\n\n# ---------------- db -> raw -----------------------------------\n\n\ndef repo_branch_head_raw_key_from_db_key(db_key: bytes) -> str:\n    return db_key.decode()[len(K_BRANCH):]\n\n\ndef repo_branch_head_raw_val_from_db_val(db_val: bytes) -> str:\n    try:\n        commit_hash = db_val.decode()\n    except AttributeError:\n        commit_hash = ''\n    return commit_hash\n\n\n\"\"\"\nMethods working with writer lock key/values\n-------------------------------------------\n\"\"\"\n\n# ------------------- db key for lock is fixed -------------------\n\n\ndef repo_writer_lock_db_key() -> bytes:\n    return K_WLOCK.encode()\n\n\ndef repo_writer_lock_sentinal_db_val() -> bytes:\n    return WLOCK_SENTINAL.encode()\n\n\ndef repo_writer_lock_force_release_sentinal() -> str:\n    return 'FORCE_RELEASE'\n\n\n# ------------------------- raw -> db ------------------------------\n\n\ndef repo_writer_lock_db_val_from_raw_val(lock_uuid: str) -> bytes:\n    return f'{lock_uuid}'.encode()\n\n\n# -------------------------- db -> raw ------------------------------\n\n\ndef repo_writer_lock_raw_val_from_db_val(db_val: bytes) -> str:\n    return db_val.decode()\n\n\n# -------------------- Remote Work --------------------------------------------\n\n\ndef remote_db_key_from_raw_key(remote_name: str) -> bytes:\n    \"\"\"Get the remote db key val for a remote name\n\n    Parameters\n    ----------\n    remote_name : str\n        name of the remote location\n\n    Returns\n    -------\n    bytes\n        db key allowing access to address value at the name of the remote\n    \"\"\"\n    return f'{K_REMOTES}{remote_name}'.encode()\n\n\ndef remote_raw_key_from_db_key(db_key: bytes, *, _SPLT=len(K_REMOTES)) -> str:\n    \"\"\"Get the remote name from a remote db key\n\n    Parameters\n    ----------\n    db_key : bytes\n        db key of the remote\n\n    Returns\n    -------\n    str\n        name of the remote\n    \"\"\"\n    return db_key.decode()[_SPLT:]\n\n\ndef remote_db_val_from_raw_val(grpc_address: str) -> bytes:\n    \"\"\"Format a remote db value from it's grpc address string\n\n    Parameters\n    ----------\n    grpc_address : str\n        IP:PORT where the grpc server can be accessed\n\n    Returns\n    -------\n    bytes\n        formated representation of the grpc address suitable for storage in lmdb.\n    \"\"\"\n    return grpc_address.encode()\n\n\ndef remote_raw_val_from_db_val(db_val: bytes) -> str:\n    \"\"\"Retrieve the address where a grpc server is running from a remote db value\n\n\n    Parameters\n    ----------\n    db_val : bytes\n        db value assigned to the desired remote name\n\n    Returns\n    -------\n    str\n        IP:PORT where the grpc server can be accessed.\n    \"\"\"\n    return db_val.decode()\n\n\n\"\"\"\nCommit Parsing Methods\n-----------------------\n\nThe parsers defined in this section handle commit (ref) records\n\"\"\"\n\n\nclass CommitAncestorSpec(NamedTuple):\n    is_merge_commit: bool\n    master_ancestor: str\n    dev_ancestor: str\n\n\nclass CommitUserSpec(NamedTuple):\n    commit_time: float\n    commit_message: str\n    commit_user: str\n    commit_email: str\n\n\nclass DigestAndUserSpec(NamedTuple):\n    digest: str\n    user_spec: CommitUserSpec\n\n\nclass DigestAndAncestorSpec(NamedTuple):\n    digest: str\n    ancestor_spec: CommitAncestorSpec\n\n\nclass DigestAndBytes(NamedTuple):\n    digest: str\n    raw: bytes\n\n\nclass DigestAndDbRefs(NamedTuple):\n    digest: str\n    db_kvs: Union[Tuple, Tuple[Tuple[bytes, bytes]]]\n\n\ndef _hash_func(recs: bytes) -> str:\n    \"\"\"hash a tuple of db formatted k, v pairs.\n\n    Parameters\n    ----------\n    recs : bytes\n        tuple to calculate the joined digest of\n\n    Returns\n    -------\n    str\n        hexdigest of the joined tuple data\n    \"\"\"\n    return blake2b(recs, digest_size=20).hexdigest()\n\n\ndef cmt_final_digest(parent_digest: str, spec_digest: str, refs_digest: str,\n                     *, tcode: str = 'a') -> str:\n    \"\"\"Determine digest of commit based on digests of its parent, specs, and refs.\n\n    Parameters\n    ----------\n    parent_digest : str\n        digest of the parent value\n    spec_digest : str\n        digest of the user spec value\n    refs_digest : str\n        digest of the data record values\n    tcode : str, optional, kwarg-only\n        hash calculation type code. Included to allow future updates to change\n        hashing algorithm, kwarg-only, by default '0'\n\n    Returns\n    -------\n    str\n        digest of the commit with typecode prepended by '{tcode}='.\n    \"\"\"\n    if tcode == 'a':\n        sorted_digests = sorted([parent_digest, spec_digest, refs_digest])\n        joined_bytes = CMT_DIGEST_JOIN_KEY.join(sorted_digests).encode()\n        rawDigest = _hash_func(joined_bytes)\n        digest = f'a={rawDigest}'\n    else:\n        raise ValueError(\n            f'Invalid commit reference type code {tcode}. If encountered during '\n            f'normal operation, please report to hangar development team.')\n    return digest\n\n\n\"\"\"\nCommit Parent (ancestor) Lookup methods\n---------------------------------------\n\"\"\"\n\n# ------------------------- raw -> db -----------------------------------------\n\n\ndef commit_parent_db_key_from_raw_key(commit_hash: str) -> bytes:\n    return f'{commit_hash}'.encode()\n\n\ndef commit_parent_db_val_from_raw_val(master_ancestor: str,\n                                      dev_ancestor: str = '',\n                                      is_merge_commit: bool = False) -> DigestAndBytes:\n    if is_merge_commit:\n        str_val = f'{master_ancestor}{SEP_CMT}{dev_ancestor}'\n    else:\n        str_val = f'{master_ancestor}'\n    db_val = str_val.encode()\n    digest = _hash_func(db_val)\n    return DigestAndBytes(digest=digest, raw=db_val)\n\n\n# ------------------------------- db -> raw -----------------------------------\n\n\ndef commit_parent_raw_key_from_db_key(db_key: bytes) -> str:\n    return db_key.decode()\n\n\ndef commit_parent_raw_val_from_db_val(db_val: bytes) -> DigestAndAncestorSpec:\n    \"\"\"Parse the value of a commit's parent value to find it's ancestors\n\n    Parameters\n    ----------\n    db_val : bytes\n        Lmdb value of the commit parent field.\n\n    Returns\n    -------\n    DigestAndAncestorSpec\n        `digest` of data writen to disk and `ancestor_spec`, Namedtuple\n        containing fields for `is_merge_commit`, `master_ancestor`, and\n        `dev_ancestor`\n    \"\"\"\n    parentValDigest = _hash_func(db_val)\n\n    commit_str = db_val.decode()\n    commit_ancestors = commit_str.split(SEP_CMT)\n    if len(commit_ancestors) == 1:\n        is_merge_commit = False\n        master_ancestor = commit_ancestors[0]\n        dev_ancestor = ''\n    else:\n        is_merge_commit = True\n        master_ancestor = commit_ancestors[0]\n        dev_ancestor = commit_ancestors[1]\n\n    ancestorSpec = CommitAncestorSpec(is_merge_commit, master_ancestor, dev_ancestor)\n    return DigestAndAncestorSpec(digest=parentValDigest, ancestor_spec=ancestorSpec)\n\n\n\"\"\"\nCommit reference key and values.\n--------------------------------\n\"\"\"\n\n\ndef commit_ref_db_key_from_raw_key(commit_hash: str) -> bytes:\n    return f'{commit_hash}{SEP_KEY}ref'.encode()\n\n\ndef _commit_ref_joined_kv_digest(joined_db_kvs: Iterable[bytes]) -> str:\n    \"\"\"reproducibly calculate digest from iterable of joined record k/v pairs.\n\n    First calculate the digest of each element in the input iterable. As these\n    elements contain the record type (meta key, column name, sample key) as\n    well as the data hash digest, any modification of any reference record will\n    result in a different digest for that element. Then join all elements into\n    single serialized bytestring.\n\n    The output of this method is the hash digest of the serialized bytestring.\n\n    Parameters\n    ----------\n    joined_db_kvs : Iterable[bytes]\n        list or tuple of bytes where each element is the joining of kv pairs\n        from the full commit references\n\n    Returns\n    -------\n    str\n        calculated digest of the commit ref record component\n    \"\"\"\n    kv_digests = map(_hash_func, joined_db_kvs)\n    joined_digests = CMT_DIGEST_JOIN_KEY.join(kv_digests).encode()\n    res = _hash_func(joined_digests)\n    return res\n\n\ndef commit_ref_db_val_from_raw_val(db_kvs: Iterable[Tuple[bytes, bytes]]) -> DigestAndBytes:\n    \"\"\"serialize and compress a list of db_key/db_value pairs for commit storage\n\n    Parameters\n    ----------\n    db_kvs : Iterable[Tuple[bytes, bytes]]\n        Iterable collection binary encoded db_key/db_val pairs.\n\n    Returns\n    -------\n    DigestAndBytes\n        `raw` serialized and compressed representation of the object. `digest`\n        digest of the joined db kvs.\n    \"\"\"\n    joined = tuple(map(CMT_KV_JOIN_KEY.join, db_kvs))\n    refDigest = _commit_ref_joined_kv_digest(joined)\n    pck = CMT_REC_JOIN_KEY.join(joined)\n    raw = blosc.compress(pck, typesize=1, clevel=8, shuffle=blosc.NOSHUFFLE, cname='zstd')\n    return DigestAndBytes(digest=refDigest, raw=raw)\n\n\ndef commit_ref_raw_val_from_db_val(commit_db_val: bytes) -> DigestAndDbRefs:\n    \"\"\"Load and decompress a commit ref db_val into python object memory.\n\n    Parameters\n    ----------\n    commit_db_val : bytes\n        Serialized and compressed representation of commit refs.\n\n    Returns\n    -------\n    DigestAndDbRefs\n        `digest` of the unpacked commit refs if desired for verification. `db_kvs`\n        Iterable of binary encoded key/value pairs making up the repo state at the\n        time of that commit. key/value pairs are already in sorted order.\n    \"\"\"\n    uncomp_db_raw = blosc.decompress(commit_db_val)\n    # if a commit has nothing in it (completely empty), the return from query == ()\n    # the stored data is b'' from which the hash is calculated. We manually set these\n    # values as the expected unpacking routine will not work correctly.\n    if uncomp_db_raw == b'':\n        refsDigest = _hash_func(b'')\n        raw_db_kv_list = ()\n    else:\n        raw_joined_kvs_list = uncomp_db_raw.split(CMT_REC_JOIN_KEY)\n        refsDigest = _commit_ref_joined_kv_digest(raw_joined_kvs_list)\n        raw_db_kv_list = tuple(map(tuple, map(bytes.split, raw_joined_kvs_list)))\n\n    return DigestAndDbRefs(digest=refsDigest, db_kvs=raw_db_kv_list)\n\n\n\"\"\"\nCommit spec reference keys and values\n-------------------------------------\n\"\"\"\n\n\ndef commit_spec_db_key_from_raw_key(commit_hash: str) -> bytes:\n    return f'{commit_hash}{SEP_KEY}spec'.encode()\n\n\ndef commit_spec_db_val_from_raw_val(commit_time: float, commit_message: str,\n                                    commit_user: str,\n                                    commit_email: str) -> DigestAndBytes:\n    \"\"\"Serialize a commit specification from user values to a db store value\n\n    Parameters\n    ----------\n    commit_time : float\n        time since unix epoch that the commit was made\n    commit_message : str\n        user specified commit message to attach to the record\n    commit_user : str\n        globally configured user name of the repository committer\n    commit_email : str\n        globally configured user email of the repository committer\n\n    Returns\n    -------\n    DigestAndBytes\n        Two tuple containing ``digest`` and ``raw`` compressed binary encoded\n        serialization of commit spec\n    \"\"\"\n    spec_dict = {\n        'commit_time': commit_time,\n        'commit_message': commit_message,\n        'commit_user': commit_user,\n        'commit_email': commit_email,\n    }\n\n    db_spec_val = json.dumps(spec_dict, separators=(',', ':')).encode()\n    digest = _hash_func(db_spec_val)\n    comp_raw = blosc.compress(\n        db_spec_val, typesize=8, clevel=9, shuffle=blosc.SHUFFLE, cname='zlib')\n    return DigestAndBytes(digest=digest, raw=comp_raw)\n\n\ndef commit_spec_raw_val_from_db_val(db_val: bytes) -> DigestAndUserSpec:\n    uncompressed_db_val = blosc.decompress(db_val)\n    digest = _hash_func(uncompressed_db_val)\n    commit_spec = json.loads(uncompressed_db_val)\n    user_spec = CommitUserSpec(**commit_spec)\n    return DigestAndUserSpec(digest=digest, user_spec=user_spec)\n"
  },
  {
    "path": "src/hangar/records/queries.py",
    "content": "from typing import Dict, Iterable, Iterator, List, Set, Tuple, Union, Sequence\n\nimport lmdb\n\nfrom .column_parsers import (\n    data_record_digest_val_from_db_val,\n    dynamic_layout_data_record_db_start_range_key,\n    dynamic_layout_data_record_from_db_key,\n    dynamic_layout_data_record_db_key_from_names,\n    schema_column_record_from_db_key,\n    schema_db_range_key_from_column_unknown_layout,\n    schema_record_count_start_range_key,\n)\nfrom .recordstructs import (\n    FlatColumnDataKey,\n    NestedColumnDataKey,\n    DataRecordVal,\n)\nfrom ..txnctx import TxnRegister\nfrom ..utils import ilen\nfrom ..mixins import CursorRangeIterator\n\nRawDataTuple = Tuple[Union[FlatColumnDataKey, NestedColumnDataKey], DataRecordVal]\nKeyType = Union[str, int]\n\nclass RecordQuery(CursorRangeIterator):\n\n    def __init__(self, dataenv: lmdb.Environment):\n        self._dataenv = dataenv\n\n# ------------------ traversing the unpacked records ------------------------------------\n\n    def _traverse_all_records(self) -> Iterator[Tuple[bytes, bytes]]:\n        \"\"\"Pull out all records in the database as a tuple of binary encoded\n\n        Returns\n        -------\n        list of tuples of bytes\n            list type stack of tuples with each db_key, db_val pair\n        \"\"\"\n        try:\n            datatxn = TxnRegister().begin_reader_txn(self._dataenv)\n            with datatxn.cursor() as cursor:\n                cursor.first()\n                for db_kv in cursor.iternext(keys=True, values=True):\n                    yield db_kv\n        finally:\n            TxnRegister().abort_reader_txn(self._dataenv)\n\n    def _traverse_column_schema_records(self, keys: bool = True, values: bool = True\n                                        ) -> Iterable[Union[Tuple[bytes], Tuple[bytes, bytes]]]:\n        \"\"\"Internal method to traverse all schema records and pull out k/v db pairs.\n\n        Parameters\n        ----------\n        keys : bool, optional\n            If True, yield metadata keys encountered, if False only values are returned.\n            By default, True.\n        values : bool, optional\n            If True, yield metadata hash values encountered, if False only keys are returned.\n            By default, True.\n\n        Yields\n        ------\n        Iterable[Union[Tuple[bytes], Tuple[bytes, bytes]]]:\n            db schema keys and db_values\n        \"\"\"\n        startSchemaRangeKey = schema_record_count_start_range_key()\n        try:\n            datatxn = TxnRegister().begin_reader_txn(self._dataenv)\n            yield from self.cursor_range_iterator(datatxn, startSchemaRangeKey, keys, values)\n        finally:\n            TxnRegister().abort_reader_txn(self._dataenv)\n\n    def _traverse_column_data_records(self,\n                                      column_name: str,\n                                      *,\n                                      keys: bool = True,\n                                      values: bool = True) -> Iterable[Union[bytes, Tuple[bytes, bytes]]]:\n        \"\"\"Internal method to traverse column data records and get keys/db_values\n\n        The column name is required because this method controls the cursor\n        movement by first setting it's position on the column record count\n        key, reading it's value \"N\" and then sequentially pulling records out of\n        the db for N loops.\n\n        Parameters\n        ----------\n        column_name : str\n            name of the column to traverse records for.\n        keys : bool, optional\n            If True, yield metadata keys encountered, if False only values are returned.\n            By default, True.\n        values : bool, optional\n            If True, yield metadata hash values encountered, if False only keys are returned.\n            By default, True.\n\n        Yields\n        ------\n        Iterable[Union[bytes, Tuple[bytes, bytes]]]:\n            dict of db_key/db_values for each record traversed\n\n        Raises\n        ------\n        KeyError\n            if no column exists with the requested name.\n        \"\"\"\n        try:\n            datatxn = TxnRegister().begin_reader_txn(self._dataenv)\n            schemaColumnRangeKey = schema_db_range_key_from_column_unknown_layout(column_name)\n            with datatxn.cursor() as cur:\n                if not cur.set_range(schemaColumnRangeKey):\n                    raise KeyError(f'Traversal of commit references failed. '\n                                   f'No column named `{column_name}` exists.')\n                schemaColumnKey = cur.key()\n            column_record = schema_column_record_from_db_key(schemaColumnKey)\n            startRangeKey = dynamic_layout_data_record_db_start_range_key(column_record)\n            yield from self.cursor_range_iterator(datatxn, startRangeKey, keys, values)\n        finally:\n            TxnRegister().abort_reader_txn(self._dataenv)\n\n# ------------------------- process columns --------------------------------------------\n\n    def column_names(self) -> List[str]:\n        \"\"\"Find all named columns in the checkout\n\n        Returns\n        -------\n        List[str]\n            list of all column names\n        \"\"\"\n        recs = self._traverse_column_schema_records(keys=True, values=False)\n        column_recs = map(schema_column_record_from_db_key, recs)\n        return [x.column for x in column_recs]\n\n    def column_count(self) -> int:\n        \"\"\"Return number of columns/schemas in the commit\n\n        Returns\n        -------\n        int\n            len of columns\n        \"\"\"\n        return ilen(self._traverse_column_schema_records(keys=True, values=False))\n\n    def data_hashes(self) -> List[str]:\n        \"\"\"Find all data hashes contained within all columns\n\n        Note: this method does not deduplicate values\n\n        Returns\n        -------\n        List[str]\n            all hash values for all data pieces in the commit\n        \"\"\"\n        all_hashes = []\n        columns = self.column_names()\n        for column in columns:\n            recs = self._traverse_column_data_records(column, keys=False, values=True)\n            data_rec = map(data_record_digest_val_from_db_val, recs)\n            data_val_rec = [x.digest for x in data_rec]\n            all_hashes.extend(data_val_rec)\n        return all_hashes\n\n# ------------------------ process column data records ----------------------\n\n    def column_data_records(self, column_name: str) -> Iterable[RawDataTuple]:\n        \"\"\"Returns the raw data record key and record values for a specific column.\n\n        Parameters\n        ----------\n        column_name : str\n            name of the column to pull records for\n\n        Yields\n        ------\n        Iterable[RawDataTuple]\n            generator of key and value data record specs\n        \"\"\"\n        for data_key, data_val in self._traverse_column_data_records(column_name):\n            data_rec_key = dynamic_layout_data_record_from_db_key(data_key)\n            data_rec_val = data_record_digest_val_from_db_val(data_val)\n            yield (data_rec_key, data_rec_val)\n\n    def column_data_hashes(self, column_name: str) -> Set[DataRecordVal]:\n        \"\"\"Find all data hashes contained within a particular column\n\n        Note: this method does not remove any duplicates which may be present,\n        if dedup is required, process it downstream\n\n        Parameters\n        ----------\n        column_name : str\n            name of the column to find the hashes contained in\n\n        Returns\n        -------\n        Set[DataRecordVal]\n            all hash values for all data pieces in the column\n        \"\"\"\n        recs = self._traverse_column_data_records(column_name, keys=False, values=True)\n        return set(map(data_record_digest_val_from_db_val, recs))\n\n    def column_data_count(self, column_name: str) -> int:\n        \"\"\"Return the number of samples in an column with the provided name\n\n        Parameters\n        ----------\n        column_name : str\n            name of the column to query\n\n        Returns\n        -------\n        int\n            number of samples in the column with given name\n        \"\"\"\n        recs = self._traverse_column_data_records(column_name, keys=True, values=False)\n        return ilen(recs)  # regular len method not defined for generator iterable\n\n# ------------------------- process schema ----------------------------------------------\n\n    def schema_specs(self):\n        \"\"\"Return the all schema specs defined by all columns.\n\n        Returns\n        -------\n        dict\n            dict of column spec key and digest for each column schema\n        \"\"\"\n        recs = {}\n        for schema_key, schema_val in self._traverse_column_schema_records():\n            schema_record = schema_column_record_from_db_key(schema_key)\n            schema_val = data_record_digest_val_from_db_val(schema_val)\n            recs[schema_record] = schema_val\n        return recs\n\n    def schema_hashes(self) -> List[str]:\n        \"\"\"Find all schema hashes inside of a commit\n\n        Returns\n        -------\n        List[str]\n            list of all schema hash digests in the commit\n        \"\"\"\n        all_schema_hashes = []\n        for schema_rec_val in self._traverse_column_schema_records(keys=False, values=True):\n            digest = data_record_digest_val_from_db_val(schema_rec_val)\n            all_schema_hashes.append(digest.digest)\n        return all_schema_hashes\n\n    def data_hash_to_schema_hash(self) -> Dict[str, str]:\n        \"\"\"For all hashes in the commit, map sample hash to schema hash.\n\n        Returns\n        -------\n        Dict[str, str]\n            mapping of sample hash to aset_schema_hash\n        \"\"\"\n        odict = {}\n        aset_names = self.column_names()\n        aset_schema_specs = self.schema_specs()\n        col_names_schema_digests = {k.column: v.digest for k, v in aset_schema_specs.items()}\n        for asetn in aset_names:\n            aset_hash_vals = self.column_data_hashes(asetn)\n            aset_schema_hash = col_names_schema_digests[asetn]\n            for aset_hash_val in aset_hash_vals:\n                odict[aset_hash_val.digest] = aset_schema_hash\n        return odict\n\n    def column_schema_layout(self, column: str) -> str:\n        \"\"\"Return the column schema layout for a column name\n\n        Parameters\n        ----------\n        column: str\n            name of the column to query\n\n        Returns\n        -------\n        str\n            One of the valid colum layout types (ie. `flat`, `nested`, etc.)\n        \"\"\"\n        for schema_key in self._traverse_column_schema_records(values=False):\n            schema_record = schema_column_record_from_db_key(schema_key)\n            if schema_record.column == column:\n                return schema_record.layout\n"
  },
  {
    "path": "src/hangar/records/recordstructs.pxd",
    "content": "# header file for record containers\n\ncdef class CompatibleData:\n    cdef readonly bint compatible\n    cdef readonly str reason\n\n\ncdef class ColumnSchemaKey:\n    cdef readonly str column\n    cdef readonly str layout\n\n\ncdef class FlatColumnDataKey:\n    cdef readonly str column\n    cdef str _sample\n    cdef bint _s_int\n\n\ncdef class NestedColumnDataKey:\n    cdef readonly str column\n    cdef str _sample, _subsample\n    cdef bint _s_int, _ss_int\n\n\ncdef class DataRecordVal:\n    cdef readonly str digest\n"
  },
  {
    "path": "src/hangar/records/recordstructs.pyx",
    "content": "\ncdef class CompatibleData:\n    \"\"\"Bool recording if data `compatible` and if False the rejection `reason`.\n    \"\"\"\n\n    def __init__(self, bint compatible, str reason):\n        self.compatible = compatible\n        self.reason = reason\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'compatible={self.compatible}, '\n                f'reason=\"{self.reason}\")')\n\n    def __iter__(self):\n        for attr in ['compatible', 'reason']:\n            yield getattr(self, attr)\n\n    def __eq__(self, other):\n        return (isinstance(other, self.__class__) and\n                self.compatible == other.compatible and\n                self.reason == other.reason)\n\n    def __hash__(self):\n        return hash((self.__class__, self.compatible, self.reason))\n\n\ncdef class ColumnSchemaKey:\n    \"\"\"Record listing `column` name and `layout` type.\n    \"\"\"\n\n    def __init__(self, str column, str layout):\n        self.column = column\n        self.layout = layout\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'column=\"{self.column}\", '\n                f'layout=\"{self.layout}\")')\n\n    def __iter__(self):\n        for attr in ['column', 'layout']:\n            yield getattr(self, attr)\n\n    def __eq__(self, other):\n        return (isinstance(other, self.__class__) and\n                self.column == other.column and\n                self.layout == other.layout)\n\n    def __hash__(self):\n        return hash((self.__class__, self.column, self.layout))\n\n\ncdef class FlatColumnDataKey:\n    \"\"\"Record listing `column` & `sample` name along with `layout` property\n    \"\"\"\n\n    def __init__(self, str column, str sample):\n        self.column = column\n        self._sample = sample\n        self._s_int = True if sample[0] == '#' else False\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'column=\"{self.column}\", '\n                f'sample={f\"{self.sample if self._s_int else repr(self.sample)}\"})')\n\n    def __iter__(self):\n        for attr in ['column', 'sample']:\n            yield getattr(self, attr)\n\n    def __eq__(self, other):\n        return (isinstance(other, self.__class__) and\n                self.column == other.column and\n                self.sample == other.sample)\n\n    def __hash__(self):\n        return hash((self.__class__, self.column, self.sample))\n\n    @property\n    def sample(self):\n        if self._s_int:\n            return int(self._sample[1:])\n        else:\n            return self._sample\n\n    @property\n    def layout(self):\n        return 'flat'\n\n\ncdef class NestedColumnDataKey:\n    \"\"\"Record listing `column`, `sample`, & `subsample` name along with `layout` property\n    \"\"\"\n\n    def __init__(self, str column, str sample, str subsample):\n        self.column = column\n        self._sample = sample\n        self._subsample = subsample\n        self._s_int = True if sample[0] == '#' else False\n        self._ss_int = True if subsample[0] == '#' else False\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'column=\"{self.column}\", '\n                f'sample={f\"{self.sample if self._s_int else repr(self.sample)}\"}, '\n                f'subsample={f\"{self.subsample if self._ss_int else repr(self.subsample)}\"})')\n\n    def __iter__(self):\n        for attr in ['column', 'sample', 'subsample']:\n            yield getattr(self, attr)\n\n    def __eq__(self, other):\n        return (isinstance(other, self.__class__) and\n                self.column == other.column and\n                self.sample == other.sample and\n                self.subsample == other.subsample)\n\n    def __hash__(self):\n        return hash((self.__class__, self.column, self.sample, self.subsample))\n\n    @property\n    def sample(self):\n        if self._s_int:\n            return int(self._sample[1:])\n        else:\n            return self._sample\n\n    @property\n    def subsample(self):\n        if self._ss_int:\n            return int(self._subsample[1:])\n        else:\n            return self._subsample\n\n    @property\n    def layout(self):\n        return 'nested'\n\n\ncdef class DataRecordVal:\n\n    def __init__(self, str digest):\n        self.digest = digest\n\n    def __repr__(self):\n        return (f'{self.__class__.__name__}('\n                f'digest={repr(self.digest)})')\n\n    def __iter__(self):\n        for attr in ['digest']:\n            yield getattr(self, attr)\n\n    def __eq__(self, other):\n        return (isinstance(other, self.__class__)\n                and self.digest == other.digest)\n\n    def __hash__(self):\n        return hash((self.__class__, self.digest))\n"
  },
  {
    "path": "src/hangar/records/summarize.py",
    "content": "from pathlib import Path\nimport time\nfrom io import StringIO\n\nimport lmdb\n\nfrom .commiting import (\n    get_commit_ancestors_graph,\n    get_commit_spec,\n    tmp_cmt_env,\n)\nfrom .heads import (\n    get_staging_branch_head,\n    get_branch_head_commit,\n    commit_hash_to_branch_name_map,\n)\nfrom .queries import RecordQuery\nfrom .hashs import HashQuery\nfrom ..diff import DiffOut, Changes\nfrom ..txnctx import TxnRegister\nfrom ..utils import format_bytes, file_size, folder_size, unique_everseen\nfrom ..diagnostics import graphing\n\n\ndef log(branchenv: lmdb.Environment,\n        refenv: lmdb.Environment,\n        branch: str = None,\n        commit: str = None,\n        *,\n        return_contents: bool = False,\n        show_time: bool = False,\n        show_user: bool = False):\n    \"\"\"Displays a pretty printed commit log graph to the terminal.\n\n    .. note::\n\n        For programatic access, the return_contents value can be set to true\n        which will retrieve relevant commit specifications as dictionary\n        elements.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db storing information on named branch HEADS\n    refenv : lmdb.Environment\n        db storing full commit history refs (compressed).\n    branch : str, optional\n        The name of the branch to start the log process from. (Default value\n        = None)\n    commit : str, optional\n        The commit hash to start the log process from. (Default value = None)\n    return_contents : bool, optional, kwarg only\n        If true, return the commit graph specifications in a dictionary\n        suitable for programatic access/evaluation.\n    show_time : bool, optional, kwarg only\n        If true and return_contents is False, show the time of each commit\n        on the printed log graph\n    show_user : bool, optional, kwarg only\n        If true and return_contents is False, show the committer of each\n        commit on the printed log graph\n    Returns\n    -------\n    Optional[dict]\n        Dict containing the commit ancestor graph, and all specifications.\n    \"\"\"\n    res = list_history(\n        refenv=refenv,\n        branchenv=branchenv,\n        branch_name=branch,\n        commit_hash=commit)\n    branchMap = dict(commit_hash_to_branch_name_map(branchenv=branchenv))\n\n    if return_contents:\n        for digest in list(branchMap.keys()):\n            if digest not in res['order']:\n                del branchMap[digest]\n        res['branch_heads'] = branchMap\n        return res\n    else:\n        g = graphing.Graph()\n        g.show_nodes(dag=res['ancestors'],\n                     spec=res['specs'],\n                     branch=branchMap,\n                     start=res['head'],\n                     order=res['order'],\n                     show_time=show_time,\n                     show_user=show_user)\n\n\ndef list_history(refenv, branchenv, branch_name=None, commit_hash=None):\n    \"\"\"Traverse commit history to specifying ancestor DAG and all ancestor specs.\n\n    Parameters\n    ----------\n    refenv : lmdb.Environment\n        environment containing all repository commit data.\n    branchenv : lmdb.Environment\n        environment containing the current staging head branch and branch head\n        commit hashes\n    branch_name : string, optional\n        if specified, get the history starting at the head commit of this named\n        branch (the default is None, which will use the `commit_hash` arg if\n        available, or staging area head)\n    commit_hash : string, optional\n        if specified, get the history starting at this specific commit,\n        overrides branch name if both are specified (the default is `None`,\n        which will use the branch_name arg if available, or staging area head)\n\n    Returns\n    -------\n    dict\n        dict containing information about the repo history. specifies fields for\n        `head`, `ancestors` (DAG of commit), and `specs` of each commit, also `order`\n        encountered.\n    \"\"\"\n\n    if commit_hash is not None:\n        head_commit = commit_hash\n    elif branch_name is not None:\n        head_commit = get_branch_head_commit(branchenv=branchenv, branch_name=branch_name)\n    else:\n        head_branch = get_staging_branch_head(branchenv)\n        head_commit = get_branch_head_commit(branchenv, head_branch)\n\n    ancestors = get_commit_ancestors_graph(\n        refenv=refenv, starting_commit=head_commit)\n\n    commitSpecs = {}\n    for commit in ancestors.keys():\n        commitSpecs[commit] = dict(get_commit_spec(refenv, commit_hash=commit)._asdict())\n\n    cmtTimeSorter = [(k, v['commit_time']) for k, v in commitSpecs.items()]\n    cmtTimeSorter.sort(key=lambda t: t[1], reverse=True)\n    showparentsOrder = [x[0] for x in cmtTimeSorter]\n\n    res = {\n        'head': head_commit,\n        'ancestors': ancestors,\n        'specs': commitSpecs,\n        'order': showparentsOrder,\n    }\n    return res\n\n\ndef details(env: lmdb.Environment, line_limit=100, line_length=100) -> StringIO:  # pragma: no cover\n    \"\"\"Print the details of an lmdb environment to stdout\n\n    Parameters\n    ----------\n    env : lmdb.Environment\n        environment handle to print records of\n    line_limit : int, optional\n        limit to the amount of record lines printed, by default 100\n    line_length : int, optional\n        limit the amount of text printed per line, by default 100\n\n    Returns\n    -------\n    StringIO\n        buffer containing detail data.\n    \"\"\"\n    buf = StringIO()\n    buf.write('\\n======================\\n')\n    buf.write(f'{Path(env.path()).name}\\n')\n    try:\n        buf.write(f'File Size: {format_bytes(file_size(Path(env.path())))}\\n')\n    except FileNotFoundError:\n        pass\n    buf.write('======================\\n\\n')\n    txn = TxnRegister().begin_reader_txn(env)\n    entries = txn.stat()['entries'] - 10\n    with txn.cursor() as cursor:\n        count, once = 0, False\n        for key, value in cursor:\n            if (count >= line_limit) and (count < entries):\n                count += 1\n                if (once is False) and (count < entries):\n                    once = True\n                    buf.write('...\\n...\\n...\\n')\n                continue\n            else:\n                if len(value) >= line_length:\n                    buf.write(f'{key} long binary\\n')\n                else:\n                    buf.write(f'{key} {value}\\n')\n            count += 1\n    cursor.close()\n    TxnRegister().abort_reader_txn(env)\n    return buf\n\n\ndef summary(env, *, branch='', commit='') -> StringIO:\n    \"\"\"Summary of data set stored in repository.\n\n    Parameters\n    ----------\n    env : :class:`..context.Environments`\n        class which contains all of the lmdb environments pre-initialized for use.\n    commit : str\n        commit hash to query. if left empty, HEAD commit is used (Default value = '')\n    branch : str\n        branch name to query, if left empty, HEAD will be used. (Default value = '')\n\n    Returns\n    -------\n    StringIO:\n        buffer formatting the contents of the commit ref at the queried commit.\n    \"\"\"\n    if commit != '':\n        cmt = commit\n    elif branch != '':\n        cmt = get_branch_head_commit(env.branchenv, branch)\n    else:\n        headBranch = get_staging_branch_head(env.branchenv)\n        cmt = get_branch_head_commit(env.branchenv, headBranch)\n\n    spec = get_commit_spec(env.refenv, cmt)._asdict()\n    if cmt == '':\n        buf = StringIO()\n        buf.write('No commits made')\n        return buf\n\n    def _schema_digest_spec_dict(hashenv, digest):\n        hq = HashQuery(hashenv)\n        res = hq.get_schema_digest_spec(digest)\n        return res\n\n    with tmp_cmt_env(env.refenv, cmt) as cmtrefenv:\n        query = RecordQuery(cmtrefenv)\n\n        nbytes = folder_size(env.repo_path, recurse=True)\n        humanBytes = format_bytes(nbytes)\n        buf = StringIO()\n        buf.write(f'Summary of Contents Contained in Data Repository \\n')\n        buf.write(f' \\n')\n        buf.write(f'================== \\n')\n        buf.write(f'| Repository Info \\n')\n        buf.write(f'|----------------- \\n')\n        buf.write(f'|  Base Directory: {str(env.repo_path.parent)} \\n')\n        buf.write(f'|  Disk Usage: {humanBytes} \\n')\n        buf.write(f' \\n')\n\n        buf.write(f'=================== \\n')\n        buf.write(f'| Commit Details \\n')\n        buf.write(f'------------------- \\n')\n        buf.write(f'|  Commit: {cmt} \\n')\n        buf.write(f'|  Created: {time.asctime(time.gmtime(spec[\"commit_time\"]))} \\n')\n        buf.write(f'|  By: {spec[\"commit_user\"]} \\n')\n        buf.write(f'|  Email: {spec[\"commit_email\"]} \\n')\n        buf.write(f'|  Message: {spec[\"commit_message\"]} \\n')\n        buf.write(f' \\n')\n        buf.write(f'================== \\n')\n        buf.write(f'| DataSets \\n')\n        buf.write(f'|----------------- \\n')\n\n        buf.write(f'|  Number of Named Columns: {query.column_count()} \\n')\n        for asetn, asetnSchema in query.schema_specs().items():\n            buf.write(f'|\\n')\n            buf.write(f'|  * Column Name: {asetn} \\n')\n            buf.write(f'|    Num Data Pieces: {query.column_data_count(asetn.column)} \\n')\n\n            buf.write(f'|    Details: \\n')\n            schema_dict = _schema_digest_spec_dict(env.hashenv, asetnSchema.digest)\n            for k, v in schema_dict.items():\n                buf.write(f'|    - {k}: {v} \\n')\n\n    return buf\n\n\ndef status(hashenv: lmdb.Environment, branch_name: str, diff: DiffOut) -> StringIO:\n    \"\"\"Format human readable string buffer of changes in a staging area\n\n    Parameters\n    ----------\n    hashenv : lmdb.Environment\n        hashenv to pull usefull schema spec info from.\n    branch_name : str\n        Name of the branch the diff is from.\n    diff : DiffOut\n        diff struct tuple returned from standard diff tool.\n\n    Returns\n    -------\n    StringIO\n        Buffer containing human readable printable string of change summary\n    \"\"\"\n    def _schema_digest_spec_dict(digest):\n        hq = HashQuery(hashenv)\n        res = hq.get_schema_digest_spec(digest)\n        return res\n\n    def _diff_info(df: Changes) -> StringIO:\n        \"\"\"Format buffer for each of `ADDED`, `DELETED`, `MUTATED` changes\n        \"\"\"\n        buf = StringIO()\n        buf.write(f'|---------- \\n')\n        buf.write(f'| Schema: {len(df.schema)} \\n')\n        for k, v in df.schema.items():\n            digest = v.digest\n            buf.write(f'|  - \"{k.column}\": \\n')\n            buf.write(f'|       digest=\"{digest}\" \\n')\n            schema_spec = _schema_digest_spec_dict(digest)\n            for schema_key, schema_val in schema_spec.items():\n                buf.write(f'|       {schema_key}: {schema_val} \\n')\n\n        buf.write('|---------- \\n')\n        buf.write(f'| Samples: {len(df.samples)} \\n')\n        unique = unique_everseen(df.samples, lambda x: x.column)\n        for u in unique:\n            un = u.column\n            count = sum((1 for k in df.samples if k.column == un))\n            buf.write(f'|  - \"{un}\": {count} \\n')\n        buf.write(' \\n')\n        return buf\n\n    buf = StringIO()\n    buf.write('============ \\n')\n    buf.write(f'| Branch: {branch_name} \\n')\n    buf.write(' \\n')\n    for changes, changeType in zip(diff, diff.__annotations__.keys()):\n        buf.write('============ \\n')\n        buf.write(f'| {changeType.upper()} \\n')\n        change_buf = _diff_info(changes)\n        buf.write(change_buf.getvalue())\n    return buf\n"
  },
  {
    "path": "src/hangar/records/vcompat.py",
    "content": "from pathlib import Path\n\nimport lmdb\n\nfrom .parsing import (\n    repo_version_db_key,\n    repo_version_db_val_from_raw_val,\n    repo_version_raw_spec_from_raw_string,\n    repo_version_raw_val_from_db_val,\n)\nfrom .._version import Version\nfrom ..constants import LMDB_SETTINGS, LMDB_BRANCH_NAME\nfrom ..txnctx import TxnRegister\nfrom ..utils import pairwise\n\n\ndef set_repository_software_version(branchenv: lmdb.Environment,\n                                    ver_str: str,\n                                    *,\n                                    overwrite: bool = False) -> bool:\n    \"\"\"Write the repository software version to a particular value\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the head, branch, and version specs are stored\n    ver_str : str\n        semantic version style string representing version (ie. \"0.1.0\",\n        \"1.2.1\", etc)\n    overwrite : bool, optional\n        If True, replace current value with new value; If False, do not\n        overwrite if this key exists, by default False\n\n    Returns\n    -------\n    bool\n        True if successful, False otherwise\n    \"\"\"\n    versionKey = repo_version_db_key()\n    ver_spec = repo_version_raw_spec_from_raw_string(v_str=ver_str)\n    versionVal = repo_version_db_val_from_raw_val(v_spec=ver_spec)\n    branchTxn = TxnRegister().begin_writer_txn(branchenv)\n    try:\n        success = branchTxn.put(versionKey, versionVal, overwrite=overwrite)\n    finally:\n        TxnRegister().commit_writer_txn(branchenv)\n    return success\n\n\ndef get_repository_software_version_spec(branchenv: lmdb.Environment) -> Version:\n    \"\"\"Get the repository version specification tuple.\n\n    Parameters\n    ----------\n    branchenv : lmdb.Environment\n        db where the head, branch, and version specs are stored\n\n    Returns\n    -------\n    Version\n        This class abstracts handling of a project’s versions. A Version\n        instance is comparison aware and can be compared and sorted using the\n        standard Python interfaces.\n\n    Raises\n    ------\n    KeyError\n        If no version key is set for the repository\n    \"\"\"\n    versionKey = repo_version_db_key()\n    branchTxn = TxnRegister().begin_reader_txn(branchenv)\n    try:\n        versionVal = branchTxn.get(versionKey, default=False)\n    finally:\n        TxnRegister().abort_reader_txn(branchenv)\n\n    if versionVal is False:\n        raise KeyError('No version string is set for the repository')\n    else:\n        version_val = repo_version_raw_val_from_db_val(versionVal)\n        return version_val\n\n\n\"\"\"\nInitial checking of repository versions\n---------------------------------------\n\"\"\"\n\n\ndef startup_check_repo_version(repo_path: Path) -> Version:\n    \"\"\"Determine repo version without having to have Environments ctx opened.\n\n    Parameters\n    ----------\n    repo_path : Path\n        path to the repository directory on disk\n\n    Returns\n    -------\n    Version\n        This class abstracts handling of a project’s versions. A Version\n        instance is comparison aware and can be compared and sorted using the\n        standard Python interfaces.\n\n    Raises\n    ------\n    RuntimeError\n        If for whatever reason, the branch file does not exist on disk.\n        Execution should not reach this point.\n    \"\"\"\n    brch_fp = repo_path.joinpath(LMDB_BRANCH_NAME)\n    if not brch_fp.is_file():\n        msg = f'Hangar Internal Error, startup_check_repo_version did not find '\\\n              f'brch db at: {brch_fp}. Execution should never reach this point. '\\\n              f'Please report this error to Hangar developers.'\n        raise RuntimeError(msg)\n\n    branchenv = lmdb.open(path=str(brch_fp), readonly=True, create=False, **LMDB_SETTINGS)\n    spec = get_repository_software_version_spec(branchenv=branchenv)\n    branchenv.close()\n    return spec\n\n\nincompatible_changes_after = [\n    Version('0.2.0'),\n    Version('0.3.0'),\n    Version('0.4.0'),\n    Version('0.5.0.dev0'),\n    Version('0.5.0.dev1'),\n]\n\n\ndef is_repo_software_version_compatible(repo_v: Version, curr_v: Version) -> bool:\n    \"\"\"Determine if the repo on disk and the current Hangar versions iscompatible.\n\n    Parameters\n    ----------\n    repo_v : Version\n        repository software writtern version.\n    curr_v : Version\n        currently active software version specification\n\n    Returns\n    -------\n    bool\n        True if compatible, False if not.\n    \"\"\"\n    for start, end in pairwise(incompatible_changes_after):\n        if (repo_v >= start) and (repo_v < end):\n            if (curr_v < start) or (curr_v >= end):\n                return False\n            elif (curr_v >= start) and (curr_v < end):\n                return True\n    if (repo_v >= end) and (curr_v < end):\n        return False\n    return True\n"
  },
  {
    "path": "src/hangar/remote/__init__.py",
    "content": ""
  },
  {
    "path": "src/hangar/remote/chunks.py",
    "content": "import math\nimport struct\nfrom io import BytesIO\nfrom typing import NamedTuple, List, Union, Tuple, Iterable\n\nimport blosc\nimport numpy as np\n\nfrom . import hangar_service_pb2\nfrom ..utils import set_blosc_nthreads\n\nset_blosc_nthreads()\n\n\ndef chunk_bytes(bytesData, *, chunkSize: int = 32_000) -> Iterable[bytes]:\n    \"\"\"Slice a bytestring into subelements and store the data in a list\n\n    Arguments\n    ---------\n        bytesData : bytes\n            bytestring buffer of the array data\n        chunkSize : int, optional, kwarg-only\n            number of bytes which each chunk should be split into.\n\n    Yields\n    ------\n    bytes\n        data split into 32kb chunk sizes.\n    \"\"\"\n    numIters = math.ceil(len(bytesData) / chunkSize)\n    currentStart = 0\n    currentEnd = chunkSize\n    for i in range(numIters):\n        yield bytesData[currentStart:currentEnd]\n        currentStart += chunkSize\n        currentEnd += chunkSize\n\n\ndef clientCommitChunkedIterator(commit: str, parentVal: bytes, specVal: bytes,\n                                refVal: bytes) -> hangar_service_pb2.PushCommitRequest:\n    \"\"\"Generator splitting commit specs into chunks sent from client to server\n\n    Parameters\n    ----------\n    commit : str\n        commit hash which is being sent\n    parentVal : bytes\n        bytes representing the commits immediate parents\n    specVal : bytes\n        bytes representing the commit message/user specifications\n    refVal : bytes\n        bytes containing all records stored in the repository\n\n    Yields\n    ------\n    hangar_service_pb2.PushCommitRequest\n        Chunked generator of the PushCommitRequest protobuf.\n    \"\"\"\n    commit_proto = hangar_service_pb2.CommitRecord(\n        parent=parentVal,\n        spec=specVal)\n    byteSize = len(refVal)\n    chunkIterator = chunk_bytes(refVal)\n    for refChunk in chunkIterator:\n        commit_proto.ref = refChunk\n        request = hangar_service_pb2.PushCommitRequest(\n            commit=commit,\n            total_byte_size=byteSize,\n            record=commit_proto)\n        yield request\n\n\ndef tensorChunkedIterator(buf, uncomp_nbytes, pb2_request,\n                          *,\n                          err=None, chunkSize: int = 32_000):\n\n    compBytes = blosc.compress(\n        buf, clevel=3, cname='blosclz', shuffle=blosc.NOSHUFFLE)\n\n    request = pb2_request(\n        comp_nbytes=len(compBytes),\n        uncomp_nbytes=uncomp_nbytes,\n        error=err)\n\n    chunkIterator = chunk_bytes(compBytes, chunkSize=chunkSize)\n    for dchunk in chunkIterator:\n        request.raw_data = dchunk\n        yield request\n\n\ndef missingHashIterator(commit, hash_bytes, err, pb2_func):\n    comp_bytes = blosc.compress(\n        hash_bytes, cname='zlib', clevel=3, typesize=1, shuffle=blosc.SHUFFLE)\n\n    rpc_method = pb2_func(\n        commit=commit,\n        total_byte_size=len(comp_bytes),\n        error=err)\n\n    chunkIterator = chunk_bytes(comp_bytes)\n    for bchunk in chunkIterator:\n        rpc_method.hashs = bchunk\n        yield rpc_method\n\n\ndef missingHashRequestIterator(commit, hash_bytes, pb2_func):\n    comp_bytes = blosc.compress(\n        hash_bytes, cname='zlib', clevel=3, typesize=1, shuffle=blosc.SHUFFLE)\n\n    rpc_method = pb2_func(\n        commit=commit,\n        total_byte_size=len(comp_bytes))\n\n    chunkIterator = chunk_bytes(comp_bytes)\n    for bchunk in chunkIterator:\n        rpc_method.hashs = bchunk\n        yield rpc_method\n\n\n# ------------------------ serialization formats -------------------------\n\n\nclass DataIdent(NamedTuple):\n    digest: str\n    schema: str\n\n\nclass DataRecord(NamedTuple):\n    data: Union[np.ndarray, str, bytes]\n    digest: str\n    schema: str\n\n\ndef _serialize_arr(arr: np.ndarray) -> bytes:\n    \"\"\"\n    dtype_num ndim dim1_size dim2_size ... dimN_size array_bytes\n    \"\"\"\n    buf = BytesIO()\n    np.save(buf, arr, allow_pickle=False, fix_imports=False)\n    raw = buf.getvalue()\n    return raw\n\n\ndef _deserialize_arr(raw: bytes) -> np.ndarray:\n    buf = BytesIO(initial_bytes=raw)\n    buf.seek(0)\n    arr = np.load(buf, allow_pickle=False, fix_imports=False)\n    return arr\n\n\ndef _serialize_str(data: str) -> bytes:\n    \"\"\"\n    data_bytes\n    \"\"\"\n    return data.encode()\n\n\ndef _deserialize_str(raw: bytes) -> str:\n    return raw.decode()\n\n\ndef _serialize_bytes(data: bytes) -> bytes:\n    \"\"\"\n    data_bytes\n    \"\"\"\n    return data\n\n\ndef _deserialize_bytes(data: bytes) -> bytes:\n    return data\n\n\ndef serialize_ident(digest: str, schema: str) -> bytes:\n    \"\"\"\n    len_digest len_schema digest_str schema_str\n    \"\"\"\n    raw = struct.pack(\n        f'<hh{len(digest)}s{len(schema)}s',\n        len(digest), len(schema), digest.encode(), schema.encode()\n    )\n    return raw\n\n\ndef deserialize_ident(raw: bytes) -> DataIdent:\n    digestLen, schemaLen = struct.unpack('<hh', raw[:4])\n    rawdigest, rawschema = struct.unpack(f'<{digestLen}s{schemaLen}s', raw[4:])\n    digest = rawdigest.decode()\n    schema = rawschema.decode()\n    return DataIdent(digest, schema)\n\n\ndef serialize_data(data: Union[np.ndarray, str, bytes]) -> Tuple[int, bytes]:\n    if isinstance(data, np.ndarray):\n        return (0, _serialize_arr(data))\n    elif isinstance(data, str):\n        return (2, _serialize_str(data))\n    elif isinstance(data, bytes):\n        return (3, _serialize_bytes(data))\n    else:\n        raise TypeError(type(data))\n\n\ndef deserialize_data(dtype_code: int, raw_data: bytes) -> Union[np.ndarray, str, bytes]:\n    if dtype_code == 0:\n        return _deserialize_arr(raw_data)\n    elif dtype_code == 2:\n        return _deserialize_str(raw_data)\n    elif dtype_code == 3:\n        return _deserialize_bytes(raw_data)\n    else:\n        raise ValueError(f'dtype_code unknown {dtype_code}')\n\n\ndef serialize_record(data: Union[np.ndarray, str, bytes], digest: str, schema: str) -> bytes:\n    \"\"\"\n    dtype_code len_raw_ident len_raw_data raw_ident, raw_data\n    \"\"\"\n    dtype_code, raw_data = serialize_data(data)\n    raw_ident = serialize_ident(digest, schema)\n    raw = struct.pack(\n        f'<b2Q{len(raw_ident)}s{len(raw_data)}s',\n        dtype_code, len(raw_ident), len(raw_data), raw_ident, raw_data\n    )\n    return raw\n\n\ndef deserialize_record(raw: bytes) -> DataRecord:\n    identStart = 17  # 1 + 2 * 8 bytes\n    dtype_code, identLen, dataLen = struct.unpack(f'<b2Q', raw[:identStart])\n    identEnd = identStart + identLen\n    arrEnd = identEnd + dataLen\n    arr = deserialize_data(dtype_code, raw[identEnd:arrEnd])\n    ident = deserialize_ident(raw[identStart:identEnd])\n    return DataRecord(arr, ident.digest, ident.schema)\n\n\ndef serialize_record_pack(records: List[bytes]) -> bytes:\n    \"\"\"\n    num_records len_rec1 raw_rec1 len_rec2 raw_rec2 ... len_recN raw_recN\n    \"\"\"\n    raw_num_records = struct.pack(f'<i', len(records))\n    raw_records = [b''.join([struct.pack(f'<Q', len(rec)), rec]) for rec in records]\n    return b''.join([raw_num_records, *raw_records])\n\n\ndef deserialize_record_pack(raw: bytes) -> List[bytes]:\n    numRecords = struct.unpack(f'<i', raw[:4])[0]\n    cursorPos, recs = 4, []\n    for i in range(numRecords):\n        lenRec = struct.unpack(f'<Q', raw[cursorPos:cursorPos+8])[0]\n        recs.append(raw[cursorPos+8:cursorPos+8+lenRec])\n        cursorPos += (8 + lenRec)\n    return recs\n"
  },
  {
    "path": "src/hangar/remote/client.py",
    "content": "import concurrent.futures\nimport logging\nimport os\nimport tempfile\nimport time\nfrom threading import Lock\nfrom typing import Tuple, Sequence, List, Iterable, TYPE_CHECKING\n\nimport blosc\nimport grpc\nimport lmdb\nfrom tqdm import tqdm\n\nfrom . import chunks, hangar_service_pb2, hangar_service_pb2_grpc\nfrom .header_manipulator_client_interceptor import header_adder_interceptor\nfrom .. import constants as c\nfrom ..backends import BACKEND_ACCESSOR_MAP, backend_decoder\nfrom ..context import Environments\nfrom ..records import commiting, hashs, hash_data_db_key_from_raw_key, queries, summarize\nfrom ..records.hashmachine import hash_func_from_tcode\nfrom ..txnctx import TxnRegister\nfrom ..utils import set_blosc_nthreads, calc_num_threadpool_workers\n\nif TYPE_CHECKING:\n    from .content import DataWriter\n\n\nset_blosc_nthreads()\n\nlogger = logging.getLogger(__name__)\n\n\nclass HangarClient(object):\n    \"\"\"Client which connects and handles data transfer to the hangar server.\n\n    Parameters\n    ----------\n    envs : Environments\n        environment handles to manage all required calls to the local\n        repostory state.\n    address : str\n        IP:PORT where the hangar server can be reached.\n    auth_username : str, optional, kwarg-only\n        credentials to use for authentication.\n    auth_password : str, optional, kwarg-only, by default ''.\n        credentials to use for authentication, by default ''.\n    wait_for_ready : bool, optional, kwarg-only, be default True.\n        If the client should wait before erring for a short period of time\n        while a server is `UNAVAILABLE`, typically due to it just starting up\n        at the time the connection was made\n    wait_for_ready_timeout : float, optional, kwarg-only, by default 5.\n        If `wait_for_ready` is True, the time in seconds which the client should\n        wait before raising an error. Must be positive value (greater than 0)\n    \"\"\"\n\n    def __init__(self,\n                 envs: Environments,\n                 address: str,\n                 *,\n                 auth_username: str = '',\n                 auth_password: str = '',\n                 wait_for_ready: bool = True,\n                 wait_for_ready_timeout: float = 5):\n\n        self.env: Environments = envs\n        self.address: str = address\n        self.wait_ready: bool = wait_for_ready\n        self.wait_ready_timeout: float = abs(wait_for_ready_timeout + 0.001)\n        self.data_writer_lock = Lock()\n\n        self.channel: grpc.Channel = None\n        self.stub: hangar_service_pb2_grpc.HangarServiceStub = None\n        self.header_adder_int = header_adder_interceptor(auth_username, auth_password)\n\n        self.cfg: dict = {}\n        self._rFs: BACKEND_ACCESSOR_MAP = {}\n\n        for backend, accessor in BACKEND_ACCESSOR_MAP.items():\n            if accessor is not None:\n                self._rFs[backend] = accessor(\n                    repo_path=self.env.repo_path,\n                    schema_shape=None,\n                    schema_dtype=None)\n                self._rFs[backend].open(mode='r')\n\n        self._setup_client_channel_config()\n\n    def _setup_client_channel_config(self):\n        \"\"\"get grpc client configuration from server and setup channel and stub for use.\n        \"\"\"\n        tmp_insec_channel = grpc.insecure_channel(self.address)\n        tmp_channel = grpc.intercept_channel(tmp_insec_channel, self.header_adder_int)\n        tmp_stub = hangar_service_pb2_grpc.HangarServiceStub(tmp_channel)\n        t_init, t_tot = time.time(), 0\n        while t_tot < self.wait_ready_timeout:\n            try:\n                request = hangar_service_pb2.GetClientConfigRequest()\n                response = tmp_stub.GetClientConfig(request)\n                self.cfg['push_max_nbytes'] = int(response.config['push_max_nbytes'])\n                self.cfg['optimization_target'] = response.config['optimization_target']\n\n                enable_compression = response.config['enable_compression']\n                if enable_compression == 'NoCompression':\n                    compression_val = grpc.Compression.NoCompression\n                elif enable_compression == 'Deflate':\n                    compression_val = grpc.Compression.Deflate\n                elif enable_compression == 'Gzip':\n                    compression_val = grpc.Compression.Gzip\n                else:\n                    compression_val = grpc.Compression.NoCompression\n                self.cfg['enable_compression'] = compression_val\n\n            except grpc.RpcError as err:\n                if not (err.code() == grpc.StatusCode.UNAVAILABLE) and (self.wait_ready is True):\n                    logger.error(err)\n                    raise err\n            else:\n                break\n            time.sleep(0.05)\n            t_tot = time.time() - t_init\n        else:\n            err = ConnectionError(f'Server did not connect after: {self.wait_ready_timeout} sec.')\n            logger.error(err)\n            raise err\n\n        tmp_channel.close()\n        tmp_insec_channel.close()\n        configured_channel = grpc.insecure_channel(\n            self.address,\n            options=[\n                ('grpc.optimization_target', self.cfg['optimization_target']),\n                (\"grpc.keepalive_time_ms\", 1000 * 60 * 1),\n                (\"grpc.keepalive_timeout_ms\", 1000 * 10),\n                (\"grpc.http2_min_sent_ping_interval_without_data_ms\", 1000 * 10),\n                (\"grpc.http2_max_pings_without_data\", 0),\n                (\"grpc.keepalive_permit_without_calls\", 1),\n            ],\n            compression=self.cfg['enable_compression'])\n        self.channel = grpc.intercept_channel(configured_channel, self.header_adder_int)\n        self.stub = hangar_service_pb2_grpc.HangarServiceStub(self.channel)\n\n    def close(self):\n        \"\"\"Close reader file handles and the GRPC channel connection, invalidating this instance.\n        \"\"\"\n        for backend_accessor in self._rFs.values():\n            backend_accessor.close()\n        self.channel.close()\n\n    def ping_pong(self) -> str:\n        \"\"\"Ping server to ensure that connection is working\n\n        Returns\n        -------\n        str\n            Should be value 'PONG'\n        \"\"\"\n        request = hangar_service_pb2.PingRequest()\n        response: hangar_service_pb2.PingReply = self.stub.PING(request)\n        return response.result\n\n    def push_branch_record(self, name: str, head: str\n                           ) -> hangar_service_pb2.PushBranchRecordReply:\n        \"\"\"Create a branch (if new) or update the server branch HEAD to new commit.\n\n        Parameters\n        ----------\n        name : str\n            branch name to be pushed\n        head : str\n            commit hash to update the server head to\n\n        Returns\n        -------\n        hangar_service_pb2.PushBranchRecordReply\n            code indicating success, message with human readable info\n        \"\"\"\n        rec = hangar_service_pb2.BranchRecord(name=name, commit=head)\n        request = hangar_service_pb2.PushBranchRecordRequest(rec=rec)\n        response = self.stub.PushBranchRecord(request)\n        return response\n\n    def fetch_branch_record(self, name: str\n                            ) -> hangar_service_pb2.FetchBranchRecordReply:\n        \"\"\"Get the latest head commit the server knows about for a given branch\n\n        Parameters\n        ----------\n        name : str\n            name of the branch to query on the server\n\n        Returns\n        -------\n        hangar_service_pb2.FetchBranchRecordReply\n            rec containing name and head commit if branch exists, along with\n            standard error proto if it does not exist on the server.\n        \"\"\"\n        rec = hangar_service_pb2.BranchRecord(name=name)\n        request = hangar_service_pb2.FetchBranchRecordRequest(rec=rec)\n        response = self.stub.FetchBranchRecord(request)\n        return response\n\n    def push_commit_record(self, commit: str, parentVal: bytes, specVal: bytes,\n                           refVal: bytes\n                           ) -> hangar_service_pb2.PushBranchRecordReply:\n        \"\"\"Push a new commit reference to the server.\n\n        Parameters\n        ----------\n        commit : str\n            hash digest of the commit to send\n        parentVal : bytes\n            lmdb ref parentVal of the commit\n        specVal : bytes\n            lmdb ref specVal of the commit\n        refVal : bytes\n            lmdb ref refVal of the commit\n\n        Returns\n        -------\n        hangar_service_pb2.PushBranchRecordReply\n            standard error proto\n        \"\"\"\n        cIter = chunks.clientCommitChunkedIterator(commit=commit,\n                                                   parentVal=parentVal,\n                                                   specVal=specVal,\n                                                   refVal=refVal)\n        response = self.stub.PushCommit(cIter)\n        return response\n\n    def fetch_commit_record(self, commit: str) -> Tuple[str, bytes, bytes, bytes]:\n        \"\"\"get the refs for a commit digest\n\n        Parameters\n        ----------\n        commit : str\n            digest of the commit to retrieve the references for\n\n        Returns\n        -------\n        Tuple[str, bytes, bytes, bytes]\n            ['commit hash', 'parentVal', 'specVal', 'refVal']\n        \"\"\"\n        request = hangar_service_pb2.FetchCommitRequest(commit=commit)\n        replies = self.stub.FetchCommit(request)\n        for idx, reply in enumerate(replies):\n            if idx == 0:\n                refVal = bytearray(reply.total_byte_size)\n                specVal = reply.record.spec\n                parentVal = reply.record.parent\n                offset = 0\n            size = len(reply.record.ref)\n            refVal[offset: offset + size] = reply.record.ref\n            offset += size\n\n        if reply.error.code != 0:\n            logger.error(reply.error)\n            return False\n        return (commit, parentVal, specVal, refVal)\n\n    def fetch_schema(self, schema_hash: str) -> Tuple[str, bytes]:\n        \"\"\"get the schema specification for a schema hash\n\n        Parameters\n        ----------\n        schema_hash : str\n            schema hash to retrieve from the server\n\n        Returns\n        -------\n        Tuple[str, bytes]\n            ['schema hash', 'schemaVal']\n        \"\"\"\n        schema_rec = hangar_service_pb2.SchemaRecord(digest=schema_hash)\n        request = hangar_service_pb2.FetchSchemaRequest(rec=schema_rec)\n        reply = self.stub.FetchSchema(request)\n        if reply.error.code != 0:\n            logger.error(reply.error)\n            return False\n\n        schemaVal = reply.rec.blob\n        return (schema_hash, schemaVal)\n\n    def push_schema(self, schema_hash: str,\n                    schemaVal: bytes) -> hangar_service_pb2.PushSchemaReply:\n        \"\"\"push a schema hash record to the remote server\n\n        Parameters\n        ----------\n        schema_hash : str\n            hash digest of the schema being sent\n        schemaVal : bytes\n            ref value of the schema representation\n\n        Returns\n        -------\n        hangar_service_pb2.PushSchemaReply\n            standard error proto indicating success\n        \"\"\"\n        rec = hangar_service_pb2.SchemaRecord(digest=schema_hash,\n                                              blob=schemaVal)\n        request = hangar_service_pb2.PushSchemaRequest(rec=rec)\n        response = self.stub.PushSchema(request)\n        return response\n\n    def fetch_data(\n            self,\n            origins: Sequence[hangar_service_pb2.DataOriginReply],\n            datawriter_cm: 'DataWriter',\n            schema: str,\n            pbar: 'tqdm'\n    ) -> Sequence[str]:\n        \"\"\"Fetch data hash digests for a particular schema.\n\n        As the total size of the data to be transferred isn't known before this\n        operation occurs, if more tensor data digests are requested then the\n        Client is configured to allow in memory at a time, only a portion of the\n        requested digests will actually be materialized. The received digests\n        are listed as the return value of this function, be sure to check that\n        all requested digests have been received!\n\n        Parameters\n        ----------\n        origins : Sequence[hangar_service_pb2.DataOriginReply],\n        datawriter_cm : 'DataWriter',\n        schema : str,\n        pbar : 'tqdm'\n\n        Returns\n        -------\n        Sequence[str]\n\n        Raises\n        ------\n        RuntimeError\n            if received digest != requested or what was reported to be sent.\n\n         client.fetch_data(origins, DW_CM, schema, pbar):\n            _ = DW_CM.data(schema, data_digest=returned_digest, data=returned_data)\n        \"\"\"\n\n        def fetch_write_data_parallel(\n                pb: 'hangar_service_pb2.DataOriginReply',\n                dw_cm: 'DataWriter',\n                schema: str,\n                lock: 'Lock'\n        ) -> str:\n            requested_uri = pb.uri\n            request = hangar_service_pb2.FetchDataRequest(uri=requested_uri)\n            replies = self.stub.FetchData(request)\n            for idx, reply in enumerate(replies):\n                if idx == 0:\n                    dBytes = bytearray(reply.nbytes)\n                    offset = 0\n                    if reply.uri != requested_uri:\n                        raise ValueError(f'requested uri: {requested_uri}, returned: {reply.uri}')\n                size = len(reply.raw_data)\n                if size > 0:\n                    dBytes[offset:offset + size] = reply.raw_data\n                    offset += size\n\n            if pb.compression is True:\n                codex = pb.compression_opts['id']\n                if codex == 'blosc':\n                    returned_raw = blosc.decompress(dBytes)\n                else:\n                    raise ValueError(f'compression id: {codex}')\n            else:\n                returned_raw = dBytes\n\n            dtype_code = pb.data_type\n            returned_data = chunks.deserialize_data(dtype_code, returned_raw)\n            hash_func = hash_func_from_tcode(str(dtype_code))\n            received_hash = hash_func(returned_data)\n            if received_hash != pb.digest:\n                raise RuntimeError(f'MANGLED! got: {received_hash} != requested: {pb.digest}')\n            with lock:\n                written_digest = dw_cm.data(\n                    schema, data_digest=received_hash, data=returned_data)\n            return written_digest\n\n        saved_digests = []\n        nWorkers = calc_num_threadpool_workers()\n        with concurrent.futures.ThreadPoolExecutor(max_workers=nWorkers) as executor:\n            futures = [executor.submit(fetch_write_data_parallel,\n                pb, datawriter_cm, schema, self.data_writer_lock) for pb in origins]\n            for future in concurrent.futures.as_completed(futures):\n                saved_digests.append(future.result())\n                pbar.update(1)\n        return saved_digests\n\n    def fetch_data_origin(self, digests: Sequence[str]) -> List[hangar_service_pb2.DataOriginReply]:\n\n        def origin_request_iter(digests: Sequence[str]):\n            for digest in digests:\n                yield hangar_service_pb2.DataOriginRequest(digest=digest)\n\n        requestIter = origin_request_iter(digests)\n        replies = self.stub.FetchFindDataOrigin(requestIter)\n\n        output = []\n        for reply in replies:\n            output.append(reply)\n        return output\n\n    def push_find_data_origin(self, digests):\n        try:\n            specs = []\n            hashTxn = TxnRegister().begin_reader_txn(self.env.hashenv)\n            for digest in digests:\n                hashKey = hash_data_db_key_from_raw_key(digest)\n                hashVal = hashTxn.get(hashKey, default=False)\n                if not hashVal:\n                    raise KeyError(f'No hash record with key: {hashKey}')\n                be_loc = backend_decoder(hashVal)\n                specs.append((digest, be_loc))\n        finally:\n            TxnRegister().abort_reader_txn(self.env.hashenv)\n\n    def push_data_begin_context(self):\n        request = hangar_service_pb2.PushBeginContextRequest()\n        reply = self.stub.PushBeginContext(request)\n        return reply\n\n    def push_data_end_context(self):\n        request = hangar_service_pb2.PushEndContextRequest()\n        reply = self.stub.PushEndContext(request)\n        return reply\n\n    def push_data(self, schema_hash: str, digests: Sequence[str],\n                  pbar: tqdm = None) -> hangar_service_pb2.PushDataReply:\n        \"\"\"Given a schema and digest list, read the data and send to the server\n\n        Parameters\n        ----------\n        schema_hash : str\n            hash of the digest schemas\n        digests : Sequence[str]\n            iterable of digests to be read in and sent to the server\n        pbar : tqdm, optional\n            progress bar instance to be updated as the operation occurs, by default None\n\n        Returns\n        -------\n        hangar_service_pb2.PushDataReply\n            standard error proto indicating success\n\n        Raises\n        ------\n        KeyError\n            if one of the input digests does not exist on the client\n        rpc_error\n            if the server received corrupt data\n        \"\"\"\n        CONFIG_COMPRESSION_IS_DESIRED = True\n        try:\n            specs = {}\n            request_stack = []\n            hashTxn = TxnRegister().begin_reader_txn(self.env.hashenv)\n            for digest in digests:\n                hashKey = hash_data_db_key_from_raw_key(digest)\n                hashVal = hashTxn.get(hashKey, default=False)\n                if not hashVal:\n                    raise KeyError(f'No hash record with key: {hashKey}')\n\n                be_loc = backend_decoder(hashVal)\n                specs[digest] = be_loc  # saving for later so no recompute cost\n\n                if be_loc.backend in ['01', '00', '10']:\n                    dtype = hangar_service_pb2.DataType.NP_ARRAY\n                elif be_loc.backend == '30':\n                    dtype = hangar_service_pb2.DataType.STR\n                elif be_loc.backend == '31':\n                    dtype = hangar_service_pb2.DataType.BYTES\n                else:\n                    raise TypeError(be_loc)\n\n                _request = hangar_service_pb2.PushFindDataOriginRequest(\n                    data_type=dtype,\n                    digest=digest,\n                    compression_is_desired=CONFIG_COMPRESSION_IS_DESIRED)\n                request_stack.append(_request)\n        finally:\n            TxnRegister().abort_reader_txn(self.env.hashenv)\n\n        def request_stack_iterator(request_stack):\n            for request in request_stack:\n                yield request\n\n        requestIter = request_stack_iterator(request_stack)\n        replies: Iterable[hangar_service_pb2.PushFindDataOriginReply]\n        replies = self.stub.PushFindDataOrigin(requestIter)\n\n        try:\n            for k in self._rFs.keys():\n                self._rFs[k].__enter__()\n\n            def push_request_iterator(raw, uri, data_type, schema_hash):\n                push_request = hangar_service_pb2.PushDataRequest(\n                    uri=uri,\n                    nbytes=len(raw),\n                    data_type=data_type,\n                    schema_hash=schema_hash)\n                for raw_chunk in chunks.chunk_bytes(raw):\n                    push_request.raw_data = raw_chunk\n                    yield push_request\n\n            def push_data_parallel(reply):\n                be_loc = specs[reply.digest]\n                data = self._rFs[be_loc.backend].read_data(be_loc)\n                _, raw_data = chunks.serialize_data(data)\n\n                if reply.compression_expected is True:\n                    compressed_record = blosc.compress(\n                        raw_data, clevel=3, cname='blosclz', shuffle=blosc.NOSHUFFLE)\n                else:\n                    compressed_record = raw_data\n\n                if be_loc.backend in ['01', '00', '10']:\n                    dtype = hangar_service_pb2.DataType.NP_ARRAY\n                elif be_loc.backend == '30':\n                    dtype = hangar_service_pb2.DataType.STR\n                elif be_loc.backend == '31':\n                    dtype = hangar_service_pb2.DataType.BYTES\n                else:\n                    raise TypeError(be_loc)\n\n                pushDataIter = push_request_iterator(compressed_record, reply.uri, dtype, schema_hash)\n                push_data_response = self.stub.PushData(pushDataIter)\n                return push_data_response\n\n            nWorkers = calc_num_threadpool_workers()\n            with concurrent.futures.ThreadPoolExecutor(max_workers=nWorkers) as executor:\n                push_futures = tuple((executor.submit(push_data_parallel, reply) for reply in replies))\n                for future in concurrent.futures.as_completed(push_futures):\n                    _ = future.result()\n                    pbar.update(1)\n\n        except grpc.RpcError as rpc_error:\n            logger.error(rpc_error)\n            raise rpc_error\n\n        finally:\n            for k in self._rFs.keys():\n                self._rFs[k].__exit__()\n\n    def fetch_find_missing_commits(self, branch_name):\n\n        c_commits = commiting.list_all_commits(self.env.refenv)\n        branch_rec = hangar_service_pb2.BranchRecord(name=branch_name)\n        request = hangar_service_pb2.FindMissingCommitsRequest()\n        request.commits.extend(c_commits)\n        request.branch.CopyFrom(branch_rec)\n        reply = self.stub.FetchFindMissingCommits(request)\n        return reply\n\n    def push_find_missing_commits(self, branch_name):\n        branch_commits = summarize.list_history(\n            refenv=self.env.refenv,\n            branchenv=self.env.branchenv,\n            branch_name=branch_name)\n        branch_rec = hangar_service_pb2.BranchRecord(\n            name=branch_name, commit=branch_commits['head'])\n\n        request = hangar_service_pb2.FindMissingCommitsRequest()\n        request.commits.extend(branch_commits['order'])\n        request.branch.CopyFrom(branch_rec)\n        reply = self.stub.PushFindMissingCommits(request)\n        return reply\n\n    def fetch_find_missing_hash_records(self, commit):\n\n        all_hashs = hashs.HashQuery(self.env.hashenv).list_all_hash_keys_raw()\n        all_hashs_raw = [chunks.serialize_ident(digest, '') for digest in all_hashs]\n        raw_pack = chunks.serialize_record_pack(all_hashs_raw)\n        pb2_func = hangar_service_pb2.FindMissingHashRecordsRequest\n        cIter = chunks.missingHashRequestIterator(commit, raw_pack, pb2_func)\n        responses = self.stub.FetchFindMissingHashRecords(cIter)\n        for idx, response in enumerate(responses):\n            if idx == 0:\n                hBytes, offset = bytearray(response.total_byte_size), 0\n            size = len(response.hashs)\n            hBytes[offset: offset + size] = response.hashs\n            offset += size\n\n        uncompBytes = blosc.decompress(hBytes)\n        raw_idents = chunks.deserialize_record_pack(uncompBytes)\n        idents = [chunks.deserialize_ident(raw) for raw in raw_idents]\n        return idents\n\n    def push_find_missing_hash_records(self, commit, tmpDB: lmdb.Environment = None):\n\n        if tmpDB is None:\n            with tempfile.TemporaryDirectory() as tempD:\n                tmpDF = os.path.join(tempD, 'test.lmdb')\n                tmpDB = lmdb.open(path=tmpDF, **c.LMDB_SETTINGS)\n                commiting.unpack_commit_ref(self.env.refenv, tmpDB, commit)\n                c_hashs_schemas = queries.RecordQuery(tmpDB).data_hash_to_schema_hash()\n                c_hashes = list(set(c_hashs_schemas.keys()))\n                tmpDB.close()\n        else:\n            c_hashs_schemas = queries.RecordQuery(tmpDB).data_hash_to_schema_hash()\n            c_hashes = list(set(c_hashs_schemas.keys()))\n\n        c_hashs_raw = [chunks.serialize_ident(digest, '') for digest in c_hashes]\n        raw_pack = chunks.serialize_record_pack(c_hashs_raw)\n        pb2_func = hangar_service_pb2.FindMissingHashRecordsRequest\n        cIter = chunks.missingHashRequestIterator(commit, raw_pack, pb2_func)\n\n        responses = self.stub.PushFindMissingHashRecords(cIter)\n        for idx, response in enumerate(responses):\n            if idx == 0:\n                hBytes, offset = bytearray(response.total_byte_size), 0\n            size = len(response.hashs)\n            hBytes[offset: offset + size] = response.hashs\n            offset += size\n\n        uncompBytes = blosc.decompress(hBytes)\n        s_missing_raw = chunks.deserialize_record_pack(uncompBytes)\n        s_mis_hsh = [chunks.deserialize_ident(raw).digest for raw in s_missing_raw]\n        s_mis_hsh_sch = [(s_hsh, c_hashs_schemas[s_hsh]) for s_hsh in s_mis_hsh]\n        return s_mis_hsh_sch\n\n    def fetch_find_missing_schemas(self, commit):\n        c_schemaset = set(hashs.HashQuery(self.env.hashenv).list_all_schema_digests())\n        c_schemas = list(c_schemaset)\n\n        request = hangar_service_pb2.FindMissingSchemasRequest()\n        request.commit = commit\n        request.schema_digests.extend(c_schemas)\n\n        response = self.stub.FetchFindMissingSchemas(request)\n        return response\n\n    def push_find_missing_schemas(self, commit, tmpDB: lmdb.Environment = None):\n\n        if tmpDB is None:\n            with tempfile.TemporaryDirectory() as tempD:\n                tmpDF = os.path.join(tempD, 'test.lmdb')\n                tmpDB = lmdb.open(path=tmpDF, **c.LMDB_SETTINGS)\n                commiting.unpack_commit_ref(self.env.refenv, tmpDB, commit)\n                c_schemaset = set(queries.RecordQuery(tmpDB).schema_hashes())\n                c_schemas = list(c_schemaset)\n                tmpDB.close()\n        else:\n            c_schemaset = set(queries.RecordQuery(tmpDB).schema_hashes())\n            c_schemas = list(c_schemaset)\n\n        request = hangar_service_pb2.FindMissingSchemasRequest()\n        request.commit = commit\n        request.schema_digests.extend(c_schemas)\n\n        response = self.stub.PushFindMissingSchemas(request)\n        return response\n"
  },
  {
    "path": "src/hangar/remote/config_server.ini",
    "content": "[SERVER_GRPC]\nchannel_address = [::]:50051\nmax_thread_pool_workers = 200\nmax_concurrent_rpcs = 100\nenable_compression = NoCompression\noptimization_target = blend\nfetch_max_nbytes = 500_000_000\n\n[SERVER_ADMIN]\nrestrict_push = 0\nusername = --none--\npassword = --none--\n\n[CLIENT_GRPC]\nenable_compression = NoCompression\noptimization_target = blend\npush_max_nbytes = 600_000_000\n"
  },
  {
    "path": "src/hangar/remote/content.py",
    "content": "from typing import NamedTuple, Union, Optional\n\nimport numpy as np\n\nfrom ..columns.constructors import open_file_handles, column_type_object_from_schema\nfrom ..context import Environments\nfrom ..records import (\n    parsing,\n    schema_spec_from_db_val,\n    hash_schema_db_key_from_raw_key,\n    hash_data_db_key_from_raw_key\n)\nfrom ..txnctx import TxnRegister\n\n\nclass ContentWriter(object):\n    \"\"\"Common methods to client & server which write content received.\n\n    These are special methods configured especially for remote operations.\n    They do not honor the public facing API or data write/read conventions\n    established for users or the rest of Hangar internals.\n\n    Parameters\n    ----------\n    envs\n        main hangar environment context object.\n    \"\"\"\n\n    def __init__(self, envs: Environments):\n\n        self.env: Environments = envs\n        self.txnctx: TxnRegister = TxnRegister()\n\n    def commit(self, commit: str, parentVal: bytes, specVal: bytes,\n               refVal: bytes) -> Union[str, bool]:\n        \"\"\"Write a commit record to the ref db\n\n        Parameters\n        ----------\n        commit\n            commit hash to write\n        parentVal\n            db formatted representation of commit parents\n        specVal\n            db formatted representation of the commit specs\n        refVal\n            db formated representation of commit record contents\n\n        Returns\n        -------\n        str or False\n            Commit hash if operation was successful.\n\n            False if the commit hash existed in the db previously and\n            no records were written.\n        \"\"\"\n        commitSpecKey = parsing.commit_spec_db_key_from_raw_key(commit)\n        commitParentKey = parsing.commit_parent_db_key_from_raw_key(commit)\n        commitRefKey = parsing.commit_ref_db_key_from_raw_key(commit)\n        refTxn = self.txnctx.begin_writer_txn(self.env.refenv)\n        try:\n            cmtParExists = refTxn.put(commitParentKey, parentVal, overwrite=False)\n            cmtRefExists = refTxn.put(commitRefKey, refVal, overwrite=False)\n            cmtSpcExists = refTxn.put(commitSpecKey, specVal, overwrite=False)\n        finally:\n            self.txnctx.commit_writer_txn(self.env.refenv)\n\n        ret = False if not all([cmtParExists, cmtRefExists, cmtSpcExists]) else commit\n        return ret\n\n    def schema(self, schema_hash: str, schemaVal: bytes) -> Union[str, bool]:\n        \"\"\"Write a column schema hash specification record to the db\n\n        Parameters\n        ----------\n        schema_hash\n            schema hash being written\n        schemaVal\n            db formatted representation of schema specification\n\n        Returns\n        -------\n        str or False\n            schema_hash written if operation was successful.\n\n            False if the schema_hash existed in db and no records written.\n        \"\"\"\n        schemaKey = hash_schema_db_key_from_raw_key(schema_hash)\n        hashTxn = self.txnctx.begin_writer_txn(self.env.hashenv)\n        try:\n            schemaExists = hashTxn.put(schemaKey, schemaVal, overwrite=False)\n        finally:\n            self.txnctx.commit_writer_txn(self.env.hashenv)\n\n        ret = False if not schemaExists else schema_hash\n        return ret\n\n\nclass DataWriter:\n\n    def __init__(self, envs):\n\n        self.env: Environments = envs\n        self.txnctx: TxnRegister = TxnRegister()\n\n        self._schema_hash_be_accessors = {}\n        self._schema_hash_objects = {}\n        self._is_cm = False\n\n    def __enter__(self):\n        self._is_cm = True\n        self.hashTxn = self.txnctx.begin_writer_txn(self.env.hashenv)\n        return self\n\n    def __exit__(self, *exc):\n        for be in self._schema_hash_be_accessors.values():\n            be.close()\n        self.txnctx.commit_writer_txn(self.env.hashenv)\n        self._schema_hash_be_accessors.clear()\n        self._schema_hash_objects.clear()\n        self._is_cm = False\n        self.hashTxn = None\n\n    @property\n    def is_cm(self):\n        return self._is_cm\n\n    def _open_new_backend(self, schema):\n        be_accessor = open_file_handles(backends=[schema.backend],\n                                        path=self.env.repo_path,\n                                        mode='a',\n                                        schema=schema,\n                                        remote_operation=True)[schema.backend]\n        self._schema_hash_be_accessors[schema.schema_hash_digest()] = be_accessor\n\n    def _get_schema_object(self, schema_hash):\n        schemaKey = hash_schema_db_key_from_raw_key(schema_hash)\n        schemaVal = self.hashTxn.get(schemaKey)\n\n        schema_val = schema_spec_from_db_val(schemaVal)\n        schema = column_type_object_from_schema(schema_val)\n\n        if schema_hash != schema.schema_hash_digest():\n            raise RuntimeError(schema.__dict__)\n\n        self._schema_hash_objects[schema_hash] = schema\n        return schema\n\n    def _get_changed_schema_object(self, schema_hash, backend, backend_options):\n        import copy\n        if schema_hash in self._schema_hash_objects:\n            base_schema = copy.deepcopy(self._schema_hash_objects[schema_hash])\n        else:\n            base_schema = copy.deepcopy(self._get_schema_object(schema_hash))\n\n        base_schema.change_backend(backend, backend_options=backend_options)\n        changed_schema = self._schema_hash_objects.setdefault(base_schema.schema_hash_digest(), base_schema)\n        return changed_schema\n\n    def data(self,\n             schema_hash: str,\n             data_digest: str,\n             data: Union[str, int, np.ndarray],\n             backend: Optional[str] = None,\n             backend_options: Optional[dict] = None) -> str:\n        \"\"\"Write data content to the hash records database\n\n        Parameters\n        ----------\n        schema_hash\n            schema_hash currently being written\n        data_digest\n            digest to write\n        data\n            actual piece of data to write\n        backend\n            Manually specified backend code which will be used to record the\n            data records. If not specified (``None``), the default backend\n            recorded in the schema spec will be used, by default None\n        backend_options\n            dict specifying backend options to use\n\n        Returns\n        -------\n        str\n            data digest written by this method.\n        \"\"\"\n        if schema_hash not in self._schema_hash_objects:\n            self._get_schema_object(schema_hash)\n        schema = self._schema_hash_objects[schema_hash]\n        if (backend is not None) and ((backend != schema.backend) or (backend_options is not None)):\n            schema = self._get_changed_schema_object(schema_hash, backend, backend_options)\n\n        # Need because after changing, the schema_hash of the backend changes\n        final_schema_hash = schema.schema_hash_digest()\n        if final_schema_hash not in self._schema_hash_be_accessors:\n            self._open_new_backend(schema)\n\n        be_accessor = self._schema_hash_be_accessors[final_schema_hash]\n        hashVal = be_accessor.write_data(data, remote_operation=True)\n        hashKey = hash_data_db_key_from_raw_key(data_digest)\n        self.hashTxn.put(hashKey, hashVal)\n        return data_digest\n\n\nRawCommitContent = NamedTuple('RawCommitContent', [('commit', str),\n                                                   ('cmtParentVal', bytes),\n                                                   ('cmtSpecVal', bytes),\n                                                   ('cmtRefVal', bytes)])\n\n\nclass ContentReader(object):\n    \"\"\"Common methods to client & server which read content.\n\n    These are special methods configured especially for remote operations.\n    They do not honor the public facing API or data write/read conventions\n    established for users or the rest of Hangar internals.\n\n    Parameters\n    ----------\n    envs : context.Environments\n        main hangar environment context object.\n    \"\"\"\n    def __init__(self, envs):\n\n        self.env: Environments = envs\n        self.txnctx: TxnRegister = TxnRegister()\n\n    def commit(self, commit: str) -> Union[RawCommitContent, bool]:\n        \"\"\"Read a commit with a given hash and get db formatted content\n\n        Parameters\n        ----------\n        commit\n            commit hash to read from the ref db\n\n        Returns\n        -------\n        namedtuple or False\n            nametuple with typename = RawCommitContent field_names = ('commit',\n            'cmtParentVal', 'cmtSpecVal', 'cmtRefVal') if operation successful.\n\n            False if commit does not exist with provided digest.\n        \"\"\"\n        cmtRefKey = parsing.commit_ref_db_key_from_raw_key(commit)\n        cmtParentKey = parsing.commit_parent_db_key_from_raw_key(commit)\n        cmtSpecKey = parsing.commit_spec_db_key_from_raw_key(commit)\n\n        reftxn = self.txnctx.begin_reader_txn(self.env.refenv)\n        try:\n            cmtRefVal = reftxn.get(cmtRefKey, default=False)\n            cmtParentVal = reftxn.get(cmtParentKey, default=False)\n            cmtSpecVal = reftxn.get(cmtSpecKey, default=False)\n        finally:\n            self.txnctx.abort_reader_txn(self.env.refenv)\n\n        ret = RawCommitContent(commit, cmtParentVal, cmtSpecVal, cmtRefVal)\n\n        if not all(ret) and not isinstance(ret.cmtParentVal, bytes):\n            return False\n        else:\n            return ret\n\n    def schema(self, schema_hash: str) -> Union[bytes, bool]:\n        \"\"\"Read db formatted schema val for a schema hash\n\n        Parameters\n        ----------\n        schema_hash\n            schema hash to look up\n\n        Returns\n        -------\n        bytes or False\n            db formatted representation of schema bytes if schema_hash exists\n\n            False if the schema_hash does not exist in the db.\n        \"\"\"\n        schemaKey = hash_schema_db_key_from_raw_key(schema_hash)\n        hashTxn = self.txnctx.begin_reader_txn(self.env.hashenv)\n        try:\n            schemaVal = hashTxn.get(schemaKey, default=False)\n        finally:\n            self.txnctx.abort_reader_txn(self.env.hashenv)\n\n        ret = False if not schemaVal else schemaVal\n        return ret\n"
  },
  {
    "path": "src/hangar/remote/hangar_service.proto",
    "content": "syntax = \"proto3\";\n\npackage hangar;\noption optimize_for = SPEED;\n\n\nservice HangarService {\n\n    rpc PING (PingRequest) returns (PingReply) {}\n    rpc GetClientConfig (GetClientConfigRequest) returns (GetClientConfigReply) {}\n\n    rpc FetchBranchRecord (FetchBranchRecordRequest) returns (FetchBranchRecordReply) {}\n    rpc FetchData (FetchDataRequest) returns (stream FetchDataReply) {}\n    rpc FetchCommit (FetchCommitRequest) returns (stream FetchCommitReply) {}\n    rpc FetchSchema (FetchSchemaRequest) returns (FetchSchemaReply) {}\n\n    rpc PushBranchRecord (PushBranchRecordRequest) returns (PushBranchRecordReply) {}\n    rpc PushData (stream PushDataRequest) returns (PushDataReply) {}\n    rpc PushCommit (stream PushCommitRequest) returns (PushCommitReply) {}\n    rpc PushSchema (PushSchemaRequest) returns (PushSchemaReply) {}\n\n    rpc FetchFindMissingCommits (FindMissingCommitsRequest) returns (FindMissingCommitsReply) {}\n    rpc FetchFindMissingHashRecords (stream FindMissingHashRecordsRequest) returns (stream FindMissingHashRecordsReply) {}\n    rpc FetchFindMissingSchemas (FindMissingSchemasRequest) returns (FindMissingSchemasReply) {}\n\n    rpc PushFindMissingCommits (FindMissingCommitsRequest) returns (FindMissingCommitsReply) {}\n    rpc PushFindMissingHashRecords (stream FindMissingHashRecordsRequest) returns (stream FindMissingHashRecordsReply) {}\n    rpc PushFindMissingSchemas (FindMissingSchemasRequest) returns (FindMissingSchemasReply) {}\n\n    rpc FetchFindDataOrigin (stream DataOriginRequest) returns (stream DataOriginReply) {}\n    rpc PushFindDataOrigin (stream PushFindDataOriginRequest) returns (stream PushFindDataOriginReply) {}\n    rpc PushBeginContext (PushBeginContextRequest) returns (PushBeginContextReply) {}\n    rpc PushEndContext (PushEndContextRequest) returns (PushEndContextReply) {}\n}\n\n\n/*\n-------------------------------------------------------------------------------\n| Common Formats for Data and Records\n-------------------------------------------------------------------------------\n*/\n\n\nmessage PushBeginContextRequest {\n    // TODO: make this field actually do something\n    string client_uuid = 1;\n}\nmessage PushBeginContextReply {\n    ErrorProto err = 1;\n}\n\n\nmessage PushEndContextRequest {\n    // TODO: make this field actually do something\n    string client_uuid = 1;\n}\nmessage PushEndContextReply {\n    ErrorProto err = 1;\n}\n\n\n\nmessage ErrorProto {\n    // binary indicator of success. 1: success, 0: failed\n    int64 code = 1;\n    // string response indicating success. 'OK': success, 'ERROR': failed\n    string message = 2;\n  }\n\n\nmessage BranchRecord {\n    // name of the branch\n    string name = 1;\n    // branch head commit hash\n    string commit = 2;\n}\n\nmessage HashRecord {\n    // specific hash algorithm used to calculate the digest\n    string type = 1;\n    // (hex)digest of the hash record\n    string digest = 2;\n}\n\nmessage CommitRecord {\n    // parent hash(s) of the commit in same format as local store\n    bytes parent = 1;\n    // compressed record reference contents of the commit\n    bytes ref = 2;\n    // metadata attached to the commit record (username, email, message, time, etc.)\n    bytes spec = 3;\n}\n\n\nmessage SchemaRecord {\n    // hash of the schema def\n    string digest = 1;\n    // encoded schema val to be sent\n    bytes blob = 2;\n}\n\nmessage DataOriginRequest {\n    string digest = 1;\n}\n\n\nenum DataLocation {\n    // Server Side Local Disk\n    REMOTE_SERVER = 0;\n    // Minio Instance\n    MINIO = 1;\n    // AWS S3\n    S3 = 2;\n    // Google Cloud Store\n    GCS = 3;\n    // Azure Blob Store\n    ABS = 4;\n}\n\nenum DataType {\n    NP_ARRAY = 0;\n    SCHEMA = 1;  // Do not know if intend to share this or not.\n    STR = 2;\n    BYTES = 3;\n}\n\n\nmessage DataOriginReply {\n    DataLocation location = 1;\n    DataType data_type = 2;\n    string digest = 3;\n    string uri = 4;\n    bool compression = 5;\n    map<string, string> compression_opts = 6;\n}\n\n\nmessage PushFindDataOriginRequest {\n    DataType data_type = 1;\n    string digest = 2;\n    bool compression_is_desired = 3;\n}\n\nmessage PushFindDataOriginReply {\n    string digest = 1;\n    DataLocation location = 2;\n    string uri = 3;\n    bool compression_expected = 5;\n    map<string, string> compression_opts_expected = 6;\n}\n\n\n\n/*\n-------------------------------------------------------------------------------\n| Client Config from Server\n-------------------------------------------------------------------------------\n*/\n\nmessage PingRequest {}\n\nmessage PingReply {\n    string result = 1;\n}\n\n\nmessage GetClientConfigRequest {}\n\nmessage GetClientConfigReply {\n    // dictionary style map of the config options\n    map<string, string> config = 1;\n    // success or not\n    ErrorProto error = 2;\n}\n\n\n/*\n-------------------------------------------------------------------------------\n| Fetching Data and Records\n-------------------------------------------------------------------------------\n*/\n\n\nmessage FetchBranchRecordRequest {\n    // name of the branch to fetch\n    BranchRecord rec = 1;\n}\nmessage FetchBranchRecordReply {\n    // record result of the branch\n    BranchRecord rec = 1;\n    // success or not\n    ErrorProto error = 2;\n}\n\n\nmessage FetchDataRequest {\n    string uri = 1;\n//    bytes raw_data = 1;\n//    // total size of the split tensorprotos\n//    int64 comp_nbytes = 2;\n//    // total size of the uncompressed raw data\n//    int64 uncomp_nbytes = 3;\n//    // string schema_hash = 4;\n//    ErrorProto error = 4;\n}\n\n\nmessage FetchDataReply {\n    string uri = 1;\n    // data container for the tensor\n    bytes raw_data = 2;\n    // total number of bytes\n    int64 nbytes = 3;\n//    // total size of the split tensorprotos\n//    int64 comp_nbytes = 2;\n//    // total size of the uncompressed raw data\n//    int64 uncomp_nbytes = 3;\n    // success or not\n    ErrorProto error = 4;\n}\n\n\n\nmessage FetchCommitRequest {\n    // (hex)digest of the commit to fetch references to\n    string commit = 1;\n}\nmessage FetchCommitReply {\n    // (hex)digest hash of the commit record\n    string commit = 1;\n    // total size of bytes\n    int64 total_byte_size = 2;\n    // data\n    CommitRecord record = 3;\n    // success or not\n    ErrorProto error = 4;\n}\n\n\nmessage FetchSchemaRequest {\n    // schema record spec with hash specified\n    SchemaRecord rec = 1;\n}\nmessage FetchSchemaReply {\n    // schema record spec\n    SchemaRecord rec = 1;\n    // success or not\n    ErrorProto error = 2;\n}\n\n\n/*\n-------------------------------------------------------------------------------\n| Pushing Data and Records\n-------------------------------------------------------------------------------\n*/\n\n\nmessage PushBranchRecordRequest {\n    // branch record to push\n    BranchRecord rec = 1;\n}\nmessage PushBranchRecordReply {\n    // success or not\n    ErrorProto error = 1;\n}\n\nmessage PushDataRequest {\n    string uri = 1;\n    // data container for the tensor\n    bytes raw_data = 2;\n    // total number of bytes\n    int64 nbytes = 3;\n    // data type of the contents\n    DataType data_type = 4;\n    // TODO: Remove need for schema hash\n    string schema_hash = 5;\n}\nmessage PushDataReply {\n    // success or not\n    ErrorProto error = 1;\n}\n\n\nmessage PushCommitRequest {\n    // (hex)digest hash of the commit record\n    string commit = 1;\n    // total size of bytes\n    int64 total_byte_size = 2;\n    // data\n    CommitRecord record = 3;\n}\nmessage PushCommitReply {\n    // success or not\n    ErrorProto error = 1;\n}\n\nmessage PushSchemaRequest {\n    SchemaRecord rec = 1;\n}\nmessage PushSchemaReply {\n    // SchemaRecord rec = 1;\n    ErrorProto error = 1;\n}\n\n\n\n/*\n-------------------------------------------------------------------------------\n| Fetch Finding outdated records\n-------------------------------------------------------------------------------\n*/\n\nmessage FindMissingCommitsRequest {\n    // list of commits existing on one side\n    repeated string commits = 1;\n    // branch to query\n    BranchRecord branch = 2;\n}\nmessage FindMissingCommitsReply {\n    // list of commits existing on one side but not the other in requested branch.\n    repeated string commits = 1;\n    // branch to query\n    BranchRecord branch = 2;\n    // success or not\n    ErrorProto error = 3;\n}\n\n\nmessage FindMissingHashRecordsRequest {\n    // commit hash to check hash records for\n    string commit = 1;\n    // all hashes existing on a side\n    bytes hashs = 2;\n    // total byte size\n    int64 total_byte_size = 3;\n}\nmessage FindMissingHashRecordsReply {\n    // commit hash specified\n    string commit = 1;\n    // all hashes existing on a side\n    bytes hashs = 2;\n    // number of hashes total\n    int64 total_byte_size = 3;\n    // success or not\n    ErrorProto error = 4;\n}\n\n\n\nmessage FindMissingSchemasRequest {\n    // commit hash specified\n    string commit = 1;\n    // schema records on that side\n    repeated string schema_digests = 2;\n}\nmessage FindMissingSchemasReply {\n    // commit hash specified\n    string commit = 1;\n    // schema records on that side\n    repeated string schema_digests = 2;\n    // success or not\n    ErrorProto error = 3;\n}\n"
  },
  {
    "path": "src/hangar/remote/hangar_service_pb2.py",
    "content": "# -*- coding: utf-8 -*-\n# Generated by the protocol buffer compiler.  DO NOT EDIT!\n# source: hangar_service.proto\n\nfrom google.protobuf.internal import enum_type_wrapper\nfrom google.protobuf import descriptor as _descriptor\nfrom google.protobuf import message as _message\nfrom google.protobuf import reflection as _reflection\nfrom google.protobuf import symbol_database as _symbol_database\n# @@protoc_insertion_point(imports)\n\n_sym_db = _symbol_database.Default()\n\n\n\n\nDESCRIPTOR = _descriptor.FileDescriptor(\n  name='hangar_service.proto',\n  package='hangar',\n  syntax='proto3',\n  serialized_options=b'H\\001',\n  serialized_pb=b'\\n\\x14hangar_service.proto\\x12\\x06hangar\\\".\\n\\x17PushBeginContextRequest\\x12\\x13\\n\\x0b\\x63lient_uuid\\x18\\x01 \\x01(\\t\\\"8\\n\\x15PushBeginContextReply\\x12\\x1f\\n\\x03\\x65rr\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\",\\n\\x15PushEndContextRequest\\x12\\x13\\n\\x0b\\x63lient_uuid\\x18\\x01 \\x01(\\t\\\"6\\n\\x13PushEndContextReply\\x12\\x1f\\n\\x03\\x65rr\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"+\\n\\nErrorProto\\x12\\x0c\\n\\x04\\x63ode\\x18\\x01 \\x01(\\x03\\x12\\x0f\\n\\x07message\\x18\\x02 \\x01(\\t\\\",\\n\\x0c\\x42ranchRecord\\x12\\x0c\\n\\x04name\\x18\\x01 \\x01(\\t\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x02 \\x01(\\t\\\"*\\n\\nHashRecord\\x12\\x0c\\n\\x04type\\x18\\x01 \\x01(\\t\\x12\\x0e\\n\\x06\\x64igest\\x18\\x02 \\x01(\\t\\\"9\\n\\x0c\\x43ommitRecord\\x12\\x0e\\n\\x06parent\\x18\\x01 \\x01(\\x0c\\x12\\x0b\\n\\x03ref\\x18\\x02 \\x01(\\x0c\\x12\\x0c\\n\\x04spec\\x18\\x03 \\x01(\\x0c\\\",\\n\\x0cSchemaRecord\\x12\\x0e\\n\\x06\\x64igest\\x18\\x01 \\x01(\\t\\x12\\x0c\\n\\x04\\x62lob\\x18\\x02 \\x01(\\x0c\\\"#\\n\\x11\\x44\\x61taOriginRequest\\x12\\x0e\\n\\x06\\x64igest\\x18\\x01 \\x01(\\t\\\"\\x90\\x02\\n\\x0f\\x44\\x61taOriginReply\\x12&\\n\\x08location\\x18\\x01 \\x01(\\x0e\\x32\\x14.hangar.DataLocation\\x12#\\n\\tdata_type\\x18\\x02 \\x01(\\x0e\\x32\\x10.hangar.DataType\\x12\\x0e\\n\\x06\\x64igest\\x18\\x03 \\x01(\\t\\x12\\x0b\\n\\x03uri\\x18\\x04 \\x01(\\t\\x12\\x13\\n\\x0b\\x63ompression\\x18\\x05 \\x01(\\x08\\x12\\x46\\n\\x10\\x63ompression_opts\\x18\\x06 \\x03(\\x0b\\x32,.hangar.DataOriginReply.CompressionOptsEntry\\x1a\\x36\\n\\x14\\x43ompressionOptsEntry\\x12\\x0b\\n\\x03key\\x18\\x01 \\x01(\\t\\x12\\r\\n\\x05value\\x18\\x02 \\x01(\\t:\\x02\\x38\\x01\\\"p\\n\\x19PushFindDataOriginRequest\\x12#\\n\\tdata_type\\x18\\x01 \\x01(\\x0e\\x32\\x10.hangar.DataType\\x12\\x0e\\n\\x06\\x64igest\\x18\\x02 \\x01(\\t\\x12\\x1e\\n\\x16\\x63ompression_is_desired\\x18\\x03 \\x01(\\x08\\\"\\x9d\\x02\\n\\x17PushFindDataOriginReply\\x12\\x0e\\n\\x06\\x64igest\\x18\\x01 \\x01(\\t\\x12&\\n\\x08location\\x18\\x02 \\x01(\\x0e\\x32\\x14.hangar.DataLocation\\x12\\x0b\\n\\x03uri\\x18\\x03 \\x01(\\t\\x12\\x1c\\n\\x14\\x63ompression_expected\\x18\\x05 \\x01(\\x08\\x12_\\n\\x19\\x63ompression_opts_expected\\x18\\x06 \\x03(\\x0b\\x32<.hangar.PushFindDataOriginReply.CompressionOptsExpectedEntry\\x1a>\\n\\x1c\\x43ompressionOptsExpectedEntry\\x12\\x0b\\n\\x03key\\x18\\x01 \\x01(\\t\\x12\\r\\n\\x05value\\x18\\x02 \\x01(\\t:\\x02\\x38\\x01\\\"\\r\\n\\x0bPingRequest\\\"\\x1b\\n\\tPingReply\\x12\\x0e\\n\\x06result\\x18\\x01 \\x01(\\t\\\"\\x18\\n\\x16GetClientConfigRequest\\\"\\xa2\\x01\\n\\x14GetClientConfigReply\\x12\\x38\\n\\x06\\x63onfig\\x18\\x01 \\x03(\\x0b\\x32(.hangar.GetClientConfigReply.ConfigEntry\\x12!\\n\\x05\\x65rror\\x18\\x02 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\x1a-\\n\\x0b\\x43onfigEntry\\x12\\x0b\\n\\x03key\\x18\\x01 \\x01(\\t\\x12\\r\\n\\x05value\\x18\\x02 \\x01(\\t:\\x02\\x38\\x01\\\"=\\n\\x18\\x46\\x65tchBranchRecordRequest\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.BranchRecord\\\"^\\n\\x16\\x46\\x65tchBranchRecordReply\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.BranchRecord\\x12!\\n\\x05\\x65rror\\x18\\x02 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"\\x1f\\n\\x10\\x46\\x65tchDataRequest\\x12\\x0b\\n\\x03uri\\x18\\x01 \\x01(\\t\\\"b\\n\\x0e\\x46\\x65tchDataReply\\x12\\x0b\\n\\x03uri\\x18\\x01 \\x01(\\t\\x12\\x10\\n\\x08raw_data\\x18\\x02 \\x01(\\x0c\\x12\\x0e\\n\\x06nbytes\\x18\\x03 \\x01(\\x03\\x12!\\n\\x05\\x65rror\\x18\\x04 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"$\\n\\x12\\x46\\x65tchCommitRequest\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\\"\\x84\\x01\\n\\x10\\x46\\x65tchCommitReply\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\x17\\n\\x0ftotal_byte_size\\x18\\x02 \\x01(\\x03\\x12$\\n\\x06record\\x18\\x03 \\x01(\\x0b\\x32\\x14.hangar.CommitRecord\\x12!\\n\\x05\\x65rror\\x18\\x04 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"7\\n\\x12\\x46\\x65tchSchemaRequest\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.SchemaRecord\\\"X\\n\\x10\\x46\\x65tchSchemaReply\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.SchemaRecord\\x12!\\n\\x05\\x65rror\\x18\\x02 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"<\\n\\x17PushBranchRecordRequest\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.BranchRecord\\\":\\n\\x15PushBranchRecordReply\\x12!\\n\\x05\\x65rror\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"z\\n\\x0fPushDataRequest\\x12\\x0b\\n\\x03uri\\x18\\x01 \\x01(\\t\\x12\\x10\\n\\x08raw_data\\x18\\x02 \\x01(\\x0c\\x12\\x0e\\n\\x06nbytes\\x18\\x03 \\x01(\\x03\\x12#\\n\\tdata_type\\x18\\x04 \\x01(\\x0e\\x32\\x10.hangar.DataType\\x12\\x13\\n\\x0bschema_hash\\x18\\x05 \\x01(\\t\\\"2\\n\\rPushDataReply\\x12!\\n\\x05\\x65rror\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"b\\n\\x11PushCommitRequest\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\x17\\n\\x0ftotal_byte_size\\x18\\x02 \\x01(\\x03\\x12$\\n\\x06record\\x18\\x03 \\x01(\\x0b\\x32\\x14.hangar.CommitRecord\\\"4\\n\\x0fPushCommitReply\\x12!\\n\\x05\\x65rror\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"6\\n\\x11PushSchemaRequest\\x12!\\n\\x03rec\\x18\\x01 \\x01(\\x0b\\x32\\x14.hangar.SchemaRecord\\\"4\\n\\x0fPushSchemaReply\\x12!\\n\\x05\\x65rror\\x18\\x01 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"R\\n\\x19\\x46indMissingCommitsRequest\\x12\\x0f\\n\\x07\\x63ommits\\x18\\x01 \\x03(\\t\\x12$\\n\\x06\\x62ranch\\x18\\x02 \\x01(\\x0b\\x32\\x14.hangar.BranchRecord\\\"s\\n\\x17\\x46indMissingCommitsReply\\x12\\x0f\\n\\x07\\x63ommits\\x18\\x01 \\x03(\\t\\x12$\\n\\x06\\x62ranch\\x18\\x02 \\x01(\\x0b\\x32\\x14.hangar.BranchRecord\\x12!\\n\\x05\\x65rror\\x18\\x03 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"W\\n\\x1d\\x46indMissingHashRecordsRequest\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\r\\n\\x05hashs\\x18\\x02 \\x01(\\x0c\\x12\\x17\\n\\x0ftotal_byte_size\\x18\\x03 \\x01(\\x03\\\"x\\n\\x1b\\x46indMissingHashRecordsReply\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\r\\n\\x05hashs\\x18\\x02 \\x01(\\x0c\\x12\\x17\\n\\x0ftotal_byte_size\\x18\\x03 \\x01(\\x03\\x12!\\n\\x05\\x65rror\\x18\\x04 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto\\\"C\\n\\x19\\x46indMissingSchemasRequest\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\x16\\n\\x0eschema_digests\\x18\\x02 \\x03(\\t\\\"d\\n\\x17\\x46indMissingSchemasReply\\x12\\x0e\\n\\x06\\x63ommit\\x18\\x01 \\x01(\\t\\x12\\x16\\n\\x0eschema_digests\\x18\\x02 \\x03(\\t\\x12!\\n\\x05\\x65rror\\x18\\x03 \\x01(\\x0b\\x32\\x12.hangar.ErrorProto*F\\n\\x0c\\x44\\x61taLocation\\x12\\x11\\n\\rREMOTE_SERVER\\x10\\x00\\x12\\t\\n\\x05MINIO\\x10\\x01\\x12\\x06\\n\\x02S3\\x10\\x02\\x12\\x07\\n\\x03GCS\\x10\\x03\\x12\\x07\\n\\x03\\x41\\x42S\\x10\\x04*8\\n\\x08\\x44\\x61taType\\x12\\x0c\\n\\x08NP_ARRAY\\x10\\x00\\x12\\n\\n\\x06SCHEMA\\x10\\x01\\x12\\x07\\n\\x03STR\\x10\\x02\\x12\\t\\n\\x05\\x42YTES\\x10\\x03\\x32\\x9a\\r\\n\\rHangarService\\x12\\x30\\n\\x04PING\\x12\\x13.hangar.PingRequest\\x1a\\x11.hangar.PingReply\\\"\\x00\\x12Q\\n\\x0fGetClientConfig\\x12\\x1e.hangar.GetClientConfigRequest\\x1a\\x1c.hangar.GetClientConfigReply\\\"\\x00\\x12W\\n\\x11\\x46\\x65tchBranchRecord\\x12 .hangar.FetchBranchRecordRequest\\x1a\\x1e.hangar.FetchBranchRecordReply\\\"\\x00\\x12\\x41\\n\\tFetchData\\x12\\x18.hangar.FetchDataRequest\\x1a\\x16.hangar.FetchDataReply\\\"\\x00\\x30\\x01\\x12G\\n\\x0b\\x46\\x65tchCommit\\x12\\x1a.hangar.FetchCommitRequest\\x1a\\x18.hangar.FetchCommitReply\\\"\\x00\\x30\\x01\\x12\\x45\\n\\x0b\\x46\\x65tchSchema\\x12\\x1a.hangar.FetchSchemaRequest\\x1a\\x18.hangar.FetchSchemaReply\\\"\\x00\\x12T\\n\\x10PushBranchRecord\\x12\\x1f.hangar.PushBranchRecordRequest\\x1a\\x1d.hangar.PushBranchRecordReply\\\"\\x00\\x12>\\n\\x08PushData\\x12\\x17.hangar.PushDataRequest\\x1a\\x15.hangar.PushDataReply\\\"\\x00(\\x01\\x12\\x44\\n\\nPushCommit\\x12\\x19.hangar.PushCommitRequest\\x1a\\x17.hangar.PushCommitReply\\\"\\x00(\\x01\\x12\\x42\\n\\nPushSchema\\x12\\x19.hangar.PushSchemaRequest\\x1a\\x17.hangar.PushSchemaReply\\\"\\x00\\x12_\\n\\x17\\x46\\x65tchFindMissingCommits\\x12!.hangar.FindMissingCommitsRequest\\x1a\\x1f.hangar.FindMissingCommitsReply\\\"\\x00\\x12o\\n\\x1b\\x46\\x65tchFindMissingHashRecords\\x12%.hangar.FindMissingHashRecordsRequest\\x1a#.hangar.FindMissingHashRecordsReply\\\"\\x00(\\x01\\x30\\x01\\x12_\\n\\x17\\x46\\x65tchFindMissingSchemas\\x12!.hangar.FindMissingSchemasRequest\\x1a\\x1f.hangar.FindMissingSchemasReply\\\"\\x00\\x12^\\n\\x16PushFindMissingCommits\\x12!.hangar.FindMissingCommitsRequest\\x1a\\x1f.hangar.FindMissingCommitsReply\\\"\\x00\\x12n\\n\\x1aPushFindMissingHashRecords\\x12%.hangar.FindMissingHashRecordsRequest\\x1a#.hangar.FindMissingHashRecordsReply\\\"\\x00(\\x01\\x30\\x01\\x12^\\n\\x16PushFindMissingSchemas\\x12!.hangar.FindMissingSchemasRequest\\x1a\\x1f.hangar.FindMissingSchemasReply\\\"\\x00\\x12O\\n\\x13\\x46\\x65tchFindDataOrigin\\x12\\x19.hangar.DataOriginRequest\\x1a\\x17.hangar.DataOriginReply\\\"\\x00(\\x01\\x30\\x01\\x12^\\n\\x12PushFindDataOrigin\\x12!.hangar.PushFindDataOriginRequest\\x1a\\x1f.hangar.PushFindDataOriginReply\\\"\\x00(\\x01\\x30\\x01\\x12T\\n\\x10PushBeginContext\\x12\\x1f.hangar.PushBeginContextRequest\\x1a\\x1d.hangar.PushBeginContextReply\\\"\\x00\\x12N\\n\\x0ePushEndContext\\x12\\x1d.hangar.PushEndContextRequest\\x1a\\x1b.hangar.PushEndContextReply\\\"\\x00\\x42\\x02H\\x01\\x62\\x06proto3'\n)\n\n_DATALOCATION = _descriptor.EnumDescriptor(\n  name='DataLocation',\n  full_name='hangar.DataLocation',\n  filename=None,\n  file=DESCRIPTOR,\n  values=[\n    _descriptor.EnumValueDescriptor(\n      name='REMOTE_SERVER', index=0, number=0,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='MINIO', index=1, number=1,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='S3', index=2, number=2,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='GCS', index=3, number=3,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='ABS', index=4, number=4,\n      serialized_options=None,\n      type=None),\n  ],\n  containing_type=None,\n  serialized_options=None,\n  serialized_start=3186,\n  serialized_end=3256,\n)\n_sym_db.RegisterEnumDescriptor(_DATALOCATION)\n\nDataLocation = enum_type_wrapper.EnumTypeWrapper(_DATALOCATION)\n_DATATYPE = _descriptor.EnumDescriptor(\n  name='DataType',\n  full_name='hangar.DataType',\n  filename=None,\n  file=DESCRIPTOR,\n  values=[\n    _descriptor.EnumValueDescriptor(\n      name='NP_ARRAY', index=0, number=0,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='SCHEMA', index=1, number=1,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='STR', index=2, number=2,\n      serialized_options=None,\n      type=None),\n    _descriptor.EnumValueDescriptor(\n      name='BYTES', index=3, number=3,\n      serialized_options=None,\n      type=None),\n  ],\n  containing_type=None,\n  serialized_options=None,\n  serialized_start=3258,\n  serialized_end=3314,\n)\n_sym_db.RegisterEnumDescriptor(_DATATYPE)\n\nDataType = enum_type_wrapper.EnumTypeWrapper(_DATATYPE)\nREMOTE_SERVER = 0\nMINIO = 1\nS3 = 2\nGCS = 3\nABS = 4\nNP_ARRAY = 0\nSCHEMA = 1\nSTR = 2\nBYTES = 3\n\n\n\n_PUSHBEGINCONTEXTREQUEST = _descriptor.Descriptor(\n  name='PushBeginContextRequest',\n  full_name='hangar.PushBeginContextRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='client_uuid', full_name='hangar.PushBeginContextRequest.client_uuid', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=32,\n  serialized_end=78,\n)\n\n\n_PUSHBEGINCONTEXTREPLY = _descriptor.Descriptor(\n  name='PushBeginContextReply',\n  full_name='hangar.PushBeginContextReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='err', full_name='hangar.PushBeginContextReply.err', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=80,\n  serialized_end=136,\n)\n\n\n_PUSHENDCONTEXTREQUEST = _descriptor.Descriptor(\n  name='PushEndContextRequest',\n  full_name='hangar.PushEndContextRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='client_uuid', full_name='hangar.PushEndContextRequest.client_uuid', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=138,\n  serialized_end=182,\n)\n\n\n_PUSHENDCONTEXTREPLY = _descriptor.Descriptor(\n  name='PushEndContextReply',\n  full_name='hangar.PushEndContextReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='err', full_name='hangar.PushEndContextReply.err', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=184,\n  serialized_end=238,\n)\n\n\n_ERRORPROTO = _descriptor.Descriptor(\n  name='ErrorProto',\n  full_name='hangar.ErrorProto',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='code', full_name='hangar.ErrorProto.code', index=0,\n      number=1, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='message', full_name='hangar.ErrorProto.message', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=240,\n  serialized_end=283,\n)\n\n\n_BRANCHRECORD = _descriptor.Descriptor(\n  name='BranchRecord',\n  full_name='hangar.BranchRecord',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='name', full_name='hangar.BranchRecord.name', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.BranchRecord.commit', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=285,\n  serialized_end=329,\n)\n\n\n_HASHRECORD = _descriptor.Descriptor(\n  name='HashRecord',\n  full_name='hangar.HashRecord',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='type', full_name='hangar.HashRecord.type', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.HashRecord.digest', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=331,\n  serialized_end=373,\n)\n\n\n_COMMITRECORD = _descriptor.Descriptor(\n  name='CommitRecord',\n  full_name='hangar.CommitRecord',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='parent', full_name='hangar.CommitRecord.parent', index=0,\n      number=1, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='ref', full_name='hangar.CommitRecord.ref', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='spec', full_name='hangar.CommitRecord.spec', index=2,\n      number=3, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=375,\n  serialized_end=432,\n)\n\n\n_SCHEMARECORD = _descriptor.Descriptor(\n  name='SchemaRecord',\n  full_name='hangar.SchemaRecord',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.SchemaRecord.digest', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='blob', full_name='hangar.SchemaRecord.blob', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=434,\n  serialized_end=478,\n)\n\n\n_DATAORIGINREQUEST = _descriptor.Descriptor(\n  name='DataOriginRequest',\n  full_name='hangar.DataOriginRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.DataOriginRequest.digest', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=480,\n  serialized_end=515,\n)\n\n\n_DATAORIGINREPLY_COMPRESSIONOPTSENTRY = _descriptor.Descriptor(\n  name='CompressionOptsEntry',\n  full_name='hangar.DataOriginReply.CompressionOptsEntry',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='key', full_name='hangar.DataOriginReply.CompressionOptsEntry.key', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='value', full_name='hangar.DataOriginReply.CompressionOptsEntry.value', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=b'8\\001',\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=736,\n  serialized_end=790,\n)\n\n_DATAORIGINREPLY = _descriptor.Descriptor(\n  name='DataOriginReply',\n  full_name='hangar.DataOriginReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='location', full_name='hangar.DataOriginReply.location', index=0,\n      number=1, type=14, cpp_type=8, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='data_type', full_name='hangar.DataOriginReply.data_type', index=1,\n      number=2, type=14, cpp_type=8, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.DataOriginReply.digest', index=2,\n      number=3, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='uri', full_name='hangar.DataOriginReply.uri', index=3,\n      number=4, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='compression', full_name='hangar.DataOriginReply.compression', index=4,\n      number=5, type=8, cpp_type=7, label=1,\n      has_default_value=False, default_value=False,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='compression_opts', full_name='hangar.DataOriginReply.compression_opts', index=5,\n      number=6, type=11, cpp_type=10, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[_DATAORIGINREPLY_COMPRESSIONOPTSENTRY, ],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=518,\n  serialized_end=790,\n)\n\n\n_PUSHFINDDATAORIGINREQUEST = _descriptor.Descriptor(\n  name='PushFindDataOriginRequest',\n  full_name='hangar.PushFindDataOriginRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='data_type', full_name='hangar.PushFindDataOriginRequest.data_type', index=0,\n      number=1, type=14, cpp_type=8, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.PushFindDataOriginRequest.digest', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='compression_is_desired', full_name='hangar.PushFindDataOriginRequest.compression_is_desired', index=2,\n      number=3, type=8, cpp_type=7, label=1,\n      has_default_value=False, default_value=False,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=792,\n  serialized_end=904,\n)\n\n\n_PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY = _descriptor.Descriptor(\n  name='CompressionOptsExpectedEntry',\n  full_name='hangar.PushFindDataOriginReply.CompressionOptsExpectedEntry',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='key', full_name='hangar.PushFindDataOriginReply.CompressionOptsExpectedEntry.key', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='value', full_name='hangar.PushFindDataOriginReply.CompressionOptsExpectedEntry.value', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=b'8\\001',\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1130,\n  serialized_end=1192,\n)\n\n_PUSHFINDDATAORIGINREPLY = _descriptor.Descriptor(\n  name='PushFindDataOriginReply',\n  full_name='hangar.PushFindDataOriginReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='digest', full_name='hangar.PushFindDataOriginReply.digest', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='location', full_name='hangar.PushFindDataOriginReply.location', index=1,\n      number=2, type=14, cpp_type=8, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='uri', full_name='hangar.PushFindDataOriginReply.uri', index=2,\n      number=3, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='compression_expected', full_name='hangar.PushFindDataOriginReply.compression_expected', index=3,\n      number=5, type=8, cpp_type=7, label=1,\n      has_default_value=False, default_value=False,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='compression_opts_expected', full_name='hangar.PushFindDataOriginReply.compression_opts_expected', index=4,\n      number=6, type=11, cpp_type=10, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[_PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY, ],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=907,\n  serialized_end=1192,\n)\n\n\n_PINGREQUEST = _descriptor.Descriptor(\n  name='PingRequest',\n  full_name='hangar.PingRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1194,\n  serialized_end=1207,\n)\n\n\n_PINGREPLY = _descriptor.Descriptor(\n  name='PingReply',\n  full_name='hangar.PingReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='result', full_name='hangar.PingReply.result', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1209,\n  serialized_end=1236,\n)\n\n\n_GETCLIENTCONFIGREQUEST = _descriptor.Descriptor(\n  name='GetClientConfigRequest',\n  full_name='hangar.GetClientConfigRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1238,\n  serialized_end=1262,\n)\n\n\n_GETCLIENTCONFIGREPLY_CONFIGENTRY = _descriptor.Descriptor(\n  name='ConfigEntry',\n  full_name='hangar.GetClientConfigReply.ConfigEntry',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='key', full_name='hangar.GetClientConfigReply.ConfigEntry.key', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='value', full_name='hangar.GetClientConfigReply.ConfigEntry.value', index=1,\n      number=2, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=b'8\\001',\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1382,\n  serialized_end=1427,\n)\n\n_GETCLIENTCONFIGREPLY = _descriptor.Descriptor(\n  name='GetClientConfigReply',\n  full_name='hangar.GetClientConfigReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='config', full_name='hangar.GetClientConfigReply.config', index=0,\n      number=1, type=11, cpp_type=10, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.GetClientConfigReply.error', index=1,\n      number=2, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[_GETCLIENTCONFIGREPLY_CONFIGENTRY, ],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1265,\n  serialized_end=1427,\n)\n\n\n_FETCHBRANCHRECORDREQUEST = _descriptor.Descriptor(\n  name='FetchBranchRecordRequest',\n  full_name='hangar.FetchBranchRecordRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.FetchBranchRecordRequest.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1429,\n  serialized_end=1490,\n)\n\n\n_FETCHBRANCHRECORDREPLY = _descriptor.Descriptor(\n  name='FetchBranchRecordReply',\n  full_name='hangar.FetchBranchRecordReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.FetchBranchRecordReply.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FetchBranchRecordReply.error', index=1,\n      number=2, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1492,\n  serialized_end=1586,\n)\n\n\n_FETCHDATAREQUEST = _descriptor.Descriptor(\n  name='FetchDataRequest',\n  full_name='hangar.FetchDataRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='uri', full_name='hangar.FetchDataRequest.uri', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1588,\n  serialized_end=1619,\n)\n\n\n_FETCHDATAREPLY = _descriptor.Descriptor(\n  name='FetchDataReply',\n  full_name='hangar.FetchDataReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='uri', full_name='hangar.FetchDataReply.uri', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='raw_data', full_name='hangar.FetchDataReply.raw_data', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='nbytes', full_name='hangar.FetchDataReply.nbytes', index=2,\n      number=3, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FetchDataReply.error', index=3,\n      number=4, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1621,\n  serialized_end=1719,\n)\n\n\n_FETCHCOMMITREQUEST = _descriptor.Descriptor(\n  name='FetchCommitRequest',\n  full_name='hangar.FetchCommitRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FetchCommitRequest.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1721,\n  serialized_end=1757,\n)\n\n\n_FETCHCOMMITREPLY = _descriptor.Descriptor(\n  name='FetchCommitReply',\n  full_name='hangar.FetchCommitReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FetchCommitReply.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='total_byte_size', full_name='hangar.FetchCommitReply.total_byte_size', index=1,\n      number=2, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='record', full_name='hangar.FetchCommitReply.record', index=2,\n      number=3, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FetchCommitReply.error', index=3,\n      number=4, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1760,\n  serialized_end=1892,\n)\n\n\n_FETCHSCHEMAREQUEST = _descriptor.Descriptor(\n  name='FetchSchemaRequest',\n  full_name='hangar.FetchSchemaRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.FetchSchemaRequest.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1894,\n  serialized_end=1949,\n)\n\n\n_FETCHSCHEMAREPLY = _descriptor.Descriptor(\n  name='FetchSchemaReply',\n  full_name='hangar.FetchSchemaReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.FetchSchemaReply.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FetchSchemaReply.error', index=1,\n      number=2, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=1951,\n  serialized_end=2039,\n)\n\n\n_PUSHBRANCHRECORDREQUEST = _descriptor.Descriptor(\n  name='PushBranchRecordRequest',\n  full_name='hangar.PushBranchRecordRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.PushBranchRecordRequest.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2041,\n  serialized_end=2101,\n)\n\n\n_PUSHBRANCHRECORDREPLY = _descriptor.Descriptor(\n  name='PushBranchRecordReply',\n  full_name='hangar.PushBranchRecordReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.PushBranchRecordReply.error', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2103,\n  serialized_end=2161,\n)\n\n\n_PUSHDATAREQUEST = _descriptor.Descriptor(\n  name='PushDataRequest',\n  full_name='hangar.PushDataRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='uri', full_name='hangar.PushDataRequest.uri', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='raw_data', full_name='hangar.PushDataRequest.raw_data', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='nbytes', full_name='hangar.PushDataRequest.nbytes', index=2,\n      number=3, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='data_type', full_name='hangar.PushDataRequest.data_type', index=3,\n      number=4, type=14, cpp_type=8, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='schema_hash', full_name='hangar.PushDataRequest.schema_hash', index=4,\n      number=5, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2163,\n  serialized_end=2285,\n)\n\n\n_PUSHDATAREPLY = _descriptor.Descriptor(\n  name='PushDataReply',\n  full_name='hangar.PushDataReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.PushDataReply.error', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2287,\n  serialized_end=2337,\n)\n\n\n_PUSHCOMMITREQUEST = _descriptor.Descriptor(\n  name='PushCommitRequest',\n  full_name='hangar.PushCommitRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.PushCommitRequest.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='total_byte_size', full_name='hangar.PushCommitRequest.total_byte_size', index=1,\n      number=2, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='record', full_name='hangar.PushCommitRequest.record', index=2,\n      number=3, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2339,\n  serialized_end=2437,\n)\n\n\n_PUSHCOMMITREPLY = _descriptor.Descriptor(\n  name='PushCommitReply',\n  full_name='hangar.PushCommitReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.PushCommitReply.error', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2439,\n  serialized_end=2491,\n)\n\n\n_PUSHSCHEMAREQUEST = _descriptor.Descriptor(\n  name='PushSchemaRequest',\n  full_name='hangar.PushSchemaRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='rec', full_name='hangar.PushSchemaRequest.rec', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2493,\n  serialized_end=2547,\n)\n\n\n_PUSHSCHEMAREPLY = _descriptor.Descriptor(\n  name='PushSchemaReply',\n  full_name='hangar.PushSchemaReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.PushSchemaReply.error', index=0,\n      number=1, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2549,\n  serialized_end=2601,\n)\n\n\n_FINDMISSINGCOMMITSREQUEST = _descriptor.Descriptor(\n  name='FindMissingCommitsRequest',\n  full_name='hangar.FindMissingCommitsRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commits', full_name='hangar.FindMissingCommitsRequest.commits', index=0,\n      number=1, type=9, cpp_type=9, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='branch', full_name='hangar.FindMissingCommitsRequest.branch', index=1,\n      number=2, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2603,\n  serialized_end=2685,\n)\n\n\n_FINDMISSINGCOMMITSREPLY = _descriptor.Descriptor(\n  name='FindMissingCommitsReply',\n  full_name='hangar.FindMissingCommitsReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commits', full_name='hangar.FindMissingCommitsReply.commits', index=0,\n      number=1, type=9, cpp_type=9, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='branch', full_name='hangar.FindMissingCommitsReply.branch', index=1,\n      number=2, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FindMissingCommitsReply.error', index=2,\n      number=3, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2687,\n  serialized_end=2802,\n)\n\n\n_FINDMISSINGHASHRECORDSREQUEST = _descriptor.Descriptor(\n  name='FindMissingHashRecordsRequest',\n  full_name='hangar.FindMissingHashRecordsRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FindMissingHashRecordsRequest.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='hashs', full_name='hangar.FindMissingHashRecordsRequest.hashs', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='total_byte_size', full_name='hangar.FindMissingHashRecordsRequest.total_byte_size', index=2,\n      number=3, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2804,\n  serialized_end=2891,\n)\n\n\n_FINDMISSINGHASHRECORDSREPLY = _descriptor.Descriptor(\n  name='FindMissingHashRecordsReply',\n  full_name='hangar.FindMissingHashRecordsReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FindMissingHashRecordsReply.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='hashs', full_name='hangar.FindMissingHashRecordsReply.hashs', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\",\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='total_byte_size', full_name='hangar.FindMissingHashRecordsReply.total_byte_size', index=2,\n      number=3, type=3, cpp_type=2, label=1,\n      has_default_value=False, default_value=0,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FindMissingHashRecordsReply.error', index=3,\n      number=4, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=2893,\n  serialized_end=3013,\n)\n\n\n_FINDMISSINGSCHEMASREQUEST = _descriptor.Descriptor(\n  name='FindMissingSchemasRequest',\n  full_name='hangar.FindMissingSchemasRequest',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FindMissingSchemasRequest.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='schema_digests', full_name='hangar.FindMissingSchemasRequest.schema_digests', index=1,\n      number=2, type=9, cpp_type=9, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=3015,\n  serialized_end=3082,\n)\n\n\n_FINDMISSINGSCHEMASREPLY = _descriptor.Descriptor(\n  name='FindMissingSchemasReply',\n  full_name='hangar.FindMissingSchemasReply',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='commit', full_name='hangar.FindMissingSchemasReply.commit', index=0,\n      number=1, type=9, cpp_type=9, label=1,\n      has_default_value=False, default_value=b\"\".decode('utf-8'),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='schema_digests', full_name='hangar.FindMissingSchemasReply.schema_digests', index=1,\n      number=2, type=9, cpp_type=9, label=3,\n      has_default_value=False, default_value=[],\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='error', full_name='hangar.FindMissingSchemasReply.error', index=2,\n      number=3, type=11, cpp_type=10, label=1,\n      has_default_value=False, default_value=None,\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=3084,\n  serialized_end=3184,\n)\n\n_PUSHBEGINCONTEXTREPLY.fields_by_name['err'].message_type = _ERRORPROTO\n_PUSHENDCONTEXTREPLY.fields_by_name['err'].message_type = _ERRORPROTO\n_DATAORIGINREPLY_COMPRESSIONOPTSENTRY.containing_type = _DATAORIGINREPLY\n_DATAORIGINREPLY.fields_by_name['location'].enum_type = _DATALOCATION\n_DATAORIGINREPLY.fields_by_name['data_type'].enum_type = _DATATYPE\n_DATAORIGINREPLY.fields_by_name['compression_opts'].message_type = _DATAORIGINREPLY_COMPRESSIONOPTSENTRY\n_PUSHFINDDATAORIGINREQUEST.fields_by_name['data_type'].enum_type = _DATATYPE\n_PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY.containing_type = _PUSHFINDDATAORIGINREPLY\n_PUSHFINDDATAORIGINREPLY.fields_by_name['location'].enum_type = _DATALOCATION\n_PUSHFINDDATAORIGINREPLY.fields_by_name['compression_opts_expected'].message_type = _PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY\n_GETCLIENTCONFIGREPLY_CONFIGENTRY.containing_type = _GETCLIENTCONFIGREPLY\n_GETCLIENTCONFIGREPLY.fields_by_name['config'].message_type = _GETCLIENTCONFIGREPLY_CONFIGENTRY\n_GETCLIENTCONFIGREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FETCHBRANCHRECORDREQUEST.fields_by_name['rec'].message_type = _BRANCHRECORD\n_FETCHBRANCHRECORDREPLY.fields_by_name['rec'].message_type = _BRANCHRECORD\n_FETCHBRANCHRECORDREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FETCHDATAREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FETCHCOMMITREPLY.fields_by_name['record'].message_type = _COMMITRECORD\n_FETCHCOMMITREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FETCHSCHEMAREQUEST.fields_by_name['rec'].message_type = _SCHEMARECORD\n_FETCHSCHEMAREPLY.fields_by_name['rec'].message_type = _SCHEMARECORD\n_FETCHSCHEMAREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_PUSHBRANCHRECORDREQUEST.fields_by_name['rec'].message_type = _BRANCHRECORD\n_PUSHBRANCHRECORDREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_PUSHDATAREQUEST.fields_by_name['data_type'].enum_type = _DATATYPE\n_PUSHDATAREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_PUSHCOMMITREQUEST.fields_by_name['record'].message_type = _COMMITRECORD\n_PUSHCOMMITREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_PUSHSCHEMAREQUEST.fields_by_name['rec'].message_type = _SCHEMARECORD\n_PUSHSCHEMAREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FINDMISSINGCOMMITSREQUEST.fields_by_name['branch'].message_type = _BRANCHRECORD\n_FINDMISSINGCOMMITSREPLY.fields_by_name['branch'].message_type = _BRANCHRECORD\n_FINDMISSINGCOMMITSREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FINDMISSINGHASHRECORDSREPLY.fields_by_name['error'].message_type = _ERRORPROTO\n_FINDMISSINGSCHEMASREPLY.fields_by_name['error'].message_type = _ERRORPROTO\nDESCRIPTOR.message_types_by_name['PushBeginContextRequest'] = _PUSHBEGINCONTEXTREQUEST\nDESCRIPTOR.message_types_by_name['PushBeginContextReply'] = _PUSHBEGINCONTEXTREPLY\nDESCRIPTOR.message_types_by_name['PushEndContextRequest'] = _PUSHENDCONTEXTREQUEST\nDESCRIPTOR.message_types_by_name['PushEndContextReply'] = _PUSHENDCONTEXTREPLY\nDESCRIPTOR.message_types_by_name['ErrorProto'] = _ERRORPROTO\nDESCRIPTOR.message_types_by_name['BranchRecord'] = _BRANCHRECORD\nDESCRIPTOR.message_types_by_name['HashRecord'] = _HASHRECORD\nDESCRIPTOR.message_types_by_name['CommitRecord'] = _COMMITRECORD\nDESCRIPTOR.message_types_by_name['SchemaRecord'] = _SCHEMARECORD\nDESCRIPTOR.message_types_by_name['DataOriginRequest'] = _DATAORIGINREQUEST\nDESCRIPTOR.message_types_by_name['DataOriginReply'] = _DATAORIGINREPLY\nDESCRIPTOR.message_types_by_name['PushFindDataOriginRequest'] = _PUSHFINDDATAORIGINREQUEST\nDESCRIPTOR.message_types_by_name['PushFindDataOriginReply'] = _PUSHFINDDATAORIGINREPLY\nDESCRIPTOR.message_types_by_name['PingRequest'] = _PINGREQUEST\nDESCRIPTOR.message_types_by_name['PingReply'] = _PINGREPLY\nDESCRIPTOR.message_types_by_name['GetClientConfigRequest'] = _GETCLIENTCONFIGREQUEST\nDESCRIPTOR.message_types_by_name['GetClientConfigReply'] = _GETCLIENTCONFIGREPLY\nDESCRIPTOR.message_types_by_name['FetchBranchRecordRequest'] = _FETCHBRANCHRECORDREQUEST\nDESCRIPTOR.message_types_by_name['FetchBranchRecordReply'] = _FETCHBRANCHRECORDREPLY\nDESCRIPTOR.message_types_by_name['FetchDataRequest'] = _FETCHDATAREQUEST\nDESCRIPTOR.message_types_by_name['FetchDataReply'] = _FETCHDATAREPLY\nDESCRIPTOR.message_types_by_name['FetchCommitRequest'] = _FETCHCOMMITREQUEST\nDESCRIPTOR.message_types_by_name['FetchCommitReply'] = _FETCHCOMMITREPLY\nDESCRIPTOR.message_types_by_name['FetchSchemaRequest'] = _FETCHSCHEMAREQUEST\nDESCRIPTOR.message_types_by_name['FetchSchemaReply'] = _FETCHSCHEMAREPLY\nDESCRIPTOR.message_types_by_name['PushBranchRecordRequest'] = _PUSHBRANCHRECORDREQUEST\nDESCRIPTOR.message_types_by_name['PushBranchRecordReply'] = _PUSHBRANCHRECORDREPLY\nDESCRIPTOR.message_types_by_name['PushDataRequest'] = _PUSHDATAREQUEST\nDESCRIPTOR.message_types_by_name['PushDataReply'] = _PUSHDATAREPLY\nDESCRIPTOR.message_types_by_name['PushCommitRequest'] = _PUSHCOMMITREQUEST\nDESCRIPTOR.message_types_by_name['PushCommitReply'] = _PUSHCOMMITREPLY\nDESCRIPTOR.message_types_by_name['PushSchemaRequest'] = _PUSHSCHEMAREQUEST\nDESCRIPTOR.message_types_by_name['PushSchemaReply'] = _PUSHSCHEMAREPLY\nDESCRIPTOR.message_types_by_name['FindMissingCommitsRequest'] = _FINDMISSINGCOMMITSREQUEST\nDESCRIPTOR.message_types_by_name['FindMissingCommitsReply'] = _FINDMISSINGCOMMITSREPLY\nDESCRIPTOR.message_types_by_name['FindMissingHashRecordsRequest'] = _FINDMISSINGHASHRECORDSREQUEST\nDESCRIPTOR.message_types_by_name['FindMissingHashRecordsReply'] = _FINDMISSINGHASHRECORDSREPLY\nDESCRIPTOR.message_types_by_name['FindMissingSchemasRequest'] = _FINDMISSINGSCHEMASREQUEST\nDESCRIPTOR.message_types_by_name['FindMissingSchemasReply'] = _FINDMISSINGSCHEMASREPLY\nDESCRIPTOR.enum_types_by_name['DataLocation'] = _DATALOCATION\nDESCRIPTOR.enum_types_by_name['DataType'] = _DATATYPE\n_sym_db.RegisterFileDescriptor(DESCRIPTOR)\n\nPushBeginContextRequest = _reflection.GeneratedProtocolMessageType('PushBeginContextRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHBEGINCONTEXTREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushBeginContextRequest)\n  })\n_sym_db.RegisterMessage(PushBeginContextRequest)\n\nPushBeginContextReply = _reflection.GeneratedProtocolMessageType('PushBeginContextReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHBEGINCONTEXTREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushBeginContextReply)\n  })\n_sym_db.RegisterMessage(PushBeginContextReply)\n\nPushEndContextRequest = _reflection.GeneratedProtocolMessageType('PushEndContextRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHENDCONTEXTREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushEndContextRequest)\n  })\n_sym_db.RegisterMessage(PushEndContextRequest)\n\nPushEndContextReply = _reflection.GeneratedProtocolMessageType('PushEndContextReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHENDCONTEXTREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushEndContextReply)\n  })\n_sym_db.RegisterMessage(PushEndContextReply)\n\nErrorProto = _reflection.GeneratedProtocolMessageType('ErrorProto', (_message.Message,), {\n  'DESCRIPTOR' : _ERRORPROTO,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.ErrorProto)\n  })\n_sym_db.RegisterMessage(ErrorProto)\n\nBranchRecord = _reflection.GeneratedProtocolMessageType('BranchRecord', (_message.Message,), {\n  'DESCRIPTOR' : _BRANCHRECORD,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.BranchRecord)\n  })\n_sym_db.RegisterMessage(BranchRecord)\n\nHashRecord = _reflection.GeneratedProtocolMessageType('HashRecord', (_message.Message,), {\n  'DESCRIPTOR' : _HASHRECORD,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.HashRecord)\n  })\n_sym_db.RegisterMessage(HashRecord)\n\nCommitRecord = _reflection.GeneratedProtocolMessageType('CommitRecord', (_message.Message,), {\n  'DESCRIPTOR' : _COMMITRECORD,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.CommitRecord)\n  })\n_sym_db.RegisterMessage(CommitRecord)\n\nSchemaRecord = _reflection.GeneratedProtocolMessageType('SchemaRecord', (_message.Message,), {\n  'DESCRIPTOR' : _SCHEMARECORD,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.SchemaRecord)\n  })\n_sym_db.RegisterMessage(SchemaRecord)\n\nDataOriginRequest = _reflection.GeneratedProtocolMessageType('DataOriginRequest', (_message.Message,), {\n  'DESCRIPTOR' : _DATAORIGINREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.DataOriginRequest)\n  })\n_sym_db.RegisterMessage(DataOriginRequest)\n\nDataOriginReply = _reflection.GeneratedProtocolMessageType('DataOriginReply', (_message.Message,), {\n\n  'CompressionOptsEntry' : _reflection.GeneratedProtocolMessageType('CompressionOptsEntry', (_message.Message,), {\n    'DESCRIPTOR' : _DATAORIGINREPLY_COMPRESSIONOPTSENTRY,\n    '__module__' : 'hangar_service_pb2'\n    # @@protoc_insertion_point(class_scope:hangar.DataOriginReply.CompressionOptsEntry)\n    })\n  ,\n  'DESCRIPTOR' : _DATAORIGINREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.DataOriginReply)\n  })\n_sym_db.RegisterMessage(DataOriginReply)\n_sym_db.RegisterMessage(DataOriginReply.CompressionOptsEntry)\n\nPushFindDataOriginRequest = _reflection.GeneratedProtocolMessageType('PushFindDataOriginRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHFINDDATAORIGINREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushFindDataOriginRequest)\n  })\n_sym_db.RegisterMessage(PushFindDataOriginRequest)\n\nPushFindDataOriginReply = _reflection.GeneratedProtocolMessageType('PushFindDataOriginReply', (_message.Message,), {\n\n  'CompressionOptsExpectedEntry' : _reflection.GeneratedProtocolMessageType('CompressionOptsExpectedEntry', (_message.Message,), {\n    'DESCRIPTOR' : _PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY,\n    '__module__' : 'hangar_service_pb2'\n    # @@protoc_insertion_point(class_scope:hangar.PushFindDataOriginReply.CompressionOptsExpectedEntry)\n    })\n  ,\n  'DESCRIPTOR' : _PUSHFINDDATAORIGINREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushFindDataOriginReply)\n  })\n_sym_db.RegisterMessage(PushFindDataOriginReply)\n_sym_db.RegisterMessage(PushFindDataOriginReply.CompressionOptsExpectedEntry)\n\nPingRequest = _reflection.GeneratedProtocolMessageType('PingRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PINGREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PingRequest)\n  })\n_sym_db.RegisterMessage(PingRequest)\n\nPingReply = _reflection.GeneratedProtocolMessageType('PingReply', (_message.Message,), {\n  'DESCRIPTOR' : _PINGREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PingReply)\n  })\n_sym_db.RegisterMessage(PingReply)\n\nGetClientConfigRequest = _reflection.GeneratedProtocolMessageType('GetClientConfigRequest', (_message.Message,), {\n  'DESCRIPTOR' : _GETCLIENTCONFIGREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.GetClientConfigRequest)\n  })\n_sym_db.RegisterMessage(GetClientConfigRequest)\n\nGetClientConfigReply = _reflection.GeneratedProtocolMessageType('GetClientConfigReply', (_message.Message,), {\n\n  'ConfigEntry' : _reflection.GeneratedProtocolMessageType('ConfigEntry', (_message.Message,), {\n    'DESCRIPTOR' : _GETCLIENTCONFIGREPLY_CONFIGENTRY,\n    '__module__' : 'hangar_service_pb2'\n    # @@protoc_insertion_point(class_scope:hangar.GetClientConfigReply.ConfigEntry)\n    })\n  ,\n  'DESCRIPTOR' : _GETCLIENTCONFIGREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.GetClientConfigReply)\n  })\n_sym_db.RegisterMessage(GetClientConfigReply)\n_sym_db.RegisterMessage(GetClientConfigReply.ConfigEntry)\n\nFetchBranchRecordRequest = _reflection.GeneratedProtocolMessageType('FetchBranchRecordRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHBRANCHRECORDREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchBranchRecordRequest)\n  })\n_sym_db.RegisterMessage(FetchBranchRecordRequest)\n\nFetchBranchRecordReply = _reflection.GeneratedProtocolMessageType('FetchBranchRecordReply', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHBRANCHRECORDREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchBranchRecordReply)\n  })\n_sym_db.RegisterMessage(FetchBranchRecordReply)\n\nFetchDataRequest = _reflection.GeneratedProtocolMessageType('FetchDataRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHDATAREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchDataRequest)\n  })\n_sym_db.RegisterMessage(FetchDataRequest)\n\nFetchDataReply = _reflection.GeneratedProtocolMessageType('FetchDataReply', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHDATAREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchDataReply)\n  })\n_sym_db.RegisterMessage(FetchDataReply)\n\nFetchCommitRequest = _reflection.GeneratedProtocolMessageType('FetchCommitRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHCOMMITREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchCommitRequest)\n  })\n_sym_db.RegisterMessage(FetchCommitRequest)\n\nFetchCommitReply = _reflection.GeneratedProtocolMessageType('FetchCommitReply', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHCOMMITREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchCommitReply)\n  })\n_sym_db.RegisterMessage(FetchCommitReply)\n\nFetchSchemaRequest = _reflection.GeneratedProtocolMessageType('FetchSchemaRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHSCHEMAREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchSchemaRequest)\n  })\n_sym_db.RegisterMessage(FetchSchemaRequest)\n\nFetchSchemaReply = _reflection.GeneratedProtocolMessageType('FetchSchemaReply', (_message.Message,), {\n  'DESCRIPTOR' : _FETCHSCHEMAREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FetchSchemaReply)\n  })\n_sym_db.RegisterMessage(FetchSchemaReply)\n\nPushBranchRecordRequest = _reflection.GeneratedProtocolMessageType('PushBranchRecordRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHBRANCHRECORDREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushBranchRecordRequest)\n  })\n_sym_db.RegisterMessage(PushBranchRecordRequest)\n\nPushBranchRecordReply = _reflection.GeneratedProtocolMessageType('PushBranchRecordReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHBRANCHRECORDREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushBranchRecordReply)\n  })\n_sym_db.RegisterMessage(PushBranchRecordReply)\n\nPushDataRequest = _reflection.GeneratedProtocolMessageType('PushDataRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHDATAREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushDataRequest)\n  })\n_sym_db.RegisterMessage(PushDataRequest)\n\nPushDataReply = _reflection.GeneratedProtocolMessageType('PushDataReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHDATAREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushDataReply)\n  })\n_sym_db.RegisterMessage(PushDataReply)\n\nPushCommitRequest = _reflection.GeneratedProtocolMessageType('PushCommitRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHCOMMITREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushCommitRequest)\n  })\n_sym_db.RegisterMessage(PushCommitRequest)\n\nPushCommitReply = _reflection.GeneratedProtocolMessageType('PushCommitReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHCOMMITREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushCommitReply)\n  })\n_sym_db.RegisterMessage(PushCommitReply)\n\nPushSchemaRequest = _reflection.GeneratedProtocolMessageType('PushSchemaRequest', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHSCHEMAREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushSchemaRequest)\n  })\n_sym_db.RegisterMessage(PushSchemaRequest)\n\nPushSchemaReply = _reflection.GeneratedProtocolMessageType('PushSchemaReply', (_message.Message,), {\n  'DESCRIPTOR' : _PUSHSCHEMAREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.PushSchemaReply)\n  })\n_sym_db.RegisterMessage(PushSchemaReply)\n\nFindMissingCommitsRequest = _reflection.GeneratedProtocolMessageType('FindMissingCommitsRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGCOMMITSREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingCommitsRequest)\n  })\n_sym_db.RegisterMessage(FindMissingCommitsRequest)\n\nFindMissingCommitsReply = _reflection.GeneratedProtocolMessageType('FindMissingCommitsReply', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGCOMMITSREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingCommitsReply)\n  })\n_sym_db.RegisterMessage(FindMissingCommitsReply)\n\nFindMissingHashRecordsRequest = _reflection.GeneratedProtocolMessageType('FindMissingHashRecordsRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGHASHRECORDSREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingHashRecordsRequest)\n  })\n_sym_db.RegisterMessage(FindMissingHashRecordsRequest)\n\nFindMissingHashRecordsReply = _reflection.GeneratedProtocolMessageType('FindMissingHashRecordsReply', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGHASHRECORDSREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingHashRecordsReply)\n  })\n_sym_db.RegisterMessage(FindMissingHashRecordsReply)\n\nFindMissingSchemasRequest = _reflection.GeneratedProtocolMessageType('FindMissingSchemasRequest', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGSCHEMASREQUEST,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingSchemasRequest)\n  })\n_sym_db.RegisterMessage(FindMissingSchemasRequest)\n\nFindMissingSchemasReply = _reflection.GeneratedProtocolMessageType('FindMissingSchemasReply', (_message.Message,), {\n  'DESCRIPTOR' : _FINDMISSINGSCHEMASREPLY,\n  '__module__' : 'hangar_service_pb2'\n  # @@protoc_insertion_point(class_scope:hangar.FindMissingSchemasReply)\n  })\n_sym_db.RegisterMessage(FindMissingSchemasReply)\n\n\nDESCRIPTOR._options = None\n_DATAORIGINREPLY_COMPRESSIONOPTSENTRY._options = None\n_PUSHFINDDATAORIGINREPLY_COMPRESSIONOPTSEXPECTEDENTRY._options = None\n_GETCLIENTCONFIGREPLY_CONFIGENTRY._options = None\n\n_HANGARSERVICE = _descriptor.ServiceDescriptor(\n  name='HangarService',\n  full_name='hangar.HangarService',\n  file=DESCRIPTOR,\n  index=0,\n  serialized_options=None,\n  serialized_start=3317,\n  serialized_end=5007,\n  methods=[\n  _descriptor.MethodDescriptor(\n    name='PING',\n    full_name='hangar.HangarService.PING',\n    index=0,\n    containing_service=None,\n    input_type=_PINGREQUEST,\n    output_type=_PINGREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='GetClientConfig',\n    full_name='hangar.HangarService.GetClientConfig',\n    index=1,\n    containing_service=None,\n    input_type=_GETCLIENTCONFIGREQUEST,\n    output_type=_GETCLIENTCONFIGREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchBranchRecord',\n    full_name='hangar.HangarService.FetchBranchRecord',\n    index=2,\n    containing_service=None,\n    input_type=_FETCHBRANCHRECORDREQUEST,\n    output_type=_FETCHBRANCHRECORDREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchData',\n    full_name='hangar.HangarService.FetchData',\n    index=3,\n    containing_service=None,\n    input_type=_FETCHDATAREQUEST,\n    output_type=_FETCHDATAREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchCommit',\n    full_name='hangar.HangarService.FetchCommit',\n    index=4,\n    containing_service=None,\n    input_type=_FETCHCOMMITREQUEST,\n    output_type=_FETCHCOMMITREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchSchema',\n    full_name='hangar.HangarService.FetchSchema',\n    index=5,\n    containing_service=None,\n    input_type=_FETCHSCHEMAREQUEST,\n    output_type=_FETCHSCHEMAREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushBranchRecord',\n    full_name='hangar.HangarService.PushBranchRecord',\n    index=6,\n    containing_service=None,\n    input_type=_PUSHBRANCHRECORDREQUEST,\n    output_type=_PUSHBRANCHRECORDREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushData',\n    full_name='hangar.HangarService.PushData',\n    index=7,\n    containing_service=None,\n    input_type=_PUSHDATAREQUEST,\n    output_type=_PUSHDATAREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushCommit',\n    full_name='hangar.HangarService.PushCommit',\n    index=8,\n    containing_service=None,\n    input_type=_PUSHCOMMITREQUEST,\n    output_type=_PUSHCOMMITREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushSchema',\n    full_name='hangar.HangarService.PushSchema',\n    index=9,\n    containing_service=None,\n    input_type=_PUSHSCHEMAREQUEST,\n    output_type=_PUSHSCHEMAREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchFindMissingCommits',\n    full_name='hangar.HangarService.FetchFindMissingCommits',\n    index=10,\n    containing_service=None,\n    input_type=_FINDMISSINGCOMMITSREQUEST,\n    output_type=_FINDMISSINGCOMMITSREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchFindMissingHashRecords',\n    full_name='hangar.HangarService.FetchFindMissingHashRecords',\n    index=11,\n    containing_service=None,\n    input_type=_FINDMISSINGHASHRECORDSREQUEST,\n    output_type=_FINDMISSINGHASHRECORDSREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchFindMissingSchemas',\n    full_name='hangar.HangarService.FetchFindMissingSchemas',\n    index=12,\n    containing_service=None,\n    input_type=_FINDMISSINGSCHEMASREQUEST,\n    output_type=_FINDMISSINGSCHEMASREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushFindMissingCommits',\n    full_name='hangar.HangarService.PushFindMissingCommits',\n    index=13,\n    containing_service=None,\n    input_type=_FINDMISSINGCOMMITSREQUEST,\n    output_type=_FINDMISSINGCOMMITSREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushFindMissingHashRecords',\n    full_name='hangar.HangarService.PushFindMissingHashRecords',\n    index=14,\n    containing_service=None,\n    input_type=_FINDMISSINGHASHRECORDSREQUEST,\n    output_type=_FINDMISSINGHASHRECORDSREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushFindMissingSchemas',\n    full_name='hangar.HangarService.PushFindMissingSchemas',\n    index=15,\n    containing_service=None,\n    input_type=_FINDMISSINGSCHEMASREQUEST,\n    output_type=_FINDMISSINGSCHEMASREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='FetchFindDataOrigin',\n    full_name='hangar.HangarService.FetchFindDataOrigin',\n    index=16,\n    containing_service=None,\n    input_type=_DATAORIGINREQUEST,\n    output_type=_DATAORIGINREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushFindDataOrigin',\n    full_name='hangar.HangarService.PushFindDataOrigin',\n    index=17,\n    containing_service=None,\n    input_type=_PUSHFINDDATAORIGINREQUEST,\n    output_type=_PUSHFINDDATAORIGINREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushBeginContext',\n    full_name='hangar.HangarService.PushBeginContext',\n    index=18,\n    containing_service=None,\n    input_type=_PUSHBEGINCONTEXTREQUEST,\n    output_type=_PUSHBEGINCONTEXTREPLY,\n    serialized_options=None,\n  ),\n  _descriptor.MethodDescriptor(\n    name='PushEndContext',\n    full_name='hangar.HangarService.PushEndContext',\n    index=19,\n    containing_service=None,\n    input_type=_PUSHENDCONTEXTREQUEST,\n    output_type=_PUSHENDCONTEXTREPLY,\n    serialized_options=None,\n  ),\n])\n_sym_db.RegisterServiceDescriptor(_HANGARSERVICE)\n\nDESCRIPTOR.services_by_name['HangarService'] = _HANGARSERVICE\n\n# @@protoc_insertion_point(module_scope)\n"
  },
  {
    "path": "src/hangar/remote/hangar_service_pb2.pyi",
    "content": "# @generated by generate_proto_mypy_stubs.py.  Do not edit!\nimport sys\nfrom google.protobuf.descriptor import (\n    Descriptor as google___protobuf___descriptor___Descriptor,\n    EnumDescriptor as google___protobuf___descriptor___EnumDescriptor,\n    FileDescriptor as google___protobuf___descriptor___FileDescriptor,\n)\n\nfrom google.protobuf.internal.containers import (\n    RepeatedScalarFieldContainer as google___protobuf___internal___containers___RepeatedScalarFieldContainer,\n)\n\nfrom google.protobuf.internal.enum_type_wrapper import (\n    _EnumTypeWrapper as google___protobuf___internal___enum_type_wrapper____EnumTypeWrapper,\n)\n\nfrom google.protobuf.message import (\n    Message as google___protobuf___message___Message,\n)\n\nfrom typing import (\n    Iterable as typing___Iterable,\n    Mapping as typing___Mapping,\n    MutableMapping as typing___MutableMapping,\n    NewType as typing___NewType,\n    Optional as typing___Optional,\n    Text as typing___Text,\n    cast as typing___cast,\n)\n\nfrom typing_extensions import (\n    Literal as typing_extensions___Literal,\n)\n\n\nbuiltin___bool = bool\nbuiltin___bytes = bytes\nbuiltin___float = float\nbuiltin___int = int\n\n\nDESCRIPTOR: google___protobuf___descriptor___FileDescriptor = ...\n\nDataLocationValue = typing___NewType('DataLocationValue', builtin___int)\ntype___DataLocationValue = DataLocationValue\nDataLocation: _DataLocation\nclass _DataLocation(google___protobuf___internal___enum_type_wrapper____EnumTypeWrapper[DataLocationValue]):\n    DESCRIPTOR: google___protobuf___descriptor___EnumDescriptor = ...\n    REMOTE_SERVER = typing___cast(DataLocationValue, 0)\n    MINIO = typing___cast(DataLocationValue, 1)\n    S3 = typing___cast(DataLocationValue, 2)\n    GCS = typing___cast(DataLocationValue, 3)\n    ABS = typing___cast(DataLocationValue, 4)\nREMOTE_SERVER = typing___cast(DataLocationValue, 0)\nMINIO = typing___cast(DataLocationValue, 1)\nS3 = typing___cast(DataLocationValue, 2)\nGCS = typing___cast(DataLocationValue, 3)\nABS = typing___cast(DataLocationValue, 4)\ntype___DataLocation = DataLocation\n\nDataTypeValue = typing___NewType('DataTypeValue', builtin___int)\ntype___DataTypeValue = DataTypeValue\nDataType: _DataType\nclass _DataType(google___protobuf___internal___enum_type_wrapper____EnumTypeWrapper[DataTypeValue]):\n    DESCRIPTOR: google___protobuf___descriptor___EnumDescriptor = ...\n    NP_ARRAY = typing___cast(DataTypeValue, 0)\n    SCHEMA = typing___cast(DataTypeValue, 1)\n    STR = typing___cast(DataTypeValue, 2)\n    BYTES = typing___cast(DataTypeValue, 3)\nNP_ARRAY = typing___cast(DataTypeValue, 0)\nSCHEMA = typing___cast(DataTypeValue, 1)\nSTR = typing___cast(DataTypeValue, 2)\nBYTES = typing___cast(DataTypeValue, 3)\ntype___DataType = DataType\n\nclass PushBeginContextRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    client_uuid: typing___Text = ...\n\n    def __init__(self,\n        *,\n        client_uuid : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"client_uuid\",b\"client_uuid\"]) -> None: ...\ntype___PushBeginContextRequest = PushBeginContextRequest\n\nclass PushBeginContextReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def err(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        err : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"err\",b\"err\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"err\",b\"err\"]) -> None: ...\ntype___PushBeginContextReply = PushBeginContextReply\n\nclass PushEndContextRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    client_uuid: typing___Text = ...\n\n    def __init__(self,\n        *,\n        client_uuid : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"client_uuid\",b\"client_uuid\"]) -> None: ...\ntype___PushEndContextRequest = PushEndContextRequest\n\nclass PushEndContextReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def err(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        err : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"err\",b\"err\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"err\",b\"err\"]) -> None: ...\ntype___PushEndContextReply = PushEndContextReply\n\nclass ErrorProto(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    code: builtin___int = ...\n    message: typing___Text = ...\n\n    def __init__(self,\n        *,\n        code : typing___Optional[builtin___int] = None,\n        message : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"code\",b\"code\",u\"message\",b\"message\"]) -> None: ...\ntype___ErrorProto = ErrorProto\n\nclass BranchRecord(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    name: typing___Text = ...\n    commit: typing___Text = ...\n\n    def __init__(self,\n        *,\n        name : typing___Optional[typing___Text] = None,\n        commit : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"name\",b\"name\"]) -> None: ...\ntype___BranchRecord = BranchRecord\n\nclass HashRecord(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    type: typing___Text = ...\n    digest: typing___Text = ...\n\n    def __init__(self,\n        *,\n        type : typing___Optional[typing___Text] = None,\n        digest : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"digest\",b\"digest\",u\"type\",b\"type\"]) -> None: ...\ntype___HashRecord = HashRecord\n\nclass CommitRecord(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    parent: builtin___bytes = ...\n    ref: builtin___bytes = ...\n    spec: builtin___bytes = ...\n\n    def __init__(self,\n        *,\n        parent : typing___Optional[builtin___bytes] = None,\n        ref : typing___Optional[builtin___bytes] = None,\n        spec : typing___Optional[builtin___bytes] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"parent\",b\"parent\",u\"ref\",b\"ref\",u\"spec\",b\"spec\"]) -> None: ...\ntype___CommitRecord = CommitRecord\n\nclass SchemaRecord(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    digest: typing___Text = ...\n    blob: builtin___bytes = ...\n\n    def __init__(self,\n        *,\n        digest : typing___Optional[typing___Text] = None,\n        blob : typing___Optional[builtin___bytes] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"blob\",b\"blob\",u\"digest\",b\"digest\"]) -> None: ...\ntype___SchemaRecord = SchemaRecord\n\nclass DataOriginRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    digest: typing___Text = ...\n\n    def __init__(self,\n        *,\n        digest : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"digest\",b\"digest\"]) -> None: ...\ntype___DataOriginRequest = DataOriginRequest\n\nclass DataOriginReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    class CompressionOptsEntry(google___protobuf___message___Message):\n        DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n        key: typing___Text = ...\n        value: typing___Text = ...\n\n        def __init__(self,\n            *,\n            key : typing___Optional[typing___Text] = None,\n            value : typing___Optional[typing___Text] = None,\n            ) -> None: ...\n        def ClearField(self, field_name: typing_extensions___Literal[u\"key\",b\"key\",u\"value\",b\"value\"]) -> None: ...\n    type___CompressionOptsEntry = CompressionOptsEntry\n\n    location: type___DataLocationValue = ...\n    data_type: type___DataTypeValue = ...\n    digest: typing___Text = ...\n    uri: typing___Text = ...\n    compression: builtin___bool = ...\n\n    @property\n    def compression_opts(self) -> typing___MutableMapping[typing___Text, typing___Text]: ...\n\n    def __init__(self,\n        *,\n        location : typing___Optional[type___DataLocationValue] = None,\n        data_type : typing___Optional[type___DataTypeValue] = None,\n        digest : typing___Optional[typing___Text] = None,\n        uri : typing___Optional[typing___Text] = None,\n        compression : typing___Optional[builtin___bool] = None,\n        compression_opts : typing___Optional[typing___Mapping[typing___Text, typing___Text]] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"compression\",b\"compression\",u\"compression_opts\",b\"compression_opts\",u\"data_type\",b\"data_type\",u\"digest\",b\"digest\",u\"location\",b\"location\",u\"uri\",b\"uri\"]) -> None: ...\ntype___DataOriginReply = DataOriginReply\n\nclass PushFindDataOriginRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    data_type: type___DataTypeValue = ...\n    digest: typing___Text = ...\n    compression_is_desired: builtin___bool = ...\n\n    def __init__(self,\n        *,\n        data_type : typing___Optional[type___DataTypeValue] = None,\n        digest : typing___Optional[typing___Text] = None,\n        compression_is_desired : typing___Optional[builtin___bool] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"compression_is_desired\",b\"compression_is_desired\",u\"data_type\",b\"data_type\",u\"digest\",b\"digest\"]) -> None: ...\ntype___PushFindDataOriginRequest = PushFindDataOriginRequest\n\nclass PushFindDataOriginReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    class CompressionOptsExpectedEntry(google___protobuf___message___Message):\n        DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n        key: typing___Text = ...\n        value: typing___Text = ...\n\n        def __init__(self,\n            *,\n            key : typing___Optional[typing___Text] = None,\n            value : typing___Optional[typing___Text] = None,\n            ) -> None: ...\n        def ClearField(self, field_name: typing_extensions___Literal[u\"key\",b\"key\",u\"value\",b\"value\"]) -> None: ...\n    type___CompressionOptsExpectedEntry = CompressionOptsExpectedEntry\n\n    digest: typing___Text = ...\n    location: type___DataLocationValue = ...\n    uri: typing___Text = ...\n    compression_expected: builtin___bool = ...\n\n    @property\n    def compression_opts_expected(self) -> typing___MutableMapping[typing___Text, typing___Text]: ...\n\n    def __init__(self,\n        *,\n        digest : typing___Optional[typing___Text] = None,\n        location : typing___Optional[type___DataLocationValue] = None,\n        uri : typing___Optional[typing___Text] = None,\n        compression_expected : typing___Optional[builtin___bool] = None,\n        compression_opts_expected : typing___Optional[typing___Mapping[typing___Text, typing___Text]] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"compression_expected\",b\"compression_expected\",u\"compression_opts_expected\",b\"compression_opts_expected\",u\"digest\",b\"digest\",u\"location\",b\"location\",u\"uri\",b\"uri\"]) -> None: ...\ntype___PushFindDataOriginReply = PushFindDataOriginReply\n\nclass PingRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    def __init__(self,\n        ) -> None: ...\ntype___PingRequest = PingRequest\n\nclass PingReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    result: typing___Text = ...\n\n    def __init__(self,\n        *,\n        result : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"result\",b\"result\"]) -> None: ...\ntype___PingReply = PingReply\n\nclass GetClientConfigRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    def __init__(self,\n        ) -> None: ...\ntype___GetClientConfigRequest = GetClientConfigRequest\n\nclass GetClientConfigReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    class ConfigEntry(google___protobuf___message___Message):\n        DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n        key: typing___Text = ...\n        value: typing___Text = ...\n\n        def __init__(self,\n            *,\n            key : typing___Optional[typing___Text] = None,\n            value : typing___Optional[typing___Text] = None,\n            ) -> None: ...\n        def ClearField(self, field_name: typing_extensions___Literal[u\"key\",b\"key\",u\"value\",b\"value\"]) -> None: ...\n    type___ConfigEntry = ConfigEntry\n\n\n    @property\n    def config(self) -> typing___MutableMapping[typing___Text, typing___Text]: ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        config : typing___Optional[typing___Mapping[typing___Text, typing___Text]] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"config\",b\"config\",u\"error\",b\"error\"]) -> None: ...\ntype___GetClientConfigReply = GetClientConfigReply\n\nclass FetchBranchRecordRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___BranchRecord: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___BranchRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> None: ...\ntype___FetchBranchRecordRequest = FetchBranchRecordRequest\n\nclass FetchBranchRecordReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___BranchRecord: ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___BranchRecord] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"rec\",b\"rec\"]) -> None: ...\ntype___FetchBranchRecordReply = FetchBranchRecordReply\n\nclass FetchDataRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    uri: typing___Text = ...\n\n    def __init__(self,\n        *,\n        uri : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"uri\",b\"uri\"]) -> None: ...\ntype___FetchDataRequest = FetchDataRequest\n\nclass FetchDataReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    uri: typing___Text = ...\n    raw_data: builtin___bytes = ...\n    nbytes: builtin___int = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        uri : typing___Optional[typing___Text] = None,\n        raw_data : typing___Optional[builtin___bytes] = None,\n        nbytes : typing___Optional[builtin___int] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"nbytes\",b\"nbytes\",u\"raw_data\",b\"raw_data\",u\"uri\",b\"uri\"]) -> None: ...\ntype___FetchDataReply = FetchDataReply\n\nclass FetchCommitRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\"]) -> None: ...\ntype___FetchCommitRequest = FetchCommitRequest\n\nclass FetchCommitReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    total_byte_size: builtin___int = ...\n\n    @property\n    def record(self) -> type___CommitRecord: ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        total_byte_size : typing___Optional[builtin___int] = None,\n        record : typing___Optional[type___CommitRecord] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"record\",b\"record\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"error\",b\"error\",u\"record\",b\"record\",u\"total_byte_size\",b\"total_byte_size\"]) -> None: ...\ntype___FetchCommitReply = FetchCommitReply\n\nclass FetchSchemaRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___SchemaRecord: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___SchemaRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> None: ...\ntype___FetchSchemaRequest = FetchSchemaRequest\n\nclass FetchSchemaReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___SchemaRecord: ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___SchemaRecord] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\",u\"rec\",b\"rec\"]) -> None: ...\ntype___FetchSchemaReply = FetchSchemaReply\n\nclass PushBranchRecordRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___BranchRecord: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___BranchRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> None: ...\ntype___PushBranchRecordRequest = PushBranchRecordRequest\n\nclass PushBranchRecordReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> None: ...\ntype___PushBranchRecordReply = PushBranchRecordReply\n\nclass PushDataRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    uri: typing___Text = ...\n    raw_data: builtin___bytes = ...\n    nbytes: builtin___int = ...\n    data_type: type___DataTypeValue = ...\n    schema_hash: typing___Text = ...\n\n    def __init__(self,\n        *,\n        uri : typing___Optional[typing___Text] = None,\n        raw_data : typing___Optional[builtin___bytes] = None,\n        nbytes : typing___Optional[builtin___int] = None,\n        data_type : typing___Optional[type___DataTypeValue] = None,\n        schema_hash : typing___Optional[typing___Text] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"data_type\",b\"data_type\",u\"nbytes\",b\"nbytes\",u\"raw_data\",b\"raw_data\",u\"schema_hash\",b\"schema_hash\",u\"uri\",b\"uri\"]) -> None: ...\ntype___PushDataRequest = PushDataRequest\n\nclass PushDataReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> None: ...\ntype___PushDataReply = PushDataReply\n\nclass PushCommitRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    total_byte_size: builtin___int = ...\n\n    @property\n    def record(self) -> type___CommitRecord: ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        total_byte_size : typing___Optional[builtin___int] = None,\n        record : typing___Optional[type___CommitRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"record\",b\"record\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"record\",b\"record\",u\"total_byte_size\",b\"total_byte_size\"]) -> None: ...\ntype___PushCommitRequest = PushCommitRequest\n\nclass PushCommitReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> None: ...\ntype___PushCommitReply = PushCommitReply\n\nclass PushSchemaRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def rec(self) -> type___SchemaRecord: ...\n\n    def __init__(self,\n        *,\n        rec : typing___Optional[type___SchemaRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"rec\",b\"rec\"]) -> None: ...\ntype___PushSchemaRequest = PushSchemaRequest\n\nclass PushSchemaReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> None: ...\ntype___PushSchemaReply = PushSchemaReply\n\nclass FindMissingCommitsRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commits: google___protobuf___internal___containers___RepeatedScalarFieldContainer[typing___Text] = ...\n\n    @property\n    def branch(self) -> type___BranchRecord: ...\n\n    def __init__(self,\n        *,\n        commits : typing___Optional[typing___Iterable[typing___Text]] = None,\n        branch : typing___Optional[type___BranchRecord] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"branch\",b\"branch\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"branch\",b\"branch\",u\"commits\",b\"commits\"]) -> None: ...\ntype___FindMissingCommitsRequest = FindMissingCommitsRequest\n\nclass FindMissingCommitsReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commits: google___protobuf___internal___containers___RepeatedScalarFieldContainer[typing___Text] = ...\n\n    @property\n    def branch(self) -> type___BranchRecord: ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        commits : typing___Optional[typing___Iterable[typing___Text]] = None,\n        branch : typing___Optional[type___BranchRecord] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"branch\",b\"branch\",u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"branch\",b\"branch\",u\"commits\",b\"commits\",u\"error\",b\"error\"]) -> None: ...\ntype___FindMissingCommitsReply = FindMissingCommitsReply\n\nclass FindMissingHashRecordsRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    hashs: builtin___bytes = ...\n    total_byte_size: builtin___int = ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        hashs : typing___Optional[builtin___bytes] = None,\n        total_byte_size : typing___Optional[builtin___int] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"hashs\",b\"hashs\",u\"total_byte_size\",b\"total_byte_size\"]) -> None: ...\ntype___FindMissingHashRecordsRequest = FindMissingHashRecordsRequest\n\nclass FindMissingHashRecordsReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    hashs: builtin___bytes = ...\n    total_byte_size: builtin___int = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        hashs : typing___Optional[builtin___bytes] = None,\n        total_byte_size : typing___Optional[builtin___int] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"error\",b\"error\",u\"hashs\",b\"hashs\",u\"total_byte_size\",b\"total_byte_size\"]) -> None: ...\ntype___FindMissingHashRecordsReply = FindMissingHashRecordsReply\n\nclass FindMissingSchemasRequest(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    schema_digests: google___protobuf___internal___containers___RepeatedScalarFieldContainer[typing___Text] = ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        schema_digests : typing___Optional[typing___Iterable[typing___Text]] = None,\n        ) -> None: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"schema_digests\",b\"schema_digests\"]) -> None: ...\ntype___FindMissingSchemasRequest = FindMissingSchemasRequest\n\nclass FindMissingSchemasReply(google___protobuf___message___Message):\n    DESCRIPTOR: google___protobuf___descriptor___Descriptor = ...\n    commit: typing___Text = ...\n    schema_digests: google___protobuf___internal___containers___RepeatedScalarFieldContainer[typing___Text] = ...\n\n    @property\n    def error(self) -> type___ErrorProto: ...\n\n    def __init__(self,\n        *,\n        commit : typing___Optional[typing___Text] = None,\n        schema_digests : typing___Optional[typing___Iterable[typing___Text]] = None,\n        error : typing___Optional[type___ErrorProto] = None,\n        ) -> None: ...\n    def HasField(self, field_name: typing_extensions___Literal[u\"error\",b\"error\"]) -> builtin___bool: ...\n    def ClearField(self, field_name: typing_extensions___Literal[u\"commit\",b\"commit\",u\"error\",b\"error\",u\"schema_digests\",b\"schema_digests\"]) -> None: ...\ntype___FindMissingSchemasReply = FindMissingSchemasReply\n"
  },
  {
    "path": "src/hangar/remote/hangar_service_pb2_grpc.py",
    "content": "# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!\nimport grpc\n\nfrom . import hangar_service_pb2 as hangar__service__pb2\n\n\nclass HangarServiceStub(object):\n    \"\"\"Missing associated documentation comment in .proto file\"\"\"\n\n    def __init__(self, channel):\n        \"\"\"Constructor.\n\n        Args:\n            channel: A grpc.Channel.\n        \"\"\"\n        self.PING = channel.unary_unary(\n                '/hangar.HangarService/PING',\n                request_serializer=hangar__service__pb2.PingRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PingReply.FromString,\n                )\n        self.GetClientConfig = channel.unary_unary(\n                '/hangar.HangarService/GetClientConfig',\n                request_serializer=hangar__service__pb2.GetClientConfigRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.GetClientConfigReply.FromString,\n                )\n        self.FetchBranchRecord = channel.unary_unary(\n                '/hangar.HangarService/FetchBranchRecord',\n                request_serializer=hangar__service__pb2.FetchBranchRecordRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FetchBranchRecordReply.FromString,\n                )\n        self.FetchData = channel.unary_stream(\n                '/hangar.HangarService/FetchData',\n                request_serializer=hangar__service__pb2.FetchDataRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FetchDataReply.FromString,\n                )\n        self.FetchCommit = channel.unary_stream(\n                '/hangar.HangarService/FetchCommit',\n                request_serializer=hangar__service__pb2.FetchCommitRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FetchCommitReply.FromString,\n                )\n        self.FetchSchema = channel.unary_unary(\n                '/hangar.HangarService/FetchSchema',\n                request_serializer=hangar__service__pb2.FetchSchemaRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FetchSchemaReply.FromString,\n                )\n        self.PushBranchRecord = channel.unary_unary(\n                '/hangar.HangarService/PushBranchRecord',\n                request_serializer=hangar__service__pb2.PushBranchRecordRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushBranchRecordReply.FromString,\n                )\n        self.PushData = channel.stream_unary(\n                '/hangar.HangarService/PushData',\n                request_serializer=hangar__service__pb2.PushDataRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushDataReply.FromString,\n                )\n        self.PushCommit = channel.stream_unary(\n                '/hangar.HangarService/PushCommit',\n                request_serializer=hangar__service__pb2.PushCommitRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushCommitReply.FromString,\n                )\n        self.PushSchema = channel.unary_unary(\n                '/hangar.HangarService/PushSchema',\n                request_serializer=hangar__service__pb2.PushSchemaRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushSchemaReply.FromString,\n                )\n        self.FetchFindMissingCommits = channel.unary_unary(\n                '/hangar.HangarService/FetchFindMissingCommits',\n                request_serializer=hangar__service__pb2.FindMissingCommitsRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingCommitsReply.FromString,\n                )\n        self.FetchFindMissingHashRecords = channel.stream_stream(\n                '/hangar.HangarService/FetchFindMissingHashRecords',\n                request_serializer=hangar__service__pb2.FindMissingHashRecordsRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingHashRecordsReply.FromString,\n                )\n        self.FetchFindMissingSchemas = channel.unary_unary(\n                '/hangar.HangarService/FetchFindMissingSchemas',\n                request_serializer=hangar__service__pb2.FindMissingSchemasRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingSchemasReply.FromString,\n                )\n        self.PushFindMissingCommits = channel.unary_unary(\n                '/hangar.HangarService/PushFindMissingCommits',\n                request_serializer=hangar__service__pb2.FindMissingCommitsRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingCommitsReply.FromString,\n                )\n        self.PushFindMissingHashRecords = channel.stream_stream(\n                '/hangar.HangarService/PushFindMissingHashRecords',\n                request_serializer=hangar__service__pb2.FindMissingHashRecordsRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingHashRecordsReply.FromString,\n                )\n        self.PushFindMissingSchemas = channel.unary_unary(\n                '/hangar.HangarService/PushFindMissingSchemas',\n                request_serializer=hangar__service__pb2.FindMissingSchemasRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.FindMissingSchemasReply.FromString,\n                )\n        self.FetchFindDataOrigin = channel.stream_stream(\n                '/hangar.HangarService/FetchFindDataOrigin',\n                request_serializer=hangar__service__pb2.DataOriginRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.DataOriginReply.FromString,\n                )\n        self.PushFindDataOrigin = channel.stream_stream(\n                '/hangar.HangarService/PushFindDataOrigin',\n                request_serializer=hangar__service__pb2.PushFindDataOriginRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushFindDataOriginReply.FromString,\n                )\n        self.PushBeginContext = channel.unary_unary(\n                '/hangar.HangarService/PushBeginContext',\n                request_serializer=hangar__service__pb2.PushBeginContextRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushBeginContextReply.FromString,\n                )\n        self.PushEndContext = channel.unary_unary(\n                '/hangar.HangarService/PushEndContext',\n                request_serializer=hangar__service__pb2.PushEndContextRequest.SerializeToString,\n                response_deserializer=hangar__service__pb2.PushEndContextReply.FromString,\n                )\n\n\nclass HangarServiceServicer(object):\n    \"\"\"Missing associated documentation comment in .proto file\"\"\"\n\n    def PING(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def GetClientConfig(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchBranchRecord(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchData(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchCommit(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchSchema(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushBranchRecord(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushData(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushCommit(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushSchema(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchFindMissingCommits(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchFindMissingHashRecords(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchFindMissingSchemas(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushFindMissingCommits(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushFindMissingHashRecords(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushFindMissingSchemas(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def FetchFindDataOrigin(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushFindDataOrigin(self, request_iterator, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushBeginContext(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n    def PushEndContext(self, request, context):\n        \"\"\"Missing associated documentation comment in .proto file\"\"\"\n        context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n        context.set_details('Method not implemented!')\n        raise NotImplementedError('Method not implemented!')\n\n\ndef add_HangarServiceServicer_to_server(servicer, server):\n    rpc_method_handlers = {\n            'PING': grpc.unary_unary_rpc_method_handler(\n                    servicer.PING,\n                    request_deserializer=hangar__service__pb2.PingRequest.FromString,\n                    response_serializer=hangar__service__pb2.PingReply.SerializeToString,\n            ),\n            'GetClientConfig': grpc.unary_unary_rpc_method_handler(\n                    servicer.GetClientConfig,\n                    request_deserializer=hangar__service__pb2.GetClientConfigRequest.FromString,\n                    response_serializer=hangar__service__pb2.GetClientConfigReply.SerializeToString,\n            ),\n            'FetchBranchRecord': grpc.unary_unary_rpc_method_handler(\n                    servicer.FetchBranchRecord,\n                    request_deserializer=hangar__service__pb2.FetchBranchRecordRequest.FromString,\n                    response_serializer=hangar__service__pb2.FetchBranchRecordReply.SerializeToString,\n            ),\n            'FetchData': grpc.unary_stream_rpc_method_handler(\n                    servicer.FetchData,\n                    request_deserializer=hangar__service__pb2.FetchDataRequest.FromString,\n                    response_serializer=hangar__service__pb2.FetchDataReply.SerializeToString,\n            ),\n            'FetchCommit': grpc.unary_stream_rpc_method_handler(\n                    servicer.FetchCommit,\n                    request_deserializer=hangar__service__pb2.FetchCommitRequest.FromString,\n                    response_serializer=hangar__service__pb2.FetchCommitReply.SerializeToString,\n            ),\n            'FetchSchema': grpc.unary_unary_rpc_method_handler(\n                    servicer.FetchSchema,\n                    request_deserializer=hangar__service__pb2.FetchSchemaRequest.FromString,\n                    response_serializer=hangar__service__pb2.FetchSchemaReply.SerializeToString,\n            ),\n            'PushBranchRecord': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushBranchRecord,\n                    request_deserializer=hangar__service__pb2.PushBranchRecordRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushBranchRecordReply.SerializeToString,\n            ),\n            'PushData': grpc.stream_unary_rpc_method_handler(\n                    servicer.PushData,\n                    request_deserializer=hangar__service__pb2.PushDataRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushDataReply.SerializeToString,\n            ),\n            'PushCommit': grpc.stream_unary_rpc_method_handler(\n                    servicer.PushCommit,\n                    request_deserializer=hangar__service__pb2.PushCommitRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushCommitReply.SerializeToString,\n            ),\n            'PushSchema': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushSchema,\n                    request_deserializer=hangar__service__pb2.PushSchemaRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushSchemaReply.SerializeToString,\n            ),\n            'FetchFindMissingCommits': grpc.unary_unary_rpc_method_handler(\n                    servicer.FetchFindMissingCommits,\n                    request_deserializer=hangar__service__pb2.FindMissingCommitsRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingCommitsReply.SerializeToString,\n            ),\n            'FetchFindMissingHashRecords': grpc.stream_stream_rpc_method_handler(\n                    servicer.FetchFindMissingHashRecords,\n                    request_deserializer=hangar__service__pb2.FindMissingHashRecordsRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingHashRecordsReply.SerializeToString,\n            ),\n            'FetchFindMissingSchemas': grpc.unary_unary_rpc_method_handler(\n                    servicer.FetchFindMissingSchemas,\n                    request_deserializer=hangar__service__pb2.FindMissingSchemasRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingSchemasReply.SerializeToString,\n            ),\n            'PushFindMissingCommits': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushFindMissingCommits,\n                    request_deserializer=hangar__service__pb2.FindMissingCommitsRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingCommitsReply.SerializeToString,\n            ),\n            'PushFindMissingHashRecords': grpc.stream_stream_rpc_method_handler(\n                    servicer.PushFindMissingHashRecords,\n                    request_deserializer=hangar__service__pb2.FindMissingHashRecordsRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingHashRecordsReply.SerializeToString,\n            ),\n            'PushFindMissingSchemas': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushFindMissingSchemas,\n                    request_deserializer=hangar__service__pb2.FindMissingSchemasRequest.FromString,\n                    response_serializer=hangar__service__pb2.FindMissingSchemasReply.SerializeToString,\n            ),\n            'FetchFindDataOrigin': grpc.stream_stream_rpc_method_handler(\n                    servicer.FetchFindDataOrigin,\n                    request_deserializer=hangar__service__pb2.DataOriginRequest.FromString,\n                    response_serializer=hangar__service__pb2.DataOriginReply.SerializeToString,\n            ),\n            'PushFindDataOrigin': grpc.stream_stream_rpc_method_handler(\n                    servicer.PushFindDataOrigin,\n                    request_deserializer=hangar__service__pb2.PushFindDataOriginRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushFindDataOriginReply.SerializeToString,\n            ),\n            'PushBeginContext': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushBeginContext,\n                    request_deserializer=hangar__service__pb2.PushBeginContextRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushBeginContextReply.SerializeToString,\n            ),\n            'PushEndContext': grpc.unary_unary_rpc_method_handler(\n                    servicer.PushEndContext,\n                    request_deserializer=hangar__service__pb2.PushEndContextRequest.FromString,\n                    response_serializer=hangar__service__pb2.PushEndContextReply.SerializeToString,\n            ),\n    }\n    generic_handler = grpc.method_handlers_generic_handler(\n            'hangar.HangarService', rpc_method_handlers)\n    server.add_generic_rpc_handlers((generic_handler,))\n\n\n # This class is part of an EXPERIMENTAL API.\nclass HangarService(object):\n    \"\"\"Missing associated documentation comment in .proto file\"\"\"\n\n    @staticmethod\n    def PING(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PING',\n            hangar__service__pb2.PingRequest.SerializeToString,\n            hangar__service__pb2.PingReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def GetClientConfig(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/GetClientConfig',\n            hangar__service__pb2.GetClientConfigRequest.SerializeToString,\n            hangar__service__pb2.GetClientConfigReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchBranchRecord(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/FetchBranchRecord',\n            hangar__service__pb2.FetchBranchRecordRequest.SerializeToString,\n            hangar__service__pb2.FetchBranchRecordReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchData(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_stream(request, target, '/hangar.HangarService/FetchData',\n            hangar__service__pb2.FetchDataRequest.SerializeToString,\n            hangar__service__pb2.FetchDataReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchCommit(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_stream(request, target, '/hangar.HangarService/FetchCommit',\n            hangar__service__pb2.FetchCommitRequest.SerializeToString,\n            hangar__service__pb2.FetchCommitReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchSchema(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/FetchSchema',\n            hangar__service__pb2.FetchSchemaRequest.SerializeToString,\n            hangar__service__pb2.FetchSchemaReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushBranchRecord(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushBranchRecord',\n            hangar__service__pb2.PushBranchRecordRequest.SerializeToString,\n            hangar__service__pb2.PushBranchRecordReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushData(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_unary(request_iterator, target, '/hangar.HangarService/PushData',\n            hangar__service__pb2.PushDataRequest.SerializeToString,\n            hangar__service__pb2.PushDataReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushCommit(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_unary(request_iterator, target, '/hangar.HangarService/PushCommit',\n            hangar__service__pb2.PushCommitRequest.SerializeToString,\n            hangar__service__pb2.PushCommitReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushSchema(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushSchema',\n            hangar__service__pb2.PushSchemaRequest.SerializeToString,\n            hangar__service__pb2.PushSchemaReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchFindMissingCommits(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/FetchFindMissingCommits',\n            hangar__service__pb2.FindMissingCommitsRequest.SerializeToString,\n            hangar__service__pb2.FindMissingCommitsReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchFindMissingHashRecords(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_stream(request_iterator, target, '/hangar.HangarService/FetchFindMissingHashRecords',\n            hangar__service__pb2.FindMissingHashRecordsRequest.SerializeToString,\n            hangar__service__pb2.FindMissingHashRecordsReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchFindMissingSchemas(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/FetchFindMissingSchemas',\n            hangar__service__pb2.FindMissingSchemasRequest.SerializeToString,\n            hangar__service__pb2.FindMissingSchemasReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushFindMissingCommits(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushFindMissingCommits',\n            hangar__service__pb2.FindMissingCommitsRequest.SerializeToString,\n            hangar__service__pb2.FindMissingCommitsReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushFindMissingHashRecords(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_stream(request_iterator, target, '/hangar.HangarService/PushFindMissingHashRecords',\n            hangar__service__pb2.FindMissingHashRecordsRequest.SerializeToString,\n            hangar__service__pb2.FindMissingHashRecordsReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushFindMissingSchemas(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushFindMissingSchemas',\n            hangar__service__pb2.FindMissingSchemasRequest.SerializeToString,\n            hangar__service__pb2.FindMissingSchemasReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def FetchFindDataOrigin(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_stream(request_iterator, target, '/hangar.HangarService/FetchFindDataOrigin',\n            hangar__service__pb2.DataOriginRequest.SerializeToString,\n            hangar__service__pb2.DataOriginReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushFindDataOrigin(request_iterator,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.stream_stream(request_iterator, target, '/hangar.HangarService/PushFindDataOrigin',\n            hangar__service__pb2.PushFindDataOriginRequest.SerializeToString,\n            hangar__service__pb2.PushFindDataOriginReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushBeginContext(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushBeginContext',\n            hangar__service__pb2.PushBeginContextRequest.SerializeToString,\n            hangar__service__pb2.PushBeginContextReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n\n    @staticmethod\n    def PushEndContext(request,\n            target,\n            options=(),\n            channel_credentials=None,\n            call_credentials=None,\n            compression=None,\n            wait_for_ready=None,\n            timeout=None,\n            metadata=None):\n        return grpc.experimental.unary_unary(request, target, '/hangar.HangarService/PushEndContext',\n            hangar__service__pb2.PushEndContextRequest.SerializeToString,\n            hangar__service__pb2.PushEndContextReply.FromString,\n            options, channel_credentials,\n            call_credentials, compression, wait_for_ready, timeout, metadata)\n"
  },
  {
    "path": "src/hangar/remote/header_manipulator_client_interceptor.py",
    "content": "\"\"\"Interceptor that adds headers to outgoing requests\n\nPortions of this code have been taken and modified from the \"gRPC\" project.\n\nURL:      https://github.com/grpc/grpc/\nFile:     examples/python/interceptors/default_value/header_manipulator_client_interceptor.py\nCommit:   87cd994b0477e98c976e7b321b3c1f52666ab5e0\nAccessed: 23 APR 2019\n\ngRPC License\n-------------------------------------------------------------------------------\nCopyright 2017 gRPC authors.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use\nthis file except in compliance with the License. You may obtain a copy of the\nLicense at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed\nunder the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR\nCONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n\"\"\"\nfrom collections import namedtuple\n\nimport grpc\n\n\nclass _GenericClientInterceptor(\n        grpc.UnaryUnaryClientInterceptor, grpc.UnaryStreamClientInterceptor,\n        grpc.StreamUnaryClientInterceptor, grpc.StreamStreamClientInterceptor):\n    \"\"\"Base class for interceptors that operate on all RPC types.\"\"\"\n\n    def __init__(self, interceptor_function):\n        self._fn = interceptor_function\n\n    def intercept_unary_unary(self, continuation, client_call_details, request):\n        new_details, new_request_iterator, postprocess = self._fn(\n            client_call_details, iter((request,)), False, False)\n        response = continuation(new_details, next(new_request_iterator))\n        return postprocess(response) if postprocess else response\n\n    def intercept_unary_stream(self, continuation, client_call_details,\n                               request):\n        new_details, new_request_iterator, postprocess = self._fn(\n            client_call_details, iter((request,)), False, True)\n        response_it = continuation(new_details, next(new_request_iterator))\n        return postprocess(response_it) if postprocess else response_it\n\n    def intercept_stream_unary(self, continuation, client_call_details,\n                               request_iterator):\n        new_details, new_request_iterator, postprocess = self._fn(\n            client_call_details, request_iterator, True, False)\n        response = continuation(new_details, new_request_iterator)\n        return postprocess(response) if postprocess else response\n\n    def intercept_stream_stream(self, continuation, client_call_details,\n                                request_iterator):\n        new_details, new_request_iterator, postprocess = self._fn(\n            client_call_details, request_iterator, True, True)\n        response_it = continuation(new_details, new_request_iterator)\n        return postprocess(response_it) if postprocess else response_it\n\n\ndef create_client_interceptor(intercept_call):\n    return _GenericClientInterceptor(intercept_call)\n\n\nclass _ClientCallDetails(\n        namedtuple(\n            typename='_ClientCallDetails',\n            field_names=('method', 'timeout', 'metadata', 'credentials')),\n        grpc.ClientCallDetails):\n    pass\n\n\ndef header_adder_interceptor(header, value):\n    \"\"\"Interceptor that adds headers to outgoing requests.\"\"\"\n\n    def intercept_call(client_call_details, request_iterator, request_streaming,\n                       response_streaming):\n        metadata = []\n        if client_call_details.metadata is not None:\n            metadata = list(client_call_details.metadata)\n\n        if (header != '') and (value != ''):\n            metadata.append((header, value))\n        client_call_details = _ClientCallDetails(\n            client_call_details.method,\n            client_call_details.timeout,\n            metadata,\n            client_call_details.credentials)\n        return client_call_details, request_iterator, None\n\n    return create_client_interceptor(intercept_call)"
  },
  {
    "path": "src/hangar/remote/request_header_validator_interceptor.py",
    "content": "\"\"\"Interceptor that ensures a specific header is present.\n\nPortions of this code have been taken and modified from the \"gRPC\" project.\n\nURL:      https://github.com/grpc/grpc/\nFile:     examples/python/interceptors/default_value/default_value_client_interceptor.py\nCommit:   6146151a4fe1e28921c12d1ae5635e113a24b9d7\nAccessed: 23 APR 2019\n\ngRPC License\n-------------------------------------------------------------------------------\nCopyright 2017 gRPC authors.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use\nthis file except in compliance with the License. You may obtain a copy of the\nLicense at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed\nunder the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR\nCONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n\"\"\"\nfrom os.path import split\n\nimport grpc\n\n\nSERVICE_METHOD_TYPES = {\n    'PING': 'uu',\n    'GetClientConfig': 'uu',\n    'FetchBranchRecord': 'uu',\n    'FetchData': 'us',\n    'FetchCommit': 'us',\n    'FetchSchema': 'uu',\n    'PushBranchRecord': 'uu',\n    'PushData': 'su',\n    'PushCommit': 'su',\n    'PushSchema': 'uu',\n    'FetchFindMissingCommits': 'uu',\n    'FetchFindMissingHashRecords': 'ss',\n    'FetchFindMissingSchemas': 'uu',\n    'PushFindMissingCommits': 'uu',\n    'PushFindMissingHashRecords': 'ss',\n    'PushFindMissingSchemas': 'uu',\n    'FetchFindDataOrigin': 'ss',\n    'PushFindDataOrigin': 'ss',\n    'PushBeginContext': 'uu',\n    'PushEndContext': 'uu',\n}\n\n\ndef _unary_unary_rpc_terminator(code, details):\n    def terminate(ignored_request, context):\n        context.abort(code, details)\n    return grpc.unary_unary_rpc_method_handler(terminate)\n\n\ndef _unary_stream_rpc_terminator(code, details):  # pragma: no cover\n    def terminate(ignored_request, context):\n        context.abort(code, details)\n    return grpc.unary_stream_rpc_method_handler(terminate)\n\n\ndef _stream_unary_rpc_terminator(code, details):  # pragma: no cover\n    def terminate(ignored_request, context):\n        context.abort(code, details)\n    return grpc.unary_stream_rpc_method_handler(terminate)\n\n\ndef _stream_stream_rpc_terminator(code, details):  # pragma: no cover\n    def terminate(ignored_request, context):\n        context.abort(code, details)\n    return grpc.stream_stream_rpc_method_handler(terminate)\n\n\ndef _select_rpc_terminator(intercepted_method):\n    method_type = SERVICE_METHOD_TYPES[intercepted_method]\n\n    if method_type == 'uu':\n        return _unary_unary_rpc_terminator\n    elif method_type == 'su':  # pragma: no cover\n        return _stream_unary_rpc_terminator\n    elif method_type == 'us':  # pragma: no cover\n        return _unary_stream_rpc_terminator\n    elif method_type == 'ss':  # pragma: no cover\n        return _stream_stream_rpc_terminator\n    else:                      # pragma: no cover\n        raise ValueError(f'unknown method type: {method_type} for service: {intercepted_method}')\n\n\nclass RequestHeaderValidatorInterceptor(grpc.ServerInterceptor):\n\n    def __init__(self, push_restricted, header, value, code, details):\n        self._push_restricted = push_restricted\n        self._header = header\n        self._value = value\n        self._code = code\n        self._details = details\n\n    def intercept_service(self, continuation, handler_call_details):\n        _, intercepted_method = split(handler_call_details.method)\n        print(f'intercepted method: {intercepted_method}')\n\n        if (intercepted_method.startswith('Push') is True) and (self._push_restricted is True):\n            if (self._header, self._value) in handler_call_details.invocation_metadata:\n                return continuation(handler_call_details)\n            else:\n                return _select_rpc_terminator(intercepted_method)(self._code, self._details)\n        else:\n            return continuation(handler_call_details)\n"
  },
  {
    "path": "src/hangar/remote/server.py",
    "content": "import configparser\nimport os\nimport shutil\nimport tempfile\nimport traceback\nimport warnings\nfrom concurrent import futures\nfrom os.path import join as pjoin\nfrom pathlib import Path\nfrom pprint import pprint as pp\nfrom threading import Lock\nfrom typing import Union, Iterable\n\nimport blosc\nimport grpc\nimport lmdb\n\nfrom . import (\n    chunks,\n    hangar_service_pb2,\n    hangar_service_pb2_grpc,\n    request_header_validator_interceptor,\n)\nfrom .content import ContentWriter, DataWriter\nfrom .. import constants as c\nfrom ..backends import BACKEND_ACCESSOR_MAP, backend_decoder\nfrom ..context import Environments\nfrom ..records import (\n    commiting,\n    hashs,\n    heads,\n    parsing,\n    queries,\n    summarize,\n    hash_schema_db_key_from_raw_key,\n    hash_data_db_key_from_raw_key,\n)\nfrom ..records.hashmachine import hash_func_from_tcode\nfrom ..txnctx import TxnRegister\nfrom ..utils import set_blosc_nthreads\n\nset_blosc_nthreads()\n\n\ndef server_config(server_dir, *, create: bool = True) -> configparser.ConfigParser:\n    CFG = configparser.ConfigParser()\n    dst_dir = Path(server_dir)\n    dst_path = dst_dir.joinpath(c.CONFIG_SERVER_NAME)\n    if dst_path.is_file():\n        CFG.read(dst_path)\n        print(f'Found Config File at {dst_path}')\n    else:\n        if create:\n            dst_dir.mkdir(exist_ok=True)\n            print(f'Creating Server Config File in {dst_path}')\n            src_path = Path(os.path.dirname(__file__), c.CONFIG_SERVER_NAME)\n            shutil.copyfile(src_path, dst_path)\n            CFG.read(src_path)\n        else:\n            src_path = Path(os.path.dirname(__file__), c.CONFIG_SERVER_NAME)\n            CFG.read(src_path)\n    return CFG\n\n\ndef context_abort_with_exception_traceback(\n        context: grpc.ServicerContext,\n        exc: Exception,\n        status_code: grpc.StatusCode\n):\n    context.abort(\n        code=status_code,\n        details=(f'Exception Type: {type(exc)} \\n'\n                 f'Exception Message: {exc} \\n'\n                 f'Traceback: \\n {traceback.format_tb(exc.__traceback__)}'))\n\n\ndef context_abort_with_handled_error(\n        context: grpc.ServicerContext,\n        message: str, status_code:\n        grpc.StatusCode\n):\n    context.abort(code=status_code, details=message)\n\n\nclass HangarServer(hangar_service_pb2_grpc.HangarServiceServicer):\n\n    def __init__(self, repo_path: Union[str, bytes, Path], overwrite=False):\n\n        if isinstance(repo_path, (str, bytes)):\n            repo_path = Path(repo_path)\n\n        with warnings.catch_warnings():\n            warnings.simplefilter('ignore', UserWarning)\n            envs = Environments(pth=repo_path)\n        self.env: Environments = envs\n        self.data_writer_lock = Lock()\n        self.hash_reader_lock = Lock()\n\n        try:\n            self.env.init_repo(\n                user_name='SERVER_USER',\n                user_email='SERVER_USER@HANGAR.SERVER',\n                remove_old=overwrite)\n        except OSError:\n            pass\n\n        self._rFs = {}\n        for backend, accessor in BACKEND_ACCESSOR_MAP.items():\n            if accessor is not None:\n                self._rFs[backend] = accessor(\n                    repo_path=self.env.repo_path,\n                    schema_shape=None,\n                    schema_dtype=None)\n                self._rFs[backend].open(mode='r')\n\n        self.CFG = server_config(repo_path, create=True)\n        print(f'Server Started with Config:')\n        pp({k: dict(v) for k, v in self.CFG.items()})\n        self.txnregister = TxnRegister()\n        self.repo_path = self.env.repo_path\n        self.data_dir = pjoin(self.repo_path, c.DIR_DATA)\n        self.CW = ContentWriter(self.env)\n        self.DW = DataWriter(self.env)\n\n    def close(self):\n        for backend_accessor in self._rFs.values():\n            backend_accessor.close()\n        self.env._close_environments()\n\n    # -------------------- Client Config --------------------------------------\n\n    def PING(self, request, context):\n        \"\"\"Test function. PING -> PONG!\n        \"\"\"\n        reply = hangar_service_pb2.PingReply(result='PONG')\n        return reply\n\n    def GetClientConfig(self, request, context):\n        \"\"\"Return parameters to the client to set up channel options as desired by the server.\n        \"\"\"\n        clientCFG = self.CFG['CLIENT_GRPC']\n        push_max_nbytes = clientCFG['push_max_nbytes']\n        enable_compression = clientCFG['enable_compression']\n        optimization_target = clientCFG['optimization_target']\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.GetClientConfigReply(error=err)\n        reply.config['push_max_nbytes'] = push_max_nbytes\n        reply.config['enable_compression'] = enable_compression\n        reply.config['optimization_target'] = optimization_target\n        return reply\n\n    # -------------------- Branch Record --------------------------------------\n\n    def FetchBranchRecord(self, request, context):\n        \"\"\"Return the current HEAD commit of a particular branch\n        \"\"\"\n        branch_name = request.rec.name\n        try:\n            head = heads.get_branch_head_commit(self.env.branchenv, branch_name)\n            rec = hangar_service_pb2.BranchRecord(name=branch_name, commit=head)\n            err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n            reply = hangar_service_pb2.FetchBranchRecordReply(rec=rec, error=err)\n            return reply\n        except ValueError:\n            msg = f'BRANCH: {branch_name} DOES NOT EXIST ON SERVER.'\n            context_abort_with_handled_error(\n                context=context, message=msg, status_code=grpc.StatusCode.NOT_FOUND)\n            return\n\n    def PushBranchRecord(self, request, context):\n        \"\"\"Update the HEAD commit of a branch, creating the record if not previously existing.\n        \"\"\"\n        branch_name = request.rec.name\n        commit = request.rec.commit\n        branch_names = heads.get_branch_names(self.env.branchenv)\n        if branch_name not in branch_names:\n            heads.create_branch(self.env.branchenv, name=branch_name, base_commit=commit)\n            err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        else:\n            current_head = heads.get_branch_head_commit(self.env.branchenv, branch_name)\n            if current_head == commit:\n                msg = f'NO CHANGE TO BRANCH: {branch_name} WITH HEAD: {current_head}'\n                context_abort_with_handled_error(\n                    context=context, message=msg, status_code=grpc.StatusCode.ALREADY_EXISTS)\n                return\n            else:\n                heads.set_branch_head_commit(self.env.branchenv, branch_name, commit)\n                err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n\n        reply = hangar_service_pb2.PushBranchRecordReply(error=err)\n        return reply\n\n    # -------------------------- Commit Record --------------------------------\n\n    def FetchCommit(self, request, context):\n        \"\"\"Return raw data representing contents, spec, and parents of a commit hash.\n        \"\"\"\n        commit = request.commit\n        commitRefKey = parsing.commit_ref_db_key_from_raw_key(commit)\n        commitParentKey = parsing.commit_parent_db_key_from_raw_key(commit)\n        commitSpecKey = parsing.commit_spec_db_key_from_raw_key(commit)\n\n        reftxn = self.txnregister.begin_reader_txn(self.env.refenv)\n        try:\n            commitRefVal = reftxn.get(commitRefKey, default=False)\n            commitParentVal = reftxn.get(commitParentKey, default=False)\n            commitSpecVal = reftxn.get(commitSpecKey, default=False)\n        finally:\n            self.txnregister.abort_reader_txn(self.env.refenv)\n\n        if commitRefVal is False:\n            msg = f'COMMIT: {commit} DOES NOT EXIST ON SERVER'\n            context.set_details(msg)\n            context.set_code(grpc.StatusCode.NOT_FOUND)\n            err = hangar_service_pb2.ErrorProto(code=5, message=msg)\n            reply = hangar_service_pb2.FetchCommitReply(commit=commit, error=err)\n            yield reply\n            raise StopIteration()\n        else:\n            raw_data_chunks = chunks.chunk_bytes(commitRefVal)\n            bsize = len(commitRefVal)\n            commit_proto = hangar_service_pb2.CommitRecord()\n            commit_proto.parent = commitParentVal\n            commit_proto.spec = commitSpecVal\n            reply = hangar_service_pb2.FetchCommitReply(commit=commit, total_byte_size=bsize)\n            for chunk in raw_data_chunks:\n                commit_proto.ref = chunk\n                reply.record.CopyFrom(commit_proto)\n                yield reply\n\n    def PushCommit(self, request_iterator, context):\n        \"\"\"Record the contents of a new commit sent to the server.\n\n        Will not overwrite data if a commit hash is already recorded on the server.\n        \"\"\"\n        for idx, request in enumerate(request_iterator):\n            if idx == 0:\n                commit = request.commit\n                refBytes, offset = bytearray(request.total_byte_size), 0\n                specVal = request.record.spec\n                parentVal = request.record.parent\n            size = len(request.record.ref)\n            refBytes[offset: offset + size] = request.record.ref\n            offset += size\n\n        digest = self.CW.commit(commit, parentVal, specVal, refBytes)\n        if not digest:\n            msg = f'COMMIT: {commit} ALREADY EXISTS'\n            context.set_code(grpc.StatusCode.ALREADY_EXISTS)\n            context.set_details(msg)\n            err = hangar_service_pb2.ErrorProto(code=6, message=msg)\n        else:\n            err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n            commiting.move_process_data_to_store(self.env.repo_path, remote_operation=True)\n\n        reply = hangar_service_pb2.PushCommitReply(error=err)\n        return reply\n\n    # --------------------- Schema Record -------------------------------------\n\n    def FetchSchema(self, request, context):\n        \"\"\"Return the raw byte specification of a particular schema with requested hash.\n        \"\"\"\n        schema_hash = request.rec.digest\n        schemaKey = hash_schema_db_key_from_raw_key(schema_hash)\n        hashTxn = self.txnregister.begin_reader_txn(self.env.hashenv)\n        try:\n            schemaExists = hashTxn.get(schemaKey, default=False)\n            if schemaExists is not False:\n                print(f'found schema: {schema_hash}')\n                rec = hangar_service_pb2.SchemaRecord(digest=schema_hash, blob=schemaExists)\n                err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n            else:\n                print(f'not exists: {schema_hash}')\n                msg = f'SCHEMA HASH: {schema_hash} DOES NOT EXIST ON SERVER'\n                context.set_details(msg)\n                context.set_code(grpc.StatusCode.NOT_FOUND)\n                err = hangar_service_pb2.ErrorProto(code=5, message=msg)\n                rec = hangar_service_pb2.SchemaRecord(digest=schema_hash)\n        finally:\n            self.txnregister.abort_reader_txn(self.env.hashenv)\n\n        reply = hangar_service_pb2.FetchSchemaReply(rec=rec, error=err)\n        return reply\n\n    def PushSchema(self, request, context):\n        \"\"\"Add a new schema byte specification record.\n\n        Will not overwrite a schema hash which already exists on the server.\n        \"\"\"\n        schema_hash = request.rec.digest\n        schema_val = request.rec.blob\n\n        digest = self.CW.schema(schema_hash, schema_val)\n        if not digest:\n            print(f'exists: {schema_val}')\n            msg = f'SCHEMA: {schema_hash} ALREADY EXISTS ON SERVER'\n            context.set_details(msg)\n            context.set_code(grpc.StatusCode.ALREADY_EXISTS)\n            err = hangar_service_pb2.ErrorProto(code=6, message=msg)\n        else:\n            print(f'created new: {schema_val}')\n            err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.PushSchemaReply(error=err)\n        return reply\n\n    # ---------------------------- Data ---------------------------------------\n\n    def FetchFindDataOrigin(self, request_iterator, context):\n        digests = []\n        for request in request_iterator:\n            digests.append(request.digest)\n\n        hashTxn = self.txnregister.begin_reader_txn(self.env.hashenv)\n        try:\n            for digest in digests:\n                hashKey = hash_data_db_key_from_raw_key(digest)\n                hashVal = hashTxn.get(hashKey, default=False)\n                if hashVal is False:\n                    msg = f'HASH DOES NOT EXIST: {hashKey}'\n                    context.set_details(msg)\n                    context.set_code(grpc.StatusCode.NOT_FOUND)\n                    err = hangar_service_pb2.ErrorProto(code=5, message=msg)\n                    reply = hangar_service_pb2.FetchDataReply(error=err)\n                    yield reply\n                    raise StopIteration()\n                else:\n                    spec = backend_decoder(hashVal)\n                    if spec.backend in ['01', '00', '10']:\n                        dtype = hangar_service_pb2.DataType.NP_ARRAY\n                    elif spec.backend == '30':\n                        dtype = hangar_service_pb2.DataType.STR\n                    elif spec.backend == '31':\n                        dtype = hangar_service_pb2.DataType.BYTES\n                    else:\n                        raise TypeError(spec)\n\n                    response = hangar_service_pb2.DataOriginReply(\n                        location=hangar_service_pb2.DataLocation.REMOTE_SERVER,\n                        data_type=dtype,\n                        digest=digest,\n                        uri=digest,\n                        compression=True,\n                    )\n                    response.compression_opts['id'] = 'blosc'\n                    response.compression_opts['cname'] = 'blosclz'\n                    response.compression_opts['clevel'] = '3'\n                    yield response\n\n        finally:\n            self.txnregister.abort_reader_txn(self.env.hashenv)\n\n    def FetchData(self, request, context):\n        \"\"\"Return a packed byte representation of samples corresponding to a digest.\n\n        Please see comments below which explain why not all requests are\n        guaranteed to fully complete in one operation.\n\n        We receive a list of digests to send to the client. One consideration\n        we have is that there is no way to know how much memory will be used\n        when the data is read from disk. Samples are compressed against\n        each-other before going over the wire, which means its preferable to\n        read in as much as possible. However, since we don't want to overload\n        the client system when the binary blob is decompressed into individual\n        tensors, we set some maximum size which tensors can occupy when\n        uncompressed. When we receive a list of digests whose data size is in\n        excess of this limit, we just say sorry to the client, send the chunk\n        of digests/tensors off to them as is (incomplete), and request that\n        the client figure out what it still needs and ask us again.\n        \"\"\"\n        uri = request.uri\n        hashKey = hash_data_db_key_from_raw_key(uri)\n        try:\n            with self.hash_reader_lock:\n                hashTxn = self.txnregister.begin_reader_txn(self.env.hashenv)\n                hashVal = hashTxn.get(hashKey, default=False)\n                self.txnregister.abort_reader_txn(self.env.hashenv)\n        except Exception as e:\n            context_abort_with_exception_traceback(\n                context=context, exc=e, status_code=grpc.StatusCode.INTERNAL)\n            raise e\n\n        if hashVal is False:\n            exc = FileNotFoundError(f'request uri does not exist. URI: {uri}')\n            context_abort_with_exception_traceback(\n                context=context, exc=exc, status_code=grpc.StatusCode.NOT_FOUND)\n\n        spec = backend_decoder(hashVal)\n        data = self._rFs[spec.backend].read_data(spec)\n        dtype_code, raw_record = chunks.serialize_data(data)\n        compressed_record = blosc.compress(\n            raw_record, clevel=3, cname='blosclz', shuffle=blosc.NOSHUFFLE)\n\n        def replies_iterator(raw, uri, error_proto):\n            reply = hangar_service_pb2.FetchDataReply(\n                uri=uri,\n                nbytes=len(raw),\n                error=error_proto)\n            for raw_chunk in chunks.chunk_bytes(raw):\n                reply.raw_data = raw_chunk\n                yield reply\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        repliesIter = replies_iterator(compressed_record, uri, err)\n        yield from repliesIter\n\n    def PushFindDataOrigin(\n            self,\n            request_iterator: Iterable[hangar_service_pb2.PushFindDataOriginRequest],\n            context\n    ) -> hangar_service_pb2.PushFindDataOriginReply:\n\n        CONFIG_SEND_LOCATION = hangar_service_pb2.DataLocation.REMOTE_SERVER\n\n        all_requests = [req for req in request_iterator]\n        for request in all_requests:\n            if request.compression_is_desired is True:\n                reply_compression_expected = True\n                if request.data_type == hangar_service_pb2.DataType.NP_ARRAY:\n                    reply_compression_opts_expected = {\n                        'id': 'blosc',\n                        'cname': 'blosclz',\n                        'clevel': '3'\n                    }\n                elif request.data_type == hangar_service_pb2.DataType.STR:\n                    reply_compression_opts_expected = {\n                        'id': 'blosc',\n                        'cname': 'zstd',\n                        'clevel': '3'\n                    }\n                elif request.data_type == hangar_service_pb2.DataType.BYTES:\n                    reply_compression_opts_expected = {\n                        'id': 'blosc',\n                        'cname': 'blosclz',\n                        'clevel': '3'\n                    }\n                else:\n                    raise TypeError(request)\n            else:\n                reply_compression_expected = False\n                reply_compression_opts_expected = {}\n\n            if CONFIG_SEND_LOCATION == hangar_service_pb2.DataLocation.REMOTE_SERVER:\n                reply_uri = request.digest\n            else:\n                raise RuntimeError(f'CONFIG_SEND_LOCATION: {CONFIG_SEND_LOCATION}')\n\n            reply = hangar_service_pb2.PushFindDataOriginReply(\n                digest=request.digest,\n                location=CONFIG_SEND_LOCATION,\n                uri=reply_uri,\n                compression_expected=reply_compression_expected,\n                compression_opts_expected=reply_compression_opts_expected,\n            )\n            yield reply\n\n    def PushBeginContext(self, request, context):\n        try:\n            self.DW.__enter__()\n        except Exception as e:\n            context.abort(\n                code=grpc.StatusCode.INTERNAL,\n                details=(f'Exception Type: {type(e)} \\n'\n                         f'Exception Message: {e} \\n'\n                         f'Traceback: \\n {traceback.format_tb(e.__traceback__)}')\n            )\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.PushBeginContextReply(err=err)\n        return reply\n\n    def PushEndContext(self, request, context):\n        try:\n            self.DW.__exit__()\n        except Exception as e:\n            context.abort(\n                code=grpc.StatusCode.INTERNAL,\n                details=(f'Exception Type: {type(e)} \\n'\n                         f'Exception Message: {e} \\n'\n                         f'Traceback: \\n {traceback.format_tb(e.__traceback__)}')\n            )\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.PushEndContextReply(err=err)\n        return reply\n\n    def PushData(\n            self,\n            request_iterator: Iterable[hangar_service_pb2.PushDataRequest],\n            context: grpc.ServicerContext\n    ) -> hangar_service_pb2.PushDataReply:\n        \"\"\"Receive compressed streams of binary data from the client.\n\n        In order to prevent errors or malicious behavior, the cryptographic hash\n        of every tensor is calculated and compared to what the client \"said\" it\n        is. If an error is detected, no sample in the entire stream will be\n        saved to disk.\n        \"\"\"\n\n        for idx, request in enumerate(request_iterator):\n            if idx == 0:\n                if not self.DW.is_cm:\n                    context.abort(\n                        code=grpc.StatusCode.FAILED_PRECONDITION,\n                        details=f'Attept to push without opening context'\n                    )\n                uri = request.uri\n                dtype_code = request.data_type\n                schema_hash = request.schema_hash\n                dBytes = bytearray(request.nbytes)\n                offset = 0\n            size = len(request.raw_data)\n            dBytes[offset: offset + size] = request.raw_data\n            offset += size\n\n        # TODO: Handle expected vs required\n        uncompBytes = blosc.decompress(dBytes)\n\n        recieved_data = chunks.deserialize_data(dtype_code, uncompBytes)\n        hash_func = hash_func_from_tcode(str(dtype_code))\n        recieved_hash = hash_func(recieved_data)\n\n        # TODO: uri is not the correct name for this\n        if recieved_hash != uri:\n            context.abort(\n                code=grpc.StatusCode.DATA_LOSS,\n                details=f'HASH MANGLED, received: {recieved_hash} != expected digest: {uri}'\n            )\n        try:\n            with self.data_writer_lock:\n                _ = self.DW.data(schema_hash, data_digest=recieved_hash, data=recieved_data)  # returns saved)_digests\n        except Exception as e:\n            context.abort(\n                code=grpc.StatusCode.INTERNAL,\n                details=(f'Exception Type: {type(e)} \\n'\n                         f'Exception Message: {e} \\n'\n                         f'Traceback: \\n {traceback.format_tb(e.__traceback__)}')\n            )\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.PushDataReply(error=err)\n        return reply\n\n    # ------------------------ Fetch Find Missing -----------------------------------\n\n    def FetchFindMissingCommits(self, request, context):\n        \"\"\"Determine commit digests existing on the server which are not present on the client.\n        \"\"\"\n        c_branch_name = request.branch.name\n        c_ordered_commits = request.commits\n\n        try:\n            s_history = summarize.list_history(\n                refenv=self.env.refenv,\n                branchenv=self.env.branchenv,\n                branch_name=c_branch_name)\n        except ValueError:\n            msg = f'BRANCH NOT EXIST. Name: {c_branch_name}'\n            context.set_code(grpc.StatusCode.NOT_FOUND)\n            context.set_details(msg)\n            err = hangar_service_pb2.ErrorProto(code=5, message=msg)\n            reply = hangar_service_pb2.FindMissingCommitsReply(error=err)\n            return reply\n\n        s_orderset = set(s_history['order'])\n        c_orderset = set(c_ordered_commits)\n        c_missing = list(s_orderset.difference(c_orderset))   # only difference to PushFindMissingCommits\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        if len(c_missing) == 0:\n            brch = hangar_service_pb2.BranchRecord(name=c_branch_name, commit=s_history['head'])\n            reply = hangar_service_pb2.FindMissingCommitsReply(branch=brch, error=err)\n        else:\n            brch = hangar_service_pb2.BranchRecord(name=c_branch_name, commit=s_history['head'])\n            reply = hangar_service_pb2.FindMissingCommitsReply(branch=brch, error=err)\n            reply.commits.extend(c_missing)\n\n        return reply\n\n    def PushFindMissingCommits(self, request, context):\n        \"\"\"Determine commit digests existing on the client which are not present on the server.\n        \"\"\"\n        c_branch_name = request.branch.name\n        c_head_commit = request.branch.commit\n        c_ordered_commits = request.commits\n\n        s_commits = commiting.list_all_commits(self.env.refenv)\n        s_orderset = set(s_commits)\n        c_orderset = set(c_ordered_commits)\n        s_missing = list(c_orderset.difference(s_orderset))  # only difference to FetchFindMissingCommits\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        if len(s_missing) == 0:\n            brch = hangar_service_pb2.BranchRecord(name=c_branch_name, commit=c_head_commit)\n            reply = hangar_service_pb2.FindMissingCommitsReply(branch=brch, error=err)\n        else:\n            brch = hangar_service_pb2.BranchRecord(name=c_branch_name, commit=c_head_commit)\n            reply = hangar_service_pb2.FindMissingCommitsReply(branch=brch, error=err)\n            reply.commits.extend(s_missing)\n\n        return reply\n\n    def FetchFindMissingHashRecords(self, request_iterator, context):\n        \"\"\"Determine data tensor hash records existing on the server and not on the client.\n        \"\"\"\n        for idx, request in enumerate(request_iterator):\n            if idx == 0:\n                commit = request.commit\n                hBytes, offset = bytearray(request.total_byte_size), 0\n            size = len(request.hashs)\n            hBytes[offset: offset + size] = request.hashs\n            offset += size\n\n        uncompBytes = blosc.decompress(hBytes)\n        c_hashs_raw = chunks.deserialize_record_pack(uncompBytes)\n        c_hashset = set([chunks.deserialize_ident(raw).digest for raw in c_hashs_raw])\n\n        with tempfile.TemporaryDirectory() as tempD:\n            tmpDF = os.path.join(tempD, 'test.lmdb')\n            tmpDB = lmdb.open(path=tmpDF, **c.LMDB_SETTINGS)\n            commiting.unpack_commit_ref(self.env.refenv, tmpDB, commit)\n            s_hashes_schemas = queries.RecordQuery(tmpDB).data_hash_to_schema_hash()\n            s_hashes = set(s_hashes_schemas.keys())\n            tmpDB.close()\n\n        c_missing = list(s_hashes.difference(c_hashset))\n        c_hash_schemas_raw = [chunks.serialize_ident(c_mis, s_hashes_schemas[c_mis]) for c_mis in c_missing]\n        raw_pack = chunks.serialize_record_pack(c_hash_schemas_raw)\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        response_pb = hangar_service_pb2.FindMissingHashRecordsReply\n        cIter = chunks.missingHashIterator(commit, raw_pack, err, response_pb)\n        yield from cIter\n\n    def PushFindMissingHashRecords(self, request_iterator, context):\n        \"\"\"Determine data tensor hash records existing on the client and not on the server.\n        \"\"\"\n        for idx, request in enumerate(request_iterator):\n            if idx == 0:\n                commit = request.commit\n                hBytes, offset = bytearray(request.total_byte_size), 0\n            size = len(request.hashs)\n            hBytes[offset: offset + size] = request.hashs\n            offset += size\n\n        uncompBytes = blosc.decompress(hBytes)\n        c_hashs_raw = chunks.deserialize_record_pack(uncompBytes)\n        c_hashset = set([chunks.deserialize_ident(raw).digest for raw in c_hashs_raw])\n        s_hashset = set(hashs.HashQuery(self.env.hashenv).list_all_hash_keys_raw())\n        s_missing = c_hashset.difference(s_hashset)\n        s_hashs_raw = [chunks.serialize_ident(s_mis, '') for s_mis in s_missing]\n        raw_pack = chunks.serialize_record_pack(s_hashs_raw)\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        response_pb = hangar_service_pb2.FindMissingHashRecordsReply\n        cIter = chunks.missingHashIterator(commit, raw_pack, err, response_pb)\n        yield from cIter\n\n    def FetchFindMissingSchemas(self, request, context):\n        \"\"\"Determine schema hash digest records existing on the server and not on the client.\n        \"\"\"\n        commit = request.commit\n        c_schemas = set(request.schema_digests)\n\n        with tempfile.TemporaryDirectory() as tempD:\n            tmpDF = os.path.join(tempD, 'test.lmdb')\n            tmpDB = lmdb.open(path=tmpDF, **c.LMDB_SETTINGS)\n            commiting.unpack_commit_ref(self.env.refenv, tmpDB, commit)\n            s_schemas = set(queries.RecordQuery(tmpDB).schema_hashes())\n            tmpDB.close()\n\n        c_missing = list(s_schemas.difference(c_schemas))\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.FindMissingSchemasReply(commit=commit, error=err)\n        reply.schema_digests.extend(c_missing)\n        return reply\n\n    def PushFindMissingSchemas(self, request, context):\n        \"\"\"Determine schema hash digest records existing on the client and not on the server.\n        \"\"\"\n        commit = request.commit\n        c_schemas = set(request.schema_digests)\n        s_schemas = set(hashs.HashQuery(self.env.hashenv).list_all_schema_digests())\n        s_missing = list(c_schemas.difference(s_schemas))\n\n        err = hangar_service_pb2.ErrorProto(code=0, message='OK')\n        reply = hangar_service_pb2.FindMissingSchemasReply(commit=commit, error=err)\n        reply.schema_digests.extend(s_missing)\n        return reply\n\n\ndef serve(hangar_path: str,\n          overwrite: bool = False,\n          *,\n          channel_address: str = None,\n          restrict_push: bool = None,\n          username: str = None,\n          password: str = None) -> tuple:\n    \"\"\"Start serving the GRPC server. Should only be called once.\n\n    Raises:\n        e: critical error from one of the workers.\n    \"\"\"\n\n    # ------------------- Configure Server ------------------------------------\n\n    server_dir = pjoin(hangar_path, c.DIR_HANGAR_SERVER)\n    CFG = server_config(server_dir, create=False)\n    serverCFG = CFG['SERVER_GRPC']\n    enable_compression = serverCFG['enable_compression']\n    if enable_compression == 'NoCompression':\n        compression_val = grpc.Compression.NoCompression\n    elif enable_compression == 'Deflate':\n        compression_val = grpc.Compression.Deflate\n    elif enable_compression == 'Gzip':\n        compression_val = grpc.Compression.Gzip\n    else:\n        compression_val = grpc.Compression.NoCompression\n\n    optimization_target = serverCFG['optimization_target']\n    if channel_address is None:\n        channel_address = serverCFG['channel_address']\n    max_thread_pool_workers = int(serverCFG['max_thread_pool_workers'])\n    max_concurrent_rpcs = int(serverCFG['max_concurrent_rpcs'])\n\n    adminCFG = CFG['SERVER_ADMIN']\n    if (restrict_push is None) and (username is None) and (password is None):\n        admin_restrict_push = bool(int(adminCFG['restrict_push']))\n        admin_username = adminCFG['username']\n        admin_password = adminCFG['password']\n    else:\n        admin_restrict_push = restrict_push\n        admin_username = username\n        admin_password = password\n    msg = 'PERMISSION ERROR: PUSH OPERATIONS RESTRICTED FOR CALLER'\n    code = grpc.StatusCode.PERMISSION_DENIED\n    interc = request_header_validator_interceptor.RequestHeaderValidatorInterceptor(\n        admin_restrict_push, admin_username, admin_password, code, msg)\n\n    # ---------------- Start the thread pool for the grpc server --------------\n\n    grpc_thread_pool = futures.ThreadPoolExecutor(\n        max_workers=max_thread_pool_workers,\n        thread_name_prefix='grpc_thread_pool')\n    server = grpc.server(\n        thread_pool=grpc_thread_pool,\n        maximum_concurrent_rpcs=max_concurrent_rpcs,\n        options=[('grpc.optimization_target', optimization_target)],\n        compression=compression_val,\n        interceptors=(interc,))\n\n    # ------------------- Start the GRPC server -------------------------------\n\n    hangserv = HangarServer(server_dir, overwrite)\n    hangar_service_pb2_grpc.add_HangarServiceServicer_to_server(hangserv, server)\n    port = server.add_insecure_port(channel_address)\n    if port == 0:\n        server.stop(0.1)\n        server.wait_for_termination(timeout=2)\n        raise OSError(f'Unable to bind port, adddress {channel_address} already in use.')\n    return (server, hangserv, channel_address)\n\n\nif __name__ == '__main__':\n    workdir = os.getcwd()\n    print(workdir)\n    serve(workdir)\n"
  },
  {
    "path": "src/hangar/remotes.py",
    "content": "import logging\nimport tempfile\nimport time\nimport warnings\nfrom collections import defaultdict\nfrom contextlib import closing\nfrom pathlib import Path\nfrom typing import (\n    List, NamedTuple, Optional, Sequence, Union, Tuple, Set, Dict\n)\n\nimport grpc\nimport lmdb\nfrom tqdm import tqdm\n\nfrom .backends import backend_decoder\nfrom .constants import LMDB_SETTINGS\nfrom .context import Environments\nfrom .records import hash_data_db_key_from_raw_key\nfrom .records import heads, queries, summarize\nfrom .records.commiting import (\n    check_commit_hash_in_history,\n    move_process_data_to_store,\n    unpack_commit_ref,\n)\nfrom .remote.client import HangarClient\nfrom .remote.content import ContentWriter, ContentReader, DataWriter\nfrom .txnctx import TxnRegister\nfrom .utils import is_suitable_user_key\n\nlogger = logging.getLogger(__name__)\n\nRemoteInfo = NamedTuple('RemoteInfo', [('name', str), ('address', str)])\n\nKeyType = Union[str, int]\n\n\nclass Remotes(object):\n    \"\"\"Class which governs access to remote interactor objects.\n\n    .. note::\n\n       The remote-server implementation is under heavy development, and is\n       likely to undergo changes in the Future. While we intend to ensure\n       compatability between software versions of Hangar repositories written\n       to disk, the API is likely to change. Please follow our process at:\n       https://www.github.com/tensorwerk/hangar-py\n\n    \"\"\"\n\n    def __init__(self, env: Environments):\n\n        self._env: Environments = env\n        self._repo_path: Path = self._env.repo_path\n        self._client: Optional[HangarClient] = None\n\n    def __verify_repo_initialized(self):\n        \"\"\"Internal method to verify repo initialized before operations occur\n\n        Raises\n        ------\n        RuntimeError\n            If the repository db environments have not been initialized at the\n            specified repo path.\n        \"\"\"\n        if not self._env.repo_is_initialized:\n            raise RuntimeError(\n                f'Path {self._repo_path} not Hangar Repo. Use `init_repo()` method')\n\n    def add(self, name: str, address: str) -> RemoteInfo:\n        \"\"\"Add a remote to the repository accessible by `name` at `address`.\n\n        Parameters\n        ----------\n        name\n            the name which should be used to refer to the remote server\n            (i.e. 'origin')\n        address\n            the IP:PORT where the hangar server is running\n\n        Returns\n        -------\n        RemoteInfo\n            Two-tuple containing (``name``, ``address``) of the remote added to\n            the client's server list.\n\n        Raises\n        ------\n        ValueError\n            If provided name contains any non ascii letter characters\n            characters, or if the string is longer than 64 characters long.\n        ValueError\n            If a remote with the provided name is already listed on this client,\n            No-Op. In order to update a remote server address, it must be\n            removed and then re-added with the desired address.\n        \"\"\"\n        self.__verify_repo_initialized()\n        if (not isinstance(name, str)) or (not is_suitable_user_key(name)):\n            raise ValueError(\n                f'Remote name {name} of type: {type(name)} invalid. Must be '\n                f'string with only alpha-numeric (or \".\" \"_\" \"-\") ascii characters. '\n                f'Must be <= 64 characters long.')\n\n        succ = heads.add_remote(self._env.branchenv, name=name, address=address)\n        if succ is False:\n            raise ValueError(f'No-Op: Remote named: {name} already exists.')\n        return RemoteInfo(name=name, address=address)\n\n    def remove(self, name: str) -> RemoteInfo:\n        \"\"\"Remove a remote repository from the branch records\n\n        Parameters\n        ----------\n        name\n            name of the remote to remove the reference to\n\n        Raises\n        ------\n        KeyError\n            If a remote with the provided name does not exist\n\n        Returns\n        -------\n        RemoteInfo\n            The channel address which was removed at the given remote name\n        \"\"\"\n        self.__verify_repo_initialized()\n        try:\n            address = heads.remove_remote(branchenv=self._env.branchenv, name=name)\n        except KeyError as e:\n            raise e\n        return RemoteInfo(name=name, address=address)\n\n    def list_all(self) -> List[RemoteInfo]:\n        \"\"\"List all remote names and addresses recorded in the client's repository.\n\n        Returns\n        -------\n        List[RemoteInfo]\n            list of namedtuple specifying (``name``, ``address``) for each\n            remote server recorded in the client repo.\n        \"\"\"\n        self.__verify_repo_initialized()\n        res = []\n        names = heads.get_remote_names(self._env.branchenv)\n        for name in names:\n            address = heads.get_remote_address(self._env.branchenv, name)\n            res.append(RemoteInfo(name=name, address=address))\n        return res\n\n    def ping(self, name: str) -> float:\n        \"\"\"Ping remote server and check the round trip time.\n\n        Parameters\n        ----------\n        name\n            name of the remote server to ping\n\n        Returns\n        -------\n        float\n            round trip time it took to ping the server after the connection was\n            established and requested client configuration was retrieved\n\n        Raises\n        ------\n        KeyError\n            If no remote with the provided name is recorded.\n        ConnectionError\n            If the remote server could not be reached.\n        \"\"\"\n        self.__verify_repo_initialized()\n        address = heads.get_remote_address(branchenv=self._env.branchenv, name=name)\n        self._client = HangarClient(envs=self._env, address=address)\n        with closing(self._client) as client:\n            client: HangarClient\n            start = time.time()\n            client.ping_pong()\n            elapsed = time.time() - start\n        return elapsed\n\n    def fetch(self, remote: str, branch: str) -> str:\n        \"\"\"Retrieve new commits made on a remote repository branch.\n\n        This is semantically identical to a `git fetch` command. Any new commits\n        along the branch will be retrieved, but placed on an isolated branch to\n        the local copy (ie. ``remote_name/branch_name``). In order to unify\n        histories, simply merge the remote branch into the local branch.\n\n        Parameters\n        ----------\n        remote\n            name of the remote repository to fetch from (ie. ``origin``)\n        branch\n            name of the branch to fetch the commit references for.\n\n        Returns\n        -------\n        str\n            Name of the branch which stores the retrieved commits.\n        \"\"\"\n        self.__verify_repo_initialized()\n        address = heads.get_remote_address(self._env.branchenv, name=remote)\n        self._client = HangarClient(envs=self._env, address=address)\n        CW = ContentWriter(self._env)\n\n        with closing(self._client) as client:\n            client: HangarClient\n\n            # ----------------- setup / validate operations -------------------\n\n            try:\n                cHEAD = heads.get_branch_head_commit(self._env.branchenv, branch)\n            except ValueError:\n                # branch does not exist on local client\n                try:\n                    s_branch = client.fetch_branch_record(branch)\n                    sHEAD = s_branch.rec.commit\n                except grpc.RpcError as rpc_error:\n                    if rpc_error.code() == grpc.StatusCode.NOT_FOUND:\n                        # branch does not exist on remote\n                        logger.error(rpc_error.details())\n                    raise rpc_error\n            else:\n                c_bhistory = summarize.list_history(\n                    self._env.refenv, self._env.branchenv, branch_name=branch)\n                try:\n                    s_branch = client.fetch_branch_record(branch)\n                    sHEAD = s_branch.rec.commit\n                except grpc.RpcError as rpc_error:\n                    if rpc_error.code() == grpc.StatusCode.NOT_FOUND:\n                        # branch does not exist on remote\n                        logger.error(rpc_error.details())\n                    raise rpc_error\n\n                # verify histories are intact and should be synced\n                if sHEAD == cHEAD:\n                    warnings.warn(f'NoOp:  {sHEAD} == client HEAD {cHEAD}', UserWarning)\n                    return branch\n                elif sHEAD in c_bhistory['order']:\n                    warnings.warn(\n                        f'REJECTED: remote HEAD: {sHEAD} behind local: {cHEAD}', UserWarning)\n                    return branch\n\n            # ------------------- get data ------------------------------------\n\n            mCmtResponse = client.fetch_find_missing_commits(branch)\n            m_cmts = mCmtResponse.commits\n            for commit in tqdm(m_cmts, desc='fetching commit data refs'):\n                mSchemaResponse = client.fetch_find_missing_schemas(commit)\n                for schema in mSchemaResponse.schema_digests:\n                    schema_hash, schemaVal = client.fetch_schema(schema)\n                    CW.schema(schema_hash, schemaVal)\n                # Record missing data hash digests (does not get data itself)\n                m_hashes = client.fetch_find_missing_hash_records(commit)\n                m_schema_hash_map = defaultdict(list)\n                for digest, schema_hash in m_hashes:\n                    m_schema_hash_map[schema_hash].append((digest, schema_hash))\n\n                DW = DataWriter(self._env)\n                with DW as DW_CM:\n                    for schema_hash, m_digests_schemas in m_schema_hash_map.items():\n                        for data_digest, data_schema_hash in m_digests_schemas:\n                            DW_CM.data(schema_hash,\n                                       data_digest=data_digest,\n                                       data=data_schema_hash,\n                                       backend='50')\n\n            # Get missing commit reference specification\n            for commit in tqdm(m_cmts, desc='fetching commit spec'):\n                cmt, parentVal, specVal, refVal = client.fetch_commit_record(commit)\n                CW.commit(cmt, parentVal, specVal, refVal)\n\n            # --------------------------- At completion -----------------------\n\n            # Update (or create) remote branch pointer with new HEAD commit\n            fetchBranchName = f'{remote}/{branch}'\n            try:\n                heads.create_branch(\n                    self._env.branchenv, name=fetchBranchName, base_commit=sHEAD)\n            except ValueError:\n                heads.set_branch_head_commit(\n                    self._env.branchenv, branch_name=fetchBranchName, commit_hash=sHEAD)\n\n            return fetchBranchName\n\n    def fetch_data_sample(self,\n                          remote: str,\n                          column: str,\n                          samples: Union[KeyType, Sequence[KeyType],\n                                         Sequence[Union[Tuple[KeyType, KeyType], Tuple[KeyType], KeyType]]],\n                          branch: Optional[str] = None,\n                          commit: Optional[str] = None) -> str:\n        \"\"\"Granular fetch data operation allowing selection of individual samples.\n\n        .. warning::\n\n            This is a specialized version of the :meth:`fetch_data` method for use\n            in specilized situations where some prior knowledge is known about the data.\n            Most users should prefer :meth:`fetch_data` over this version.\n\n        In some cases, it may be desireable to only perform a fetch data operation\n        for some particular samples within a column (without needing to download any\n        other data contained in the column). This method allows for the granular\n        specification of keys to fetch in a certain column at the selected `branch` /\n        `commit` time point.\n\n        Parameters\n        ----------\n        remote\n            name of the remote server to pull data from\n        column\n            name of the column which data is being fetched from.\n        samples\n            Key, or sequence of sample keys to select.\n\n            *  Flat column layouts should provide just a single key, or flat sequence of\n               keys which will be fetched from the server. ie. `sample1` OR\n               [`sample1`, `sample2`, `sample3`, etc.]\n\n            *  Nested column layouts can provide tuples specifying `(sample, subsample)`\n               records to retrieve, tuples with an `Ellipsis` character in the `subsample`\n               index `(sample, ...)` (which will fetch all subsamples for the given sample),\n               or can provide lone sample keys in the sequences `sample` (which will also fetch\n               all subsamples listed under the sample) OR ANY COMBINATION of the above.\n        branch\n            branch head to operate on, either ``branch`` or ``commit`` argument must be\n            passed, but NOT both. Default is ``None``\n        commit\n            commit to operate on, either `branch` or `commit` argument must be passed,\n            but NOT both.\n\n        Returns\n        -------\n        str\n            On success, the commit hash which data was fetched into.\n        \"\"\"\n        self.__verify_repo_initialized()\n        address = heads.get_remote_address(branchenv=self._env.branchenv, name=remote)\n        self._client = HangarClient(envs=self._env, address=address)\n\n        # ----------------- setup / validate operations -----------------------\n\n        if all([branch, commit]):\n            raise ValueError(f'``branch`` and ``commit`` args cannot be set simultaneously')\n        if branch is not None:\n            cmt = heads.get_branch_head_commit(self._env.branchenv, branch_name=branch)\n        else:\n            cmt = commit\n            cmtExist = check_commit_hash_in_history(self._env.refenv, commit)\n            if not cmtExist:\n                raise ValueError(f'specified commit: {commit} does not exist in the repo.')\n\n        if not isinstance(samples, (list, tuple)):\n            samples = (samples,)\n\n        # ------------------ Determine which data to fetch --------------------\n\n        with tempfile.TemporaryDirectory() as tempD:\n            # share unpacked ref db between dependent methods\n            tmpDF = Path(tempD, 'test.lmdb')\n            tmpDB = lmdb.open(path=str(tmpDF), **LMDB_SETTINGS)\n            try:\n                with tmpDB.begin(write=True) as txn:\n                    with txn.cursor() as curs:\n                        notEmpty = curs.first()\n                        while notEmpty:\n                            notEmpty = curs.delete()\n                unpack_commit_ref(self._env.refenv, tmpDB, cmt)\n                recQuery = queries.RecordQuery(tmpDB)\n                selectedDataRecords = self._select_digests_fetch_data_sample(\n                    cmt=cmt, column=column, recQuery=recQuery, samples=samples\n                )\n            finally:\n                tmpDB.close()\n\n            m_schema_hash_map = self._form_missing_schema_digest_map(\n                selectedDataRecords=selectedDataRecords, hashenv=self._env.hashenv\n            )\n\n            # -------------------- download missing data --------------------------\n\n            DW = DataWriter(self._env)\n            total_data = sum(len(v) for v in m_schema_hash_map.values())\n            with closing(self._client) as client, tqdm(total=total_data, desc='fetching data') as pbar,  DW as DW_CM:\n                client: HangarClient  # type hint\n                for schema in m_schema_hash_map.keys():\n                    hashes = set(m_schema_hash_map[schema])\n                    origins = client.fetch_data_origin(hashes)\n                    client.fetch_data(\n                        origins=origins,\n                        datawriter_cm=DW_CM,\n                        schema=schema,\n                        pbar=pbar)\n\n            move_process_data_to_store(self._repo_path, remote_operation=True)\n            return cmt\n\n    @staticmethod\n    def _select_digests_fetch_data_sample(\n            cmt: str,\n            column: str,\n            recQuery: queries.RecordQuery,\n            samples: Union[KeyType, Sequence[KeyType],\n                           Sequence[Union[Tuple[KeyType, KeyType], Tuple[KeyType], KeyType]]]\n    ) -> Set[queries.DataRecordVal]:\n        \"\"\"Map sample keys to data record digest\n\n        Depending on column layout, the mapping of samples -> digests\n        is handled differently.\n\n        \"flat\" columns:\n            There is a direct map of sample key -> digest. If a sample\n            does not exist in the column, it is a key error.\n        \"nested\" column:\n            There is a layered mapping of sample key -> subsamples -> digests\n            We take the approach that only specifying a sample key results\n            in fetching all subsamples contained under it.\n\n        Parameters\n        ----------\n        cmt\n            commit which is being operated on\n        column\n            column name\n        recQuery\n            record query object set up with necessary `dataenv`\n        samples\n            specified samples to query\n\n        Returns\n        -------\n        Set[queries.DataRecordVal]\n            data records which should be fetched (includes digests)\n        \"\"\"\n        # handle column_names option\n        cmt_column_names = recQuery.column_names()\n        if column not in cmt_column_names:\n            raise KeyError(f'column name {column} does not exist in repo at commit {cmt}')\n\n        selectedDataRecords = set()\n        column_layout = recQuery.column_schema_layout(column=column)\n        if column_layout == 'flat':\n            sampleRecords = {}\n            for keyRecord, dataRecord in recQuery.column_data_records(column):\n                sampleRecords[keyRecord.sample] = dataRecord\n            for _key in samples:\n                if isinstance(_key, (str, int)):\n                    selectedDataRecords.add(sampleRecords[_key])\n                else:\n                    raise TypeError(_key)\n\n        elif column_layout == 'nested':\n            sampleRecords = defaultdict(dict)\n            for keyRecord, dataRecord in recQuery.column_data_records(column):\n                sampleRecords[keyRecord.sample].update(\n                    {keyRecord.subsample: dataRecord}\n                )\n\n            for _key in samples:\n                if isinstance(_key, (list, tuple)):\n                    if len(_key) == 2:\n                        # sequence specifying `(sample, subsample)`\n                        if _key[1] == Ellipsis:\n                            # Ellipsis indicator ``...`` is intepreted as:\n                            # \"get all subsamples under this sample key\"\n                            for _spec in sampleRecords[_key[0]].values():\n                                selectedDataRecords.add(_spec)\n                        else:\n                            # otherwise \"get sample + subsample named as specified\"\n                            selectedDataRecords.add(sampleRecords[_key[0]][_key[1]])\n                    elif len(_key) == 1:\n                        # sequence specifying `(sample,)` interpreted as:\n                        # \"get all subsamples under this key\"\n                        for _spec in sampleRecords[_key[0]].values():\n                            selectedDataRecords.add(_spec)\n                    else:\n                        raise ValueError(\n                            f'nested column specifier sequence len() must be '\n                            f'either length ``1`` or ``2``. key {_key} has length '\n                            f'{len(_key)}.')\n                elif isinstance(_key, (str, int)):\n                    # if not sequence, then `key` == `sample`; interpreted as:\n                    # \"get all subsamples under this key\"\n                    for _spec in sampleRecords[_key].values():\n                        selectedDataRecords.add(_spec)\n                else:\n                    raise TypeError(_key)\n        return selectedDataRecords\n\n    def fetch_data(self,\n                   remote: str,\n                   branch: str = None,\n                   commit: str = None,\n                   *,\n                   column_names: Optional[Sequence[str]] = None,\n                   retrieve_all_history: bool = False) -> List[str]:\n        \"\"\"Retrieve the data for some commit which exists in a `partial` state.\n\n        Parameters\n        ----------\n        remote\n            name of the remote to pull the data from\n        branch\n            The name of a branch whose HEAD will be used as the data fetch\n            point. If None, ``commit`` argument expected, by default None\n        commit\n            Commit hash to retrieve data for, If None, ``branch`` argument\n            expected, by default None\n        column_names\n            Names of the columns which should be retrieved for the particular\n            commits, any columns not named will not have their data fetched\n            from the server. Default behavior is to retrieve all columns\n        retrieve_all_history\n            if data should be retrieved for all history accessible by the parents\n            of this commit HEAD. by default False\n\n        Returns\n        -------\n        List[str]\n            commit hashes of the data which was returned.\n\n        Raises\n        ------\n        ValueError\n            if branch and commit args are set simultaneously.\n        ValueError\n            if specified commit does not exist in the repository.\n        ValueError\n            if branch name does not exist in the repository.\n        \"\"\"\n        self.__verify_repo_initialized()\n        address = heads.get_remote_address(branchenv=self._env.branchenv, name=remote)\n        self._client = HangarClient(envs=self._env, address=address)\n\n        # ----------------- setup / validate operations -----------------------\n\n        if all([branch, commit]):\n            raise ValueError(f'``branch`` and ``commit`` args cannot be set simultaneously')\n        if branch is not None:\n            cmt = heads.get_branch_head_commit(self._env.branchenv, branch_name=branch)\n        else:\n            cmt = commit\n            cmtExist = check_commit_hash_in_history(self._env.refenv, commit)\n            if not cmtExist:\n                raise ValueError(f'specified commit: {commit} does not exist in the repo.')\n\n        # --------------- negotiate missing data to get -----------------------\n\n        if retrieve_all_history is True:\n            hist = summarize.list_history(self._env.refenv, self._env.branchenv, commit_hash=cmt)\n            commits = hist['order']\n        else:\n            commits = [cmt]\n\n        with tempfile.TemporaryDirectory() as tempD:\n            # share unpacked ref db between dependent methods\n            tmpDF = Path(tempD, 'test.lmdb')\n            tmpDB = lmdb.open(path=str(tmpDF), **LMDB_SETTINGS)\n\n            try:\n                # all history argument\n                selectedDataRecords = set()\n                for commit in tqdm(commits, desc='counting objects'):\n                    with tmpDB.begin(write=True) as txn:\n                        with txn.cursor() as curs:\n                            notEmpty = curs.first()\n                            while notEmpty:\n                                notEmpty = curs.delete()\n                    unpack_commit_ref(self._env.refenv, tmpDB, commit)\n                    recQuery = queries.RecordQuery(tmpDB)\n                    commitDataRecords = self._select_digest_fetch_data(\n                        column_names=column_names, recQuery=recQuery\n                    )\n                    selectedDataRecords.update(commitDataRecords)\n            finally:\n                tmpDB.close()\n\n        m_schema_hash_map = self._form_missing_schema_digest_map(\n            selectedDataRecords=selectedDataRecords, hashenv=self._env.hashenv\n        )\n\n        # -------------------- download missing data --------------------------\n\n        DW = DataWriter(self._env)\n        total_data = sum(len(v) for v in m_schema_hash_map.values())\n\n        with closing(self._client) as client, \\\n                tqdm(total=total_data, desc='fetching data') as pbar, \\\n                DW as DW_CM:\n            client: HangarClient  # type hint\n            for schema in m_schema_hash_map.keys():\n                hashes = set(m_schema_hash_map[schema])\n                origins = client.fetch_data_origin(hashes)\n                client.fetch_data(\n                    origins=origins,\n                    datawriter_cm=DW_CM,\n                    schema=schema,\n                    pbar=pbar)\n\n        move_process_data_to_store(self._repo_path, remote_operation=True)\n        return commits\n\n    @staticmethod\n    def _form_missing_schema_digest_map(\n            selectedDataRecords: Set[queries.DataRecordVal],\n            hashenv: lmdb.Environment\n    ) -> Dict[str, List[str]]:\n        \"\"\"Calculate mapping of schemas to data digests.\n\n        Parameters\n        ----------\n        selectedDataRecords\n        hashenv\n\n        Returns\n        -------\n        Dict[str, List[str]]\n            map of all schema digests -> sequence of all data hash digests\n            registered under that schema.\n        \"\"\"\n\n        try:\n            hashTxn = TxnRegister().begin_reader_txn(hashenv)\n            m_schema_hash_map = defaultdict(list)\n            for hashVal in selectedDataRecords:\n                hashKey = hash_data_db_key_from_raw_key(hashVal.digest)\n                hashRef = hashTxn.get(hashKey)\n                be_loc = backend_decoder(hashRef)\n                if be_loc.backend == '50':\n                    m_schema_hash_map[be_loc.schema_hash].append(hashVal.digest)\n        finally:\n            TxnRegister().abort_reader_txn(hashenv)\n        return m_schema_hash_map\n\n    @staticmethod\n    def _select_digest_fetch_data(\n            column_names: Union[None, Sequence[str]],\n            recQuery: queries.RecordQuery\n    ) -> Set[queries.DataRecordVal]:\n        \"\"\"Map column names to data digests.\n\n        Parameters\n        ----------\n        column_names\n            column names to fetch data for. If ``None``, download all column data.\n        recQuery\n            initialized record query object set up with appropriate ``dataenv``.\n\n        Returns\n        -------\n        Set[queries.DataRecordVal]\n            data records which should be fetched (includes digests)\n        \"\"\"\n        selectedDataRecords = set()\n        cmt_column_names = recQuery.column_names()\n        if column_names is None:\n            # handle column_names option\n            cmt_columns = cmt_column_names\n        else:\n            cmt_columns = [col for col in column_names if col in cmt_column_names]\n        for col in cmt_columns:\n            cmtData_hashs = recQuery.column_data_hashes(col)\n            selectedDataRecords.update(cmtData_hashs)\n        return selectedDataRecords\n\n    def push(self, remote: str, branch: str,\n             *, username: str = '', password: str = '') -> str:\n        \"\"\"push changes made on a local repository to a remote repository.\n\n        This method is semantically identical to a ``git push`` operation.\n        Any local updates will be sent to the remote repository.\n\n        .. note::\n\n            The current implementation is not capable of performing a\n            ``force push`` operation. As such, remote branches with diverged\n            histories to the local repo must be retrieved, locally merged,\n            then re-pushed. This feature will be added in the near future.\n\n        Parameters\n        ----------\n        remote\n            name of the remote repository to make the push on.\n        branch\n            Name of the branch to push to the remote. If the branch name does\n            not exist on the remote, the it will be created\n        username\n            credentials to use for authentication if repository push restrictions\n            are enabled, by default ''.\n        password\n            credentials to use for authentication if repository push restrictions\n            are enabled, by default ''.\n\n        Returns\n        -------\n        str\n            Name of the branch which was pushed\n        \"\"\"\n        self.__verify_repo_initialized()\n        try:\n            address = heads.get_remote_address(self._env.branchenv, name=remote)\n            cHEAD = heads.get_branch_head_commit(self._env.branchenv, branch)\n        except (KeyError, ValueError) as e:\n            raise e from None\n\n        CR = ContentReader(self._env)\n        self._client = HangarClient(envs=self._env,\n                                    address=address,\n                                    auth_username=username,\n                                    auth_password=password)\n\n        # ----------------- setup / validate operations -------------------\n\n        with closing(self._client) as client:\n            client: HangarClient  # type hinting for development\n            CR: ContentReader\n            c_bhistory = summarize.list_history(refenv=self._env.refenv,\n                                                branchenv=self._env.branchenv,\n                                                branch_name=branch)\n            try:\n                s_branch = client.fetch_branch_record(branch)\n            except grpc.RpcError as rpc_error:\n                # Do not raise if error due to branch not existing on server\n                if rpc_error.code() != grpc.StatusCode.NOT_FOUND:\n                    raise rpc_error\n            else:\n                sHEAD = s_branch.rec.commit\n                if sHEAD == cHEAD:\n                    warnings.warn(\n                        f'NoOp: server HEAD: {sHEAD} == client HEAD: {cHEAD}', UserWarning)\n                    return branch\n                elif (sHEAD not in c_bhistory['order']) and (sHEAD != ''):\n                    warnings.warn(\n                        f'REJECTED: server branch has commits not on client', UserWarning)\n                    return branch\n\n            # --------------- negotiate missing data to send -------------------\n\n            try:\n                # First push op verifies user permissions if push restricted (NOT SECURE)\n                res = client.push_find_missing_commits(branch)\n                m_commits = res.commits\n            except grpc.RpcError as rpc_error:\n                if rpc_error.code() == grpc.StatusCode.PERMISSION_DENIED:\n                    raise PermissionError(f'{rpc_error.code()}: {rpc_error.details()}')\n                else:\n                    raise rpc_error\n\n            m_schemas = set()\n            m_schema_hashs = defaultdict(set)\n            with tempfile.TemporaryDirectory() as tempD:\n                tmpDF = Path(tempD, 'test.lmdb')\n                tmpDB = lmdb.open(path=str(tmpDF), **LMDB_SETTINGS)\n                for commit in tqdm(m_commits, desc='counting objects'):\n                    # share unpacked ref db between dependent methods\n                    with tmpDB.begin(write=True) as txn:\n                        with txn.cursor() as curs:\n                            notEmpty = curs.first()\n                            while notEmpty:\n                                notEmpty = curs.delete()\n                    unpack_commit_ref(self._env.refenv, tmpDB, commit)\n                    # schemas\n                    schema_res = client.push_find_missing_schemas(commit, tmpDB=tmpDB)\n                    m_schemas.update(schema_res.schema_digests)\n                    # data hashs\n                    m_cmt_schema_hashs = defaultdict(list)\n                    mis_hashes_sch = client.push_find_missing_hash_records(commit, tmpDB=tmpDB)\n                    for hsh, schema in mis_hashes_sch:\n                        m_cmt_schema_hashs[schema].append(hsh)\n                    for schema, hashes in m_cmt_schema_hashs.items():\n                        m_schema_hashs[schema].update(hashes)\n                tmpDB.close()\n\n            # ------------------------- send data -----------------------------\n\n            # schemas\n            for m_schema in tqdm(m_schemas, desc='pushing schemas'):\n                schemaVal = CR.schema(m_schema)\n                if not schemaVal:\n                    raise KeyError(f'no schema with hash: {m_schema} exists')\n                client.push_schema(m_schema, schemaVal)\n            # data\n            total_data = sum([len(v) for v in m_schema_hashs.values()])\n            with tqdm(total=total_data, desc='pushing data') as p:\n                client.push_data_begin_context()\n                try:\n                    for dataSchema, dataHashes in m_schema_hashs.items():\n                        client.push_data(dataSchema, dataHashes, pbar=p)\n                        p.update(1)\n                finally:\n                    client.push_data_end_context()\n            # commit refs\n            for commit in tqdm(m_commits, desc='pushing commit refs'):\n                cmtContent = CR.commit(commit)\n                if not cmtContent:\n                    raise KeyError(f'no commit with hash: {commit} exists')\n                client.push_commit_record(commit=cmtContent.commit,\n                                          parentVal=cmtContent.cmtParentVal,\n                                          specVal=cmtContent.cmtSpecVal,\n                                          refVal=cmtContent.cmtRefVal)\n\n            # --------------------------- At completion -----------------------\n\n            # update local remote HEAD pointer\n            branchHead = heads.get_branch_head_commit(self._env.branchenv, branch)\n            try:\n                client.push_branch_record(branch, branchHead)\n            except grpc.RpcError as rpc_error:\n                # Do not raise if error due to branch not existing on server\n                if rpc_error.code() != grpc.StatusCode.ALREADY_EXISTS:\n                    logger.warning(f'CODE: {rpc_error.code()} DETAILS:{rpc_error.details()}')\n                else:\n                    raise rpc_error\n            else:\n                cRemoteBranch = f'{remote}/{branch}'\n                if cRemoteBranch not in heads.get_branch_names(self._env.branchenv):\n                    heads.create_branch(branchenv=self._env.branchenv,\n                                        name=cRemoteBranch,\n                                        base_commit=branchHead)\n                else:\n                    heads.set_branch_head_commit(branchenv=self._env.branchenv,\n                                                 branch_name=cRemoteBranch,\n                                                 commit_hash=branchHead)\n            return branch\n"
  },
  {
    "path": "src/hangar/repository.py",
    "content": "from pathlib import Path\nimport weakref\nimport warnings\nfrom typing import Union, Optional, List\nfrom io import StringIO\n\nfrom .merger import select_merge_algorithm\nfrom .constants import DIR_HANGAR\nfrom .remotes import Remotes\nfrom .context import Environments\nfrom .diagnostics import ecosystem, integrity\nfrom .records import heads, parsing, summarize, vcompat, commiting\nfrom .checkout import ReaderCheckout, WriterCheckout\nfrom .diff import DiffAndConflicts, ReaderUserDiff\nfrom .utils import (\n    is_valid_directory_path,\n    is_suitable_user_key,\n    is_ascii,\n    folder_size,\n    format_bytes\n)\n\n\nclass Repository(object):\n    \"\"\"Launching point for all user operations in a Hangar repository.\n\n    All interaction, including the ability to initialize a repo, checkout a\n    commit (for either reading or writing), create a branch, merge branches, or\n    generally view the contents or state of the local repository starts here.\n    Just provide this class instance with a path to an existing Hangar\n    repository, or to a directory one should be initialized, and all required\n    data for starting your work on the repo will automatically be populated.\n\n        >>> from hangar import Repository\n        >>> repo = Repository('foo/path/to/dir')\n\n    Parameters\n    ----------\n    path : Union[str, os.PathLike]\n        local directory path where the Hangar repository exists (or initialized)\n    exists : bool, optional\n        True if a Hangar repository should exist at the given directory path.\n        Should no Hangar repository exists at that location, a UserWarning will\n        be raised indicating that the :meth:`init` method needs to be called.\n\n        False if the provided path does not need to (but optionally can) contain a\n        Hangar repository.  if a Hangar repository does not exist at that path, the\n        usual UserWarning will be suppressed.\n\n        In both cases, the path must exist and the user must have sufficient OS\n        permissions to write to that location. Default = True\n    \"\"\"\n\n    def __init__(self, path: Union[str, Path], exists: bool = True):\n\n        if isinstance(path, (str, bytes)):\n            path = Path(path)\n\n        try:\n            usr_path = is_valid_directory_path(path)\n        except (TypeError, NotADirectoryError, PermissionError) as e:\n            raise e from None\n\n        repo_pth = usr_path.joinpath(DIR_HANGAR)\n        if exists is False:\n            with warnings.catch_warnings():\n                warnings.simplefilter('ignore', UserWarning)\n                envs = Environments(pth=repo_pth)\n        else:\n            envs = Environments(pth=repo_pth)\n\n        self._repo_path: Path = repo_pth\n        self._env: Environments = envs\n        self._remote: Remotes = Remotes(self._env)\n\n    def _repr_pretty_(self, p, cycle):\n        \"\"\"provide a pretty-printed repr for ipython based user interaction.\n\n        Parameters\n        ----------\n        p : printer\n            io stream printer type object which is provided via ipython\n        cycle : bool\n            if the pretty-printer detects a cycle or infinite loop. Not a\n            concern here since we just output the text and return, no looping\n            required.\n\n        \"\"\"\n        self.__verify_repo_initialized()\n        res = f'Hangar {self.__class__.__name__}\\\n               \\n    Repository Path  : {self.path}\\\n               \\n    Writer-Lock Free : {heads.writer_lock_held(self._env.branchenv)}\\n'\n        p.text(res)\n\n    def __repr__(self):\n        \"\"\"Override the default repr to show useful information to developers.\n\n        Note: the pprint repr (ipython enabled) is separately defined in\n        :py:meth:`_repr_pretty_`. We specialize because we assume that anyone\n        operating in a terminal-based interpreter is probably a more advanced\n        developer-type, and expects traditional repr information instead of a\n        user facing summary of the repo. Though if we're wrong, go ahead and\n        feel free to reassign the attribute :) won't hurt our feelings, promise.\n\n        Returns\n        -------\n        string\n            formatted representation of the object\n        \"\"\"\n        res = f'{self.__class__}(path={self._repo_path})'\n        return res\n\n    def __verify_repo_initialized(self):\n        \"\"\"Internal method to verify repo initialized before operations occur\n\n        Raises\n        ------\n        RuntimeError\n            If the repository db environments have not been initialized at the\n            specified repo path.\n        \"\"\"\n        if not self._env.repo_is_initialized:\n            msg = f'Repository at path: {self._repo_path} has not been initialized. '\\\n                  f'Please run the `init_repo()` function'\n            raise RuntimeError(msg)\n\n    @property\n    def remote(self) -> Remotes:\n        \"\"\"Accessor to the methods controlling remote interactions.\n\n        .. seealso::\n\n           :class:`Remotes` for available methods of this property\n\n        Returns\n        -------\n        Remotes\n            Accessor object methods for controlling remote interactions.\n        \"\"\"\n        proxy = weakref.proxy(self._remote)\n        return proxy\n\n    @property\n    def path(self) -> str:\n        \"\"\"Return the path to the repository on disk, read-only attribute\n\n        Returns\n        -------\n        str\n            path to the specified repository, not including `.hangar` directory\n        \"\"\"\n        self.__verify_repo_initialized()\n        return str(self._repo_path.parent)\n\n    @property\n    def writer_lock_held(self) -> bool:\n        \"\"\"Check if the writer lock is currently marked as held. Read-only attribute.\n\n        Returns\n        -------\n        bool\n            True is writer-lock is held, False if writer-lock is free.\n        \"\"\"\n        self.__verify_repo_initialized()\n        return not heads.writer_lock_held(self._env.branchenv)\n\n    @property\n    def version(self) -> str:\n        \"\"\"Find the version of Hangar software the repository is written with\n\n        Returns\n        -------\n        str\n            semantic version of major, minor, micro version of repo software version.\n        \"\"\"\n        self.__verify_repo_initialized()\n        res = vcompat.get_repository_software_version_spec(self._env.branchenv)\n        return str(res)\n\n    @property\n    def initialized(self) -> bool:\n        \"\"\"\n        Check if the repository has been initialized or not\n\n        Returns\n        -------\n        bool\n            True if repository has been initialized.\n        \"\"\"\n        return self._env.repo_is_initialized\n\n    @property\n    def size_nbytes(self) -> int:\n        \"\"\"Disk space used by the repository returned in number of bytes.\n\n            >>> repo.size_nbytes\n            1234567890\n            >>> print(type(repo.size_nbytes))\n            <class 'int'>\n\n        Returns\n        -------\n        int\n            number of bytes used by the repository on disk.\n        \"\"\"\n        self.__verify_repo_initialized()\n        return folder_size(self._repo_path, recurse=True)\n\n    @property\n    def size_human(self) -> str:\n        \"\"\"Disk space used by the repository returned in human readable string.\n\n            >>> repo.size_human\n            '1.23 GB'\n            >>> print(type(repo.size_human))\n            <class 'str'>\n\n        Returns\n        -------\n        str\n            disk space used by the repository formated in human readable text.\n        \"\"\"\n        self.__verify_repo_initialized()\n        nbytes = folder_size(self._repo_path, recurse=True)\n        return format_bytes(nbytes)\n\n    def checkout(self,\n                 write: bool = False,\n                 *,\n                 branch: str = '',\n                 commit: str = '') -> Union[ReaderCheckout, WriterCheckout]:\n        \"\"\"Checkout the repo at some point in time in either `read` or `write` mode.\n\n        Only one writer instance can exist at a time. Write enabled checkout\n        must must create a staging area from the ``HEAD`` commit of a branch. On\n        the contrary, any number of reader checkouts can exist at the same time\n        and can specify either a branch name or a commit hash.\n\n        Parameters\n        ----------\n        write : bool, optional\n            Specify if the checkout is write capable, defaults to False\n        branch : str, optional\n            name of the branch to checkout. This utilizes the state of the repo\n            as it existed at the branch ``HEAD`` commit when this checkout object\n            was instantiated, defaults to ''\n        commit : str, optional\n            specific hash of a commit to use for the checkout (instead of a\n            branch ``HEAD`` commit). This argument takes precedent over a branch\n            name parameter if it is set. Note: this only will be used in\n            non-writeable checkouts, defaults to ''\n\n        Raises\n        ------\n        ValueError\n            If the value of `write` argument is not boolean\n        ValueError\n            If ``commit`` argument is set to any value when ``write=True``.\n            Only ``branch`` argument is allowed.\n\n        Returns\n        -------\n        Union[ReaderCheckout, WriterCheckout]\n            Checkout object which can be used to interact with the repository\n            data\n        \"\"\"\n        self.__verify_repo_initialized()\n        try:\n            if write is True:\n                if commit != '':\n                    raise ValueError(\n                        f'Only `branch` argument can be set if `write=True`. '\n                        f'Setting `commit={commit}` not allowed.')\n                if branch == '':\n                    branch = heads.get_staging_branch_head(self._env.branchenv)\n                co = WriterCheckout(\n                    repo_pth=self._repo_path,\n                    branch_name=branch,\n                    hashenv=self._env.hashenv,\n                    refenv=self._env.refenv,\n                    stageenv=self._env.stageenv,\n                    branchenv=self._env.branchenv,\n                    stagehashenv=self._env.stagehashenv)\n                return co\n            elif write is False:\n                commit_hash = self._env.checkout_commit(\n                    branch_name=branch, commit=commit)\n                co = ReaderCheckout(\n                    base_path=self._repo_path,\n                    dataenv=self._env.cmtenv[commit_hash],\n                    hashenv=self._env.hashenv,\n                    branchenv=self._env.branchenv,\n                    refenv=self._env.refenv,\n                    commit=commit_hash)\n                return co\n            else:\n                raise ValueError(\"Argument `write` only takes True or False as value\")\n        except (RuntimeError, ValueError) as e:\n            raise e from None\n\n    def clone(self, user_name: str, user_email: str, remote_address: str,\n              *, remove_old: bool = False) -> str:\n        \"\"\"Download a remote repository to the local disk.\n\n        The clone method implemented here is very similar to a `git clone`\n        operation. This method will pull all commit records, history, and data\n        which are parents of the remote's `master` branch head commit. If a\n        :class:`Repository` exists at the specified directory,\n        the operation will fail.\n\n        Parameters\n        ----------\n        user_name : str\n            Name of the person who will make commits to the repository. This\n            information is recorded permanently in the commit records.\n        user_email : str\n            Email address of the repository user. This information is recorded\n            permanently in any commits created.\n        remote_address : str\n            location where the\n            :class:`hangar.remote.server.HangarServer` process is\n            running and accessible by the clone user.\n        remove_old : bool, optional, kwarg only\n            DANGER! DEVELOPMENT USE ONLY! If enabled, a\n            :class:`hangar.repository.Repository` existing on disk at the same\n            path as the requested clone location will be completely removed and\n            replaced with the newly cloned repo. (the default is False, which\n            will not modify any contents on disk and which will refuse to create\n            a repository at a given location if one already exists there.)\n\n        Returns\n        -------\n        str\n            Name of the master branch for the newly cloned repository.\n        \"\"\"\n        self.init(user_name=user_name, user_email=user_email, remove_old=remove_old)\n        self._remote.add(name='origin', address=remote_address)\n        branch = self._remote.fetch(remote='origin', branch='master')\n        HEAD = heads.get_branch_head_commit(self._env.branchenv, branch_name=branch)\n        heads.set_branch_head_commit(self._env.branchenv, 'master', HEAD)\n        with warnings.catch_warnings(record=False):\n            warnings.simplefilter('ignore', category=UserWarning)\n            co = self.checkout(write=True, branch='master')\n            co.reset_staging_area()\n            co.close()\n        return 'master'\n\n    def init(self,\n             user_name: str,\n             user_email: str,\n             *,\n             remove_old: bool = False) -> str:\n        \"\"\"Initialize a Hangar repository at the specified directory path.\n\n        This function must be called before a checkout can be performed.\n\n        Parameters\n        ----------\n        user_name : str\n            Name of the repository user account.\n        user_email : str\n            Email address of the repository user account.\n        remove_old : bool, kwarg-only\n            DEVELOPER USE ONLY -- remove and reinitialize a Hangar\n            repository at the given path, Default = False\n\n        Returns\n        -------\n        str\n            the full directory path where the Hangar repository was\n            initialized on disk.\n        \"\"\"\n        pth = self._env.init_repo(user_name=user_name,\n                                  user_email=user_email,\n                                  remove_old=remove_old)\n        return str(pth)\n\n    def log(self,\n            branch: str = None,\n            commit: str = None,\n            *,\n            return_contents: bool = False,\n            show_time: bool = False,\n            show_user: bool = False) -> Optional[dict]:\n        \"\"\"Displays a pretty printed commit log graph to the terminal.\n\n        .. note::\n\n            For programatic access, the return_contents value can be set to true\n            which will retrieve relevant commit specifications as dictionary\n            elements.\n\n        Parameters\n        ----------\n        branch : str, optional\n            The name of the branch to start the log process from. (Default value\n            = None)\n        commit : str, optional\n            The commit hash to start the log process from. (Default value = None)\n        return_contents : bool, optional, kwarg only\n            If true, return the commit graph specifications in a dictionary\n            suitable for programatic access/evaluation.\n        show_time : bool, optional, kwarg only\n            If true and return_contents is False, show the time of each commit\n            on the printed log graph\n        show_user : bool, optional, kwarg only\n            If true and return_contents is False, show the committer of each\n            commit on the printed log graph\n        Returns\n        -------\n        Optional[dict]\n            Dict containing the commit ancestor graph, and all specifications.\n        \"\"\"\n        self.__verify_repo_initialized()\n        res = summarize.log(branchenv=self._env.branchenv,\n                            refenv=self._env.refenv,\n                            branch=branch,\n                            commit=commit,\n                            return_contents=return_contents,\n                            show_time=show_time,\n                            show_user=show_user)\n        return res\n\n    def summary(self, *, branch: str = '', commit: str = '') -> None:\n        \"\"\"Print a summary of the repository contents to the terminal\n\n        Parameters\n        ----------\n        branch : str, optional\n            A specific branch name whose head commit will be used as the summary\n            point (Default value = '')\n        commit : str, optional\n            A specific commit hash which should be used as the summary point.\n            (Default value = '')\n        \"\"\"\n        self.__verify_repo_initialized()\n        try:\n            ppbuf = summarize.summary(self._env, branch=branch, commit=commit)\n        except ValueError:\n            if commiting.number_commits_recorded(self._env.refenv) == 0:\n                ppbuf = StringIO()\n                ppbuf.write(f'No commits have been made in the repository. \\n')\n                ppbuf.write(f'Please make a commit and try again.')\n            else:\n                raise\n        print(ppbuf.getvalue())\n        return None\n\n    def _details(self, *, line_limit=100, line_length=100) -> None:  # pragma: no cover\n        \"\"\"DEVELOPER USE ONLY: Dump some details about the underlying db structure to disk.\n        \"\"\"\n        print(summarize.details(\n            self._env.branchenv, line_limit=line_limit, line_length=line_length).getvalue())\n        print(summarize.details(\n            self._env.refenv, line_limit=line_limit, line_length=line_length).getvalue())\n        print(summarize.details(\n            self._env.hashenv, line_limit=line_limit, line_length=line_length).getvalue())\n        print(summarize.details(\n            self._env.stageenv, line_limit=line_limit, line_length=line_length).getvalue())\n        print(summarize.details(\n            self._env.stagehashenv, line_limit=line_limit, line_length=line_length).getvalue())\n        for commit, commitenv in self._env.cmtenv.items():\n            print(summarize.details(\n                commitenv, line_limit=line_limit, line_length=line_length).getvalue())\n        return\n\n    def _ecosystem_details(self) -> dict:\n        \"\"\"DEVELOPER USER ONLY: log and return package versions on the system.\n        \"\"\"\n        eco = ecosystem.get_versions()\n        return eco\n\n    def diff(self, master: str, dev: str) -> DiffAndConflicts:\n        \"\"\"Calculate diff between master and dev branch/commits.\n\n        Diff is calculated as if we are to merge \"dev\" into \"master\"\n\n        Parameters\n        ----------\n        master: str\n            branch name or commit hash digest to use as the \"master\" which\n            changes made in \"dev\" are compared to.\n        dev: str\n            branch name or commit hash digest to use as the \"dev\"\n            (ie. \"feature\") branch which changes have been made to\n            which are to be compared to the contents of \"master\".\n\n        Returns\n        -------\n        DiffAndConflicts\n            Standard output diff structure.\n        \"\"\"\n        current_branches = self.list_branches()\n\n        # assert branch / commit specified by \"master\" exists and\n        # standardize into \"digest\" rather than \"branch name\" arg type\n        if master in current_branches:\n            masterHEAD = heads.get_branch_head_commit(\n                branchenv=self._env.branchenv, branch_name=master)\n        else:\n            cmtExists = commiting.check_commit_hash_in_history(\n                refenv=self._env.refenv, commit_hash=master)\n            if not cmtExists:\n                raise ValueError(f'`master` {master} is not valid branch/commit.')\n            masterHEAD = master\n\n        # same check & transform for \"dev\" branch/commit arg.\n        if dev in current_branches:\n            devHEAD = heads.get_branch_head_commit(\n                branchenv=self._env.branchenv, branch_name=dev)\n        else:\n            cmtExists = commiting.check_commit_hash_in_history(\n                refenv=self._env.refenv, commit_hash=dev)\n            if not cmtExists:\n                raise ValueError(f'`dev` {dev} is not valid branch/commit.')\n            devHEAD = dev\n\n        # create differ object and generate results...\n        diff = ReaderUserDiff(commit_hash=masterHEAD,\n                              branchenv=self._env.branchenv,\n                              refenv=self._env.refenv)\n        res = diff.commit(dev_commit_hash=devHEAD)\n        return res\n\n    def merge(self, message: str, master_branch: str, dev_branch: str) -> str:\n        \"\"\"Perform a merge of the changes made on two branches.\n\n        Parameters\n        ----------\n        message: str\n            Commit message to use for this merge.\n        master_branch : str\n            name of the master branch to merge into\n        dev_branch : str\n            name of the dev/feature branch to merge\n\n        Returns\n        -------\n        str\n            Hash of the commit which is written if possible.\n        \"\"\"\n        self.__verify_repo_initialized()\n        commit_hash = select_merge_algorithm(\n            message=message,\n            branchenv=self._env.branchenv,\n            stageenv=self._env.stageenv,\n            refenv=self._env.refenv,\n            stagehashenv=self._env.stagehashenv,\n            master_branch=master_branch,\n            dev_branch=dev_branch,\n            repo_path=self._repo_path)\n\n        return commit_hash\n\n    def create_branch(self, name: str, base_commit: str = None) -> heads.BranchHead:\n        \"\"\"create a branch with the provided name from a certain commit.\n\n        If no base commit hash is specified, the current writer branch ``HEAD``\n        commit is used as the ``base_commit`` hash for the branch. Note that\n        creating a branch does not actually create a checkout object for\n        interaction with the data. to interact you must use the repository\n        checkout method to properly initialize a read (or write) enabled\n        checkout object.\n\n            >>> from hangar import Repository\n            >>> repo = Repository('foo/path/to/dir')\n\n            >>> repo.create_branch('testbranch')\n                BranchHead(name='testbranch', digest='b66b...a8cc')\n            >>> repo.list_branches()\n                ['master', 'testbranch']\n            >>> co = repo.checkout(write=True, branch='testbranch')\n            >>> # add data ...\n            >>> newDigest = co.commit('added some stuff')\n\n            >>> repo.create_branch('new-changes', base_commit=newDigest)\n                BranchHead(name='new-changes', digest='35kd...3254')\n            >>> repo.list_branches()\n                ['master', 'new-changes', 'testbranch']\n\n        Parameters\n        ----------\n        name : str\n            name to assign to the new branch\n        base_commit : str, optional\n            commit hash to start the branch root at. if not specified, the\n            writer branch ``HEAD`` commit at the time of execution will be used,\n            defaults to None\n\n        Returns\n        -------\n        :class:`~.heads.BranchHead`\n            NamedTuple[str, str] with fields for ``name`` and ``digest`` of the\n            branch created (if the operation was successful)\n\n        Raises\n        ------\n        ValueError\n            If the branch name provided contains characters outside of alpha-numeric\n            ascii characters and \".\", \"_\", \"-\" (no whitespace), or is > 64 characters.\n        ValueError\n            If the branch already exists.\n        RuntimeError\n            If the repository does not have at-least one commit on the \"default\"\n            (ie. ``master``) branch.\n        \"\"\"\n        self.__verify_repo_initialized()\n        if (not is_ascii(name)) or (not is_suitable_user_key(name)):\n            err = ValueError(\n                f'Branch name provided: {name} invalid. Must contain only alpha-numeric '\n                f'or \".\" \"_\" \"-\" ascii characters. And be <= 64 Characters')\n            raise err from None\n        createdBranch = heads.create_branch(\n            branchenv=self._env.branchenv,\n            name=name,\n            base_commit=base_commit)\n        return createdBranch\n\n    def remove_branch(self, name: str, *, force_delete: bool = False) -> heads.BranchHead:\n        \"\"\"Permanently delete a branch pointer from the repository history.\n\n        Since a branch (by definition) is the name associated with the HEAD\n        commit of a historical path, the default behavior of this method is to\n        throw an exception (no-op) should the ``HEAD`` not be referenced as an\n        ancestor (or at least as a twin) of a separate branch which is\n        currently *ALIVE*. If referenced in another branch's history, we are\n        assured that all changes have been merged and recorded, and that this\n        pointer can be safely deleted without risk of damage to historical\n        provenance or (eventual) loss to garbage collection.\n\n            >>> from hangar import Repository\n            >>> repo = Repository('foo/path/to/dir')\n\n            >>> repo.create_branch('first-testbranch')\n            BranchHead(name='first-testbranch', digest='9785...56da')\n            >>> repo.create_branch('second-testbranch')\n            BranchHead(name='second-testbranch', digest='9785...56da')\n            >>> repo.list_branches()\n            ['master', 'first-testbranch', 'second-testbranch']\n            >>> # Make a commit to advance a branch\n            >>> co = repo.checkout(write=True, branch='first-testbranch')\n            >>> # add data ...\n            >>> co.commit('added some stuff')\n            '3l253la5hna3k3a553256nak35hq5q534kq35532'\n            >>> co.close()\n\n            >>> repo.remove_branch('second-testbranch')\n            BranchHead(name='second-testbranch', digest='9785...56da')\n\n        A user may manually specify to delete an un-merged branch, in which\n        case the ``force_delete`` keyword-only argument should be set to\n        ``True``.\n\n            >>> # check out master and try to remove 'first-testbranch'\n            >>> co = repo.checkout(write=True, branch='master')\n            >>> co.close()\n\n            >>> repo.remove_branch('first-testbranch')\n            Traceback (most recent call last):\n                ...\n            RuntimeError: (\"The branch first-testbranch is not fully merged. \"\n            \"If you are sure you want to delete it, re-run with \"\n            \"force-remove parameter set.\")\n            >>> # Now set the `force_delete` parameter\n            >>> repo.remove_branch('first-testbranch', force_delete=True)\n            BranchHead(name='first-testbranch', digest='9785...56da')\n\n        It is important to note that *while this method will handle all safety\n        checks, argument validation, and performs the operation to permanently\n        delete a branch name/digest pointer, **no commit refs along the history\n        will be deleted from the Hangar database**.* Most of the history contains\n        commit refs which must be safe in other branch histories, and recent\n        commits may have been used as the base for some new history. As such, even\n        if some of the latest commits leading up to a deleted branch ``HEAD`` are\n        orphaned (unreachable), the records (and all data added in those commits)\n        will remain on the disk.\n\n        In the future, we intend to implement a garbage collector which will remove\n        orphan commits which have not been modified for some set amount of time\n        (probably on the order of a few months), but this is not implemented at the\n        moment.\n\n        Should an accidental forced branch deletion occur, *it is possible to\n        recover* and create a new branch head pointing to the same commit. If\n        the commit digest of the removed branch ``HEAD`` is known, its as simple as\n        specifying a name and the ``base_digest`` in the normal\n        :meth:`create_branch` method. If the digest is unknown, it will be a\n        bit more work, but some of the developer facing introspection tools /\n        routines could be used to either manually or (with minimal effort)\n        programmatically find the orphan commit candidates. If you find\n        yourself having accidentally deleted a branch, and must get it back,\n        please reach out on the `Github Issues\n        <https://github.com/tensorwerk/hangar-py/issues>`__ page. We'll gladly\n        explain more in depth and walk you through the process in any way we\n        can help!\n\n        Parameters\n        ----------\n        name : str\n            name of the branch which should be deleted. This branch must exist, and\n            cannot refer to a remote tracked branch (ie. origin/devbranch), please\n            see exception descriptions for other parameters determining validity of\n            argument\n        force_delete : bool, optional\n            If True, remove the branch pointer even if the changes are un-merged in\n            other branch histories. May result in orphaned commits which may be\n            time-consuming to recover if needed, by default False\n\n        Returns\n        -------\n        :class:`~.heads.BranchHead`\n            NamedTuple[str, str] with fields for `name` and `digest` of the branch\n            pointer deleted.\n\n        Raises\n        ------\n        ValueError\n            If a branch with the provided name does not exist locally\n        PermissionError\n            If removal of the branch would result in a repository with zero local\n            branches.\n        PermissionError\n            If a write enabled checkout is holding the writer-lock at time of this\n            call.\n        PermissionError\n            If the branch to be removed was the last used in a write-enabled\n            checkout, and whose contents form the base of the staging area.\n        RuntimeError\n            If the branch has not been fully merged into other branch histories,\n            and ``force_delete`` option is not ``True``.\n        \"\"\"\n        self.__verify_repo_initialized()\n        res = heads.remove_branch(branchenv=self._env.branchenv,\n                                  refenv=self._env.refenv,\n                                  name=name,\n                                  force_delete=force_delete)\n        return res\n\n    def list_branches(self) -> List[str]:\n        \"\"\"list all branch names created in the repository.\n\n        Returns\n        -------\n        List[str]\n            the branch names recorded in the repository\n        \"\"\"\n        self.__verify_repo_initialized()\n        branches = heads.get_branch_names(self._env.branchenv)\n        return branches\n\n    def verify_repo_integrity(self) -> bool:\n        \"\"\"Verify the integrity of the repository data on disk.\n\n        Runs a full cryptographic verification of repository contents in order\n        to ensure the integrity of all data and history recorded on disk.\n\n        .. note::\n\n            This proof may take a significant amount of time to run for\n            repositories which:\n\n            1. store significant quantities of data on disk.\n            2. have a very large number of commits in their history.\n\n            As a brief explanation for why these are the driving factors behind\n            processing time:\n\n            1. Every single piece of data in the repositories history must be read\n               from disk, cryptographically hashed, and compared to the expected\n               value. There is no exception to this rule; regardless of when a piece\n               of data was added / removed from an column, or for how many (or how\n               few) commits some sample exists in. The integrity of the commit tree at\n               any point after some piece of data is added to the repo can only be\n               validated if it - and all earlier data pieces - are proven to be intact\n               and unchanged.\n\n               Note: This does not mean that the verification is repeatedly\n               performed for every commit some piece of data is stored in. Each\n               data piece is read from disk and verified only once, regardless of\n               how many commits some piece of data is referenced in.\n\n            2. Each commit reference (defining names / contents of a commit) must be\n               decompressed and parsed into a usable data structure. We scan across\n               all data digests referenced in the commit and ensure that the\n               corresponding data piece is known to hangar (and validated as\n               unchanged). The commit refs (along with the corresponding user records,\n               message, and parent map), are then re-serialized and cryptographically\n               hashed for comparison to the expected value. While this process is\n               fairly efficient for a single commit, it must be repeated for each\n               commit in the repository history, and may take a non-trivial amount of\n               time for repositories with thousands of commits.\n\n        While the two points above are the most time consuming operations,\n        there are many more checks which are performed alongside them as part\n        of the full verification run.\n\n        Returns\n        -------\n        bool\n            True if integrity verification is successful, otherwise False; in\n            this case, a message describing the offending component will be\n            printed to stdout.\n        \"\"\"\n        self.__verify_repo_initialized()\n        heads.acquire_writer_lock(self._env.branchenv, 'VERIFY_PROCESS')\n        try:\n            integrity.run_verification(\n                branchenv=self._env.branchenv,\n                hashenv=self._env.hashenv,\n                refenv=self._env.refenv,\n                repo_path=self._env.repo_path)\n        finally:\n            heads.release_writer_lock(self._env.branchenv, 'VERIFY_PROCESS')\n        return True\n\n    def force_release_writer_lock(self) -> bool:\n        \"\"\"Force release the lock left behind by an unclosed writer-checkout\n\n        .. warning::\n\n            *NEVER USE THIS METHOD IF WRITER PROCESS IS CURRENTLY ACTIVE.* At the time\n            of writing, the implications of improper/malicious use of this are not\n            understood, and there is a a risk of of undefined behavior or (potentially)\n            data corruption.\n\n            At the moment, the responsibility to close a write-enabled checkout is\n            placed entirely on the user. If the `close()` method is not called\n            before the program terminates, a new checkout with write=True will fail.\n            The lock can only be released via a call to this method.\n\n        .. note::\n\n            This entire mechanism is subject to review/replacement in the future.\n\n        Returns\n        -------\n        bool\n            if the operation was successful.\n        \"\"\"\n        self.__verify_repo_initialized()\n        forceReleaseSentinal = parsing.repo_writer_lock_force_release_sentinal()\n        success = heads.release_writer_lock(self._env.branchenv, forceReleaseSentinal)\n        return success\n"
  },
  {
    "path": "src/hangar/txnctx.py",
    "content": "from collections import Counter\nfrom typing import MutableMapping\n\nimport lmdb\n\n\nclass TxnRegisterSingleton(type):\n    _instances = {}\n    def __call__(cls, *args, **kwargs):\n        if cls not in cls._instances:\n            cls._instances[cls] = super(TxnRegisterSingleton, cls).__call__(*args, **kwargs)\n        return cls._instances[cls]\n\n\nclass TxnRegister(metaclass=TxnRegisterSingleton):\n    \"\"\"Singleton to manage transaction thread safety in lmdb databases.\n\n    This is essentailly a reference counting transaction register, lots of room\n    for improvement here.\n    \"\"\"\n\n    def __init__(self):\n        self.WriterAncestors = Counter()\n        self.ReaderAncestors = Counter()\n        self.WriterTxn: MutableMapping[lmdb.Environment, lmdb.Transaction] = {}\n        self.ReaderTxn: MutableMapping[lmdb.Environment, lmdb.Transaction] = {}\n\n    @property\n    def _debug_(self):  # pragma: no cover\n        return {\n            '__class__': self.__class__,\n            'WriterAncestors': self.WriterAncestors,\n            'ReaderAncestors': self.ReaderAncestors,\n            'WriterTxn': self.WriterTxn,\n            'ReaderTxn': self.ReaderTxn,\n        }\n\n    def begin_writer_txn(self, lmdbenv: lmdb.Environment,\n                         buffer: bool = False) -> lmdb.Transaction:\n        \"\"\"Start a write enabled transaction on the given environment\n\n        If multiple write transactions are requested for the same handle, only\n        one instance of the transaction handle will be returened, and will not\n        close until all operations on that handle have requested to close\n\n        Parameters\n        ----------\n        lmdbenv : lmdb.Environment\n            the environment to open the transaction on\n        buffer : bool, optional\n            if buffer objects should be used (the default is False, which does\n            not use buffers)\n\n        Returns\n        -------\n        lmdb.Transaction\n            transaction handle to perform operations on\n        \"\"\"\n        if self.WriterAncestors[lmdbenv] == 0:\n            self.WriterTxn[lmdbenv] = lmdbenv.begin(write=True, buffers=buffer)\n        self.WriterAncestors[lmdbenv] += 1\n        return self.WriterTxn[lmdbenv]\n\n    def begin_reader_txn(self, lmdbenv: lmdb.Environment,\n                         buffer: bool = False) -> lmdb.Transaction:\n        \"\"\"Start a reader only txn for the given environment\n\n        If there a read-only transaction for the same environment already exists\n        then the same reader txn handle will be returned, and will not close\n        until all operations on that handle have said they are finished.\n\n        Parameters\n        ----------\n        lmdbenv : lmdb.Environment\n            the environment to start the transaction in.\n        buffer : bool, optional\n            weather a buffer transaction should be used (the default is False,\n            which means no buffers are returned)\n\n        Returns\n        -------\n        lmdb.Transaction\n            handle to the lmdb transaction.\n        \"\"\"\n        if self.ReaderAncestors[lmdbenv] == 0:\n            self.ReaderTxn[lmdbenv] = lmdbenv.begin(write=False, buffers=buffer)\n        self.ReaderAncestors[lmdbenv] += 1\n        return self.ReaderTxn[lmdbenv]\n\n    def commit_writer_txn(self, lmdbenv: lmdb.Environment) -> bool:\n        \"\"\"Commit changes made in a write-enable transaction handle\n\n        As multiple objects can have references to the same open transaction handle,\n        the data is not actually committed until all open transactions have called\n        the commit method.\n\n        Parameters\n        ----------\n        lmdbenv : lmdb.Environment\n            the environment handle used to open the transaction\n\n        Raises\n        ------\n        RuntimeError\n            If the internal reference counting gets out of sync\n\n        Returns\n        -------\n        bool\n            True if this operation actually committed, otherwise false\n            if other objects have references to the same (open) handle\n        \"\"\"\n        ancestors = self.WriterAncestors[lmdbenv]\n        if ancestors == 0:\n            msg = f'hash ancestors are zero but commit called on {lmdbenv}'\n            raise RuntimeError(msg)\n        elif ancestors == 1:\n            self.WriterTxn[lmdbenv].commit()\n            self.WriterTxn.__delitem__(lmdbenv)\n            ret = True\n        else:\n            ret = False\n        self.WriterAncestors[lmdbenv] -= 1\n        return ret\n\n    def abort_reader_txn(self, lmdbenv: lmdb.Environment) -> bool:\n        \"\"\"Request to close a read-only transaction handle\n\n        As multiple objects can have references to the same open transaction\n        handle, the transaction is not actuall aborted until all open transactions\n        have called the abort method\n\n\n        Parameters\n        ----------\n        lmdbenv : lmdb.Environment\n            the environment handle used to open the transaction\n\n        Raises\n        ------\n        RuntimeError\n            If the internal reference counting gets out of sync.\n\n        Returns\n        -------\n        bool\n            True if this operation actually aborted the transaction,\n            otherwise False if other objects have references to the same (open)\n            handle.\n        \"\"\"\n        ancestors = self.ReaderAncestors[lmdbenv]\n        if ancestors == 0:\n            raise RuntimeError(f'hash ancestors are zero but abort called')\n        elif ancestors == 1:\n            self.ReaderTxn[lmdbenv].abort()\n            self.ReaderTxn.__delitem__(lmdbenv)\n            ret = True\n        else:\n            ret = False\n        self.ReaderAncestors[lmdbenv] -= 1\n        return ret\n"
  },
  {
    "path": "src/hangar/typesystem/__init__.py",
    "content": "from .descriptors import (\n    Descriptor, OneOf, DictItems, EmptyDict, SizedIntegerTuple, checkedmeta\n)\nfrom .ndarray import NdarrayVariableShape, NdarrayFixedShape\nfrom .pystring import StringVariableShape\nfrom .pybytes import BytesVariableShape\n\n__all__ = [\n    'Descriptor', 'OneOf', 'DictItems', 'EmptyDict', 'SizedIntegerTuple',\n    'checkedmeta', 'NdarrayVariableShape', 'NdarrayFixedShape',\n    'StringVariableShape', 'BytesVariableShape'\n]\n"
  },
  {
    "path": "src/hangar/typesystem/base.py",
    "content": "from .descriptors import OneOf, String, checkedmeta\nfrom ..records import hash_func_from_tcode\n\n\n@OneOf(['flat', 'nested'])\nclass ColumnLayout(String):\n    pass\n\n\n@OneOf(['str', 'ndarray', 'bytes'])\nclass ColumnDType(String):\n    pass\n\n\n@OneOf(['1'])\nclass SchemaHasherTcode(String):\n    pass\n\n\nclass ColumnBase(metaclass=checkedmeta):\n    _column_layout = ColumnLayout()\n    _column_type = ColumnDType()\n    _schema_hasher_tcode = SchemaHasherTcode()\n\n    def __init__(\n            self,\n            column_layout,\n            column_type,\n            data_hasher_tcode,\n            schema_hasher_tcode=None,\n            *args, **kwargs\n    ):\n        if schema_hasher_tcode is None:\n            schema_hasher_tcode = '1'\n\n        self._column_layout = column_layout\n        self._column_type = column_type\n        self._schema_hasher_tcode = schema_hasher_tcode\n        self._data_hasher_tcode = data_hasher_tcode\n        self._schema_attributes = [\n            '_column_layout',\n            '_column_type',\n            '_schema_hasher_tcode',\n            '_data_hasher_tcode',\n        ]\n        self._schema_hasher_func = hash_func_from_tcode(self._schema_hasher_tcode)\n        self._data_hasher_func = hash_func_from_tcode(self._data_hasher_tcode)\n        self._hidden_be_opts = None\n\n    @property\n    def _beopts(self):\n        from ..backends import BACKEND_OPTIONS_MAP\n        if self._hidden_be_opts is None:\n            self._hidden_be_opts = BACKEND_OPTIONS_MAP[self.backend](\n                backend_options=self.backend_options,\n                dtype=self.dtype,\n                shape=(self.shape if hasattr(self, '_shape') else None))\n        return self._hidden_be_opts\n\n    @_beopts.deleter\n    def _beopts(self):\n        self._hidden_be_opts = None\n\n    @_beopts.setter\n    def _beopts(self, backend_options):\n        from ..backends import BACKEND_OPTIONS_MAP\n        self._hidden_be_opts = BACKEND_OPTIONS_MAP[self.backend](\n            backend_options=backend_options,\n            dtype=self.dtype,\n            shape=(self.shape if hasattr(self, '_shape') else None))\n\n    @property\n    def column_layout(self):\n        return self._column_layout\n\n    @property\n    def column_type(self):\n        return self._column_type\n\n    @property\n    def schema_hasher_tcode(self):\n        return self._schema_hasher_tcode\n\n    @property\n    def schema(self):\n        schema_dict = {}\n        public_attr_names = [attr.lstrip('_') for attr in self._schema_attributes]\n        for attr in public_attr_names:\n            schema_dict[attr] = getattr(self, f'_{attr}')\n        return schema_dict\n\n    def schema_hash_digest(self):\n        return self._schema_hasher_func(self.schema)\n\n    def backend_from_heuristics(self, *args, **kwargs):\n        raise NotImplementedError\n\n    def verify_data_compatible(self, *args, **kwargs):\n        raise NotImplementedError\n\n    @property\n    def data_hasher_tcode(self):\n        return self._data_hasher_tcode\n\n    def data_hash_digest(self, *args, **kwargs):\n        raise NotImplementedError\n\n"
  },
  {
    "path": "src/hangar/typesystem/descriptors.py",
    "content": "\"\"\"\nPortions of this code have been taken and modified from the book:\n\nBeazley, D. and B. K. Jones (2013). Python Cookbook, O’Reilly Media, Inc.\n\nChapter: 8.13. Implementing a Data Model or Type System\n\n===============================================================================\n\nProblem\n-------\n\nYou want to define various kinds of data structures, but want to enforce\nconstraints on the values that are allowed to be assigned to certain\nattributes.\n\nSolution\n--------\n\nIn this problem, you are basically faced with the task of placing checks or\nassertions on the setting of certain instance attributes. To do this, you need\nto customize the setting of attributes on a per-attribute basis. To do this,\nyou should use descriptors.\n\nThis recipe involves a number of advanced techniques, including descriptors,\nmixin classes, the use of super(), class decorators, and metaclasses. Covering\nthe basics of all those topics is beyond what can be covered here; However,\nthere are a number of subtle points worth noting.\n\nFirst, in the Descriptor base class, you will notice that there is a __set__()\nmethod, but no corresponding __get__(). If a descriptor will do nothing more\nthan extract an identically named value from the underlying instance\ndictionary, defining __get__() is unnecessary. In fact, defining __get__() will\njust make it run slower. Thus, this recipe only focuses on the implementation\nof __set__().\n\nThe overall design of the various descriptor classes is based on mixin classes.\nFor example, the Unsigned and MaxSized classes are meant to be mixed with the\nother descriptor classes derived from Typed. To handle a specific kind of data\ntype, multiple inheritance is used to combine the desired functionality.\n\nYou will also notice that all __init__() methods of the various descriptors\nhave been programmed to have an identical signature involving keyword arguments\n**opts. The class for MaxSized looks for its required attribute in opts, but\nsimply passes it along to the Descriptor base class, which actually sets it.\nOne tricky part about composing classes like this (especially mixins), is that\nyou don’t always know how the classes are going to be chained together or what\nsuper() will invoke. For this reason, you need to make it work with any\npossible combination of classes.\n\nThe definitions of the various type classes such as Integer, Float, and String\nillustrate a useful technique of using class variables to customize an\nimplementation. The Ty ped descriptor merely looks for an expected_type\nattribute that is provided by each of those subclasses.\n\nThe use of a class decorator or metaclass is often useful for simplifying the\nspecification by the user.\n\nThe code for the class decorator and metaclass simply scan the class dictionary\nlooking for descriptors. When found, they simply fill in the descriptor name\nbased on the key value.\n\nAs a final twist, a class decorator approach can also be used as a replacement\nfor mixin classes, multiple inheritance, and tricky use of the super() function\n\nThe classes defined in this alternative formulation work in exactly the same\nmanner as before (none of the earlier example code changes) except that\neverything runs much faster. For example, a simple timing test of setting a\ntyped attribute reveals that the class decorator approach runs almost 100%\nfaster than the approach using mixins.\n\"\"\"\nfrom typing import Sequence\n\n\nclass Descriptor:\n    # Base class. Uses a descriptor to set a value\n    def __init__(self, name=None, **opts):\n        self.name = name\n        self.__dict__.update(opts)\n\n    def __set__(self, instance, value):\n        instance.__dict__[self.name] = value\n\n\ndef Typed(expected_type, cls=None):\n    # Decorator for applying type checking\n    if cls is None:\n        return lambda cls: Typed(expected_type, cls)\n\n    super_set = cls.__set__\n\n    def __set__(self, instance, value):\n        if not isinstance(value, expected_type):\n            raise TypeError('expected ' + str(expected_type))\n        super_set(self, instance, value)\n\n    cls.__set__ = __set__\n    return cls\n\n\ndef TypedSequence(expected_element_types, cls=None):\n    # Decorator enforcing that all elements in an sequence are specific type(s).\n    # using the python ABC definition of \"Sequence\" (list, tuple)\n    # https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence\n    if cls is None:\n        return lambda cls: TypedSequence(expected_element_types, cls)\n\n    super_set = cls.__set__\n    def __set__(self, instance, value):\n        if not isinstance(value, Sequence):\n            raise TypeError(f'input is not Sequence type, recieved {type(value)}')\n        elif not all([isinstance(el, expected_element_types) for el in value]):\n            raise TypeError(f'not all elements are {expected_element_types} type(s) in {value}')\n        super_set(self, instance, value)\n    cls.__set__ = __set__\n    return cls\n\n\ndef OneOf(expected_values, cls=None):\n    # Decorator for enforcing values\n    if cls is None:\n        return lambda cls: OneOf(expected_values, cls)\n\n    super_set = cls.__set__\n    def __set__(self, instance, value):\n        if value not in expected_values:\n            raise ValueError(f'expected one of {expected_values} recieved {value}')\n        super_set(self, instance, value)\n    cls.__set__ = __set__\n    return cls\n\n\ndef MaxSized(cls):\n    # Decorator for allowing sized values\n    super_init = cls.__init__\n    def __init__(self, name=None, **opts):\n        if 'size' not in opts:\n            raise TypeError('missing size option')\n        self.size = opts['size']\n        super_init(self, name, **opts)\n    cls.__init__ = __init__\n    super_set = cls.__set__\n    def __set__(self, instance, value):\n        if len(value) > self.size:\n            raise ValueError('size must be < ' + str(self.size))\n        super_set(self, instance, value)\n    cls.__set__ = __set__\n    return cls\n\n\ndef DictItems(expected_keys_required, expected_values, cls=None):\n    # check a dictionary for the existence of keys. expected_keys should be a dictionary of keys,\n    # with bool values set to indicate if they are required or not. expected_values should be\n    # mapping of same keys to list of acceptable values.\n    if cls is None:\n        return lambda cls: DictItems(expected_keys_required, expected_values, cls)\n\n    super_set = cls.__set__\n    def __set__(self, instance, value):\n        if not isinstance(value, dict):\n            raise TypeError(f'expected {dict}, recieved {type(value)}')\n        for expected_key, required in expected_keys_required.items():\n            try:\n                if value[expected_key] not in expected_values[expected_key]:\n                    raise ValueError(f'{value[expected_key]} invalid for key {expected_key}')\n            except KeyError as e:\n                if required:\n                    raise e\n        for recieved_key in value.keys():\n            if recieved_key not in expected_keys_required:\n                raise TypeError(f'Not supposed to have key {recieved_key}')\n        super_set(self, instance, value)\n    cls.__set__ = __set__\n    return cls\n\n\n@Typed(str)\nclass String(Descriptor):\n    pass\n\n\n@DictItems(expected_keys_required={},\n           expected_values={},)\nclass EmptyDict(Descriptor):\n    pass\n\n\n@Typed((dict, type(None)))\nclass OptionalDict(Descriptor):\n    pass\n\n\n@Typed((str, type(None)))\nclass OptionalString(Descriptor):\n    pass\n\n\n@Typed(tuple)\nclass Tuple(Descriptor):\n    pass\n\n\n@MaxSized\n@TypedSequence(int)\nclass SizedIntegerTuple(Tuple):\n    pass\n\n\nclass checkedmeta(type):\n    # A metaclass that applies checking\n    def __new__(cls, clsname, bases, methods):\n        # Attach attribute names to the descriptors\n        for key, value in methods.items():\n            if isinstance(value, Descriptor):\n                value.name = key\n        return type.__new__(cls, clsname, bases, methods)\n"
  },
  {
    "path": "src/hangar/typesystem/ndarray.py",
    "content": "import numpy as np\n\nfrom .base import ColumnBase\nfrom .descriptors import OneOf, String, OptionalString, SizedIntegerTuple, OptionalDict\nfrom ..records import CompatibleData\n\n\n@OneOf(['variable_shape', 'fixed_shape'])\nclass NdarraySchemaType(String):\n    pass\n\n\n@OneOf(['ndarray'])\nclass NdarrayColumnType(String):\n    pass\n\n\n@OneOf(['0'])\nclass DataHasherTcode(String):\n    pass\n\n\nclass NdarraySchemaBase(ColumnBase):\n    _schema_type = NdarraySchemaType()\n    _column_type = NdarrayColumnType()\n    _data_hasher_tcode = DataHasherTcode()\n\n    def __init__(\n            self,\n            shape,\n            dtype,\n            backend=None,\n            backend_options=None,\n            *args, **kwargs\n    ):\n        if 'data_hasher_tcode' not in kwargs:\n            kwargs['data_hasher_tcode'] = '0'\n        super().__init__(*args, **kwargs)\n\n        if backend_options is not None and backend is None:\n            raise ValueError(\n                '`backend_options` cannot be set if `backend` is not also provided.')\n\n        if not isinstance(dtype, str):\n            dtype = np.dtype(dtype).name\n        self._dtype = dtype\n        self._shape = shape\n        self._backend = backend\n        self._backend_options = backend_options\n        self._schema_attributes.extend(\n            ['_schema_type', '_shape', '_dtype', '_backend', '_backend_options'])\n\n    def backend_from_heuristics(self):\n        # uncompressed numpy memmap data is most appropriate for data whose shape is\n        # likely small tabular row data (CSV or such...)\n        if (len(self._shape) == 1) and (self._shape[0] < 400):\n            backend = '10'\n        # hdf5 is the default backend for larger array sizes.\n        elif (len(self._shape) == 1) and (self._shape[0] <= 10_000_000):\n            backend = '00'\n        # on fixed arrays sized arrays apply optimizations.\n        elif self._schema_type == 'fixed_shape':\n            backend = '01'\n        else:\n            backend = '00'\n        self._backend = backend\n\n    @property\n    def schema_type(self):\n        return self._schema_type\n\n    @property\n    def shape(self):\n        return self._shape\n\n    @property\n    def dtype(self):\n        return np.dtype(self._dtype)\n\n    @property\n    def backend(self):\n        return self._backend\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    def data_hash_digest(self, data: np.ndarray) -> str:\n        return self._data_hasher_func(data)\n\n    def change_backend(self, backend, backend_options=None):\n        old_backend = self._backend\n        old_backend_options = self._backend_options\n        try:\n            del self._beopts\n            self._backend = backend\n            self._beopts = backend_options\n            self._backend_options = self._beopts.backend_options\n        except (TypeError, ValueError) as e:\n            del self._beopts\n            self._backend = old_backend\n            self._beopts = old_backend_options\n            self._backend_options = self._beopts.backend_options\n            raise e from None\n\n    def data_nbytes(self, obj: np.ndarray):\n        return obj.nbytes\n\n\n@OneOf(['00', '01', '10', '50', None])\nclass NdarrayFixedShapeBackends(OptionalString):\n    pass\n\n\n@OneOf(['fixed_shape'])\nclass FixedShapeSchemaType(String):\n    pass\n\n\nclass NdarrayFixedShape(NdarraySchemaBase):\n    _shape = SizedIntegerTuple(size=31)\n    _dtype = String()\n    _backend = NdarrayFixedShapeBackends()\n    _backend_options = OptionalDict()\n    _schema_type = FixedShapeSchemaType()\n\n    def __init__(self, *args, **kwargs):\n        if 'column_type' in kwargs:\n            super().__init__(*args, **kwargs)\n        else:\n            super().__init__(column_type='ndarray', *args, **kwargs)\n\n        if 'schema_type' in kwargs:\n            self._schema_type = kwargs['schema_type']\n        else:\n            self._schema_type = 'fixed_shape'\n\n        if self.backend is None:\n            self.backend_from_heuristics()\n        self._backend_options = self._beopts.backend_options\n\n    def verify_data_compatible(self, data):\n        compatible = True\n        reason = ''\n\n        if not isinstance(data, np.ndarray):\n            compatible = False\n            reason = f'`data` argument type: {type(data)} != `np.ndarray`'\n        elif data.dtype != self._dtype:\n            compatible = False\n            reason = f'dtype: {data.dtype.name} != aset: {self._dtype}.'\n        elif not data.flags.c_contiguous:\n            compatible = False\n            reason = f'`data` must be \"C\" contiguous array.'\n        elif data.shape != self._shape:\n            compatible = False\n            reason = f'data shape {data.shape} != fixed schema {self._shape}'\n\n        res = CompatibleData(compatible, reason)\n        return res\n\n\n@OneOf(['00', '10', '50', None])\nclass NdarrayVariableShapeBackends(OptionalString):\n    pass\n\n\n@OneOf(['variable_shape'])\nclass VariableShapeSchemaType(String):\n    pass\n\n\nclass NdarrayVariableShape(NdarraySchemaBase):\n    _shape = SizedIntegerTuple(size=31)\n    _dtype = String()\n    _backend = NdarrayVariableShapeBackends()\n    _backend_options = OptionalDict()\n    _schema_type = VariableShapeSchemaType()\n\n    def __init__(self, *args, **kwargs):\n        if 'column_type' in kwargs:\n            super().__init__(*args, **kwargs)\n        else:\n            super().__init__(column_type='ndarray', *args, **kwargs)\n\n        if 'schema_type' in kwargs:\n            self._schema_type = kwargs['schema_type']\n        else:\n            self._schema_type = 'variable_shape'\n\n        if self.backend is None:\n            self.backend_from_heuristics()\n        self._backend_options = self._beopts.backend_options\n\n    def verify_data_compatible(self, data):\n        compatible = True\n        reason = ''\n\n        if not isinstance(data, np.ndarray):\n            compatible = False\n            reason = f'`data` argument type: {type(data)} != `np.ndarray`'\n        elif data.dtype != self._dtype:\n            compatible = False\n            reason = f'dtype: {data.dtype.name} != aset: {self._dtype}.'\n        elif not data.flags.c_contiguous:\n            compatible = False\n            reason = f'`data` must be \"C\" contiguous array.'\n        elif data.ndim != len(self._shape):\n            compatible = False\n            reason = f'data rank {data.ndim} != aset rank {len(self._shape)}'\n        elif not all([(dim <= maxdim) for dim, maxdim in zip(data.shape, self._shape)]):\n            compatible = False\n            reason = f'shape {data.shape} exceeds schema max {self._shape}'\n\n        res = CompatibleData(compatible, reason)\n        return res\n"
  },
  {
    "path": "src/hangar/typesystem/pybytes.py",
    "content": "from .base import ColumnBase\nfrom .descriptors import OneOf, Descriptor, String, OptionalString, OptionalDict\nfrom ..records import CompatibleData\nfrom ..utils import format_bytes\n\n\n@OneOf(['<class\\'bytes\\'>'])\nclass BytesDType(Descriptor):\n    pass\n\n\nSERIAL_DTYPE_TO_OBJ = {\n    '<class\\'bytes\\'>': bytes,\n}\n\n\n@OneOf(['variable_shape'])\nclass BytesSchemaType(String):\n    pass\n\n\n@OneOf(['bytes'])\nclass BytesColumnType(String):\n    pass\n\n\n@OneOf(['3'])\nclass DataHasherTcode(String):\n    pass\n\n\nclass BytesSchemaBase(ColumnBase):\n    _schema_type = BytesSchemaType()\n    _column_type = BytesColumnType()\n    _data_hasher_tcode = DataHasherTcode()\n\n    def __init__(\n            self,\n            dtype,\n            backend=None,\n            backend_options=None,\n            *args, **kwargs\n    ):\n        if 'data_hasher_tcode' not in kwargs:\n            kwargs['data_hasher_tcode'] = '3'\n        super().__init__(*args, **kwargs)\n\n        if backend_options is not None and backend is None:\n            raise ValueError(\n                '`backend_options` cannot be set if `backend` is not also provided.')\n\n        if not isinstance(dtype, str):\n            dtype = repr(dtype).replace(' ', '')\n\n        self._dtype = dtype\n        self._backend = backend\n        self._backend_options = backend_options\n        self._schema_attributes.extend(\n            ['_schema_type', '_dtype', '_backend', '_backend_options']\n        )\n\n    def backend_from_heuristics(self):\n        self._backend = '31'\n\n    @property\n    def schema_type(self):\n        return self._schema_type\n\n    @property\n    def dtype(self):\n        return SERIAL_DTYPE_TO_OBJ[self._dtype]\n\n    @property\n    def backend(self):\n        return self._backend\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    def data_hash_digest(self, data: str) -> str:\n        return self._data_hasher_func(data)\n\n    def change_backend(self, backend, backend_options=None):\n        old_backend = self._backend\n        old_backend_options = self._backend_options\n        try:\n            self._backend = backend\n            self._backend_options = backend_options\n            # del and reset beopts object to reverify input correctness.\n            del self._beopts\n            self._backend_options = self._beopts.backend_options\n        except (TypeError, ValueError) as e:\n            del self._beopts\n            self._backend = old_backend\n            self._backend_options = old_backend_options\n            self._backend_options = self._beopts.backend_options\n            raise e from None\n\n\n@OneOf(['31', '50', None])\nclass BytesVariableShapeBackends(OptionalString):\n    pass\n\n\n@OneOf(['variable_shape'])\nclass VariableShapeSchemaType(String):\n    pass\n\n\nclass BytesVariableShape(BytesSchemaBase):\n    _dtype = BytesDType()\n    _backend = BytesVariableShapeBackends()\n    _backend_options = OptionalDict()\n    _schema_type = VariableShapeSchemaType()\n\n    def __init__(self, *args, **kwargs):\n        if 'column_type' in kwargs:\n            super().__init__(*args, **kwargs)\n        else:\n            super().__init__(column_type='bytes', *args, **kwargs)\n\n        if 'schema_type' in kwargs:\n            self._schema_type = kwargs['schema_type']\n        else:\n            self._schema_type = 'variable_shape'\n\n        if self.backend is None:\n            self.backend_from_heuristics()\n        self._backend_options = self._beopts.backend_options\n\n    def verify_data_compatible(self, data):\n        compatible = True\n        reason = ''\n        if not isinstance(data, bytes):\n            compatible = False\n            reason = f'data {data} not valid, must be of type {bytes} not{type(data)}'\n        elif len(data) > 2000000:  # 2MB\n            compatible = False\n            reason = f'bytes must be less than 2MB in size, recieved {format_bytes(len(data))}'\n\n        res = CompatibleData(compatible, reason)\n        return res\n"
  },
  {
    "path": "src/hangar/typesystem/pystring.py",
    "content": "from .base import ColumnBase\nfrom .descriptors import OneOf, Descriptor, String, OptionalString, OptionalDict\nfrom ..records import CompatibleData\nfrom ..utils import format_bytes\n\n\n@OneOf(['<class\\'str\\'>'])\nclass StringDType(Descriptor):\n    pass\n\n\nSERIAL_DTYPE_TO_OBJ = {\n    '<class\\'str\\'>': str,\n}\n\n\n@OneOf(['variable_shape'])\nclass StringSchemaType(String):\n    pass\n\n\n@OneOf(['str'])\nclass StrColumnType(String):\n    pass\n\n\n@OneOf(['2'])\nclass DataHasherTcode(String):\n    pass\n\n\nclass StringSchemaBase(ColumnBase):\n    _schema_type = StringSchemaType()\n    _column_type = StrColumnType()\n    _data_hasher_tcode = DataHasherTcode()\n\n    def __init__(\n            self,\n            dtype,\n            backend=None,\n            backend_options=None,\n            *args, **kwargs\n    ):\n        if 'data_hasher_tcode' not in kwargs:\n            kwargs['data_hasher_tcode'] = '2'\n        super().__init__(*args, **kwargs)\n\n        if backend_options is not None and backend is None:\n            raise ValueError(\n                '`backend_options` cannot be set if `backend` is not also provided.')\n\n        if not isinstance(dtype, str):\n            dtype = repr(dtype).replace(' ', '')\n\n        self._dtype = dtype\n        self._backend = backend\n        self._backend_options = backend_options\n        self._schema_attributes.extend(\n            ['_schema_type', '_dtype', '_backend', '_backend_options']\n        )\n\n    def backend_from_heuristics(self):\n        self._backend = '30'\n\n    @property\n    def schema_type(self):\n        return self._schema_type\n\n    @property\n    def dtype(self):\n        return SERIAL_DTYPE_TO_OBJ[self._dtype]\n\n    @property\n    def backend(self):\n        return self._backend\n\n    @property\n    def backend_options(self):\n        return self._backend_options\n\n    def data_hash_digest(self, data: str) -> str:\n        return self._data_hasher_func(data)\n\n    def change_backend(self, backend, backend_options=None):\n        old_backend = self._backend\n        old_backend_options = self._backend_options\n        try:\n            self._backend = backend\n            self._backend_options = backend_options\n            # del and reset beopts object to reverify input correctness.\n            del self._beopts\n            self._backend_options = self._beopts.backend_options\n        except (TypeError, ValueError) as e:\n            del self._beopts\n            self._backend = old_backend\n            self._backend_options = old_backend_options\n            self._backend_options = self._beopts.backend_options\n            raise e from None\n\n    def data_nbytes(self, obj: str):\n        return len(obj.encode())\n\n\n@OneOf(['30', '50', None])\nclass StringVariableShapeBackends(OptionalString):\n    pass\n\n\n@OneOf(['variable_shape'])\nclass VariableShapeSchemaType(String):\n    pass\n\n\nclass StringVariableShape(StringSchemaBase):\n    _dtype = StringDType()\n    _backend = StringVariableShapeBackends()\n    _backend_options = OptionalDict()\n    _schema_type = VariableShapeSchemaType()\n\n    def __init__(self, *args, **kwargs):\n        if 'column_type' in kwargs:\n            super().__init__(*args, **kwargs)\n        else:\n            super().__init__(column_type='str', *args, **kwargs)\n\n        if 'schema_type' in kwargs:\n            self._schema_type = kwargs['schema_type']\n        else:\n            self._schema_type = 'variable_shape'\n\n        if self.backend is None:\n            self.backend_from_heuristics()\n        self._backend_options = self._beopts.backend_options\n\n    def verify_data_compatible(self, data):\n        compatible = True\n        reason = ''\n\n        if not isinstance(data, str):\n            compatible = False\n            reason = f'data {data} must be {str} type, not {type(data)}'\n        elif len(data.encode()) > 2000000:  # 2MB\n            compatible = False\n            reason = f'str bytes must be less than 2MB in size, recieved {format_bytes(len(data))}'\n\n        res = CompatibleData(compatible, reason)\n        return res\n"
  },
  {
    "path": "src/hangar/utils.py",
    "content": "import os\nimport re\nimport secrets\nimport string\nimport sys\nimport time\nfrom collections import deque\nfrom io import StringIO\nfrom pathlib import Path\nfrom itertools import tee, filterfalse, count, zip_longest\nfrom typing import Union\n\nimport blosc\n\n\nNumType = Union[int, float]\n\n\ndef bound(low: NumType, high: NumType, value: NumType) -> NumType:\n    \"\"\"Bound value such that ``low <= value <= high``\n\n    >>> bound(0, 100, 10)\n    10\n    >>> bound(0, 100, -1)\n    -1\n    >>> bound(0, 100, 500)\n    100\n    >>> bound(-5, -2, -3)\n    -3\n    >>> bound(-6.0, -5.0, 0.1)\n    -5.0\n    >>> bound(0.0, 5, 3.5)\n    3.5\n    \"\"\"\n    return max(low, min(high, value))\n\n\ndef calc_num_threadpool_workers() -> int:\n    nCores = os.cpu_count()  # includes hyperthreads\n    return bound(2, 10, nCores * 2)\n\n\ndef is_64bits():\n    \"\"\"bool indicating if running on atleast a 64 bit machine\n    \"\"\"\n    return sys.maxsize > 2 ** 32\n\n\ndef set_blosc_nthreads() -> int:  # pragma: no cover\n    \"\"\"set the blosc library to two less than the core count on the system.\n\n    If less than 2 cores are ncores-2, we set the value to two.\n\n    Returns\n    -------\n    int\n        ncores blosc will use on the system\n    \"\"\"\n    nCores = blosc.detect_number_of_cores()\n    if nCores == 1:\n        nUsed = 1\n    elif nCores == 2:\n        nUsed = 2\n    elif nCores <= 4:\n        nUsed = nCores - 1\n    else:\n        nUsed = nCores - 2\n    blosc.set_nthreads(nUsed)\n    return nUsed\n\n\ndef random_string(\n    n: int = 8,\n    *, _ALPHABET=''.join([string.ascii_lowercase, string.digits])\n) -> str:\n    \"\"\"Generate a random string of lowercase ascii letters and digits.\n\n    Parameters\n    ----------\n    n: int, optional\n        The number of characters which the output string will have. Default=8\n    \"\"\"\n    token = [secrets.choice(_ALPHABET) for i in range(n)]\n    return ''.join(token)\n\n\n_SuitableCharRE = re.compile(r'[\\w\\.\\-\\_]+\\Z', flags=re.ASCII)\n\n\ndef is_suitable_user_key(key: Union[str, int]) -> bool:\n    \"\"\"Checks if only alpha-numeric ascii chars or ['.', '-' '_'] (no whitespace)\n\n    Necessary because python 3.6 does not have a str.isascii() method. In\n    addition, checks that all input keys are less than 64 characters long.\n\n    Parameters\n    ----------\n    key : Union[str, int]\n        string to check if it contains only ascii characters\n\n    Returns\n    -------\n    bool\n        True if only ascii characters in the string, else False.\n    \"\"\"\n    try:\n        if isinstance(key, int) and (key >= 0):\n            str_data = str(key)\n        elif isinstance(key, str):\n            str_data = str(key)\n        else:\n            raise TypeError\n        if len(str_data) > 64:\n            return False\n        return bool(_SuitableCharRE.match(str_data))\n    except TypeError:\n        return False\n\n\ndef is_ascii(str_data: str) -> bool:\n    \"\"\"Checks if string contains only ascii chars.\n\n    Necessary because python 3.6 does not have a str.isascii() method.\n\n    Parameters\n    ----------\n    str_data : str\n        string to check if it contains only ascii characters\n\n    Returns\n    -------\n    bool\n        True if only ascii characters in the string, else False.\n    \"\"\"\n    try:\n        str_data.encode('ascii')\n    except (UnicodeEncodeError, AttributeError):\n        return False\n    return True\n\n\ndef pairwise(iterable):\n    \"s -> (s0,s1), (s1,s2), (s2, s3), ...\"\n    a, b = tee(iterable)\n    next(b, None)\n    return zip(a, b)\n\n\ndef unique_everseen(iterable, key=None):\n    \"\"\"List unique elements, preserving order. Remember all elements ever seen.\n\n    >>> list(unique_everseen('AAAABBBCCDAABBB'))\n    ['A', 'B', 'C', 'D']\n    >>> list(unique_everseen('ABBCcAD', str.lower))\n    ['A', 'B', 'C', 'D']\n    \"\"\"\n    seen = set()\n    seen_add = seen.add\n    if key is None:\n        for element in filterfalse(seen.__contains__, iterable):\n            seen_add(element)\n            yield element\n    else:\n        for element in iterable:\n            k = key(element)\n            if k not in seen:\n                seen_add(k)\n                yield element\n\n\ndef ilen(iterable):\n    \"\"\"Return the number of items in *iterable*.\n\n        >>> ilen(x for x in range(1000000) if x % 3 == 0)\n        333334\n        >>> it = iter([0, 1, 2, False])\n        >>> ilen(it)\n        4\n\n    This consumes the iterable, so handle with care.\n    \"\"\"\n    counter = count()\n    deque(zip(iterable, counter), maxlen=0)\n    return next(counter)\n\n\ndef grouper(iterable, n, fillvalue=None):\n    \"\"\"split iterable into n sized groups upon each call to `next()`\n\n    >>> for grp in grouper([(x, x*2) for x in range(4)], 2):\n    ...     print(grp)\n    [(0, 0), (1, 2)]\n    [(2, 4), (3, 6)]\n    >>> for grp in grouper([x for x in range(5)], 2, fillvalue=None):\n    ...     print(grp)\n    [0, 1]\n    [2, 3]\n    >>> for grp in grouper([(x, x*2) for x in range(5)], 2, fillvalue=('FOO', 'BAR')):\n    ...     print(grp)\n    [(0, 0), (1, 2)]\n    [(2, 4), (3, 6)]\n    [(4, 8), ('FOO', 'BAR')]\n    \"\"\"\n    args = [iter(iterable)] * n\n    return zip_longest(*args, fillvalue=fillvalue)\n\n\ndef file_size(p: Path) -> int:  # pragma: no cover\n    \"\"\"Query the file size of a specific file\n\n    Parameters\n    ----------\n    p : Path\n        path to a file that exists on disk.\n\n    Raises\n    ------\n    FileNotFoundError\n        if the file does not exist\n\n    Returns\n    -------\n    int\n        nbytes the file consumes on disk.\n    \"\"\"\n    if not p.is_file():\n        err = f'Cannot query size of: {str(p)}. File does not exist'\n        raise FileNotFoundError(err)\n    return p.stat().st_size\n\n\ndef folder_size(p: Path, *, recurse: bool = False) -> int:\n    \"\"\"size of all files in a folder.\n\n    Default is to not include subdirectories. Set \"recurse=True\"\n    to enable recursive calculation.\n\n    Parameters\n    ----------\n    p : Path\n        path to the repository on disk.\n    recurse : bool, kwarg-only\n        to calculate the full size of the repo (Default value = False)\n\n    Returns\n    -------\n    int\n        number of bytes used up in the repo_path\n    \"\"\"\n    total = 0\n    for entry in p.iterdir():\n        if entry.is_file() and not entry.is_symlink():\n            total += entry.stat().st_size\n        elif recurse and entry.is_dir() and not entry.is_symlink():\n            total += folder_size(entry.resolve(), recurse=True)\n    return total\n\n\ndef is_valid_directory_path(p: Path) -> Path:\n    \"\"\"Check if path is directory which user has write permission to.\n\n    Parameters\n    ----------\n    p : Path\n        path to some location on disk\n\n    Returns\n    -------\n    Path\n        If successful, the path with any user constructions expanded\n        (ie. `~/somedir` -> `/home/foo/somedir`)\n\n    Raises\n    ------\n    TypeError\n        If the provided path argument is not a pathlike object\n    NotADirectoryError\n        If the path does not exist, or is not a directory on disk\n    PermissionError\n        If the user does not have write access to the specified path\n    \"\"\"\n    if not isinstance(p, Path):\n        msg = f'Path arg `p`: {p} of type: {type(p)} is not valid path specifier'\n        raise TypeError(msg)\n\n    usr_path = p.expanduser().resolve(strict=True)\n\n    if not usr_path.is_dir():\n        msg = f'Path arg `p`: {p} is not a directory.'\n        raise NotADirectoryError(msg)\n    if not os.access(str(usr_path), os.W_OK):  # pragma: no cover\n        msg = f'User does not have permission to write to directory path: {p}'\n        raise PermissionError(msg)\n\n    return usr_path\n\n\n# ----------------- human & machine nbytes ------------------------------------\n\n\ndef format_bytes(n: int) -> str:\n    \"\"\" Format bytes as text\n    >>> format_bytes(1)\n    '1.00 B'\n    >>> format_bytes(1234)\n    '1.23 kB'\n    >>> format_bytes(12345678)\n    '12.35 MB'\n    >>> format_bytes(1234567890)\n    '1.23 GB'\n    >>> format_bytes(1234567890000)\n    '1.23 TB'\n    >>> format_bytes(1234567890000000)\n    '1.23 PB'\n    \"\"\"\n    for x in ['B', 'kB', 'MB', 'GB', 'TB', 'PB']:\n        if n < 1000.0:\n            return \"%3.2f %s\" % (n, x)\n        n /= 1000.0\n\n\n_byte_sizes = {\n    'kb': 1000,\n    'mb': 1000000,\n    'gb': 1000000000,\n    'tb': 1000000000000,\n    'pb': 1000000000000000,\n    'kib': 1024,\n    'mib': 1048576,\n    'gib': 1073741824,\n    'tib': 1099511627776,\n    'pib': 1125899906842624,\n    'b': 1,\n    '': 1,\n    'k': 1000,\n    'm': 1000000,\n    'g': 1000000000,\n    't': 1000000000000,\n    'p': 1000000000000000,\n    'ki': 1024,\n    'mi': 1048576,\n    'gi': 1073741824,\n    'ti': 1099511627776,\n    'pi': 1125899906842624\n}\n\n\ndef parse_bytes(s: str) -> int:\n    \"\"\" Parse byte string to numbers\n    >>> parse_bytes('100')\n    100\n    >>> parse_bytes('100 MB')\n    100000000\n    >>> parse_bytes('100M')\n    100000000\n    >>> parse_bytes('5kB')\n    5000\n    >>> parse_bytes('5.4 kB')\n    5400\n    >>> parse_bytes('1kiB')\n    1024\n    >>> parse_bytes('1e6')\n    1000000\n    >>> parse_bytes('1e6 kB')\n    1000000000\n    >>> parse_bytes('MB')\n    1000000\n    \"\"\"\n    s = s.replace(' ', '').lower()\n    s = f'1{s}' if not s[0].isdigit() else s\n    for i in range(len(s) - 1, -1, -1):\n        if not s[i].isalpha():\n            break\n\n    n = float(s[:i + 1])\n    mult = _byte_sizes[s[i + 1:]]\n    return int(n * mult)\n\n\ndef readme_contents(user_name: str, user_email: str) -> StringIO:\n    \"\"\"Create the Hangar README.txt contents used to fill out file on repo initialization\n\n    Parameters\n    ----------\n    user_name : str\n        name of the user initializing the repository on the machine.\n    user_email : str\n        email of the user initializing the repository on the machine.\n\n    Returns\n    -------\n    StringIO\n        Buffered string text ready to be sent to a file writer.\n    \"\"\"\n    from . import __version__\n    from .constants import DIR_HANGAR\n\n    buf = StringIO()\n    buf.write(f'This directory has been used to initialize a Hangar Repository\\n')\n    buf.write(f'\\n')\n    buf.write(f'This repository was initialized by:\\n')\n    buf.write(f'    User Name:        {user_name}\\n')\n    buf.write(f'    User Email:       {user_email}\\n')\n    buf.write(f'    Creation Time:    {time.asctime(time.gmtime())} UTC\\n')\n    buf.write(f'    Software Version: {__version__}\\n')\n    buf.write(f'\\n')\n    buf.write(f'NOTE: The repository may have been updated to work with newer Hangar versions\\n')\n    buf.write(f'since initialization.\\n')\n    buf.write(f'\\n')\n    buf.write(f'Do not modify the contents of this `{DIR_HANGAR}` folder under any circumstances.\\n')\n    buf.write(f'The contents are not meant to be understood by humans. Doing so will result\\n')\n    buf.write(f'in data loss / corruption.\\n')\n    buf.write(f'\\n')\n    buf.write(f'The project homepage can be found at: https://github.com/tensorwerk/hangar-py/ \\n')\n    buf.write(f'Documention is available at: https://hangar-py.readthedocs.io/en/latest/ \\n')\n    buf.write(f'\\n')\n    buf.write(f'NOTE: If this Repository has been initialized in a directory under traditional\\n')\n    buf.write(f'version control systems, please add `{DIR_HANGAR}` as an ignored directory path.\\n')\n    buf.write(f'Failure to do so will result in undesireable performance of version control\\n')\n    buf.write(f'systems meant for text/code such as Git, Mercurial, Subversion, etc.\\n')\n\n    return buf\n"
  },
  {
    "path": "tests/bulk_importer/test_bulk_importer.py",
    "content": "import pytest\nimport numpy as np\n\n\ndef assert_equal(arr, arr2):\n    assert np.array_equal(arr, arr2)\n    assert arr.dtype == arr2.dtype\n\n\ndef test_bulk_importer_ndarray(repo):\n    from hangar.bulk_importer import run_bulk_import\n    from hangar.bulk_importer import UDF_Return\n\n    def make_ndarray(column, key, shape, dtype, multiplier):\n        size = np.prod(shape)\n        arr = np.arange(size, dtype=dtype).reshape(shape) * multiplier\n        yield UDF_Return(column=column, key=key, data=arr)\n\n    co = repo.checkout(write=True)\n    co.add_ndarray_column('arr', shape=(5, 5), dtype=np.uint32)\n    co.commit('first')\n    co.close()\n\n    kwargs = []\n    expected_kv = []\n    for idx in range(200):\n        _kw_dict = {\n            'column': 'arr',\n            'key': idx,\n            'shape': (5, 5),\n            'dtype': np.uint32,\n            'multiplier': idx\n        }\n        kwargs.append(_kw_dict)\n\n        for _udf_val in make_ndarray(**_kw_dict):\n            expected_kv.append(_udf_val)\n    assert len(expected_kv) == 200\n\n    run_bulk_import(\n        repo,\n        branch_name='master',\n        column_names=['arr'],\n        udf=make_ndarray,\n        udf_kwargs=kwargs,\n        ncpus=2)\n\n    co = repo.checkout()\n    try:\n        arr_col = co['arr']\n        assert len(arr_col) == 200\n        for _expected_udf_val in expected_kv:\n            assert _expected_udf_val.key in arr_col\n            assert_equal(arr_col[_expected_udf_val.key], _expected_udf_val.data)\n    finally:\n        co.close()\n\n\ndef test_bulk_importer_pystr(repo):\n    from hangar.bulk_importer import run_bulk_import\n    from hangar.bulk_importer import UDF_Return\n\n    def make_pystr(column, key, str_val):\n        yield UDF_Return(column=column, key=key, data=str_val)\n\n    co = repo.checkout(write=True)\n    co.add_str_column('str')\n    co.commit('first')\n    co.close()\n\n    kwargs = []\n    expected_kv = []\n    for idx in range(200):\n        _kw_dict = {\n            'column': 'str',\n            'key': idx,\n            'str_val': f'{str(idx) * 2}',\n        }\n        kwargs.append(_kw_dict)\n\n        for _udf_val in make_pystr(**_kw_dict):\n            expected_kv.append(_udf_val)\n    assert len(expected_kv) == 200\n\n    run_bulk_import(\n        repo,\n        branch_name='master',\n        column_names=['str'],\n        udf=make_pystr,\n        udf_kwargs=kwargs,\n        ncpus=2)\n\n    co = repo.checkout()\n    try:\n        str_col = co['str']\n        assert len(str_col) == 200\n        for _expected_udf_val in expected_kv:\n            assert _expected_udf_val.key in str_col\n            assert str_col[_expected_udf_val.key] == _expected_udf_val.data\n    finally:\n        co.close()\n\n\ndef test_bulk_importer_pybytes(repo):\n    from hangar.bulk_importer import run_bulk_import\n    from hangar.bulk_importer import UDF_Return\n\n    def make_pybytes(column, key, str_val):\n        raw = str_val.encode()\n        yield UDF_Return(column=column, key=key, data=raw)\n\n    co = repo.checkout(write=True)\n    co.add_bytes_column('bytes')\n    co.commit('first')\n    co.close()\n\n    kwargs = []\n    expected_kv = []\n    for idx in range(200):\n        _kw_dict = {\n            'column': 'bytes',\n            'key': idx,\n            'str_val': f'{str(idx) * 2}',\n        }\n        kwargs.append(_kw_dict)\n\n        for _udf_val in make_pybytes(**_kw_dict):\n            expected_kv.append(_udf_val)\n    assert len(expected_kv) == 200\n\n    run_bulk_import(\n        repo,\n        branch_name='master',\n        column_names=['bytes'],\n        udf=make_pybytes,\n        udf_kwargs=kwargs,\n        ncpus=2)\n\n    co = repo.checkout()\n    try:\n        bytes_col = co['bytes']\n        assert len(bytes_col) == 200\n        for _expected_udf_val in expected_kv:\n            assert _expected_udf_val.key in bytes_col\n            assert bytes_col[_expected_udf_val.key] == _expected_udf_val.data\n    finally:\n        co.close()\n\n\ndef test_bulk_importer_two_col_pybytes_pystr(repo):\n    from hangar.bulk_importer import run_bulk_import\n    from hangar.bulk_importer import UDF_Return\n\n    def _make_pystr(column, key, str_val):\n        yield UDF_Return(column=column, key=key, data=str_val)\n\n    def _make_pybytes(column, key, str_val):\n        raw = str_val.encode()\n        yield UDF_Return(column=column, key=key, data=raw)\n\n    def make_pystr_pybytes(str_col, bytes_col, key, str_val):\n        yield from _make_pystr(column=str_col, key=key, str_val=str_val)\n        yield from _make_pybytes(column=bytes_col, key=key, str_val=str_val)\n\n    co = repo.checkout(write=True)\n    co.add_bytes_column('bytes')\n    co.add_str_column('str')\n    co.commit('first')\n    co.close()\n\n    kwargs = []\n    expected_kv = []\n    for idx in range(200):\n        _kw_dict = {\n            'str_col': 'str',\n            'bytes_col': 'bytes',\n            'key': idx,\n            'str_val': f'{str(idx) * 2}',\n        }\n        kwargs.append(_kw_dict)\n\n        for _udf_val in make_pystr_pybytes(**_kw_dict):\n            expected_kv.append(_udf_val)\n    assert len(expected_kv) == 400\n\n    run_bulk_import(\n        repo,\n        branch_name='master',\n        column_names=['bytes', 'str'],\n        udf=make_pystr_pybytes,\n        udf_kwargs=kwargs,\n        ncpus=2)\n\n    co = repo.checkout()\n    try:\n        pybytes_col = co['bytes']\n        pystr_col = co['str']\n        assert len(pybytes_col) == 200\n        assert len(pystr_col) == 200\n        for _expected_udf_val in expected_kv:\n            assert _expected_udf_val.column in ['str', 'bytes']\n            if _expected_udf_val.column == 'str':\n                assert _expected_udf_val.key in pystr_col\n                assert pystr_col[_expected_udf_val.key] == _expected_udf_val.data\n            elif _expected_udf_val.column == 'bytes':\n                assert _expected_udf_val.key in pystr_col\n                assert pybytes_col[_expected_udf_val.key] == _expected_udf_val.data\n            else:\n                raise ValueError(_expected_udf_val.column)\n    finally:\n        co.close()\n\n\ndef test_signature_wrong(repo):\n    from hangar.bulk_importer import run_bulk_import\n    from hangar.bulk_importer import UDF_Return\n\n    def wrong_sig_udf(a, b, c=None):\n        yield UDF_Return(column='str', key=a, data=f'{a} {b} {c}')\n\n    co = repo.checkout(write=True)\n    co.add_str_column('str')\n    co.commit('first')\n    co.close()\n\n    kwargs = []\n    for idx in range(200):\n        _kw_dict = {\n            'a': 'bytes',\n            'str_val': f'{str(idx) * 2}',\n        }\n        kwargs.append(_kw_dict)\n\n    with pytest.raises(TypeError):\n        run_bulk_import(\n            repo,\n            branch_name='master',\n            column_names=['str'],\n            udf=wrong_sig_udf,\n            udf_kwargs=kwargs,\n            ncpus=2)\n"
  },
  {
    "path": "tests/conftest.py",
    "content": "import time\nimport shutil\nimport random\nfrom os.path import join as pjoin\nfrom os import mkdir\n\nimport pytest\nimport numpy as np\n\nfrom hangar import Repository\nfrom hangar.checkout import WriterCheckout\nimport hangar\n\n\nvariable_shape_backend_params = ['00', '10']\nfixed_shape_backend_params = ['00', '01', '10']\n\n\n@pytest.fixture(scope=\"session\")\ndef monkeysession(request):\n    from _pytest.monkeypatch import MonkeyPatch\n    mpatch = MonkeyPatch()\n    yield mpatch\n    mpatch.undo()\n\n\n@pytest.fixture(scope='class')\ndef classrepo(tmp_path_factory) -> Repository:\n    old00_count = hangar.backends.hdf5_00.COLLECTION_COUNT\n    old00_size = hangar.backends.hdf5_00.COLLECTION_SIZE\n    old01_count = hangar.backends.hdf5_01.COLLECTION_COUNT\n    old01_size = hangar.backends.hdf5_01.COLLECTION_SIZE\n    old10_size = hangar.backends.numpy_10.COLLECTION_SIZE\n    old30_lmdb_settings = hangar.backends.lmdb_30.LMDB_SETTINGS\n    old31_lmdb_settings = hangar.backends.lmdb_31.LMDB_SETTINGS\n    hangar.backends.hdf5_00.COLLECTION_COUNT = 20\n    hangar.backends.hdf5_00.COLLECTION_SIZE = 20\n    hangar.backends.hdf5_01.COLLECTION_COUNT = 20\n    hangar.backends.hdf5_01.COLLECTION_SIZE = 20\n    hangar.backends.numpy_10.COLLECTION_SIZE = 100\n    hangar.backends.lmdb_30.LMDB_SETTINGS['map_size'] = 1_000_000\n    hangar.backends.lmdb_31.LMDB_SETTINGS['map_size'] = 1_000_000\n\n    old_map_size = hangar.constants.LMDB_SETTINGS['map_size']\n    hangar.constants.LMDB_SETTINGS['map_size'] = 2_000_000\n    hangar.txnctx.TxnRegisterSingleton._instances = {}\n\n    pth = tmp_path_factory.mktemp('classrepo')\n    repo_obj = Repository(path=str(pth), exists=False)\n    repo_obj.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    yield repo_obj\n    hangar.constants.LMDB_SETTINGS['map_size'] = old_map_size\n    hangar.backends.hdf5_00.COLLECTION_COUNT = old00_count\n    hangar.backends.hdf5_00.COLLECTION_SIZE = old00_size\n    hangar.backends.hdf5_01.COLLECTION_COUNT = old01_count\n    hangar.backends.hdf5_01.COLLECTION_SIZE = old01_size\n    hangar.backends.numpy_10.COLLECTION_SIZE = old10_size\n    hangar.backends.lmdb_30.LMDB_SETTINGS = old30_lmdb_settings\n    hangar.backends.lmdb_31.LMDB_SETTINGS = old31_lmdb_settings\n    repo_obj._env._close_environments()\n\n\n@pytest.fixture()\ndef managed_tmpdir(monkeypatch, tmp_path):\n    monkeypatch.setitem(hangar.constants.LMDB_SETTINGS, 'map_size', 2_000_000)\n    monkeypatch.setitem(hangar.backends.lmdb_30.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeypatch.setitem(hangar.backends.lmdb_31.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeypatch.setattr(hangar.backends.hdf5_00, 'COLLECTION_COUNT', 20)\n    monkeypatch.setattr(hangar.backends.hdf5_00, 'COLLECTION_SIZE', 20)\n    monkeypatch.setattr(hangar.backends.hdf5_01, 'COLLECTION_COUNT', 20)\n    monkeypatch.setattr(hangar.backends.hdf5_01, 'COLLECTION_SIZE', 20)\n    monkeypatch.setattr(hangar.backends.numpy_10, 'COLLECTION_SIZE', 100)\n    hangar.txnctx.TxnRegisterSingleton._instances = {}\n    yield tmp_path\n    shutil.rmtree(tmp_path)\n\n\n\n@pytest.fixture(scope='class')\ndef managed_tmpdir_class(monkeysession, tmp_path_factory):\n    pth = tmp_path_factory.mktemp('classrepo2', numbered=True)\n    tmp_path = str(pth)\n    monkeysession.setitem(hangar.constants.LMDB_SETTINGS, 'map_size', 2_000_000)\n    monkeysession.setitem(hangar.backends.lmdb_30.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeysession.setitem(hangar.backends.lmdb_31.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeysession.setattr(hangar.backends.hdf5_00, 'COLLECTION_COUNT', 20)\n    monkeysession.setattr(hangar.backends.hdf5_00, 'COLLECTION_SIZE', 20)\n    monkeysession.setattr(hangar.backends.hdf5_01, 'COLLECTION_COUNT', 20)\n    monkeysession.setattr(hangar.backends.hdf5_01, 'COLLECTION_SIZE', 20)\n    monkeysession.setattr(hangar.backends.numpy_10, 'COLLECTION_SIZE', 100)\n    hangar.txnctx.TxnRegisterSingleton._instances = {}\n    yield tmp_path\n    shutil.rmtree(tmp_path)\n\n\n\n\n@pytest.fixture()\ndef repo(managed_tmpdir) -> Repository:\n    repo_obj = Repository(path=managed_tmpdir, exists=False)\n    repo_obj.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    yield repo_obj\n    repo_obj._env._close_environments()\n\n\n@pytest.fixture()\ndef aset_samples_initialized_repo(repo) -> Repository:\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float64)\n    co.commit('this is a commit message')\n    co.close()\n    yield repo\n\n\n@pytest.fixture()\ndef aset_subsamples_initialized_repo(repo) -> Repository:\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(\n        name='writtenaset', shape=(5, 7), dtype=np.float64, contains_subsamples=True)\n    co.commit('this is a commit message')\n    co.close()\n    yield repo\n\n\n@pytest.fixture(params=fixed_shape_backend_params)\ndef repo_20_filled_samples(request, aset_samples_initialized_repo, array5by7) -> Repository:\n    co = aset_samples_initialized_repo.checkout(write=True)\n    second_aset = co.add_ndarray_column('second_aset', prototype=array5by7, backend=request.param)\n    first_aset = co.columns['writtenaset']\n    for i in range(0, 20):\n        array5by7[:] = i\n        first_aset[str(i)] = array5by7\n        array5by7[:] = -i\n        second_aset[str(i)] = array5by7\n    co.commit('20 samples')\n    co.close()\n    yield aset_samples_initialized_repo\n\n\n@pytest.fixture(params=fixed_shape_backend_params)\ndef repo_20_filled_subsamples(request, aset_subsamples_initialized_repo, array5by7) -> Repository:\n    co = aset_subsamples_initialized_repo.checkout(write=True)\n    second_aset = co.add_ndarray_column('second_aset', prototype=array5by7,\n                                        backend=request.param, contains_subsamples=True)\n    firstaset = co['writtenaset']\n    secondaset = co['second_aset']\n    array5by7[:] = 1\n    firstaset[0] = {1: array5by7 * 1, 2: array5by7 * 2, 3: array5by7 * 3}\n    firstaset[1] = {4: array5by7 * 4, 5: array5by7 * 5, 6: array5by7 * 6}\n    secondaset[0] = {1: array5by7 * 10, 2: array5by7 * 20, 3: array5by7 * 30}\n    secondaset[1] = {4: array5by7 * 40, 5: array5by7 * 50, 6: array5by7 * 60}\n    co.commit('added data')\n    co.close()\n    yield aset_subsamples_initialized_repo\n\n\n@pytest.fixture(params=fixed_shape_backend_params)\ndef repo_300_filled_samples(request, aset_samples_initialized_repo, array5by7) -> Repository:\n    co = aset_samples_initialized_repo.checkout(write=True)\n    aset = co.add_ndarray_column('aset', prototype=array5by7, backend=request.param)\n    with aset:\n        for i in range(300):\n            array5by7[:] = i\n            aset[i] = array5by7\n    co.commit('1000 samples')\n    co.close()\n    yield aset_samples_initialized_repo\n\n\n@pytest.fixture()\ndef repo_20_filled_samples2(repo) -> Repository:\n    # for diff testing\n    dummyData = np.arange(50).astype(np.int64)\n    co1 = repo.checkout(write=True, branch='master')\n    co1.add_ndarray_column(name='dummy', prototype=dummyData)\n    for idx in range(10):\n        dummyData[:] = idx\n        co1.columns['dummy'][idx] = dummyData\n    co1.commit('first commit adding dummy data and hello meta')\n    co1.close()\n    return repo\n\n\n@pytest.fixture(params=variable_shape_backend_params)\ndef aset_samples_var_shape_initialized_repo(request, repo) -> Repository:\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(\n        name='writtenaset', shape=(10, 10), dtype=np.float64, variable_shape=True, backend=request.param)\n    co.commit('this is a commit message')\n    co.close()\n    yield repo\n\n\n@pytest.fixture()\ndef aset_samples_initialized_w_checkout(aset_samples_initialized_repo) -> WriterCheckout:\n    co = aset_samples_initialized_repo.checkout(write=True)\n    yield co\n    co.close()\n\n\n@pytest.fixture()\ndef array5by7():\n    return np.random.random((5, 7))\n\n\n@pytest.fixture()\ndef randomsizedarray():\n    a = random.randint(2, 8)\n    b = random.randint(2, 8)\n    return np.random.random((a, b))\n\n\n@pytest.fixture(params=fixed_shape_backend_params)\ndef two_commit_filled_samples_repo(request, repo, array5by7) -> Repository:\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(\n        name='writtenaset', shape=(5, 7), dtype=np.float32, backend=request.param)\n    for cIdx in range(2):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range((cIdx + 1) * 5):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n        co.commit(f'commit number: {cIdx}')\n        co.close()\n    yield repo\n\n\n@pytest.fixture()\ndef repo_1_br_no_conf(repo) -> Repository:\n\n    dummyData = np.arange(50)\n    co1 = repo.checkout(write=True, branch='master')\n    co1.add_ndarray_column(name='dummy', prototype=dummyData)\n    for idx in range(10):\n        dummyData[:] = idx\n        co1.columns['dummy'][str(idx)] = dummyData\n    co1.commit('first commit adding dummy data')\n    co1.close()\n\n    repo.create_branch('testbranch')\n    co2 = repo.checkout(write=True, branch='testbranch')\n    for idx in range(10, 20):\n        dummyData[:] = idx\n        co2.columns['dummy'][str(idx)] = dummyData\n        co2.columns['dummy'][idx] = dummyData\n    co2.commit('first commit on test branch adding non-conflict data')\n    co2.close()\n    return repo\n\n\n@pytest.fixture()\ndef repo_2_br_no_conf(repo_1_br_no_conf) -> Repository:\n\n    dummyData = np.arange(50)\n    repo = repo_1_br_no_conf\n    co1 = repo.checkout(write=True, branch='master')\n    for idx in range(20, 30):\n        dummyData[:] = idx\n        co1.columns['dummy'][str(idx)] = dummyData\n        co1.columns['dummy'][idx] = dummyData\n    co1.commit('second commit on master adding non-conflict data')\n    co1.close()\n    return repo\n\n\ndef mock_server_config(*args, **kwargs):\n    import os\n    import configparser\n    from pathlib import Path\n    from hangar import constants as c\n    from hangar import remote\n\n    src_path = Path(os.path.dirname(remote.__file__), c.CONFIG_SERVER_NAME)\n    CFG = configparser.ConfigParser()\n    CFG.read(src_path)\n    CFG['SERVER_GRPC']['max_concurrent_rpcs'] = '16'\n    CFG['SERVER_GRPC']['max_thread_pool_workers'] = '4'\n    return CFG\n\n\n@pytest.fixture()\ndef server_instance(monkeypatch, managed_tmpdir, worker_id):\n    from secrets import choice\n    from hangar.remote import server\n    monkeypatch.setattr(server, 'server_config', mock_server_config)\n\n    possibble_addresses = [x for x in range(50000, 59999)]\n    chosen_address = choice(possibble_addresses)\n    address = f'localhost:{chosen_address}'\n    base_tmpdir = pjoin(managed_tmpdir, f'{worker_id[-1]}')\n    mkdir(base_tmpdir)\n    server, hangserver, _ = server.serve(base_tmpdir, overwrite=True, channel_address=address)\n    server.start()\n    yield address\n\n    hangserver.close()\n    server.stop(0.1)\n    server.wait_for_termination(timeout=2)\n\n\n@pytest.fixture(scope='class')\ndef server_instance_class(monkeysession, tmp_path_factory, worker_id):\n    from secrets import choice\n    from hangar.remote import server\n    monkeysession.setattr(server, 'server_config', mock_server_config)\n    monkeysession.setitem(hangar.constants.LMDB_SETTINGS, 'map_size', 2_000_000)\n    monkeysession.setitem(hangar.backends.lmdb_30.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeysession.setitem(hangar.backends.lmdb_31.LMDB_SETTINGS, 'map_size', 1_000_000)\n    monkeysession.setattr(hangar.backends.hdf5_00, 'COLLECTION_COUNT', 20)\n    monkeysession.setattr(hangar.backends.hdf5_00, 'COLLECTION_SIZE', 20)\n    monkeysession.setattr(hangar.backends.hdf5_01, 'COLLECTION_COUNT', 20)\n    monkeysession.setattr(hangar.backends.hdf5_01, 'COLLECTION_SIZE', 20)\n    monkeysession.setattr(hangar.backends.numpy_10, 'COLLECTION_SIZE', 100)\n\n    possibble_addresses = [x for x in range(50000, 59999)]\n    chosen_address = choice(possibble_addresses)\n    address = f'localhost:{chosen_address}'\n    base_tmpdir = tmp_path_factory.mktemp(f'{worker_id[-1]}')\n    server, hangserver, _ = server.serve(str(base_tmpdir), overwrite=True, channel_address=address)\n    server.start()\n    yield address\n\n    hangserver.close()\n    server.stop(0.1)\n    server.wait_for_termination(timeout=2)\n\n\n@pytest.fixture()\ndef written_two_cmt_server_repo(server_instance, two_commit_filled_samples_repo) -> tuple:\n    time.sleep(0.1)  # wait for ready\n    two_commit_filled_samples_repo.remote.add('origin', server_instance)\n    success = two_commit_filled_samples_repo.remote.push('origin', 'master')\n    assert success == 'master'\n    yield (server_instance, two_commit_filled_samples_repo)\n\n\n@pytest.fixture()\ndef server_instance_push_restricted(monkeypatch, managed_tmpdir, worker_id):\n    from hangar.remote import server\n    from secrets import choice\n    monkeypatch.setattr(server, 'server_config', mock_server_config)\n\n    possibble_addresses = [x for x in range(50000, 59999)]\n    chosen_address = choice(possibble_addresses)\n    address = f'localhost:{chosen_address}'\n    base_tmpdir = pjoin(managed_tmpdir, f'{worker_id[-1]}')\n    mkdir(base_tmpdir)\n    server, hangserver, _ = server.serve(base_tmpdir,\n                                         overwrite=True,\n                                         channel_address=address,\n                                         restrict_push=True,\n                                         username='right_username',\n                                         password='right_password')\n    server.start()\n    yield address\n\n    hangserver.env._close_environments()\n    hangserver.close()\n    server.stop(0.1)\n    server.wait_for_termination(timeout=2)\n\n"
  },
  {
    "path": "tests/ml_datasets/test_dataset.py",
    "content": "import sys\n\nimport numpy as np\nimport pytest\nfrom torch.utils.data import DataLoader\nimport warnings\nwith warnings.catch_warnings():\n    warnings.simplefilter('ignore', category=DeprecationWarning)\n    import tensorflow as tf\ntf.compat.v1.enable_eager_execution()\n\nfrom hangar.dataset import make_numpy_dataset\nfrom hangar.dataset import make_torch_dataset\nfrom hangar.dataset import make_tensorflow_dataset\nfrom hangar.dataset.common import HangarDataset\n\n\nclass TestInternalDatasetClass:\n\n    def test_column_without_wrapping_list(self, repo_20_filled_samples, array5by7):\n        co = repo_20_filled_samples.checkout()\n        first_col = co.columns['writtenaset']\n        second_col = co.columns['second_aset']\n        dataset = HangarDataset((first_col, second_col))\n        key1, key2 = dataset._keys[0]\n        assert key1 == key2\n        target = array5by7[:] = int(key1)\n        assert np.allclose(dataset.index_get(0), target)\n        co.close()\n\n    def test_no_column(self):\n        with pytest.raises(TypeError):\n            HangarDataset([])\n\n    def test_fails_on_write_enabled_columns(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout(write=True)\n        first_aset = co.columns['writtenaset']\n        with pytest.raises(PermissionError):\n            HangarDataset((first_aset,))\n        co.close()\n\n    @pytest.mark.filterwarnings(\"ignore:Column.* writtenaset contains `reference-only` samples\")\n    def test_columns_without_local_data_and_without_key_argument(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        from hangar.backends import backend_decoder\n\n        # perform a mock for nonlocal data\n        for k in co._columns._columns['writtenaset']._samples:\n            co._columns._columns['writtenaset']._samples[k] = backend_decoder(b'50:daeaaeeaebv')\n        col = co.columns['writtenaset']\n        with pytest.raises(RuntimeError):\n            HangarDataset((col,))\n\n        # perform a mock for nonlocal data\n        co = repo.checkout()\n        template = backend_decoder(b'50:daeaaeeaebv')\n        co._columns._columns['writtenaset']._samples['4'] = template\n        col = co.columns['writtenaset']\n        dataset = HangarDataset((col,))\n        dataset_available_keys = dataset._keys\n        assert len(dataset_available_keys) == 19\n        assert '4' not in dataset_available_keys\n        column_reported_local_keys = list(col.keys(local=True))\n        for dset_avail_key in dataset_available_keys:\n            assert dset_avail_key in column_reported_local_keys\n        assert len(dataset_available_keys) == len(column_reported_local_keys)\n        co.close()\n\n    def test_columns_without_common_keys_and_without_key_argument(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout(write=True)\n        first_col = co.columns['writtenaset']\n        first_col['AnExtraKey'] = first_col['0']\n        co.commit('added an extra key')\n        co.close()\n        co = repo_20_filled_samples.checkout()\n        first_col = co.columns['writtenaset']\n        second_col = co.columns['second_aset']\n        with pytest.raises(KeyError):\n            HangarDataset((first_col, second_col))\n        co.close()\n\n    def test_keys_single_column_success(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout()\n        first_col = co.columns['writtenaset']\n        keys = ['1', '2', '3']\n        dataset = HangarDataset((first_col,), keys=keys)\n        assert dataset._keys == keys\n        co.close()\n\n    def test_keys_multiple_column_success(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout()\n        first_col = co.columns['writtenaset']\n        second_col = co.columns['second_aset']\n        keys = [('1', '2'), ('2', '3'), ('3', '4')]\n        dataset = HangarDataset((first_col, second_col), keys=keys)\n        for i, key in enumerate(keys):\n            data = dataset.index_get(i)\n            assert np.allclose(data[0], first_col[key[0]])\n            assert np.allclose(data[1], second_col[key[1]])\n        co.close()\n\n    def test_keys_nested_column_success(self, repo_20_filled_subsamples):\n        co = repo_20_filled_subsamples.checkout()\n        col1 = co['writtenaset']\n        col2 = co['second_aset']\n\n        dataset = HangarDataset([col1, col2])\n        data = dataset.index_get(1)\n        assert tuple(data[0].keys()) == tuple(data[1].keys()) == (4, 5, 6)\n        assert isinstance(data[0], dict)\n        assert isinstance(data[1], dict)\n\n        keys = (((0, ...), (0, 1)), ((1, ...), (1, 4)))\n        dataset = HangarDataset([col1, col2], keys=keys)\n        data = dataset.index_get(1)\n        assert tuple(data[0].keys()) == (4, 5, 6)\n        assert np.allclose(data[1], col2[1][4])\n        co.close()\n\n    def test_keys_not_valid(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout()\n        first_col = co.columns['writtenaset']\n        keys = ['w', 'r', 'o', 'n', 'g']\n        dataset = HangarDataset((first_col,), keys=keys)\n        with pytest.raises(KeyError):\n            dataset.index_get(1)\n        co.close()\n\n    @pytest.mark.filterwarnings(\"ignore:Column.* writtenaset contains `reference-only` samples\")\n    def test_keys_non_local(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        # perform a mock for nonlocal data\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        co._columns._columns['writtenaset']._samples['4'] = template\n\n        col = co.columns['writtenaset']\n        col_reported_remote_keys = col.remote_reference_keys\n        assert col_reported_remote_keys == ('4',)\n        assert len(col_reported_remote_keys) == 1\n        dataset = HangarDataset((col,), keys=('0', *col_reported_remote_keys))\n        with pytest.raises(KeyError):\n            # TODO: hangar internal should raise FileNotFoundError?\n            dataset.index_get(1)\n        co.close()\n\n# ====================================   Numpy    ====================================\n\n\n@pytest.mark.filterwarnings(\"ignore:.* experimental method\")\nclass TestNumpyDataset:\n    def test_multiple_dataset_batched_loader(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout()\n        first_aset = co.columns['writtenaset']\n        second_aset = co.columns['second_aset']\n        dset = make_numpy_dataset([first_aset, second_aset], batch_size=6, drop_last=True)\n        total_samples = 0\n        for dset1, dset2 in dset:\n            total_samples += dset1.shape[0]\n            assert dset1.shape == (6, 5, 7)\n            assert dset2.shape == (6, 5, 7)\n        assert total_samples == 18  # drop last is True\n\n        # testing with batch_size = 1\n        dset = make_numpy_dataset([first_aset, second_aset], batch_size=1, drop_last=True)\n        total_samples = 0\n        for dset1, dset2 in dset:\n            total_samples += dset1.shape[0]\n            assert dset1.shape == (1, 5, 7)\n            assert dset2.shape == (1, 5, 7)\n        assert total_samples == 20  # drop last is True will not have any effect\n\n        with pytest.raises(RuntimeError, match=\"Setting `drop_last` is a no-op when \"\n                                               \"batching is not enabled\"):\n            # Setting drop_last without batching\n            dset = make_numpy_dataset([first_aset, second_aset], batch_size=0, drop_last=True)\n        dset = make_numpy_dataset([first_aset, second_aset], batch_size=0)\n        total_samples = 0\n        for dset1, dset2 in dset:\n            total_samples += 1\n            assert dset1.shape == (5, 7)\n            assert dset2.shape == (5, 7)\n        assert total_samples == 20\n        co.close()\n\n    def test_nested_column(self, repo_20_filled_subsamples):\n        co = repo_20_filled_subsamples.checkout()\n        col1 = co['writtenaset']\n        col2 = co['second_aset']\n        dset = make_numpy_dataset([col1, col2])\n        for data1, data2 in dset:\n            assert isinstance(data1, dict)\n            assert isinstance(data2, dict)\n            assert tuple(data1.keys()) == tuple(data2.keys())\n\n        dset = make_numpy_dataset([col1, col2], batch_size=1, drop_last=True)\n        for data1, data2 in dset:\n            assert type(data1) is type(data2) is tuple\n            assert len(data1) == len(data2) == 1\n            assert tuple(data1[0].keys()) == tuple(data2[0].keys())\n\n        dset = make_numpy_dataset([col1, col2], batch_size=2, drop_last=True)\n        for data1, data2 in dset:\n            assert len(data1) == len(data2) == 2\n        co.close()\n\n    def test_lots_of_data_with_multiple_backend(self, repo_300_filled_samples):\n        repo = repo_300_filled_samples\n        co = repo.checkout()\n        aset = co.columns['aset']\n        np_dset = make_numpy_dataset([aset], batch_size=10, drop_last=True)\n        for data in np_dset:\n            assert isinstance(data, np.ndarray)\n            assert data.shape == (10, 5, 7)\n        co.close()\n\n    def test_shuffle(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n\n        unshuffled_dataset = make_numpy_dataset((first_aset,),\n                                                keys=[str(i) for i in range(15)],\n                                                shuffle=False)\n        expected_unshuffled_content = [i for i in range(15)]\n        recieved_unshuffled_content = []\n        for data in unshuffled_dataset:\n            recieved_unshuffled_content.append(int(data[0][0]))\n        assert expected_unshuffled_content == recieved_unshuffled_content\n\n        shuffled_dataset = make_numpy_dataset((first_aset,),\n                                              keys=[str(i) for i in range(15)],\n                                              shuffle=True)\n        recieved_shuffled_content = []\n        for data in shuffled_dataset:\n            recieved_shuffled_content.append(int(data[0][0]))\n        assert recieved_shuffled_content != expected_unshuffled_content\n        co.close()\n\n    def test_collate_fn(self, repo_20_filled_subsamples):\n        co = repo_20_filled_subsamples.checkout()\n        col1 = co['writtenaset']\n        col2 = co['second_aset']\n        keys = (((0, ...), (0, 1)), ((1, ...), (1, 4)))\n\n        dataset = make_numpy_dataset([col1, col2], keys=keys,\n                                     shuffle=False, batch_size=2)\n        col1data, col2data = next(iter(dataset))\n        assert isinstance(col1data, tuple)\n        assert isinstance(col2data, np.ndarray)\n        assert list(col1data[0].keys()) == [1, 2, 3]\n        assert list(col1data[1].keys()) == [4, 5, 6]\n        assert np.allclose(col2data, np.stack((col2[0][1], col2[1][4])))\n\n        def collate_fn(data_arr):\n            arr1 = []\n            arr2 = []\n            for elem in data_arr:\n                # picking one arbitrary subsample\n                k = list(elem[0].keys())[2]\n                data1 = elem[0][k]\n                data2 = elem[1]\n                arr1.append(data1)\n                arr2.append(data2)\n            return np.stack(arr1), np.stack(arr2)\n\n        dataset = make_numpy_dataset([col1, col2], keys=keys, shuffle=False,\n                                     batch_size=2, collate_fn=collate_fn)\n        col1data, col2data = next(iter(dataset))\n        assert np.allclose(col1data, np.stack((col1[0][3], col1[1][6])))\n        assert np.allclose(col2data, np.stack((col2[0][1], col2[1][4])))\n        co.close()\n\n\n# ====================================   PyTorch  ====================================\n\n\nclass TestTorchDataset(object):\n\n    def test_multiple_dataset_loader(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n        second_aset = co.columns['second_aset']\n        torch_dset = make_torch_dataset([first_aset, second_aset])\n        loader = DataLoader(torch_dset, batch_size=6, drop_last=True)\n        total_samples = 0\n        for dset1, dset2 in loader:\n            total_samples += dset1.shape[0]\n            assert dset1.shape == (6, 5, 7)\n            assert dset2.shape == (6, 5, 7)\n        assert total_samples == 18  # drop last is True\n        co.close()\n\n    def test_return_as_dict(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n        second_aset = co.columns['second_aset']\n        torch_dset = make_torch_dataset([first_aset, second_aset], as_dict=True)\n        assert len(torch_dset) == 20\n        loader = DataLoader(torch_dset, batch_size=5)\n        for sample in loader:\n            assert 'writtenaset' in sample.keys()\n            assert 'second_aset' in sample.keys()\n        co.close()\n\n    def test_lots_of_data_with_multiple_backend(self, repo_300_filled_samples):\n        repo = repo_300_filled_samples\n        co = repo.checkout()\n        aset = co.columns['aset']\n        torch_dset = make_torch_dataset([aset], as_dict=True)\n        loader = DataLoader(torch_dset, batch_size=10, drop_last=True)\n        for data in loader:\n            assert isinstance(data, dict)\n            assert data['aset'].shape == (10, 5, 7)\n        co.close()\n\n    @pytest.mark.skipif(sys.platform == \"win32\",\n                        reason=\"multiprocess workers does not run on windows\")\n    def test_lots_of_data_with_multiple_backend_multiple_worker_dataloader(self, repo_300_filled_samples):\n        repo = repo_300_filled_samples\n        co = repo.checkout()\n        aset = co.columns['aset']\n        torch_dset = make_torch_dataset([aset])\n        loader = DataLoader(torch_dset, batch_size=10, drop_last=True, num_workers=2)\n        for data in loader:\n            assert data.shape == (10, 5, 7)\n        co.close()\n\n    @pytest.mark.skipif(sys.platform == \"win32\",\n                        reason=\"multiprocess workers does not run on windows\")\n    def test_two_aset_loader_two_worker_dataloader(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n        second_aset = co.columns['second_aset']\n        torch_dset = make_torch_dataset([first_aset, second_aset])\n        loader = DataLoader(torch_dset, batch_size=2, drop_last=True, num_workers=2)\n        count = 0\n        for asets_batch in loader:\n            assert isinstance(asets_batch, list)\n            assert len(asets_batch) == 2\n            assert asets_batch[0].shape == (2, 5, 7)\n            assert asets_batch[1].shape == (2, 5, 7)\n            assert np.allclose(asets_batch[0], -asets_batch[1])\n            count += 1\n        assert count == 10\n        co.close()\n\n\n# ==================================== Tensorflow ====================================\n\n\nclass TestTfDataset(object):\n    # TODO: Add TF2.0 and 1.0 test cases\n\n    def test_dataset_loader(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n        second_aset = co.columns['second_aset']\n\n        # multiple datasets\n        tf_dset = make_tensorflow_dataset([first_aset, second_aset])\n        tf_dset = tf_dset.batch(6)\n        for dset1, dset2 in tf_dset.take(2):\n            assert dset1.shape == tf.TensorShape((6, 5, 7))\n            assert dset2.shape == tf.TensorShape((6, 5, 7))\n        co.close()\n\n    def test_variably_shaped(self, aset_samples_var_shape_initialized_repo):\n        # Variably shaped test is required since the collation is dependent on\n        # the way we return the data from generator\n        repo = aset_samples_var_shape_initialized_repo\n        co = repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        for i in range(5, 10):\n            aset[i] = np.random.random((2, i))\n        co.commit('added data')\n        co.close()\n\n        co = repo.checkout()\n        aset = co.columns['writtenaset']\n        tf_dset = make_tensorflow_dataset((aset,))\n        shape_obj = tf.TensorShape((2, None))\n        tf_dset = tf_dset.padded_batch(5, padded_shapes=(shape_obj,))\n        for val in tf_dset:\n            assert val[0].shape[0] == 5\n            assert val[0].shape[1] == 2\n            assert 11 > val[0].shape[2] > 4\n        co.close()\n\n    def test_lots_of_data_with_multiple_backend(self, repo_300_filled_samples):\n        repo = repo_300_filled_samples\n        co = repo.checkout()\n        aset = co.columns['aset']\n        tf_dset = make_tensorflow_dataset([aset])\n        tf_dset = tf_dset.batch(10)\n        for data in tf_dset:\n            assert data[0].shape == (10, 5, 7)\n        co.close()\n\n    def test_shuffle(self, repo_20_filled_samples):\n        repo = repo_20_filled_samples\n        co = repo.checkout()\n        first_aset = co.columns['writtenaset']\n        unshuffled_dataset = make_tensorflow_dataset((first_aset,),\n                                                     keys=[str(i) for i in range(15)],\n                                                     shuffle=False)\n        expected_unshuffled_content = [i for i in range(15)]\n        recieved_unshuffled_content = []\n        for data in unshuffled_dataset:\n            recieved_unshuffled_content.append(int(data[0][0][0]))\n        assert expected_unshuffled_content == recieved_unshuffled_content\n\n        shuffled_dataset = make_tensorflow_dataset((first_aset,),\n                                                   keys=[str(i) for i in range(15)],\n                                                   shuffle=True)\n        recieved_shuffled_content = []\n        for data in shuffled_dataset:\n            recieved_shuffled_content.append(int(data[0][0][0]))\n        assert recieved_shuffled_content != expected_unshuffled_content\n        co.close()\n"
  },
  {
    "path": "tests/property_based/conftest.py",
    "content": "import pytest\n\nvariable_shape_backend_params = ['00', '10']\nfixed_shape_backend_params = ['00', '01', '10']\nstr_variable_shape_backend_params = ['30']\nbytes_variable_shape_backend_params = ['31']\n"
  },
  {
    "path": "tests/property_based/test_pbt_column_flat.py",
    "content": "import pytest\nimport numpy as np\n\nfrom conftest import (\n    variable_shape_backend_params,\n    fixed_shape_backend_params,\n    str_variable_shape_backend_params,\n    bytes_variable_shape_backend_params\n)\n\nimport string\nfrom hypothesis import given, settings, HealthCheck\nimport hypothesis.strategies as st\nfrom hypothesis.extra import numpy as npst\n\nfrom hangar import Repository\n\n\n# ------------------------ Fixture Setup ------------------------------\n\n\nadded_samples = set()\n\n\n@pytest.fixture(params=fixed_shape_backend_params, scope='class')\ndef fixed_shape_repo_co_float32_aset_flat(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples\n    added_samples = set()\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.float32,\n                          variable_shape=False,\n                          backend=request.param,\n                          contains_subsamples=False)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_float32_aset_flat(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples\n    added_samples = set()\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.float32,\n                          variable_shape=True,\n                          backend=request.param,\n                          contains_subsamples=False)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_uint8_aset_flat(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples\n    added_samples = set()\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.uint8,\n                          variable_shape=True,\n                          backend=request.param,\n                          contains_subsamples=False)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=str_variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_str_aset_flat(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples\n    added_samples = set()\n    co = classrepo.checkout(write=True)\n    co.add_str_column(name='strcolumn',\n                      contains_subsamples=False,\n                      backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=bytes_variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_bytes_aset_flat(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples\n    added_samples = set()\n    co = classrepo.checkout(write=True)\n    co.add_bytes_column(name='bytescolumn',\n                        contains_subsamples=False,\n                        backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n# -------------------------- Test Generation ---------------------------------\n# Test cases are encapsulated in a classes (and fixture functions are set to\n# \"class\" level scope in order to handle a warning introduced in hypothesis\n# version 5.6.0 - 2020-02-29\n#\n# From release notes:\n# > This release adds an explicit warning for tests that are both decorated with @given(...)\n#   and request a function-scoped pytest fixture, because such fixtures are only executed once\n#   for all Hypothesis test cases and that often causes trouble. See issue #377\n#   (https://github.com/HypothesisWorks/hypothesis/issues/377)\n#\n# However, this is actually the intended behavior for hangar, since we ant to reuse\n# the same repo/checkout across all of the test case inputs that hypothesis generates\n\n\nst_valid_names = st.text(\n    min_size=1, max_size=8, alphabet=string.ascii_letters + string.digits + '_-.')\nst_valid_ints = st.integers(min_value=0, max_value=999_999)\nst_valid_keys = st.one_of(st_valid_ints, st_valid_names)\n\n\nvalid_arrays_fixed = npst.arrays(np.float32,\n                                 shape=(5, 5, 5),\n                                 fill=st.floats(min_value=-10,\n                                                max_value=10,\n                                                allow_nan=False,\n                                                allow_infinity=False,\n                                                width=32),\n                                 elements=st.floats(min_value=-10,\n                                                    max_value=10,\n                                                    allow_nan=False,\n                                                    allow_infinity=False,\n                                                    width=32))\n\n\nclass TestColumn1:\n\n    @given(key=st_valid_keys, val=valid_arrays_fixed)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_fixed_key_values(self, key, val, fixed_shape_repo_co_float32_aset_flat):\n        global added_samples\n\n        co = fixed_shape_repo_co_float32_aset_flat\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'fixed_shape'\n\n        col[key] = val\n        added_samples.add(key)\n        out = col[key]\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n        assert len(col) == len(added_samples)\n\n\nvalid_shapes_var = npst.array_shapes(min_dims=3, max_dims=3, min_side=1, max_side=5)\nvalid_arrays_var_float32 = npst.arrays(np.float32,\n                                       shape=valid_shapes_var,\n                                       fill=st.floats(min_value=-10,\n                                                      max_value=10,\n                                                      allow_nan=False,\n                                                      allow_infinity=False,\n                                                      width=32),\n                                       elements=st.floats(min_value=-10,\n                                                          max_value=10,\n                                                          allow_nan=False,\n                                                          allow_infinity=False,\n                                                          width=32))\n\n\nclass TestColumn2:\n\n    @given(key=st_valid_keys, val=valid_arrays_var_float32)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_variable_shape_float32(self, key, val, variable_shape_repo_co_float32_aset_flat):\n        global added_samples\n\n        co = variable_shape_repo_co_float32_aset_flat\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'variable_shape'\n\n        col[key] = val\n        added_samples.add(key)\n        out = col[key]\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n        assert len(col) == len(added_samples)\n\n\nvalid_arrays_var_uint8 = npst.arrays(np.uint8,\n                                     shape=valid_shapes_var,\n                                     elements=st.integers(min_value=0, max_value=255),\n                                     fill=st.integers(min_value=0, max_value=255))\n\n\nclass TestColumn3:\n\n    @given(key=st_valid_keys, val=valid_arrays_var_uint8)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_variable_shape_uint8(self, key, val, variable_shape_repo_co_uint8_aset_flat):\n        global added_samples\n\n        co = variable_shape_repo_co_uint8_aset_flat\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'variable_shape'\n\n        col[key] = val\n        added_samples.add(key)\n        out = col[key]\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n        assert len(col) == len(added_samples)\n\n\nascii_characters = st.characters(min_codepoint=0, max_codepoint=127)\nascii_text_stratagy = st.text(alphabet=ascii_characters, min_size=0, max_size=500)\n\n\nclass TestColumn4:\n\n    @given(key=st_valid_keys, val=ascii_text_stratagy)\n    @settings(max_examples=200, deadline=None)\n    def test_str_column_variable_shape(self, key, val, variable_shape_repo_co_str_aset_flat):\n        global added_samples\n\n        co = variable_shape_repo_co_str_aset_flat\n        col = co.columns['strcolumn']\n        col[key] = val\n\n        assert col.schema_type == 'variable_shape'\n        assert col.column_type == 'str'\n        assert col.dtype == str\n\n        added_samples.add(key)\n        out = col[key]\n        assert out == val\n        assert len(col) == len(added_samples)\n\n\nbytes_stratagy = st.binary(max_size=2000)\n\n\nclass TestColumn5:\n\n    @given(key=st_valid_keys, val=bytes_stratagy)\n    @settings(max_examples=200, deadline=None)\n    def test_bytes_column_variable_shape(self, key, val, variable_shape_repo_co_bytes_aset_flat):\n        global added_samples\n\n        co = variable_shape_repo_co_bytes_aset_flat\n        col = co.columns['bytescolumn']\n        col[key] = val\n\n        assert col.schema_type == 'variable_shape'\n        assert col.column_type == 'bytes'\n        assert col.dtype == bytes\n\n        added_samples.add(key)\n        out = col[key]\n        assert out == val\n        assert len(col) == len(added_samples)\n\n"
  },
  {
    "path": "tests/property_based/test_pbt_column_nested.py",
    "content": "from collections import defaultdict\n\nimport pytest\nimport numpy as np\n\nfrom conftest import (\n    variable_shape_backend_params,\n    fixed_shape_backend_params,\n    str_variable_shape_backend_params,\n    bytes_variable_shape_backend_params\n)\n\nimport string\nfrom hypothesis import given, settings, HealthCheck\nimport hypothesis.strategies as st\nfrom hypothesis.extra import numpy as npst\n\nfrom hangar import Repository\n\n\n# ------------------------ Fixture Setup ------------------------------\n\n\nadded_samples_subsamples = defaultdict(set)\n\n\n@pytest.fixture(params=fixed_shape_backend_params, scope='class')\ndef fixed_shape_repo_co_float32_aset_nested(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples_subsamples[sample_key] = set(subsample_keys)\n    global added_samples_subsamples\n    added_samples_subsamples = defaultdict(set)\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.float32,\n                          variable_shape=False,\n                          contains_subsamples=True,\n                          backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_float32_aset_nested(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples_subsamples[sample_key] = set(subsample_keys)\n    global added_samples_subsamples\n    added_samples_subsamples = defaultdict(set)\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.float32,\n                          variable_shape=True,\n                          contains_subsamples=True,\n                          backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_uint8_aset_nested(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples_subsamples[sample_key] = set(subsample_keys)\n    global added_samples_subsamples\n    added_samples_subsamples = defaultdict(set)\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset',\n                          shape=(5, 5, 5),\n                          dtype=np.uint8,\n                          variable_shape=True,\n                          contains_subsamples=True,\n                          backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=str_variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_str_aset_nested(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples_subsamples\n    added_samples_subsamples = defaultdict(set)\n    co = classrepo.checkout(write=True)\n    co.add_str_column(name='strcolumn',\n                      contains_subsamples=True,\n                      backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n@pytest.fixture(params=bytes_variable_shape_backend_params, scope='class')\ndef variable_shape_repo_co_bytes_aset_nested(classrepo, request) -> Repository:\n    # needed because fixtures don't reset between each hypothesis run\n    # tracks added_samples = set(sample_key)\n    global added_samples_subsamples\n    added_samples_subsamples = defaultdict(set)\n    co = classrepo.checkout(write=True)\n    co.add_bytes_column(name='bytescolumn',\n                        contains_subsamples=True,\n                        backend=request.param)\n    yield co\n    co.reset_staging_area()\n    co.close()\n\n\n# -------------------------- Test Generation ---------------------------------\n# Test cases are encapsulated in a classes (and fixture functions are set to\n# \"class\" level scope in order to handle a warning introduced in hypothesis\n# version 5.6.0 - 2020-02-29\n#\n# From release notes:\n# > This release adds an explicit warning for tests that are both decorated with @given(...)\n#   and request a function-scoped pytest fixture, because such fixtures are only executed once\n#   for all Hypothesis test cases and that often causes trouble. See issue #377\n#   (https://github.com/HypothesisWorks/hypothesis/issues/377)\n#\n# However, this is actually the intended behavior for hangar, since we ant to reuse\n# the same repo/checkout across all of the test case inputs that hypothesis generates\n\n\nst_valid_names = st.text(\n    min_size=1, max_size=16, alphabet=string.ascii_letters + string.digits + '_-.')\nst_valid_ints = st.integers(min_value=0, max_value=999_999)\nst_valid_keys = st.one_of(st_valid_ints, st_valid_names)\n\nvalid_arrays_fixed = npst.arrays(np.float32,\n                                 shape=(5, 5, 5),\n                                 fill=st.floats(min_value=-10,\n                                                max_value=10,\n                                                allow_nan=False,\n                                                allow_infinity=False,\n                                                width=32),\n                                 elements=st.floats(min_value=-10,\n                                                    max_value=10,\n                                                    allow_nan=False,\n                                                    allow_infinity=False,\n                                                    width=32))\n\n\nclass TestColumn1:\n\n    @given(key=st_valid_keys, subkey=st_valid_keys, val=valid_arrays_fixed)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_fixed_key_values_nested(self, key, subkey, val, fixed_shape_repo_co_float32_aset_nested):\n        global added_samples_subsamples\n\n        added_samples_subsamples[key].add(subkey)\n\n        co = fixed_shape_repo_co_float32_aset_nested\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'fixed_shape'\n        assert col.contains_subsamples is True\n        col[key] = {subkey: val}\n        out = col[key][subkey]\n\n        assert len(col) == len(added_samples_subsamples)\n        assert len(col[key]) == len(added_samples_subsamples[key])\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n\n\nvalid_shapes_var = npst.array_shapes(min_dims=3, max_dims=3, min_side=1, max_side=5)\nvalid_arrays_var_float32 = npst.arrays(np.float32,\n                                       shape=valid_shapes_var,\n                                       fill=st.floats(min_value=-10,\n                                                      max_value=10,\n                                                      allow_nan=False,\n                                                      allow_infinity=False,\n                                                      width=32),\n                                       elements=st.floats(min_value=-10,\n                                                          max_value=10,\n                                                          allow_nan=False,\n                                                          allow_infinity=False,\n                                                          width=32))\n\n\nclass TestColumn2:\n\n    @given(key=st_valid_keys, subkey=st_valid_keys, val=valid_arrays_var_float32)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_variable_shape_float32_nested(self, key, val, subkey, variable_shape_repo_co_float32_aset_nested):\n        global added_samples_subsamples\n\n        co = variable_shape_repo_co_float32_aset_nested\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'variable_shape'\n        assert col.contains_subsamples is True\n        col[key] = {subkey: val}\n        out = col[key][subkey]\n        added_samples_subsamples[key].add(subkey)\n\n        assert len(col) == len(added_samples_subsamples)\n        assert len(col[key]) == len(added_samples_subsamples[key])\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n\n\nvalid_arrays_var_uint8 = npst.arrays(np.uint8,\n                                     shape=valid_shapes_var,\n                                     elements=st.integers(min_value=0, max_value=255),\n                                     fill=st.integers(min_value=0, max_value=255))\n\n\nclass TestColumn3:\n\n    @given(key=st_valid_keys, subkey=st_valid_keys, val=valid_arrays_var_uint8)\n    @settings(max_examples=200, deadline=None)\n    def test_arrayset_variable_shape_uint8_nested(self, key, val, subkey, variable_shape_repo_co_uint8_aset_nested):\n        global added_samples_subsamples\n\n        co = variable_shape_repo_co_uint8_aset_nested\n        col = co.columns['writtenaset']\n        assert col.schema_type == 'variable_shape'\n        assert col.contains_subsamples is True\n        col[key] = {subkey: val}\n        out = col[key][subkey]\n        added_samples_subsamples[key].add(subkey)\n\n        assert len(col) == len(added_samples_subsamples)\n        assert len(col[key]) == len(added_samples_subsamples[key])\n        assert out.dtype == val.dtype\n        assert out.shape == val.shape\n        assert np.allclose(out, val)\n\n\nascii_characters = st.characters(min_codepoint=0, max_codepoint=500)\nascii_text_stratagy = st.text(alphabet=ascii_characters, min_size=0, max_size=500)\n\n\nclass TestStrColumn:\n\n    @given(key=st_valid_keys, subkey=st_valid_keys, val=ascii_text_stratagy)\n    @settings(max_examples=200, deadline=None)\n    def test_str_column_variable_shape_nested(self, key, subkey, val, variable_shape_repo_co_str_aset_nested):\n        global added_samples_subsamples\n\n        co = variable_shape_repo_co_str_aset_nested\n        col = co.columns['strcolumn']\n        assert col.schema_type == 'variable_shape'\n        assert col.contains_subsamples is True\n\n        col[key] = {subkey: val}\n        out = col[key][subkey]\n        added_samples_subsamples[key].add(subkey)\n\n        assert len(col) == len(added_samples_subsamples)\n        assert len(col[key]) == len(added_samples_subsamples[key])\n        assert out == val\n\n\nbytes_stratagy = st.binary(max_size=2000)\n\n\nclass TestBytesColumn:\n\n    @given(key=st_valid_keys, subkey=st_valid_keys, val=bytes_stratagy)\n    @settings(max_examples=200, deadline=None)\n    def test_bytes_column_variable_shape_nested(self, key, subkey, val, variable_shape_repo_co_bytes_aset_nested):\n        global added_samples_subsamples\n\n        co = variable_shape_repo_co_bytes_aset_nested\n        col = co.columns['bytescolumn']\n        assert col.schema_type == 'variable_shape'\n        assert col.contains_subsamples is True\n\n        col[key] = {subkey: val}\n        out = col[key][subkey]\n        added_samples_subsamples[key].add(subkey)\n\n        assert len(col) == len(added_samples_subsamples)\n        assert len(col[key]) == len(added_samples_subsamples[key])\n        assert out == val\n"
  },
  {
    "path": "tests/test_backend_hdf5_00_hdf5_01.py",
    "content": "import pytest\nimport numpy as np\n\n\n@pytest.fixture(params=['00', '01'])\ndef be_filehandle(request):\n    if request.param == '00':\n        from hangar.backends.hdf5_00 import HDF5_00_FileHandles\n        return HDF5_00_FileHandles\n    elif request.param == '01':\n        from hangar.backends.hdf5_01 import HDF5_01_FileHandles\n        return HDF5_01_FileHandles\n    else:\n        raise ValueError(f'request param \"{request.param}\" for backend code unknown.')\n\n\n@pytest.mark.parametrize('clib,clibCode',\n                         [('blosc:blosclz', 0), ('blosc:lz4', 1),\n                          ('blosc:lz4hc', 2), ('blosc:zlib', 4),\n                          ('blosc:zstd', 5)])\n@pytest.mark.parametrize('clevel', [1, 4, 8])\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(None, 0), ('byte', 1), ('bit', 2)])\n@pytest.mark.parametrize('beCode', ['00', '01'])\ndef test_blosc_filter_opts_result_in_correct_dataset_args(\n        be_filehandle, clib, clibCode, clevel, cshuffle, cshuffleCode, beCode):\n\n    out = be_filehandle._dataset_opts(complib=clib,\n                                      complevel=clevel,\n                                      shuffle=cshuffle)\n    expected = {\n        'compression': 32001,\n        'compression_opts': (0, 0, 0, 0, clevel, cshuffleCode, clibCode),\n        'shuffle': False}\n    assert out == expected\n\n\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(None, False), ('byte', True)])\ndef test_lzf_filter_opts_result_in_correct_dataset_args(be_filehandle, cshuffle, cshuffleCode):\n    out = be_filehandle._dataset_opts(complib='lzf',\n                                      complevel=None,\n                                      shuffle=cshuffle)\n    expected = {\n        'compression': 'lzf',\n        'compression_opts': None,\n        'shuffle': cshuffleCode}\n    assert out == expected\n\n\n@pytest.mark.parametrize('clevel', [1, 4, 8])\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(None, False), ('byte', True)])\ndef test_gzip_filter_opts_result_in_correct_dataset_args(be_filehandle, clevel, cshuffle, cshuffleCode):\n    out = be_filehandle._dataset_opts(complib='gzip',\n                                      complevel=clevel,\n                                      shuffle=cshuffle)\n    expected = {\n        'compression': 'gzip',\n        'compression_opts': clevel,\n        'shuffle': cshuffleCode}\n    assert out == expected\n\n\n# ------------------------- test actual compression ---------------------------\n\n\n@pytest.mark.parametrize('clib,clibCode',\n                         [('blosc:blosclz', 0), ('blosc:lz4', 1),\n                          ('blosc:lz4hc', 2), ('blosc:zlib', 4),\n                          ('blosc:zstd', 5)])\n@pytest.mark.parametrize('clevel', [1, 4, 8])\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(None, 0), ('byte', 1), ('bit', 2)])\n@pytest.mark.parametrize('be_code', ['00', '01'])\ndef test_arrayset_init_with_various_blosc_opts(repo, array5by7, clib, clibCode, clevel, cshuffle, cshuffleCode, be_code):\n\n    opts = {\n        'shuffle': cshuffle,\n        'complib': clib,\n        'complevel': clevel,\n    }\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column('aset', prototype=array5by7, backend=be_code, backend_options=opts)\n    assert aset.backend == be_code\n    with aset as a:\n        for i in range(10):\n            a[i] = array5by7 + i\n\n    wuid = aset._be_fs[be_code].w_uid\n    plist = aset._be_fs[be_code].wFp[wuid]['/0'].id.get_create_plist()\n    _, _, resopts, _ = plist.get_filter(0)\n    res_clevel, res_cshuffle, res_clib = resopts[4:7]\n    assert res_clevel == clevel\n    assert res_clib == clibCode\n    assert res_cshuffle == cshuffleCode\n    wco.commit('hi')\n    wco.close()\n\n\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(False, False), (True, True)])\n@pytest.mark.parametrize('be_code', ['00', '01'])\ndef test_arrayset_init_with_various_lzf_opts(repo, array5by7, cshuffle, cshuffleCode, be_code):\n\n    opts = {\n        'shuffle': cshuffle,\n        'complib': 'lzf',\n        'complevel': None,\n    }\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column('aset', prototype=array5by7, backend=be_code, backend_options=opts)\n    assert aset.backend == be_code\n    with aset as a:\n        for i in range(10):\n            a[i] = array5by7 + i\n\n    res_compression = aset._be_fs[be_code].wFp[aset._be_fs[be_code].w_uid]['/0'].compression\n    res_shuffle = aset._be_fs[be_code].wFp[aset._be_fs[be_code].w_uid]['/0'].shuffle\n    assert res_compression == 'lzf'\n    assert res_shuffle == cshuffleCode\n    wco.commit('hi')\n    wco.close()\n\n\n@pytest.mark.parametrize('clevel', [1, 4, 8])\n@pytest.mark.parametrize('cshuffle,cshuffleCode', [(False, False), (True, True)])\n@pytest.mark.parametrize('be_code', ['00', '01'])\ndef test_arrayset_init_with_various_gzip_opts(repo, array5by7, clevel, cshuffle, cshuffleCode, be_code):\n\n    opts = {\n        'shuffle': cshuffle,\n        'complib': 'gzip',\n        'complevel': clevel,\n    }\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=array5by7, backend=be_code, backend_options=opts)\n    assert aset.backend == be_code\n    with aset as a:\n        for i in range(10):\n            a[i] = array5by7 + i\n\n    res_compression = aset._be_fs[be_code].wFp[aset._be_fs[be_code].w_uid]['/0'].compression\n    res_compression_opts = aset._be_fs[be_code].wFp[aset._be_fs[be_code].w_uid]['/0'].compression_opts\n    res_shuffle = aset._be_fs[be_code].wFp[aset._be_fs[be_code].w_uid]['/0'].shuffle\n    assert res_compression == 'gzip'\n    assert res_shuffle == cshuffleCode\n    assert res_compression_opts == clevel\n    wco.commit('hi')\n    wco.close()\n\n\n@pytest.mark.parametrize('be_code', ['00', '01'])\ndef test_arrayset_overflows_collection_size_collection_count(be_code, repo, monkeypatch):\n    if be_code == '00':\n        from hangar.backends import hdf5_00\n        monkeypatch.setattr(hdf5_00, 'COLLECTION_COUNT', 5)\n        monkeypatch.setattr(hdf5_00, 'COLLECTION_SIZE', 10)\n    elif be_code == '01':\n        from hangar.backends import hdf5_01\n        monkeypatch.setattr(hdf5_01, 'COLLECTION_COUNT', 5)\n        monkeypatch.setattr(hdf5_01, 'COLLECTION_SIZE', 10)\n    else:\n        raise ValueError(f'be_code param \"{be_code}\" unknown.')\n\n    wco = repo.checkout(write=True)\n    proto = np.arange(50).astype(np.uint16)\n    aset = wco.add_ndarray_column('aset', prototype=proto, backend=be_code)\n    with aset as cm_aset:\n        for i in range(500):\n            proto[:] = i\n            cm_aset[i] = proto\n    assert aset._be_fs[be_code].hColsRemain == 4\n    assert aset._be_fs[be_code].hMaxSize == 10\n    wco.commit('hello')\n\n    with aset as cm_aset:\n        for i in range(500):\n            proto[:] = i\n            assert np.allclose(proto, cm_aset[i])\n    wco.close()\n\n    rco = repo.checkout()\n    naset = rco.columns['aset']\n    with naset as ncm_aset:\n        for i in range(500):\n            proto[:] = i\n            assert np.allclose(proto, ncm_aset[i])\n    rco.close()\n"
  },
  {
    "path": "tests/test_branching.py",
    "content": "import pytest\n\n\n@pytest.mark.parametrize('name', [\n    'dummy branch', 'origin/master', '\\nmaster', '\\\\master', 'master\\n'\n    'master\\r\\n', 'master ', 1412, 'foo !', 'foo@', 'foo#', 'foo$', '(foo)',\n    'VeryLongNameIsInvalidOver64CharactersNotAllowedVeryLongNameIsInva'])\ndef test_create_branch_fails_invalid_name(aset_samples_initialized_repo, name):\n    repo = aset_samples_initialized_repo\n    with pytest.raises(ValueError):\n        repo.create_branch(name)\n\n\ndef test_list_branches_only_reports_master_upon_initialization(repo):\n    branches = repo.list_branches()\n    assert branches == ['master']\n\n\ndef test_cannot_create_new_branch_from_initialized_repo_with_no_commits(repo):\n    with pytest.raises(RuntimeError):\n        repo.create_branch('testbranch')\n\n\ndef test_can_create_new_branch_from_repo_with_one_commit(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    expected_digest = co.commit('first')\n    co.close()\n\n    branchRes = repo.create_branch('testbranch')\n\n    assert branchRes.name == 'testbranch'\n    assert branchRes.digest == expected_digest\n\n\ndef test_cannot_duplicate_branch_name(aset_samples_initialized_repo):\n    aset_samples_initialized_repo.create_branch('testbranch')\n    with pytest.raises(ValueError):\n        aset_samples_initialized_repo.create_branch('testbranch')\n\n\ndef test_create_multiple_branches_different_name_same_commit(aset_samples_initialized_repo):\n    b1 = aset_samples_initialized_repo.create_branch('testbranch1')\n    b2 = aset_samples_initialized_repo.create_branch('testbranch2')\n    b3 = aset_samples_initialized_repo.create_branch('testbranch3')\n\n    assert b1.digest == b2.digest\n    assert b2.digest == b3.digest\n    assert b3.digest == b1.digest\n    assert aset_samples_initialized_repo.list_branches() == ['master', 'testbranch1', 'testbranch2', 'testbranch3']\n\n\ndef test_create_branch_by_specifying_base_commit(repo):\n\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first commit')\n    first_digest = co.commit_hash\n    co['test_meta']['foo'] = 'bar'\n    second_digest = co.commit('second')\n    co['test_meta']['hello'] = 'world'\n    third_digest = co.commit('third')\n    co['test_meta']['zen'] = 'python'\n    fourth_digest = co.commit('fourth')\n    co.close()\n\n    assert repo.list_branches() == ['master']\n\n    secBranch = repo.create_branch('dev-second', base_commit=second_digest)\n    assert secBranch.name == 'dev-second'\n    assert secBranch.digest == second_digest\n\n    co = repo.checkout(branch='dev-second')\n    assert len(co['test_meta']) == 1\n    assert co['test_meta']['foo'] == 'bar'\n    co.close()\n\n\ndef test_remove_branch_works_when_commits_align(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n    repo.create_branch('testdelete')\n\n    assert repo.list_branches() == ['master', 'testdelete']\n\n    removedBranch = repo.remove_branch('testdelete')\n    assert removedBranch.name == 'testdelete'\n    assert removedBranch.digest == masterHEAD\n    assert repo.list_branches() == ['master']\n\n\ndef test_delete_branch_raises_runtime_error_when_history_not_merged(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    repo.create_branch('testdelete')\n    co = repo.checkout(write=True, branch='testdelete')\n    co['test_meta']['hello'] = 'world'\n    thirdDigest = co.commit('third')\n    co.close()\n\n    # checkout master so staging area is not on branch\n    co = repo.checkout(write=True, branch='master')\n    co.close()\n\n    assert repo.list_branches() == ['master', 'testdelete']\n    with pytest.raises(RuntimeError):\n        repo.remove_branch('testdelete')\n\n\ndef test_delete_branch_completes_when_history_not_merged_but_force_option_set(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    repo.create_branch('testdelete')\n    co = repo.checkout(write=True, branch='testdelete')\n    co['test_meta']['hello'] = 'world'\n    thirdDigest = co.commit('third')\n    co.close()\n\n    # checkout master so staging area is not on branch\n    co = repo.checkout(write=True, branch='master')\n    co.close()\n    assert repo.list_branches() == ['master', 'testdelete']\n\n    removedBranch = repo.remove_branch('testdelete', force_delete=True)\n    assert removedBranch.name == 'testdelete'\n    assert removedBranch.digest == thirdDigest\n    assert repo.list_branches() == ['master']\n\n\ndef test_delete_branch_raises_value_error_if_invalid_branch_name(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    repo.create_branch('testdelete')\n    co = repo.checkout(write=True, branch='testdelete')\n    co['test_meta']['hello'] = 'world'\n    thirdDigest = co.commit('third')\n    co.close()\n\n    assert repo.list_branches() == ['master', 'testdelete']\n    with pytest.raises(ValueError):\n        repo.remove_branch('doesnotexist')\n    with pytest.raises(ValueError):\n        repo.remove_branch('origin/master')\n\n\ndef test_delete_branch_raises_permission_error_if_writer_lock_held(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    repo.create_branch('testdelete')\n    co = repo.checkout(write=True, branch='testdelete')\n    co['test_meta']['hello'] = 'world'\n    thirdDigest = co.commit('third')\n    co.close()\n\n    # checkout master so staging area is not on branch\n    co = repo.checkout(write=True, branch='master')\n    with pytest.raises(PermissionError):\n        repo.remove_branch('testdelete')\n    assert repo.list_branches() == ['master', 'testdelete']\n    co.close()\n\n\ndef test_delete_branch_raises_permission_error_if_branch_requested_is_staging_head(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co.commit('first')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    repo.create_branch('testdelete')\n    co = repo.checkout(write=True, branch='testdelete')\n    co['test_meta']['hello'] = 'world'\n    thirdDigest = co.commit('third')\n    co.close()\n\n    with pytest.raises(PermissionError):\n        repo.remove_branch('testdelete')\n    assert repo.list_branches() == ['master', 'testdelete']\n\n\ndef test_delete_branch_raises_permission_error_if_only_one_branch_left(repo):\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co['test_meta']['foo'] = 'bar'\n    masterHEAD = co.commit('second')\n    co.close()\n\n    assert repo.list_branches() == ['master']\n    with pytest.raises(PermissionError):\n        repo.remove_branch('master')\n    assert repo.list_branches() == ['master']\n"
  },
  {
    "path": "tests/test_checkout.py",
    "content": "import atexit\nimport numpy as np\nimport pytest\nfrom conftest import fixed_shape_backend_params\n\n\nclass TestCheckout(object):\n\n    def test_write_checkout_specifying_commit_not_allowed_if_commit_exists(self, aset_samples_initialized_repo):\n        cmt_digest = aset_samples_initialized_repo.log(return_contents=True)['head']\n        with pytest.raises(ValueError):\n            aset_samples_initialized_repo.checkout(write=True, commit=cmt_digest)\n\n    def test_write_checkout_specifying_commit_not_allowed_if_commit_does_not_exists(self, aset_samples_initialized_repo):\n        cmt_digest = 'notrealcommit'\n        with pytest.raises(ValueError):\n            aset_samples_initialized_repo.checkout(write=True, commit=cmt_digest)\n\n    def test_two_write_checkouts(self, repo):\n        w1_checkout = repo.checkout(write=True)\n        with pytest.raises(PermissionError):\n            repo.checkout(write=True)\n        w1_checkout.close()\n\n    def test_two_read_checkouts(self, repo, array5by7):\n        w_checkout = repo.checkout(write=True)\n        arrayset_name = 'aset'\n        r_ds = w_checkout.add_ndarray_column(name=arrayset_name, prototype=array5by7)\n        r_ds['1'] = array5by7\n        w_checkout.commit('init')\n        r1_checkout = repo.checkout()\n        r2_checkout = repo.checkout()\n        assert np.allclose(r1_checkout.columns['aset']['1'], array5by7)\n        assert np.allclose(\n            r1_checkout.columns['aset']['1'], r2_checkout.columns['aset']['1'])\n        r1_checkout.close()\n        r2_checkout.close()\n        w_checkout.close()\n\n    def test_write_with_read_checkout(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout()\n        with pytest.raises(AttributeError):\n            co.add_ndarray_column(name='aset', shape=(5, 7), dtype=np.float64)\n        with pytest.raises(AttributeError):\n            co.add_str_column('test_meta')\n        co.close()\n\n    def test_writer_aset_obj_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        asets = co.columns\n        aset = co.columns['writtenaset']\n        co.close()\n\n        with pytest.raises(PermissionError):\n            asets.iswriteable\n        with pytest.raises(PermissionError):\n            shouldFail = asets['writtenaset']\n        with pytest.raises(PermissionError):\n            aset.iswriteable\n\n    def test_writer_aset_obj_arrayset_iter_values_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        oldObjs = []\n        for oldObj in co.columns.values():\n            oldObjs.append(oldObj)\n        co.close()\n\n        for oldObj in oldObjs:\n            with pytest.raises(PermissionError):\n                oldObj.column\n\n    def test_writer_aset_obj_arrayset_iter_items_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        oldObjs = {}\n        for oldName, oldObj in co.columns.items():\n            oldObjs[oldName] = oldObj\n        co.close()\n\n        for name, obj in oldObjs.items():\n            assert isinstance(name, str)\n            with pytest.raises(PermissionError):\n                obj.column\n\n    def test_writer_aset_obj_not_accessible_after_commit_and_close(self, aset_samples_initialized_repo, array5by7):\n        repo = aset_samples_initialized_repo\n        co = repo.checkout(write=True)\n        asets = co.columns\n        aset = co.columns['writtenaset']\n        aset['1'] = array5by7\n        co.commit('hey there')\n        co.close()\n\n        with pytest.raises(PermissionError):\n            asets.iswriteable\n        with pytest.raises(PermissionError):\n            shouldFail = asets['writtenaset']\n        with pytest.raises(PermissionError):\n            aset.iswriteable\n        with pytest.raises(PermissionError):\n            shouldFail = aset['1']\n\n    def test_reader_aset_obj_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=False)\n        asets = co.columns\n        aset = co.columns['writtenaset']\n        co.close()\n\n        with pytest.raises(PermissionError):\n            asets.iswriteable\n        with pytest.raises(PermissionError):\n            shouldFail = asets['writtenaset']\n        with pytest.raises(PermissionError):\n            aset.iswriteable\n\n    def test_reader_aset_obj_column_iter_values_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=False)\n        oldObjs = []\n        for oldObj in co.columns.values():\n            oldObjs.append(oldObj)\n        co.close()\n\n        for oldObj in oldObjs:\n            with pytest.raises(PermissionError):\n                oldObj.column\n\n    def test_reader_aset_obj_arrayset_iter_items_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=False)\n        oldObjs = {}\n        for oldName, oldObj in co.columns.items():\n            oldObjs[oldName] = oldObj\n        co.close()\n\n        for name, obj in oldObjs.items():\n            assert isinstance(name, str)\n            with pytest.raises(PermissionError):\n                obj.column\n\n    def test_reader_arrayset_context_manager_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=False)\n        aset = co.columns['writtenaset']\n        klist = []\n        with aset as ds:\n            for k in ds.keys():\n                klist.append(k)\n                a = ds\n        co.close()\n\n        with pytest.raises(PermissionError):\n            a.column\n        with pytest.raises(PermissionError):\n            ds.column\n        with pytest.raises(PermissionError):\n            aset[klist[0]]\n\n    def test_writer_arrayset_context_manager_not_accessible_after_close(self, two_commit_filled_samples_repo):\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with aset as ds:\n            # for k in ds.keys():\n            #     klist.append(k)\n            a = ds\n            a['1232'] = np.random.randn(5, 7).astype(np.float32)\n        co.close()\n\n        with pytest.raises(PermissionError):\n            a.column\n        with pytest.raises(PermissionError):\n            ds.column\n        with pytest.raises(PermissionError):\n            aset['1232']\n\n    def test_close_read_does_not_invalidate_write_checkout(self, aset_samples_initialized_repo, array5by7):\n        repo = aset_samples_initialized_repo\n        r_co = repo.checkout(write=False)\n        w_co = repo.checkout(write=True)\n        r_co.close()\n\n        with pytest.raises(PermissionError):\n            shouldFail = r_co.columns\n\n        aset = w_co.columns['writtenaset']\n        aset['1'] = array5by7\n        assert np.allclose(w_co.columns['writtenaset']['1'], array5by7)\n        w_co.commit('hello commit')\n        w_co.close()\n\n        with pytest.raises(PermissionError):\n            aset.column\n\n    def test_close_write_does_not_invalidate_read_checkout(self, aset_samples_initialized_repo, array5by7):\n        repo = aset_samples_initialized_repo\n        r_co = repo.checkout(write=False)\n        w_co = repo.checkout(write=True)\n\n        aset = w_co.columns['writtenaset']\n        aset['1'] = array5by7\n        assert np.allclose(w_co.columns['writtenaset']['1'], array5by7)\n        w_co.commit('hello commit')\n        w_co.close()\n\n        assert 'writtenaset' in r_co.columns\n        with pytest.raises(PermissionError):\n            aset.column\n        r_co.close()\n        with pytest.raises(PermissionError):\n            r_co.columns\n\n    def test_operate_on_arrayset_after_closing_old_checkout(self, repo, array5by7):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=array5by7)\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout(write=True)\n        with pytest.raises(PermissionError):\n            aset['1'] = array5by7\n            co.commit('this is a commit message')\n        co.close()\n        with pytest.raises(PermissionError):\n            aset['1']\n\n    def test_operate_on_closed_checkout(self, repo, array5by7):\n        co = repo.checkout(write=True)\n        co.add_ndarray_column('aset', prototype=array5by7)\n        co.commit('this is a commit message')\n        co.close()\n        with pytest.raises(PermissionError):\n            co.columns['aset']['1'] = array5by7\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_operate_on_arrayset_samples_after_commiting_but_not_closing_checkout(self, aset_backend, repo, array5by7):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=array5by7, backend=aset_backend)\n        aset['1'] = array5by7\n        co.commit('hi')\n\n        aset['2'] = array5by7  # this raises Exception since the reference to aset i gon\n        co.commit('hello 2')\n        assert np.allclose(aset['2'], array5by7)\n        co.close()\n\n        with pytest.raises(PermissionError):\n            aset.name\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    def test_operate_on_arraysets_after_commiting_but_not_closing_checkout(self, aset1_backend, aset2_backend, repo, array5by7):\n        co = repo.checkout(write=True)\n        asets = co.columns\n        aset = co.add_ndarray_column('aset', prototype=array5by7, backend=aset1_backend)\n        aset['1'] = array5by7\n        co.commit('hi')\n\n        aset2 = co.add_ndarray_column('arange', prototype=np.arange(50), backend=aset2_backend)\n        aset2['0'] = np.arange(50)\n        co.commit('hello 2')\n        assert np.allclose(aset2['0'], np.arange(50))\n        co.close()\n\n        with pytest.raises(PermissionError):\n            co.columns\n        with pytest.raises(PermissionError):\n            asets.iswriteable\n        with pytest.raises(PermissionError):\n            aset2.name\n\n    def test_with_wrong_argument_value(self, repo):\n        # It is intuitive to a user to pass branchname as positional\n        # argument but hangar expect permission as first argument\n        with pytest.raises(ValueError):\n            repo.checkout('branchname')\n        with pytest.raises(ValueError):\n            repo.checkout(write='True')\n        with pytest.raises(ValueError):\n            repo.checkout(branch=True)\n        co = repo.checkout(True)  # This should not raise any excpetion\n        # unregister close operation as conftest will close env before this is called.\n        atexit.unregister(co.close)\n\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    def test_reset_staging_area_no_changes_made_does_not_work(self, aset1_backend, aset2_backend, repo, array5by7):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=array5by7, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('arange', prototype=np.arange(50), backend=aset2_backend)\n        aset['1'] = array5by7\n        aset2['0'] = np.arange(50)\n        co.commit('hi')\n\n        # verifications before reset\n        assert np.allclose(aset2['0'], np.arange(50))\n        assert len(co.columns) == 2\n        assert co.columns['arange'].iswriteable\n\n        with pytest.raises(RuntimeError, match='No changes made'):\n            co.reset_staging_area()\n\n        # verifications after reset\n        assert np.allclose(aset2['0'], np.arange(50))\n        assert len(co.columns) == 2\n        assert co.columns['arange'].iswriteable\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    def test_reset_staging_area_clears_arraysets(self, aset1_backend, aset2_backend, repo, array5by7):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=array5by7, backend=aset1_backend)\n        aset['1'] = array5by7\n        co.commit('hi')\n\n        aset2 = co.add_ndarray_column('arange', prototype=np.arange(50), backend=aset2_backend)\n        aset2['0'] = np.arange(50)\n        # verifications before reset\n        assert np.allclose(aset2['0'], np.arange(50))\n        assert len(co.columns) == 2\n        assert co.columns['arange'].iswriteable\n\n        co.reset_staging_area()\n        # behavior expected after reset\n        assert len(co.columns) == 1\n        with pytest.raises(PermissionError):\n            aset2['0']\n        with pytest.raises(KeyError):\n            co.columns['arange']\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_dunder_contains_method(self, repo_20_filled_samples, write):\n        co = repo_20_filled_samples.checkout(write=write)\n        assert 'writtenaset' in co\n        assert 'second_aset' in co\n        assert 'doesnotexist' not in co\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_dunder_len_method(self, repo_20_filled_samples, write):\n        co = repo_20_filled_samples.checkout(write=write)\n        assert len(co) == 2\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_dunder_iter_method(self, repo_20_filled_samples, write):\n        from typing import Iterable\n        co = repo_20_filled_samples.checkout(write=write)\n        it = iter(co)\n        assert isinstance(it, Iterable)\n        icount = 0\n        for k in it:\n            assert k in ['writtenaset', 'second_aset']\n            icount += 1\n        assert icount == 2\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_keys_method(self, repo_20_filled_samples, write):\n        co = repo_20_filled_samples.checkout(write=write)\n        keys = list(co.keys())\n        assert len(keys) == 2\n        for k in ['writtenaset', 'second_aset']:\n            assert k in keys\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_values_method(self, repo_20_filled_samples, write):\n        from hangar.columns.layout_nested import NestedSampleWriter, NestedSampleReader\n        from hangar.columns.layout_flat import FlatSampleWriter, FlatSampleReader\n        possible_classes = (\n            NestedSampleWriter, NestedSampleReader, FlatSampleReader, FlatSampleWriter)\n\n        co = repo_20_filled_samples.checkout(write=write)\n        icount = 0\n        for col in co.values():\n            assert isinstance(col, possible_classes)\n            icount += 1\n        assert icount == 2\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_items_method(self, repo_20_filled_samples, write):\n        from hangar.columns.layout_nested import NestedSampleWriter, NestedSampleReader\n        from hangar.columns.layout_flat import FlatSampleWriter, FlatSampleReader\n        possible_classes = (\n            NestedSampleWriter, NestedSampleReader, FlatSampleReader, FlatSampleWriter)\n\n        co = repo_20_filled_samples.checkout(write=write)\n        icount = 0\n        for k, col in co.items():\n            assert k in ['writtenaset', 'second_aset']\n            assert isinstance(col, possible_classes)\n            icount += 1\n        assert icount == 2\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    def test_checkout_log_method(self, repo_20_filled_samples, write):\n        repo_log = repo_20_filled_samples.log(return_contents=True)\n        co = repo_20_filled_samples.checkout(write=write)\n        co_log = co.log(return_contents=True)\n        co.close()\n        assert repo_log == co_log\n\n\nclass TestBranchingMergingInCheckout(object):\n\n    def test_merge(self, aset_samples_initialized_repo, array5by7):\n        branch = aset_samples_initialized_repo.create_branch('testbranch')\n        assert isinstance(branch.name, str)\n        assert isinstance(branch.digest, str)\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch.name)\n        assert co._branch_name == branch.name\n        co.add_str_column('test_meta')\n        co.columns['writtenaset']['1'] = array5by7\n        co['test_meta'].update({'a': 'b'})\n        co.commit('this is a commit message')\n        co.close()\n        aset_samples_initialized_repo.merge('test merge', 'master', branch.name)\n        co = aset_samples_initialized_repo.checkout()\n        assert (co.columns['writtenaset']['1'] == array5by7).all()\n        assert co['test_meta'].get('a') == 'b'\n        co.close()\n\n    def test_merge_without_closing_previous_checkout(self, aset_samples_initialized_repo, array5by7):\n        branch = aset_samples_initialized_repo.create_branch('testbranch')\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch.name)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        with pytest.raises(PermissionError):\n            aset_samples_initialized_repo.merge('test merge', 'master', branch.name)\n        # unregister close operation as conftest will close env before this is called.\n        atexit.unregister(co.close)\n\n    def test_merge_multiple_checkouts_same_aset(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.add_str_column('test_meta')\n        co.commit('test meta commit')\n        co.close()\n        branch1 = aset_samples_initialized_repo.create_branch('testbranch1')\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        co.columns['writtenaset']['1'] = array5by7\n        co['test_meta'].update({'a1': 'b1'})\n        co.commit('this is a commit message')\n        co.close()\n\n        branch2 = aset_samples_initialized_repo.create_branch('testbranch2')\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n        co.columns['writtenaset']['2'] = array5by7\n        co['test_meta'].update({'a2': 'b2'})\n        co.commit('this is a commit message')\n        co.close()\n\n        aset_samples_initialized_repo.merge('test merge 1', 'master', branch1.name)\n        aset_samples_initialized_repo.merge('test merge 2', 'master', branch2.name)\n\n        co = aset_samples_initialized_repo.checkout(branch='master')\n        assert len(co.columns) == 2\n        assert len(co.columns['writtenaset']) == 2\n        assert list(co['test_meta'].keys()) == ['a1', 'a2']\n        co.close()\n\n    def test_merge_multiple_checkouts_multiple_aset(self, aset_samples_initialized_repo, array5by7):\n        branch1 = aset_samples_initialized_repo.create_branch('testbranch1')\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n\n        branch2 = aset_samples_initialized_repo.create_branch('testbranch2')\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n        second_aset = co.add_ndarray_column(name='second_aset', prototype=array5by7)\n        second_aset['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n\n        aset_samples_initialized_repo.merge('test merge 1', 'master', branch1.name)\n        aset_samples_initialized_repo.merge('test merge 2', 'master', branch2.name)\n\n        co = aset_samples_initialized_repo.checkout(branch='master')\n        assert len(co.columns) == 2\n        assert len(co.columns['writtenaset']) == 1\n        assert len(co.columns['second_aset']) == 1\n        co.close()\n\n    def test_merge_diverged_conflict(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.add_str_column('test_meta')\n        co.commit('test meta commit')\n        co.close()\n        branch1 = aset_samples_initialized_repo.create_branch('testbranch1')\n        branch2 = aset_samples_initialized_repo.create_branch('testbranch2')\n\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        co.columns['writtenaset']['1'] = array5by7\n        co['test_meta'].update({'a': 'b'})\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n        newarray = np.zeros_like(array5by7)\n        co.columns['writtenaset']['1'] = newarray\n        co['test_meta'].update({'a': 'c'})\n        co.commit('this is a commit message')\n        co.close()\n\n        aset_samples_initialized_repo.merge('commit message', 'master', branch1.name)\n\n        with pytest.raises(ValueError):\n            aset_samples_initialized_repo.merge('commit message', 'master', branch2.name)\n\n    def test_new_branch_from_where(self, aset_samples_initialized_repo, array5by7):\n        branch1 = aset_samples_initialized_repo.create_branch('testbranch1')\n        branch2 = aset_samples_initialized_repo.create_branch('testbranch2')\n        co1 = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        h1 = aset_samples_initialized_repo.log(branch=co1.branch_name, return_contents=True)\n        co1.close()\n\n        co2 = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n        co2.add_ndarray_column('aset2', prototype=array5by7)\n        co2.columns['aset2']['2'] = array5by7\n        co2.commit('this is a merge message')\n        co2.close()\n        h2 = aset_samples_initialized_repo.log(branch=branch2.name, return_contents=True)\n\n        branch3 = aset_samples_initialized_repo.create_branch('testbranch3')\n        co3 = aset_samples_initialized_repo.checkout(write=True, branch=branch3.name)\n        h3 = aset_samples_initialized_repo.log(branch=co3.branch_name, return_contents=True)\n        co3.close()\n\n        assert h2['head'] == h3['head']\n        assert h2['ancestors'][h2['head']] == h3['ancestors'][h3['head']]\n        assert h1['head'] in h2['ancestors'][h2['head']]\n\n    def test_cannot_checkout_branch_with_staged_changes(self, aset_samples_initialized_repo, array5by7):\n        branch1 = aset_samples_initialized_repo.create_branch('testbranch1')\n        branch2 = aset_samples_initialized_repo.create_branch('testbranch2')\n        co1 = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        initial_cmt = co1.commit_hash\n        co1.add_ndarray_column('aset2', prototype=array5by7)\n        co1.columns['aset2']['2'] = array5by7\n        co1.close()\n\n        with pytest.raises(ValueError):\n            con = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n\n        co1 = aset_samples_initialized_repo.checkout(write=True, branch=branch1.name)\n        co1.commit('hi')\n        assert co1.commit_hash != initial_cmt\n        assert co1.branch_name == branch1.name\n        co1.close()\n\n        co2 = aset_samples_initialized_repo.checkout(write=True, branch=branch2.name)\n        assert co2.branch_name == branch2.name\n        assert co2.commit_hash == initial_cmt\n        co2.close()\n\n\ndef test_full_from_short_commit_digest(two_commit_filled_samples_repo):\n    from hangar.records.commiting import expand_short_commit_digest\n\n    repo = two_commit_filled_samples_repo\n    history = repo.log(branch='master', return_contents=True)\n    commits = history['order']\n    for full_cmt in commits:\n        short_cmt = full_cmt[:18]\n        found_cmt = expand_short_commit_digest(repo._env.refenv, short_cmt)\n        assert found_cmt == full_cmt\n\n    with pytest.raises(KeyError, match='No matching commit hash found starting with'):\n        expand_short_commit_digest(repo._env.refenv, 'zzzzzzzzzzzzzzzzzzzzzzzzzzzz')\n\n\ndef test_writer_context_manager_objects_are_gc_removed_after_co_close(two_commit_filled_samples_repo):\n\n    repo = two_commit_filled_samples_repo\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    with co['test_meta'] as m:\n        m['aa'] = 'bb'\n        cmt1 = co.commit('here is the first commit')\n        with co.columns['writtenaset'] as d:\n            d['2422'] = d['0'] + 213\n            cmt2 = co.commit('here is the second commit')\n\n    assert co.close() is None\n    with pytest.raises(PermissionError):\n        _ = m.__dict__\n    with pytest.raises(PermissionError):\n        _ = d.column\n    with pytest.raises(PermissionError):\n        _ = co.columns\n    assert co.__dict__ == {}\n\n    co = repo.checkout(commit=cmt1)\n    assert 'aa' in co['test_meta']\n    assert co['test_meta']['aa'] == 'bb'\n    co.close()\n\n    co = repo.checkout(commit=cmt2)\n    assert 'aa' in co['test_meta']\n    assert co['test_meta']['aa'] == 'bb'\n    assert '2422' in co.columns['writtenaset']\n    assert np.allclose(co.columns['writtenaset']['2422'],\n                       co.columns['writtenaset']['0'] + 213)\n    co.close()\n\n\ndef test_reader_context_manager_objects_are_gc_removed_after_co_close(two_commit_filled_samples_repo):\n\n    repo = two_commit_filled_samples_repo\n    co = repo.checkout(write=False)\n    with co.columns['writtenaset'] as d:\n        ds = d['2']\n\n    assert d.iswriteable is False\n    assert np.allclose(ds, d.get('2'))\n    assert np.allclose(ds, co.columns['writtenaset'].get('2'))\n\n    assert co.close() is None\n\n    with pytest.raises(PermissionError):\n        d.column\n    with pytest.raises(AttributeError):\n        co._columns\n    with pytest.raises(PermissionError):\n        str(co.columns.get('writtenaset'))\n    with pytest.raises(PermissionError):\n        co.columns\n    with pytest.raises(PermissionError):\n        repr(co)\n    assert co.__dict__ == {}\n\n\ndef test_checkout_branch_not_existing_does_not_hold_writer_lock(two_commit_filled_samples_repo):\n    repo = two_commit_filled_samples_repo\n    assert 'doesnotexist' not in repo.list_branches()\n    assert repo.writer_lock_held is False\n    with pytest.raises(ValueError):\n        co = repo.checkout(write=True, branch='doesnotexist')\n    assert repo.writer_lock_held is False\n    with pytest.raises(NameError):\n        co.branch_name  # should not even exist\n"
  },
  {
    "path": "tests/test_checkout_arrayset_access.py",
    "content": "import pytest\nimport numpy as np\n\n\n# -------------------------- Reader Checkout ----------------------------------\n\n\n@pytest.mark.parametrize('write', [True, False])\ndef test_arrayset_getattr_does_not_raise_permission_error_if_alive(write, aset_samples_initialized_repo):\n    co = aset_samples_initialized_repo.checkout(write=write)\n    asets = co.columns\n\n    assert hasattr(asets, 'doesnotexist') is False  # does not raise error\n    assert hasattr(asets, '_mode') is True\n    with pytest.raises(AttributeError):\n        assert getattr(asets, 'doesnotexist')\n    assert getattr(asets, '_mode') == 'a' if write else 'r'\n\n    co.close()\n    with pytest.raises(PermissionError):\n        hasattr(asets, 'doesnotexist')\n    with pytest.raises(PermissionError):\n        hasattr(asets, '_mode')\n\n\ndef test_write_in_context_manager_no_loop(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    with wco:\n        assert wco._is_conman is True\n        wco['writtenaset']['0'] = array5by7\n        wco['newaset']['0'] = array10\n    assert wco._is_conman is False\n\n    assert np.allclose(array5by7, wco.columns['writtenaset']['0'])\n    assert np.allclose(array10, wco.columns['newaset']['0'])\n    wco.commit('init')\n    assert np.allclose(array5by7, wco.columns['writtenaset']['0'])\n    assert np.allclose(array10, wco.columns['newaset']['0'])\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    assert np.allclose(array5by7, rco.columns['writtenaset']['0'])\n    assert np.allclose(array10, rco.columns['newaset']['0'])\n    rco.close()\n\n\ndef test_write_in_context_manager_many_samples_looping(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    with wco:\n        assert wco._is_conman is True\n        for idx in range(100):\n            array10[:] = idx\n            array5by7[:] = idx\n            wco['writtenaset'][idx] = array5by7\n            wco['newaset'][idx] = array10\n    assert wco._is_conman is False\n\n    for idx in range(100):\n        array10[:] = idx\n        array5by7[:] = idx\n        assert np.allclose(array5by7, wco.columns['writtenaset'][idx])\n        assert np.allclose(array10, wco.columns['newaset'][idx])\n    wco.commit('init')\n    for idx in range(100):\n        array10[:] = idx\n        array5by7[:] = idx\n        assert np.allclose(array5by7, wco.columns['writtenaset'][idx])\n        assert np.allclose(array10, wco.columns['newaset'][idx])\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    for idx in range(100):\n        array10[:] = idx\n        array5by7[:] = idx\n        assert np.allclose(array5by7, rco.columns['writtenaset'][idx])\n        assert np.allclose(array10, rco.columns['newaset'][idx])\n    rco.close()\n\n\ndef test_write_fails_if_checkout_closed(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    wco['writtenaset'][0] = array5by7\n    wco['newaset'][0] = array10\n    wco.close()\n    with pytest.raises((PermissionError, UnboundLocalError)):\n        wco['writtenaset'][1] = array5by7\n        wco['newaset'][1] = array10\n\n    wco2 = aset_samples_initialized_repo.checkout(write=True)\n    assert 0 in wco2.columns['writtenaset']\n    assert 0 in wco2.columns['newaset']\n    assert 1 not in wco2.columns['writtenaset']\n    assert 1 not in wco2.columns['newaset']\n    wco2.close()\n\n\ndef test_write_context_manager_fails_if_checkout_closed(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    wco['writtenaset'][0] = array5by7\n    wco['newaset'][0] = array10\n    wco.close()\n    with pytest.raises(PermissionError):\n        with wco:\n            wco['writtenaset'][1] = array5by7\n    with pytest.raises(PermissionError):\n        with wco:\n            wco['newaset'][1] = array10\n\n    wco2 = aset_samples_initialized_repo.checkout(write=True)\n    assert 0 in wco2.columns['writtenaset']\n    assert 0 in wco2.columns['newaset']\n    assert 1 not in wco2.columns['writtenaset']\n    assert 1 not in wco2.columns['newaset']\n    wco2.close()\n\n\ndef test_writer_co_read_single_aset_single_sample(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    assert np.allclose(wco['writtenaset', 0], array5by7)\n    assert np.allclose(wco['writtenaset', 1], array5by7 + 1)\n    assert np.allclose(wco['writtenaset', 2], array5by7 + 2)\n    wco.close()\n\n\ndef test_writer_co_read_single_aset_multiple_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    res = wco[('writtenaset', 0), ('writtenaset', 1), ('writtenaset', 2)]\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array5by7 + 1)\n    assert np.allclose(res[2], array5by7 + 2)\n    wco.close()\n\n\ndef test_writer_co_read_multiple_aset_single_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    array10[:] = 0\n    wco.columns['newaset'][0] = array10\n    wco.columns['newaset'][1] = array10 + 1\n    wco.columns['newaset'][2] = array10 + 2\n\n    res = wco[('writtenaset', 0), ('newaset', 0)]\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array10)\n    res = wco[('writtenaset', 1), ('newaset', 1)]\n    assert np.allclose(res[0], array5by7 + 1)\n    assert np.allclose(res[1], array10 + 1)\n    wco.close()\n\n\ndef test_writer_co_read_multtiple_aset_multiple_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    array10[:] = 0\n    wco.columns['newaset'][0] = array10\n    wco.columns['newaset'][1] = array10 + 1\n    wco.columns['newaset'][2] = array10 + 2\n\n    res = wco[('writtenaset', 0), ('newaset', 0), ('writtenaset', 1), ('newaset', 1)]\n    assert isinstance(res, list)\n    assert len(res) == 4\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array10)\n    assert np.allclose(res[2], array5by7 + 1)\n    assert np.allclose(res[3], array10 + 1)\n    wco.close()\n\n\ndef test_writer_co_read_fails_nonexistant_aset_name(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    with pytest.raises(KeyError):\n        _ = wco['doesnotexist', 0]\n    wco.close()\n\n\ndef test_writer_co_read_fails_nonexistant_sample_name(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    with pytest.raises(KeyError):\n        _ = wco['doesnotexist', 124]\n    wco.close()\n\n\ndef test_writer_co_get_returns_none_on_nonexistant_sample_name(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    out = wco.get(('writtenaset', 124))\n    assert out is None\n    wco.close()\n\n\ndef test_writer_co_read_in_context_manager_no_loop(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    wco['writtenaset']['0'] = array5by7\n    wco['newaset']['0'] = array10\n    with wco:\n        assert wco._is_conman is True\n        assert np.allclose(wco['writtenaset', '0'], array5by7)\n    wco.close()\n\n\ndef test_writer_co_read_in_context_manager_many_samples_looping(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    with wco:\n        for idx in range(100):\n            array10[:] = idx\n            array5by7[:] = idx\n            wco['writtenaset'][idx] = array5by7\n            wco['newaset'][idx] = array10\n\n    with wco:\n        waset_keys = [('writtenaset', i) for i in range(100)]\n        naset_keys = [('newaset', i) for i in range(100)]\n        writtenasetOut = wco[waset_keys]\n        newasetOut = wco[naset_keys]\n        for idx in range(100):\n            array10[:] = idx\n            array5by7[:] = idx\n            assert np.allclose(array5by7, wco['writtenaset', idx])\n            assert np.allclose(array10, wco['newaset', idx])\n\n            o = wco[('writtenaset', idx), ('newaset', idx)]\n            assert np.allclose(o[0], array5by7)\n            assert np.allclose(o[1], array10)\n\n            assert np.allclose(writtenasetOut[idx], array5by7)\n            assert np.allclose(newasetOut[idx], array10)\n    wco.close()\n\n\n@pytest.mark.parametrize('write', [True, False])\ndef test_co_read_dunder_getitem_excepts_missing_sample(aset_samples_initialized_repo, write):\n    co = aset_samples_initialized_repo.checkout(write=write)\n    with pytest.raises(KeyError):\n        res = co['writtenaset', 0]\n    co.close()\n\n\n@pytest.mark.parametrize('write', [True, False])\ndef test_co_read_get_except_missing_true_excepts_missing_sample(aset_samples_initialized_repo, write):\n    co = aset_samples_initialized_repo.checkout(write=write)\n    with pytest.raises(KeyError):\n        res = co.get(('writtenaset', 0), except_missing=True)\n    co.close()\n\n\n@pytest.mark.parametrize('write', [True, False])\ndef test_co_read_get_except_missing_false_returns_none_on_missing_sample(aset_samples_initialized_repo, write):\n    co = aset_samples_initialized_repo.checkout(write=write)\n    res_1 = co.get(('writtenaset', 0))\n    assert res_1 is None\n    res_2 = co.get(('writtenaset', 0), except_missing=False)\n    assert res_2 is None\n    co.close()\n\n\ndef test_writer_co_aset_finds_connection_manager_of_any_aset_in_cm(aset_samples_initialized_repo):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    wco.add_ndarray_column('second', shape=(20,), dtype=np.uint8)\n    asets = wco.columns\n\n    with wco.columns['second'] as second_aset:\n        assert wco.columns['second']._is_conman is True\n        assert second_aset._is_conman is True\n        assert asets._any_is_conman() is True\n\n    with wco.columns['writtenaset'] as written_aset:\n        assert wco.columns['writtenaset']._is_conman is True\n        assert written_aset._is_conman is True\n        assert asets._any_is_conman() is True\n\n    assert wco.columns['writtenaset']._is_conman is False\n    assert wco.columns['second']._is_conman is False\n    assert asets._any_is_conman() is False\n    wco.close()\n\n\ndef test_writer_co_aset_cm_not_allow_remove_aset(aset_samples_initialized_repo, array5by7):\n\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    asets = wco.columns\n    with asets as cm_asets:\n        with pytest.raises(PermissionError):\n            cm_asets.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            asets.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('writtenaset')\n\n        with pytest.raises(PermissionError):\n            del cm_asets['writtenaset']\n        with pytest.raises(PermissionError):\n            del asets['writtenaset']\n        with pytest.raises(PermissionError):\n            del wco.columns['writtenaset']\n\n    assert len(wco['writtenaset']) == 3\n    assert np.allclose(wco['writtenaset', 0], array5by7)\n    assert np.allclose(wco['writtenaset', 1], array5by7 + 1)\n    assert np.allclose(wco['writtenaset', 2], array5by7 + 2)\n    wco.close()\n\n\ndef test_writer_co_column_instance_cm_not_allow_any_column_removal(repo_20_filled_samples):\n\n    wco = repo_20_filled_samples.checkout(write=True)\n    columns = wco.columns\n    writtenaset = wco.columns['writtenaset']\n    second_aset = wco.columns['second_aset']\n\n    with second_aset:\n        with pytest.raises(PermissionError):\n            columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            del columns['writtenaset']\n        with pytest.raises(PermissionError):\n            del columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['written_aset']\n\n    with writtenaset:\n        with pytest.raises(PermissionError):\n            columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            del columns['writtenaset']\n        with pytest.raises(PermissionError):\n            del columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['written_aset']\n\n    with columns:\n        with pytest.raises(PermissionError):\n            columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('writtenaset')\n        with pytest.raises(PermissionError):\n            wco.columns.delete('second_aset')\n        with pytest.raises(PermissionError):\n            del columns['writtenaset']\n        with pytest.raises(PermissionError):\n            del columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['second_aset']\n        with pytest.raises(PermissionError):\n            del wco.columns['written_aset']\n\n    wco.close()\n\n\ndef test_writer_co_aset_removes_all_samples_and_arrayset_still_exists(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n    assert len(wco.columns) == 1\n    assert len(wco.columns['writtenaset']) == 3\n\n    with wco.columns['writtenaset'] as wset:\n        del wset[0]\n        del wset[1]\n        del wset[2]\n        # Removed all samples, now the aset's gone\n        assert len(wset) == 0\n        assert len(wco.columns) == 1\n    assert len(wco.columns) == 1\n\n    del wco.columns['writtenaset']\n\n    assert len(wco.columns) == 0\n    with pytest.raises(KeyError):\n        len(wco.columns['writtenaset'])\n    wco.close()\n\n\n# -------------------------- Reader Checkout ----------------------------------\n\n\ndef test_reader_co_read_single_aset_single_sample(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    assert np.allclose(rco['writtenaset', 0], array5by7)\n    assert np.allclose(rco['writtenaset', 1], array5by7 + 1)\n    assert np.allclose(rco['writtenaset', 2], array5by7 + 2)\n    rco.close()\n\n\ndef test_reader_co_read_single_aset_multiple_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    res = rco[('writtenaset', 0), ('writtenaset', 1), ('writtenaset', 2)]\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array5by7 + 1)\n    assert np.allclose(res[2], array5by7 + 2)\n    rco.close()\n\n\ndef test_reader_co_read_multiple_aset_single_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    array10[:] = 0\n    wco.columns['newaset'][0] = array10\n    wco.columns['newaset'][1] = array10 + 1\n    wco.columns['newaset'][2] = array10 + 2\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    res = rco[('writtenaset', 0), ('newaset', 0)]\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array10)\n    res = rco[('writtenaset', 1), ('newaset', 1)]\n    assert np.allclose(res[0], array5by7 + 1)\n    assert np.allclose(res[1], array10 + 1)\n    rco.close()\n\n\ndef test_reader_co_read_multtiple_aset_multiple_samples(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.columns['writtenaset'][1] = array5by7 + 1\n    wco.columns['writtenaset'][2] = array5by7 + 2\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    array10[:] = 0\n    wco.columns['newaset'][0] = array10\n    wco.columns['newaset'][1] = array10 + 1\n    wco.columns['newaset'][2] = array10 + 2\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    res = rco[('writtenaset', 0), ('newaset', 0), ('writtenaset', 1), ('newaset', 1)]\n    assert isinstance(res, list)\n    assert len(res) == 4\n    assert np.allclose(res[0], array5by7)\n    assert np.allclose(res[1], array10)\n    assert np.allclose(res[2], array5by7 + 1)\n    assert np.allclose(res[3], array10 + 1)\n    rco.close()\n\n\ndef test_reader_co_read_fails_nonexistant_aset_name(aset_samples_initialized_repo, array5by7):\n    rco = aset_samples_initialized_repo.checkout()\n    with pytest.raises(KeyError):\n        _ = rco['doesnotexist', 0]\n    rco.close()\n\n\ndef test_reader_co_read_fails_nonexistant_sample_name(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    with pytest.raises(KeyError):\n        _ = rco['doesnotexist', 124]\n    rco.close()\n\n\ndef test_reader_co_get_read_returns_none_nonexistant_sample_name(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n    array5by7[:] = 0\n    wco.columns['writtenaset'][0] = array5by7\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    out = rco.get(('writtenaset', 124))\n    assert out is None\n    rco.close()\n\n\ndef test_reader_co_read_in_context_manager_no_loop(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    wco['writtenaset']['0'] = array5by7\n    wco['newaset']['0'] = array10\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    with rco:\n        assert rco._is_conman is True\n        assert np.allclose(rco['writtenaset', '0'], array5by7)\n    rco.close()\n\n\ndef test_reader_co_read_in_context_manager_many_samples_looping(aset_samples_initialized_repo, array5by7):\n    wco = aset_samples_initialized_repo.checkout(write=True)\n\n    array10 = np.arange(10, dtype=np.float32)\n    wco.add_ndarray_column('newaset', prototype=array10)\n    with wco:\n        for idx in range(100):\n            array10[:] = idx\n            array5by7[:] = idx\n            wco['writtenaset'][idx] = array5by7\n            wco['newaset'][idx] = array10\n    wco.commit('first')\n    wco.close()\n\n    rco = aset_samples_initialized_repo.checkout()\n    with rco:\n        waset_keys = [('writtenaset', i) for i in range(100)]\n        naset_keys = [('newaset', i) for i in range(100)]\n        writtenasetOut = rco[waset_keys]\n        newasetOut = rco[naset_keys]\n        for idx in range(100):\n            array10[:] = idx\n            array5by7[:] = idx\n            assert np.allclose(array5by7, rco['writtenaset', idx])\n            assert np.allclose(array10, rco['newaset', idx])\n\n            o = rco[('writtenaset', idx), ('newaset', idx)]\n            assert np.allclose(o[0], array5by7)\n            assert np.allclose(o[1], array10)\n            assert np.allclose(writtenasetOut[idx], array5by7)\n            assert np.allclose(newasetOut[idx], array10)\n    rco.close()\n"
  },
  {
    "path": "tests/test_cli.py",
    "content": "from os import getcwd\nimport os\nfrom pathlib import Path\n\nimport numpy as np\nimport pytest\nfrom click.testing import CliRunner\n\nfrom hangar import Repository\nfrom hangar.cli import cli\nfrom hangar.external import PluginManager\nfrom conftest import fixed_shape_backend_params\n\n\n# -------------------------------- test data ----------------------------------\n\n\nhelp_res = 'Usage: main [OPTIONS] COMMAND [ARGS]...\\n'\\\n           '\\n'\\\n           'Options:\\n'\\\n           '  --version  display current Hangar Version\\n'\\\n           '  --help     Show this message and exit.\\n'\\\n           '\\n'\\\n           'Commands:\\n'\\\n           '  branch       Operate on and list branch pointers.\\n'\\\n           '  checkout     Checkout writer head branch at BRANCHNAME.\\n'\\\n           '  clone        Initialize a repository at the current path and fetch updated...\\n'\\\n           '  column       Operations for working with columns in the writer checkout.\\n'\\\n           '  commit       Commits outstanding changes.\\n'\\\n           '  diff         Display diff of DEV commit/branch to MASTER commit/branch.\\n'\\\n           '  export       Export COLUMN sample data as it existed a STARTPOINT to some...\\n'\\\n           '  fetch        Retrieve the commit history from REMOTE for BRANCH.\\n'\\\n           '  fetch-data   Get data from REMOTE referenced by STARTPOINT (short-commit or...\\n'\\\n           '  import       Import file or directory of files at PATH to COLUMN in the...\\n'\\\n           '  init         Initialize an empty repository at the current path.\\n'\\\n           '  log          Display commit graph starting at STARTPOINT (short-digest or...\\n'\\\n           '  push         Upload local BRANCH commit history / data to REMOTE server.\\n'\\\n           '  remote       Operations for working with remote server references\\n'\\\n           '  server       Start a hangar server, initializing one if does not exist.\\n'\\\n           '  status       Display changes made in the staging area compared to its base...\\n'\\\n           '  summary      Display content summary at STARTPOINT (short-digest or branch).\\n'\\\n           '  view         Use a plugin to view the data of some SAMPLE in COLUMN at...\\n'\\\n           '  writer-lock  Determine if the writer lock is held for a repository.\\n'\n\n\n# ------------------------------- begin tests ---------------------------------\n\n\ndef test_help_option():\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        res = runner.invoke(cli.main, ['--help'], terminal_width=80)\n        assert res.exit_code == 0\n        assert res.stdout == help_res\n\n\ndef test_help_no_args_option():\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        res = runner.invoke(cli.main, terminal_width=80)\n        assert res.exit_code == 0\n        assert res.stdout == help_res\n\n\ndef test_version_long_option():\n    import hangar\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        res = runner.invoke(cli.main, ['--version'])\n        assert res.exit_code == 0\n        assert res.stdout == f'main, version {hangar.__version__}\\n'\n\n\ndef test_init_repo(managed_tmpdir):\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        try:\n            repo = Repository(P, exists=False)\n            res = runner.invoke(cli.init, ['--name', 'test', '--email', 'test@foo.com'], obj=repo)\n            assert res.exit_code == 0\n            assert repo._Repository__verify_repo_initialized() is None\n        finally:\n            repo._env._close_environments()\n\n\ndef test_writer_lock_is_held_check(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(cli.writer_lock_held, obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Writer lock is available.\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    res = runner.invoke(cli.writer_lock_held, obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Writer lock is held.\\n'\n    co.close()\n\n\ndef test_writer_lock_force_release(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(cli.writer_lock_held, ['--force-release'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Success force release of writer lock.\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    res = runner.invoke(cli.writer_lock_held, ['--force-release'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Success force release of writer lock.\\n'\n    assert repo_20_filled_samples2.writer_lock_held is False\n    nco = repo_20_filled_samples2.checkout(write=True)\n    with pytest.raises(PermissionError):\n        print(co.columns)\n    nco.close()\n\n\ndef test_checkout_writer_branch_works(repo_20_filled_samples2):\n    from hangar.records.heads import get_staging_branch_head\n    repo_20_filled_samples2.create_branch('dev')\n    runner = CliRunner()\n    res = runner.invoke(cli.checkout, ['dev'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Writer checkout head set to branch: dev\\n'\n    recorded_branch = get_staging_branch_head(repo_20_filled_samples2._env.branchenv)\n    assert recorded_branch == 'dev'\n    assert repo_20_filled_samples2.writer_lock_held is False\n\n\ndef test_checkout_writer_branch_nonexistant_branch_errors(repo_20_filled_samples2):\n    from hangar.records.heads import get_staging_branch_head\n    runner = CliRunner()\n    res = runner.invoke(cli.checkout, ['doesnotexist'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 1\n    assert res.stdout == 'Error: branch with name: doesnotexist does not exist. cannot get head.\\n'\n    recorded_branch = get_staging_branch_head(repo_20_filled_samples2._env.branchenv)\n    assert recorded_branch == 'master'\n    assert repo_20_filled_samples2.writer_lock_held is False\n\n\ndef test_checkout_writer_branch_lock_held_errors(repo_20_filled_samples2):\n    from hangar.records.heads import get_staging_branch_head\n    repo_20_filled_samples2.create_branch('testbranch')\n    co = repo_20_filled_samples2.checkout(write=True, branch='master')\n    try:\n        runner = CliRunner()\n        res = runner.invoke(cli.checkout, ['testbranch'], obj=repo_20_filled_samples2)\n        assert res.exit_code == 1\n        msg = res.stdout\n        assert msg.startswith('Error: Cannot acquire the writer lock.') is True\n        recorded_branch = get_staging_branch_head(repo_20_filled_samples2._env.branchenv)\n        assert recorded_branch == 'master'\n        assert repo_20_filled_samples2.writer_lock_held is True\n        assert co.branch_name == 'master'\n    finally:\n        co.close()\n    assert repo_20_filled_samples2.writer_lock_held is False\n\n\ndef test_diff_command(repo_2_br_no_conf):\n    runner = CliRunner()\n    res = runner.invoke(cli.diff, ['master', 'testbranch'], obj=repo_2_br_no_conf)\n    assert res.exit_code == 0\n\n\ndef test_commit_cli_message(repo_20_filled_samples2):\n    co = repo_20_filled_samples2.checkout(write=True)\n    co.add_str_column('test_meta')\n    base_digest = co.commit_hash\n    base_branch = co.branch_name\n    co.close()\n    assert base_branch == 'master'\n\n    runner = CliRunner()\n    res = runner.invoke(cli.commit, ['-m', 'this is my commit message'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    out = res.stdout\n    assert out.startswith('Commit message:\\nthis is my commit message\\nCommit Successful') is True\n    new_digest = out.split(' ')[-1].rstrip('\\n')\n    assert new_digest != base_digest\n\n    nco = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert nco.commit_hash == new_digest\n        assert nco.branch_name == base_branch\n    finally:\n        nco.close()\n\n\ndef test_commit_cli_message_with_no_changes(repo_20_filled_samples2):\n    co = repo_20_filled_samples2.checkout(write=True)\n    base_digest = co.commit_hash\n    base_branch = co.branch_name\n    co.close()\n    assert base_branch == 'master'\n\n    runner = CliRunner()\n    res = runner.invoke(cli.commit, ['-m', 'this is my commit message'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 1\n    assert res.stdout.endswith('Error: No changes made in staging area. Cannot commit.\\n')\n\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert co.branch_name == base_branch\n        assert co.commit_hash == base_digest\n    finally:\n        co.close()\n\n\ndef substitute_editor_commit_message(hint):\n    return 'this is my commit message\\n' + hint\n\n\ndef test_commit_editor_message(monkeypatch, repo_20_filled_samples2):\n    import click\n    monkeypatch.setattr(click, 'edit', substitute_editor_commit_message)\n\n    co = repo_20_filled_samples2.checkout(write=True)\n    co.add_str_column('test_meta')\n    base_digest = co.commit_hash\n    base_branch = co.branch_name\n    co.close()\n    assert base_branch == 'master'\n\n    runner = CliRunner()\n    res = runner.invoke(cli.commit, obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    out = res.stdout\n    assert out.startswith('Commit message:\\nthis is my commit message\\nCommit Successful') is True\n    new_digest = out.split(' ')[-1].rstrip('\\n')\n    assert new_digest != base_digest\n\n    nco = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert nco.commit_hash == new_digest\n        assert nco.branch_name == base_branch\n    finally:\n        nco.close()\n\n\ndef substitute_editor_empty_commit_message(hint):\n    return hint\n\n\ndef test_commit_editor_empty_message(monkeypatch, repo_20_filled_samples2):\n    import click\n    monkeypatch.setattr(click, 'edit', substitute_editor_empty_commit_message)\n\n    co = repo_20_filled_samples2.checkout(write=True)\n    co.add_str_column('test_meta')\n    base_digest = co.commit_hash\n    base_branch = co.branch_name\n    co.close()\n    assert base_branch == 'master'\n\n    runner = CliRunner()\n    res = runner.invoke(cli.commit, obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Aborted! Empty commit message\\n'\n    nco = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert nco.commit_hash == base_digest\n        assert nco.branch_name == base_branch\n    finally:\n        nco.close()\n\n\ndef test_clone(written_two_cmt_server_repo):\n    server, base_repo = written_two_cmt_server_repo\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        try:\n            new_repo = Repository(P, exists=False)\n            res = runner.invoke(\n                cli.clone,\n                ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server}'], obj=new_repo)\n\n            assert res.exit_code == 0\n\n            newLog = new_repo.log(return_contents=True)\n            baseLog = base_repo.log(return_contents=True)\n            assert newLog == baseLog\n            assert new_repo.summary() == base_repo.summary()\n        finally:\n            new_repo._env._close_environments()\n\n\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\ndef test_push_fetch_records(server_instance, backend):\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        repo = Repository(getcwd(), exists=False)\n        try:\n            repo.init('foo', 'bar')\n            dummyData = np.arange(50)\n            co1 = repo.checkout(write=True, branch='master')\n            co1.add_ndarray_column(name='dummy', prototype=dummyData, backend=backend)\n            for idx in range(10):\n                dummyData[:] = idx\n                co1.columns['dummy'][str(idx)] = dummyData\n            cmt1 = co1.commit('first commit adding dummy data')\n            co1.close()\n\n            repo.create_branch('testbranch')\n            co2 = repo.checkout(write=True, branch='testbranch')\n            for idx in range(10, 20):\n                dummyData[:] = idx\n                co2.columns['dummy'][str(idx)] = dummyData\n            cmt2 = co2.commit('first commit on test branch adding non-conflict data')\n            co2.close()\n\n            repo.remote.add('origin', server_instance)\n\n            res = runner.invoke(cli.push, ['origin', 'master'], obj=repo)\n            assert res.exit_code == 0\n            res = runner.invoke(cli.push, ['origin', 'testbranch'], obj=repo)\n            assert res.exit_code == 0\n        finally:\n            repo._env._close_environments()\n\n\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\n@pytest.mark.parametrize('options', [\n    ['origin', 'testbranch'],\n    ['origin', 'master'],\n    ['origin', 'testbranch', '--all-history'],\n    ['origin', 'master', '--all-history'],\n    ['origin', 'testbranch', '--column', 'data'],\n    ['origin', 'master', '--column', 'data'],\n    ['origin', 'testbranch', '--column', 'data', '--all-history'],\n    ['origin', 'master', '--column', 'data', '--all-history'],\n    ['origin', 'testbranch', '--column', 'data', '--all-history'],\n])\ndef test_fetch_records_and_data(server_instance, backend, options):\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        repo = Repository(getcwd(), exists=False)\n        try:\n            repo.init('foo', 'bar')\n            dummyData = np.arange(50)\n            co1 = repo.checkout(write=True, branch='master')\n            co1.add_ndarray_column(name='dummy', prototype=dummyData, backend=backend)\n            for idx in range(10):\n                dummyData[:] = idx\n                co1.columns['dummy'][str(idx)] = dummyData\n            cmt1 = co1.commit('first commit adding dummy data')\n            co1.close()\n\n            repo.create_branch('testbranch')\n            co2 = repo.checkout(write=True, branch='testbranch')\n            for idx in range(10, 20):\n                dummyData[:] = idx\n                co2.columns['dummy'][str(idx)] = dummyData\n            cmt2 = co2.commit('first commit on test branch adding non-conflict data')\n            co2.close()\n\n            repo.remote.add('origin', server_instance)\n\n            res = runner.invoke(cli.push, ['origin', 'master'], obj=repo)\n            assert res.exit_code == 0\n            res = runner.invoke(cli.push, ['origin', 'testbranch'], obj=repo)\n            assert res.exit_code == 0\n        finally:\n            repo._env._close_environments()\n\n    with runner.isolated_filesystem():\n        repo = Repository(getcwd(), exists=False)\n        try:\n            res = runner.invoke(\n                cli.clone,\n                ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server_instance}'], obj=repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.fetch_records, ['origin', 'testbranch'], obj=repo)\n            assert res.exit_code == 0\n            res = runner.invoke(cli.branch_create, ['testbranch', 'origin/testbranch'], obj=repo)\n            assert res.exit_code == 0\n            res = runner.invoke(cli.fetch_data, options, obj=repo)\n            assert res.exit_code == 0\n        finally:\n            repo._env._close_environments()\n\n\ndef test_add_remote(managed_tmpdir):\n    from hangar.remotes import RemoteInfo\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        repo = Repository(P, exists=False)\n        try:\n            res = runner.invoke(cli.init, ['--name', 'test', '--email', 'test@foo.com'], obj=repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.add_remote, ['origin', 'localhost:50051'], obj=repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"RemoteInfo(name='origin', address='localhost:50051')\\n\"\n\n            remote_list = repo.remote.list_all()\n            assert remote_list == [RemoteInfo(name='origin', address='localhost:50051')]\n        finally:\n            repo._env._close_environments()\n\n\ndef test_remove_remote(managed_tmpdir):\n    from hangar.remotes import RemoteInfo\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        repo = Repository(P, exists=False)\n        try:\n            res = runner.invoke(cli.init, ['--name', 'test', '--email', 'test@foo.com'], obj=repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.add_remote, ['origin', 'localhost:50051'], obj=repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"RemoteInfo(name='origin', address='localhost:50051')\\n\"\n\n            remote_list = repo.remote.list_all()\n            assert remote_list == [RemoteInfo(name='origin', address='localhost:50051')]\n\n            res = runner.invoke(cli.remove_remote, ['origin'], obj=repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"RemoteInfo(name='origin', address='localhost:50051')\\n\"\n            assert repo.remote.list_all() == []\n        finally:\n            repo._env._close_environments()\n\n\ndef test_list_all_remotes(managed_tmpdir):\n    from hangar.remotes import RemoteInfo\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        repo = Repository(P, exists=False)\n        try:\n            res = runner.invoke(cli.init, ['--name', 'test', '--email', 'test@foo.com'], obj=repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.add_remote, ['origin', 'localhost:50051'], obj=repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"RemoteInfo(name='origin', address='localhost:50051')\\n\"\n            res = runner.invoke(cli.add_remote, ['upstream', 'foo:ip'], obj=repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"RemoteInfo(name='upstream', address='foo:ip')\\n\"\n\n            remote_list = repo.remote.list_all()\n            assert remote_list == [\n                RemoteInfo(name='origin', address='localhost:50051'),\n                RemoteInfo(name='upstream', address='foo:ip')\n            ]\n\n            res = runner.invoke(cli.list_remotes, obj=repo)\n            assert res.exit_code == 0\n            expected_stdout = \"[RemoteInfo(name='origin', address='localhost:50051'), \"\\\n                              \"RemoteInfo(name='upstream', address='foo:ip')]\\n\"\n            assert res.stdout == expected_stdout\n        finally:\n            repo._env._close_environments()\n\n\ndef test_summary(written_two_cmt_server_repo, capsys):\n    server, base_repo = written_two_cmt_server_repo\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        try:\n            with capsys.disabled():\n                P = getcwd()\n                new_repo = Repository(P, exists=False)\n                res = runner.invoke(\n                    cli.clone,\n                    ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server}'], obj=new_repo)\n\n                assert res.exit_code == 0\n                assert new_repo.summary() == base_repo.summary()\n\n            new_repo.summary()\n\n            with capsys.disabled():\n                res = runner.invoke(cli.summary, obj=new_repo)\n                assert res.stdout == f\"{capsys.readouterr().out}\\n\"\n        finally:\n            new_repo._env._close_environments\n\n\ndef test_summary_before_commit_made(managed_tmpdir):\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        new_repo = Repository(P, exists=False)\n        new_repo.init('Test User', 'Test@test.com')\n        try:\n            res = runner.invoke(cli.summary, obj=new_repo)\n            assert res.exit_code == 0\n            assert 'No commits have been made in the repository' in res.stdout\n        finally:\n            new_repo._env._close_environments\n\n\ndef test_log(written_two_cmt_server_repo, capsys):\n    server, base_repo = written_two_cmt_server_repo\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        try:\n            with capsys.disabled():\n                P = getcwd()\n                new_repo = Repository(P, exists=False)\n                res = runner.invoke(\n                    cli.clone,\n                    ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server}'], obj=new_repo)\n\n                assert res.exit_code == 0\n                assert new_repo.log() == base_repo.log()\n\n            new_repo.log()\n\n            with capsys.disabled():\n                res = runner.invoke(cli.log, ['master'], obj=new_repo)\n                assert res.stdout == f\"{capsys.readouterr().out}\\n\"\n        finally:\n            new_repo._env._close_environments()\n\n\ndef test_status(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n\n    dummyData = np.arange(50).astype(np.int64)\n    co2 = repo.checkout(write=True)\n    for idx in range(10, 20):\n        dummyData[:] = idx\n        co2.columns['dummy'][str(idx)] = dummyData\n        co2.columns['dummy'][idx] = dummyData\n    df = co2.diff.staged()\n    co2.close()\n    expected = status(repo._env.hashenv, 'master', df.diff).getvalue()\n    runner = CliRunner()\n    res = runner.invoke(cli.status, obj=repo)\n    assert res.exit_code == 0\n    assert res.stdout == expected\n\n\ndef test_arrayset_create_uint8(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(\n        cli.create_column,\n        ['train_images', 'UINT8', '256', '256', '3'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Initialized Column: train_images\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'train_images' in co.columns\n        assert co.columns['train_images'].shape == (256, 256, 3)\n        assert co.columns['train_images'].dtype == np.uint8\n        assert co.columns['train_images'].schema_type == 'fixed_shape'\n        assert len(co.columns['train_images']) == 0\n    finally:\n        co.close()\n\n\ndef test_arrayset_create_float32(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(\n        cli.create_column,\n        ['train_images', 'FLOAT32', '256'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Initialized Column: train_images\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'train_images' in co.columns\n        assert co.columns['train_images'].shape == (256,)\n        assert co.columns['train_images'].dtype == np.float32\n        assert co.columns['train_images'].schema_type == 'fixed_shape'\n        assert len(co.columns['train_images']) == 0\n    finally:\n        co.close()\n\n\ndef test_arrayset_create_invalid_dtype_fails(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(\n        cli.create_column,\n        ['train_images', 'FLOAT7', '256'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 2\n    expected = ('invalid choice: FLOAT7. (choose from UINT8, '\n                'INT8, UINT16, INT16, UINT32, INT32, UINT64, '\n                'INT64, FLOAT16, FLOAT32, FLOAT64, STR)\\n')\n    assert res.stdout.endswith(expected) is True\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'train_images' not in co.columns\n    finally:\n        co.close()\n\n\ndef test_arrayset_create_invalid_name_fails(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(cli.create_column, ['tra#in', 'FLOAT32', '256'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 1\n    msg = res.stdout\n    assert msg.startswith('Error: Column name provided: `tra#in` is invalid.') is True\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'tra#in' not in co.columns\n        assert 'dummy' in co.columns\n        assert len(co.columns) == 1\n    finally:\n        co.close()\n\n\ndef test_arrayset_create_variable_shape(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(\n        cli.create_column,\n        ['train_images', 'FLOAT32', '256', '--variable-shape'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Initialized Column: train_images\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'train_images' in co.columns\n        assert co.columns['train_images'].shape == (256,)\n        assert co.columns['train_images'].dtype == np.float32\n        assert co.columns['train_images'].schema_type == 'variable_shape'\n        assert co.columns['train_images'].contains_subsamples is False\n        assert len(co.columns['train_images']) == 0\n    finally:\n        co.close()\n\n\ndef test_arrayset_create_contains_subsamples(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(\n        cli.create_column,\n        ['train_images', 'FLOAT32', '256', '--contains-subsamples'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Initialized Column: train_images\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'train_images' in co.columns\n        assert co.columns['train_images'].shape == (256,)\n        assert co.columns['train_images'].dtype == np.float32\n        assert co.columns['train_images'].schema_type == 'fixed_shape'\n        assert co.columns['train_images'].contains_subsamples is True\n        assert len(co.columns['train_images']) == 0\n    finally:\n        co.close()\n\n\ndef test_remove_arrayset(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(cli.remove_column, ['dummy'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 0\n    assert res.stdout == 'Successfully removed column: dummy\\n'\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'repo_20_filled_samples2' not in co.columns\n        assert len(co.columns) == 0\n    finally:\n        co.close()\n\n\ndef test_remove_non_existing_arrayset(repo_20_filled_samples2):\n    runner = CliRunner()\n    res = runner.invoke(cli.remove_column, ['doesnotexist'], obj=repo_20_filled_samples2)\n    assert res.exit_code == 1\n    assert res.stdout == \"Error: 'Cannot remove: doesnotexist. Key does not exist.'\\n\"\n    co = repo_20_filled_samples2.checkout(write=True)\n    try:\n        assert 'doesnotexist' not in co.columns\n        assert 'dummy' in co.columns\n        assert len(co.columns) == 1\n        assert len(co.columns['dummy']) == 10\n    finally:\n        co.close()\n\n\ndef test_branch_create_and_list(written_two_cmt_server_repo):\n    server, base_repo = written_two_cmt_server_repo\n\n    co = base_repo.checkout(write=True)\n    cmt = co.commit_hash\n    co.close()\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        new_repo = Repository(P, exists=False)\n        try:\n            res = runner.invoke(\n                cli.clone,\n                ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server}'], obj=new_repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.branch_create, ['testbranch'], obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == f\"Created BRANCH: testbranch HEAD: {cmt}\\n\"\n\n            branches = new_repo.list_branches()\n            assert branches == ['master', 'origin/master', 'testbranch']\n\n            res = runner.invoke(cli.branch_list, obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"['master', 'origin/master', 'testbranch']\\n\"\n        finally:\n            new_repo._env._close_environments()\n\n\n@pytest.mark.filterwarnings(\"ignore:Column.* contains `reference-only` samples\")\ndef test_branch_create_and_delete(written_two_cmt_server_repo):\n    server, base_repo = written_two_cmt_server_repo\n\n    co = base_repo.checkout(write=True)\n    cmt = co.commit_hash\n    co.close()\n\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        P = getcwd()\n        new_repo = Repository(P, exists=False)\n        try:\n            res = runner.invoke(\n                cli.clone,\n                ['--name', 'Foo Tester', '--email', 'foo@email.com', f'{server}'], obj=new_repo)\n            assert res.exit_code == 0\n\n            res = runner.invoke(cli.branch_create, ['testbranch'], obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == f\"Created BRANCH: testbranch HEAD: {cmt}\\n\"\n\n            branches = new_repo.list_branches()\n            assert branches == ['master', 'origin/master', 'testbranch']\n\n            res = runner.invoke(cli.branch_remove, ['testbranch'], obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == f\"Deleted BRANCH: testbranch HEAD: {cmt}\\n\"\n\n            branches = new_repo.list_branches()\n            assert branches == ['master', 'origin/master']\n\n            new_repo.create_branch('secondtest')\n            co = new_repo.checkout(write=True, branch='secondtest')\n            co.add_str_column('test_meta')\n            newDigest = co.commit('dummy commit')\n            co.close()\n\n            # re-open with staging set to master so we can try to delete secondtest\n            co = new_repo.checkout(write=True, branch='master')\n            co.close()\n\n            res = runner.invoke(cli.branch_remove, ['secondtest'], obj=new_repo)\n            assert res.exit_code == 1\n\n            res = runner.invoke(cli.branch_remove, ['secondtest', '-f'], obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == f\"Deleted BRANCH: secondtest HEAD: {newDigest}\\n\"\n\n            res = runner.invoke(cli.branch_list, obj=new_repo)\n            assert res.exit_code == 0\n            assert res.stdout == \"['master', 'origin/master']\\n\"\n        finally:\n            new_repo._env._close_environments()\n\n\ndef test_start_server(managed_tmpdir):\n    import time\n    runner = CliRunner()\n    with runner.isolated_filesystem():\n        startTime = time.time()\n        res = runner.invoke(cli.server, ['--ip', 'localhost', '--port', '50111', '--timeout', '1'])\n        assert time.time() - startTime <= 1.8  # buffer to give it time to stop\n        assert res.exit_code == 0\n        assert 'Hangar Server Started' in res.stdout\n\n\n# ------------------------ Developer Commands --------------------------------\n\n\ndef test_db_view_command(repo_20_filled_samples):\n    repo = repo_20_filled_samples\n    runner = CliRunner()\n    res = runner.invoke(cli.lmdb_record_details, ['-a'], obj=repo)\n\n    dbs_queried = 0\n    assert res.exit_code == 0\n    for line in res.stdout.splitlines():\n        if '.lmdb' in line:\n            dbs_queried += 1\n    assert dbs_queried == 5\n\n    res = runner.invoke(cli.lmdb_record_details, ['-a', '--limit', '10'], obj=repo)\n    assert res.exit_code == 0\n\n\n# =========================== External Plugin =================================\n\n\ndef monkeypatch_scan(provides, accepts, attribute, func):\n    def wrapper(self):\n        from hangar.external import BasePlugin\n\n        plugin = BasePlugin(provides, accepts)\n        plugin.__dict__[attribute] = func\n\n        self._plugin_store['myplugin'] = plugin\n    return wrapper\n\n\n@pytest.fixture()\ndef written_repo_with_1_sample(aset_samples_initialized_repo):\n    aset_name = 'writtenaset'\n    shape = (5, 7)\n    co = aset_samples_initialized_repo.checkout(write=True)\n    aset = co.columns[aset_name]\n    aset['data'] = np.random.random(shape)\n    aset['123'] = np.random.random(shape)\n    aset[123] = np.random.random(shape)\n    co.commit('added')\n    co.close()\n    yield aset_samples_initialized_repo\n\n\nclass TestImport(object):\n\n    @staticmethod\n    def load(fpath, *args, **kwargs):\n        data = np.random.random((5, 7)).astype(np.float64)\n        if isinstance(fpath, Path):\n            fpath = fpath.name\n        return data, fpath\n\n    def test_import(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        shape = (5, 7)\n        fpath = 'data.ext'\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m, runner.isolated_filesystem():\n            with open('data.ext', 'w') as f:\n                f.write('test')\n\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['load'], ['ext'], 'load', self.load))\n            # adding data\n            res = runner.invoke(cli.import_data, [aset_name, fpath], obj=repo)\n            assert res.exit_code == 0\n            co = repo.checkout(write=True)\n            co.commit('added data')\n            d1 = co.columns[aset_name][fpath]\n            co.close()\n\n            # without overwrite\n            res = runner.invoke(cli.import_data, [aset_name, fpath], obj=repo)\n            assert res.exit_code == 0\n            co = repo.checkout()\n            d2 = co.columns[aset_name][fpath]\n            co.close()\n            assert np.allclose(d1, d2)\n\n            # with overwrite\n            res = runner.invoke(cli.import_data, [aset_name, fpath, '--overwrite'], obj=repo)\n            assert res.exit_code == 0\n            co = repo.checkout(write=True)\n            co.commit('added data')\n            d3 = co.columns[aset_name][fpath]\n            co.close()\n            assert not np.allclose(d1, d3)\n        assert d1.shape == d2.shape == d3.shape == shape\n\n    def test_import_wrong_args(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['load'], ['ext'], 'load', self.load))\n\n            with runner.isolated_filesystem():\n\n                # invalid file\n                res = runner.invoke(cli.import_data, [aset_name, 'valid.ext'], obj=repo)\n                assert res.exit_code == 2\n                assert \"Invalid value for\" in res.stdout\n                assert \"PATH\" in res.stdout\n                assert \"valid.ext\" in res.stdout\n                assert \"does not exist.\" in res.stdout\n\n                with open('valid.ext', 'w') as f:\n                    f.write('empty')\n\n                with open('valid.ext.bz2', 'w') as f:\n                    f.write('empty')\n\n                res = runner.invoke(cli.import_data, [aset_name, 'valid.ext.bz2'], obj=repo)\n                assert res.exit_code == 1\n                assert res.stdout.endswith('No plugins found for the file extension ext.bz2 that could do load\\n')\n\n                # invalid branch\n                res = runner.invoke(cli.import_data, [aset_name, 'valid.ext', '--branch', 'invalid'], obj=repo)\n                assert res.exit_code == 1\n                assert res.stdout.endswith('Branch name: invalid does not exist, Exiting.\\n')\n\n                # invalid plugin\n                res = runner.invoke(cli.import_data, [aset_name, 'valid.ext', '--plugin', 'invalid'], obj=repo)\n                assert res.exit_code == 1\n                assert res.stdout.endswith('Plugin invalid not found\\n')\n\n    def test_import_generator_on_load(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        fpath = 'data.ext'\n        aset_name = 'writtenaset'\n\n        def load(fpath, *args, **kwargs):\n            for i in range(10):\n                data, name = self.load(fpath, *args, **kwargs)\n                if isinstance(name, Path):\n                    name = name.name\n                yield data, f\"{i}_{name}\"\n\n        with monkeypatch.context() as m, runner.isolated_filesystem():\n            with open('data.ext', 'w') as f:\n                f.write('test')\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['load'], ['ext'], 'load', load))\n            res = runner.invoke(cli.import_data, [aset_name, fpath], obj=repo)\n            assert res.exit_code == 0\n            co = repo.checkout(write=True)\n            co.commit('added data')\n            aset = co.columns[aset_name]\n            for i in range(10):\n                assert f\"{i}_{fpath}\" in aset.keys()\n            co.close()\n\n\nclass TestExport(object):\n    save_msg = \"Data saved from custom save function\"\n\n    @classmethod\n    def save(cls, data, outdir, sampleN, extension, *args, **kwargs):\n        print(cls.save_msg)\n        fpath = os.path.join(outdir, f\"{sampleN}.{extension}\")\n        print(fpath)\n\n    def test_export_success(self, monkeypatch, written_repo_with_1_sample, tmp_path):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n\n            # single sample\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--sample', 'data', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert self.save_msg in res.output\n\n            # with sample name and sample type\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--sample', 'int:123', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert os.path.join(tmp_path, 'int:123.ext') in res.output\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--sample', 'str:123', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert os.path.join(tmp_path, 'str:123.ext') in res.output\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--sample', '123', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert os.path.join(tmp_path, 'str:123.ext') in res.output\n\n            # whole column\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert os.path.join(tmp_path, 'str:data.ext') in res.output\n            assert os.path.join(tmp_path, 'str:123.ext') in res.output\n            assert os.path.join(tmp_path, 'int:123.ext') in res.output\n\n    def test_export_wrong_out_location(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n\n            # single sample\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', 'wrongpath', '--sample', 'data', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 2\n            assert \"Invalid value for\" in res.stdout\n            assert \"-o\" in res.stdout\n            assert \"--out\" in res.stdout\n\n    def test_export_wrong_arg(self, monkeypatch, written_repo_with_1_sample, tmp_path):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n            res = runner.invoke(\n                cli.export_data, [aset_name, '-o', str(tmp_path), '--plugin', 'invalid'], obj=repo)\n            assert res.exit_code == 1\n            assert 'Plugin invalid not found' in res.stdout\n\n    def test_export_without_specifying_out(self, monkeypatch, written_repo_with_1_sample):\n        import os\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n            res = runner.invoke(\n                cli.export_data, [aset_name, '--sample', 'data', '--format', 'ext'], obj=repo)\n            assert os.getcwd() in res.output\n\n    def test_export_for_non_existent_sample(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n            res = runner.invoke(\n                cli.export_data, [aset_name, '--sample', 'wrongname', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 1\n            assert 'wrongname' in res.output\n\n    def test_export_for_specified_branch(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['save'], ['ext'], 'save', self.save))\n            res = runner.invoke(\n                cli.export_data, [aset_name, 'master', '--sample', 'data', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n\n\nclass TestShow(object):\n    show_msg = \"Data is displayed from custom show function\"\n\n    @classmethod\n    def show(cls, fpath, *args, **kwargs):\n        print(cls.show_msg)\n\n    def test_show_success(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['show'], ['ext'], 'show', self.show))\n            res = runner.invoke(\n                cli.view_data, [aset_name, 'data', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            assert self.show_msg in res.output\n\n    def test_show_on_startpoint(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['show'], ['ext'], 'show', self.show))\n            res = runner.invoke(\n                cli.view_data, [aset_name, 'data', 'master', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 0\n            res = runner.invoke(\n                cli.view_data, [aset_name, 'data', 'wrongstartpoint', '--format', 'ext'], obj=repo)\n            assert \"No matching commit hash found\" in str(res.exception)\n\n    def test_show_with_wrong_arg(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['show'], ['ext'], 'show', self.show))\n            res = runner.invoke(\n                cli.view_data, [aset_name, 'data', '--format', 'wrong'], obj=repo)\n            assert res.exit_code == 1\n            assert 'No plugins found' in res.stdout\n\n    def test_wrong_sample_name(self, monkeypatch, written_repo_with_1_sample):\n        repo = written_repo_with_1_sample\n        runner = CliRunner()\n        aset_name = 'writtenaset'\n        with monkeypatch.context() as m:\n            m.setattr(PluginManager, \"_scan_plugins\", monkeypatch_scan(['show'], ['ext'], 'show', self.show))\n            res = runner.invoke(\n                cli.view_data, [aset_name, 'wrongsample', '--format', 'ext'], obj=repo)\n            assert res.exit_code == 1\n            assert \"wrongsample\" in res.stdout\n"
  },
  {
    "path": "tests/test_column.py",
    "content": "import pytest\nimport numpy as np\nfrom conftest import fixed_shape_backend_params, variable_shape_backend_params\nfrom itertools import permutations\n\n\ndef assert_equal(arr, arr2):\n    assert np.array_equal(arr, arr2)\n    assert arr.dtype == arr2.dtype\n\n\nclass TestColumn(object):\n\n    @pytest.mark.parametrize('name', [\n        'invalid\\n', '\\ninvalid', 'inv name', 'inva@lid', 12, ' try', 'andthis ',\n        'VeryLongNameIsInvalidOver64CharactersNotAllowedVeryLongNameIsInva'])\n    def test_invalid_column_name(self, repo, randomsizedarray, name):\n        co = repo.checkout(write=True)\n        with pytest.raises(ValueError):\n            co.add_ndarray_column(name=name, prototype=randomsizedarray)\n        with pytest.raises(ValueError):\n            co.add_str_column(name=name)\n        co.close()\n\n    def test_read_only_mode(self, aset_samples_initialized_repo):\n        import hangar\n        co = aset_samples_initialized_repo.checkout()\n        assert isinstance(co, hangar.checkout.ReaderCheckout)\n        with pytest.raises(AttributeError):\n            assert co.add_ndarray_column('foo')\n        with pytest.raises(AttributeError):\n            assert co.add_str_column('foo')\n        with pytest.raises(PermissionError):\n            del co.columns['foo']\n        with pytest.raises(PermissionError):\n            del co.columns['foo']\n        assert len(co.columns['writtenaset']) == 0\n        co.close()\n\n    def test_get_column(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n\n        # getting the column with `get`\n        asetOld = co.columns.get('writtenaset')\n        asetOldPath = asetOld._path\n        asetOldAsetn = asetOld.column\n        asetOldDefaultSchemaHash = asetOld._schema.schema_hash_digest()\n\n        asetOld['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n\n        # getting column with dictionary like style method\n        asetNew = co.columns['writtenaset']\n        assert_equal(asetNew['1'], array5by7)\n        assert asetOldPath == asetNew._path\n        assert asetOldAsetn == asetNew.column\n        assert asetOldDefaultSchemaHash == asetNew._schema.schema_hash_digest()\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_remove_column(self, aset_backend, aset_samples_initialized_repo):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        del co.columns['writtenaset']\n        with pytest.raises(KeyError):\n            del co.columns['writtenaset']\n\n        co.add_ndarray_column('writtenaset', shape=(5, 7), dtype=np.float64, backend=aset_backend)\n        assert len(co.columns) == 1\n        del co.columns['writtenaset']\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 0\n\n        co.add_ndarray_column('writtenaset', shape=(5, 7), dtype=np.float64, backend=aset_backend)\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        del co.columns['writtenaset']\n        assert len(co.columns) == 0\n        co.commit('this is a commit message')\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_init_again(self, aset_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        co.add_ndarray_column('aset', prototype=randomsizedarray, backend=aset_backend)\n        with pytest.raises(LookupError):\n            co.add_ndarray_column('aset', prototype=randomsizedarray, backend=aset_backend)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_column_with_more_dimension(self, aset_backend, repo):\n        co = repo.checkout(write=True)\n        shape = (0, 1, 2)\n        with pytest.raises(ValueError):\n            co.add_ndarray_column('aset', shape=shape, dtype=np.int, backend=aset_backend)\n        shape = [1] * 31\n        aset = co.add_ndarray_column('aset1', shape=shape, dtype=np.int, backend=aset_backend)\n        assert len(aset.shape) == 31\n        shape = [1] * 32\n        with pytest.raises(ValueError):\n            # maximum tensor rank must be <= 31\n            co.add_ndarray_column('aset2', shape=shape, dtype=np.int, backend=aset_backend)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_column_with_empty_dimension(self, aset_backend, repo):\n        co = repo.checkout(write=True)\n        arr = np.array(1, dtype=np.int64)\n        aset = co.add_ndarray_column('aset1', shape=(), dtype=np.int64, backend=aset_backend)\n        aset['1'] = arr\n        co.commit('this is a commit message')\n        aset = co.add_ndarray_column('aset2', prototype=arr)\n        aset['1'] = arr\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout()\n        aset1 = co.columns['aset1']\n        aset2 = co.columns['aset2']\n        assert_equal(aset1['1'], arr)\n        assert_equal(aset2['1'], arr)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_column_with_int_specifier_as_dimension(self, aset_backend, repo):\n        co = repo.checkout(write=True)\n        arr = np.arange(10, dtype=np.int64)\n        aset = co.add_ndarray_column('aset1', shape=10, dtype=np.int64, backend=aset_backend)\n        aset['1'] = arr\n        co.commit('this is a commit message')\n        arr2 = np.array(53, dtype=np.int64)\n        aset = co.add_ndarray_column('aset2', prototype=arr2)\n        aset['1'] = arr2\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout()\n        aset1 = co.columns['aset1']\n        aset2 = co.columns['aset2']\n        assert_equal(aset1['1'], arr)\n        assert_equal(aset2['1'], arr2)\n        co.close()\n\n    @pytest.mark.parametrize('write', [True, False])\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_getattr_does_not_raise_permission_error_if_alive(self, aset_backend, write, repo):\n        co = repo.checkout(write=True)\n        arr = np.arange(10, dtype=np.int64)\n        aset = co.add_ndarray_column('aset1', shape=10, dtype=np.int64, backend=aset_backend)\n        aset['1'] = arr\n        co.commit('hello')\n        co.close()\n        co = repo.checkout(write=write)\n        aset = co.columns['aset1']\n\n        assert hasattr(aset, 'doesnotexist') is False  # does not raise error\n        assert hasattr(aset, '_mode') is True\n        with pytest.raises(AttributeError):\n            assert getattr(aset, 'doesnotexist')\n        assert getattr(aset, '_mode') == 'a' if write else 'r'\n\n        co.close()\n        with pytest.raises(PermissionError):\n            hasattr(aset, 'doesnotexist')\n        with pytest.raises(PermissionError):\n            hasattr(aset, '_mode')\n\n\nclass TestDataWithFixedSizedColumn(object):\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_column_remote_references_property_with_none(\n            self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray\n    ):\n        co = repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int, backend=aset2_backend)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32, backend=aset3_backend)\n\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1[1] = randomsizedarray\n            d2[1] = np.ones((2, 2), dtype=np.int)\n            d3[1] = np.ones((3, 4), dtype=np.float32)\n\n        assert co.columns.contains_remote_references == {'aset1': False, 'aset2': False, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (), 'aset2': (), 'aset3': ()}\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_column_remote_references_property_with_remotes(\n            self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray\n    ):\n        co = repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int, backend=aset2_backend)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32, backend=aset3_backend)\n\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1[1] = randomsizedarray\n            d2[1] = np.ones((2, 2), dtype=np.int)\n            d3[1] = np.ones((3, 4), dtype=np.float32)\n\n        assert co.columns.contains_remote_references == {'aset1': False, 'aset2': False, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (), 'aset2': (), 'aset3': ()}\n        co.commit('hello')\n        co.close()\n        co = repo.checkout()\n\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        co._columns._columns['aset1']._samples[12] = template\n        co._columns._columns['aset2']._samples[22] = template\n\n        assert co.columns.contains_remote_references == {'aset1': True, 'aset2': True, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (12,), 'aset2': (22,), 'aset3': ()}\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_iterating_over(self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        all_tensors = []\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int, backend=aset2_backend)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32, backend=aset3_backend)\n\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1['1'] = randomsizedarray\n            d1['2'] = np.zeros_like(randomsizedarray)\n            d1['3'] = np.zeros_like(randomsizedarray) + 5\n\n            d2['1'] = np.ones((2, 2), dtype=np.int)\n            d2['2'] = np.ones((2, 2), dtype=np.int) * 5\n            d2['3'] = np.zeros((2, 2), dtype=np.int)\n\n            d3['1'] = np.ones((3, 4), dtype=np.float32)\n            d3['2'] = np.ones((3, 4), dtype=np.float32) * 7\n            d3['3'] = np.zeros((3, 4), dtype=np.float32)\n\n        all_tensors.extend([aset1['1'], aset1['2'], aset1['3']])\n        all_tensors.extend([aset2['1'], aset2['2'], aset2['3']])\n        all_tensors.extend([aset3['1'], aset3['2'], aset3['3']])\n\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout()\n        # iterating over .items()\n        tensors_in_the_order = iter(all_tensors)\n        for dname, aset in co.columns.items():\n            assert aset._column_name == dname\n            for sname, sample in aset.items():\n                assert_equal(sample, next(tensors_in_the_order))\n\n        # iterating over .keys()\n        tensors_in_the_order = iter(all_tensors)\n        for dname in co.columns.keys():\n            for sname in co.columns[dname].keys():\n                assert_equal(co.columns[dname][sname], next(tensors_in_the_order))\n\n        # iterating over .values()\n        tensors_in_the_order = iter(all_tensors)\n        for aset in co.columns.values():\n            for sample in aset.values():\n                assert_equal(sample, next(tensors_in_the_order))\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_iterating_over_local_only(self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        all_tensors = []\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int, backend=aset2_backend)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32, backend=aset3_backend)\n\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1['1'] = randomsizedarray\n            d1['2'] = np.zeros_like(randomsizedarray)\n            d1['3'] = np.zeros_like(randomsizedarray) + 5\n\n            d2['1'] = np.ones((2, 2), dtype=np.int)\n            d2['2'] = np.ones((2, 2), dtype=np.int) * 5\n            d2['3'] = np.zeros((2, 2), dtype=np.int)\n\n            d3['1'] = np.ones((3, 4), dtype=np.float32)\n            d3['2'] = np.ones((3, 4), dtype=np.float32) * 7\n            d3['3'] = np.zeros((3, 4), dtype=np.float32)\n\n        all_tensors.extend([aset1['1'], aset1['2'], aset1['3']])\n        all_tensors.extend([aset2['1'], aset2['2'], aset2['3']])\n        all_tensors.extend([aset3['1'], aset3['2'], aset3['3']])\n\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout()\n\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        co._columns._columns['aset1']._samples['4'] = template\n        co._columns._columns['aset2']._samples['4'] = template\n\n        # iterating over .items()\n        tensors_in_the_order = iter(all_tensors)\n        for dname in ['aset1', 'aset2', 'aset3']:\n            aset = co.columns[dname]\n            count = 0\n            for sname, sample in aset.items(local=True):\n                count += 1\n                assert_equal(sample, next(tensors_in_the_order))\n                assert '4' != sname\n            assert count == 3\n\n        # iterating over .keys()\n        tensors_in_the_order = iter(all_tensors)\n        for dname in ['aset1', 'aset2', 'aset3']:\n            aset = co.columns[dname]\n            count = 0\n            for sname in aset.keys(local=True):\n                count += 1\n                assert_equal(aset[sname], next(tensors_in_the_order))\n                assert '4' != sname\n            assert count == 3\n\n        # iterating over .values()\n        tensors_in_the_order = iter(all_tensors)\n        for dname in ['aset1', 'aset2', 'aset3']:\n            aset = co.columns[dname]\n            count = 0\n            for sample in aset.values(local=True):\n                count += 1\n                assert_equal(sample, next(tensors_in_the_order))\n            assert count == 3\n\n        assert list(co['aset1'].keys()) == ['1', '2', '3', '4']\n        with pytest.raises((FileNotFoundError, KeyError)):\n            list(co['aset1'].values())\n        with pytest.raises((FileNotFoundError, KeyError)):\n            list(co['aset1'].items())\n\n        assert list(co['aset2'].keys()) == ['1', '2', '3', '4']\n        with pytest.raises((FileNotFoundError, KeyError)):\n            list(co['aset2'].values())\n        with pytest.raises((FileNotFoundError, KeyError)):\n            list(co['aset2'].items())\n\n        assert list(co['aset3'].keys()) == ['1', '2', '3']\n        assert len(list(co['aset3'].values())) == 3\n        assert len(list(co['aset3'].items())) == 3\n        co.close()\n\n    def test_get_data(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n        assert np.allclose(co.columns['writtenaset']['1'], co.columns.get('writtenaset').get('1'), array5by7)\n        co.close()\n\n    def test_get_sample_with_default_works(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        res = co.columns['writtenaset'].get('doesnotexist', default=500)\n        assert res == 500\n        res = co.columns['writtenaset'].get('doesnotexist', 500)\n        assert res == 500\n        co.close()\n\n    def test_get_multiple_samples_fails(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.columns['writtenaset']['2'] = array5by7 + 1\n        co.columns['writtenaset']['3'] = array5by7 + 2\n        co.commit('this is a commit message')\n        co.close()\n\n        nco = aset_samples_initialized_repo.checkout()\n        with pytest.raises(TypeError):\n            res = nco.columns['writtenaset'].get(['1', '2'])\n        res = nco.columns['writtenaset'].get(('1', '2'))\n        assert res is None\n\n        aset = nco.columns['writtenaset']\n        with pytest.raises(TypeError):\n            res = aset.get(*('1', '2', '3'))\n        nco.close()\n\n    def test_getitem_multiple_samples_missing_key(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n\n        nco = aset_samples_initialized_repo.checkout()\n        with pytest.raises(KeyError):\n            nco.columns['writtenaset'][('1', '2')]\n        with pytest.raises(KeyError):\n            aset = nco.columns['writtenaset']\n            aset[('1', '2')]\n        nco.close()\n\n    def test_get_multiple_samples_missing_key(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n\n        nco = aset_samples_initialized_repo.checkout()\n        aset = nco.columns['writtenaset']\n        res = aset.get(('1', '2'))\n        assert res == None\n        nco.close()\n\n    def test_add_data_str_keys(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with pytest.raises(KeyError):\n            aset['somerandomkey']\n\n        aset['1'] = array5by7\n        aset['2'] = array5by7\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n        assert_equal(co.columns['writtenaset']['1'], array5by7)\n        assert_equal(co.columns['writtenaset']['2'], array5by7)\n        co.close()\n\n    def test_add_data_int_keys(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n\n        aset[1] = array5by7\n        secondArray = array5by7 + 1\n        aset[2] = secondArray\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n        assert_equal(co.columns['writtenaset'][1], array5by7)\n        assert_equal(co.columns['writtenaset'][2], secondArray)\n        co.close()\n\n    def test_cannot_add_data_negative_int_key(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with pytest.raises(ValueError):\n            aset[-1] = array5by7\n        assert len(co.columns['writtenaset']) == 0\n        co.close()\n\n    def test_cannot_add_data_float_key(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with pytest.raises(ValueError):\n            aset[2.1] = array5by7\n        with pytest.raises(ValueError):\n            aset[0.0] = array5by7\n        assert len(co.columns['writtenaset']) == 0\n        co.close()\n\n    def test_add_data_mixed_int_str_keys(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n\n        aset[1] = array5by7\n        newFirstArray = array5by7 + 1\n        aset['1'] = newFirstArray\n        secondArray = array5by7 + 2\n        aset[2] = secondArray\n        thirdArray = array5by7 + 3\n        aset['2'] = thirdArray\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n        assert_equal(co.columns['writtenaset'][1], array5by7)\n        assert_equal(co.columns['writtenaset']['1'], newFirstArray)\n        assert_equal(co.columns['writtenaset'][2], secondArray)\n        assert_equal(co.columns['writtenaset']['2'], thirdArray)\n        co.close()\n\n    def test_cannot_add_data_sample_name_longer_than_64_characters(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with pytest.raises(ValueError):\n            aset['VeryLongNameIsInvalidOver64CharactersNotAllowedVeryLongNameIsInva'] = array5by7\n        assert len(co.columns['writtenaset']) == 0\n        co.close()\n\n    def test_add_with_wrong_argument_order(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        with pytest.raises(ValueError):\n            aset[array5by7] = '1'\n\n    def test_update_with_dict_single_item(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = {'foo': array5by7}\n        aset.update(data_map)\n        assert_equal(aset['foo'], array5by7)\n\n    def test_update_with_dict_multiple_items(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = {\n            'foo': array5by7,\n            1: array5by7+1\n        }\n        aset.update(data_map)\n        assert_equal(aset['foo'], array5by7)\n        assert_equal(aset[1], array5by7+1)\n\n    def test_update_with_list_single_item(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = ['foo', array5by7]\n        with pytest.raises(ValueError, match='dictionary update sequence'):\n            aset.update(data_map)\n        assert 'foo' not in aset\n\n        aset.update((data_map,))  # try again while contained in iterable\n        assert_equal(aset['foo'], array5by7)\n\n    def test_update_with_list_multiple_items(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = [\n            ('foo', array5by7),\n            (1, array5by7+1),\n        ]\n        aset.update(data_map)\n        assert_equal(aset['foo'], array5by7)\n        assert_equal(aset[1], array5by7+1)\n\n    def test_update_with_only_kwargs_single_item(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        aset.update(foo=array5by7)\n        assert_equal(aset['foo'], array5by7)\n\n    def test_update_with_only_kwargs_multiple_items(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        aset.update(foo=array5by7, bar=array5by7+1)\n        assert_equal(aset['foo'], array5by7)\n        assert_equal(aset['bar'], array5by7+1)\n\n    def test_update_with_list_and_kwargs(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = [\n            ('foo', array5by7),\n            (1, array5by7+1),\n        ]\n        aset.update(data_map, bar=array5by7+2)\n        assert_equal(aset['foo'], array5by7)\n        assert_equal(aset[1], array5by7+1)\n        assert_equal(aset['bar'], array5by7 + 2)\n\n    def test_update_with_dict_and_kwargs(self, aset_samples_initialized_w_checkout, array5by7):\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = {\n            'foo': array5by7,\n            1: array5by7+1,\n        }\n        aset.update(data_map, bar=array5by7+2)\n        assert_equal(aset['foo'], array5by7)\n        assert_equal(aset[1], array5by7+1)\n        assert_equal(aset['bar'], array5by7 + 2)\n\n    def test_update_with_dict_and_kwargs_does_not_modify_input_in_calling_scopy(\n        self, aset_samples_initialized_w_checkout, array5by7\n    ):\n        \"\"\"ensure bug does not revert.\n\n        Had a case where if dict was passed as ``other`` along with kwargs, the operation\n        would complete as normally, but when control returned to the caller the original\n        dict passed in as ``other`` would have been silently merged with the kwargs.\n        \"\"\"\n        aset = aset_samples_initialized_w_checkout.columns['writtenaset']\n        data_map = {\n            'foo': array5by7,\n            1: array5by7+1,\n        }\n        data_map_before = list(data_map.keys())\n        aset.update(data_map, bar=array5by7+2)\n        # in bug case, would now observe that data_map would have been\n        # silently modified in a method analogous to calling:\n        #\n        #   ``data_map.update({'bar': np.array})``\n        #\n        assert list(data_map.keys()) == data_map_before\n\n    @pytest.mark.parametrize('data_map', [\n        ['foo', {'bar': np.random.random((5, 7))}],\n        ['foo', 'bar', np.random.random((5, 7))],\n        [('foo', 'bar', np.random.random((5, 7)))],\n        [{('foo', 'bar'): np.random.random((5, 7))}],\n        [('foo', 'bar', np.random.random((5, 7)))],\n        [('foo', np.random.random((5, 7)), 'bar')],\n        [(np.random.random((5, 7)), 'foo', 'bar')],\n        [('foo', np.random.random((5, 7)), 'bar'), ('valid', np.random.random((5, 7)))],\n        [('valid', np.random.random((5, 7))), ('foo', np.random.random((5, 7)), 'bar')],\n        {'foo': np.random.random((5, 7)), 'bar': (np.random.random((5, 7)), np.random.random((5, 7)))},\n    ])\n    def test_update_with_invalid_data_map_fails(self, aset_samples_initialized_w_checkout, data_map):\n        aset = aset_samples_initialized_w_checkout['writtenaset']\n        with pytest.raises(ValueError):\n            aset.update(data_map)\n\n    @pytest.mark.parametrize('key,value', [\n        ['foo', {'bar': np.random.random((5, 7))}],\n        ['foo', ('bar', np.random.random((5, 7)))],\n        [('foo', 'bar'), np.random.random((5, 7))],\n        ['foo', {('foo', 'bar'): np.random.random((5, 7))}],\n        ['foo', ('bar', np.random.random((5, 7)))],\n        ['foo', (np.random.random((5, 7)), 'bar')],\n        [np.random.random((5, 7)), ('foo', 'bar')],\n        [('foo', np.random.random((5, 7)), 'bar'), ('valid', np.random.random((5, 7)))],\n        [('valid', np.random.random((5, 7))), ('foo', np.random.random((5, 7)), 'bar')],\n        [('valid', np.random.random((5, 7))), ('valid2', np.random.random((5, 7)))],\n    ])\n    def test_setitem_with_invalid_data_map_fails(self, aset_samples_initialized_w_checkout, key, value):\n        aset = aset_samples_initialized_w_checkout['writtenaset']\n        with pytest.raises(ValueError):\n            aset[key] = value\n\n    def test_add_multiple_data_single_commit(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        new_array = np.zeros_like(array5by7)\n        co.columns['writtenaset']['2'] = new_array\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        aset = co.columns['writtenaset']\n        assert len(aset) == 2\n        assert list(aset.keys()) == ['1', '2']\n        assert_equal(aset['1'], array5by7)\n        co.close()\n\n    def test_add_same_data_same_key_does_not_duplicate_hash(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        aset['1'] = array5by7\n        old_spec = aset._samples['1']\n        aset['1'] = array5by7\n        new_spec = aset._samples['1']\n        assert old_spec == new_spec\n        assert len(aset) == 1\n        assert len(aset._samples) == 1\n        co.close()\n\n    def test_multiple_data_multiple_commit(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.commit('this is a commit message')\n        new_array = np.zeros_like(array5by7)\n        co.columns['writtenaset']['2'] = new_array\n        co.close()\n\n        new_new_array = new_array + 5\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['3'] = new_new_array\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        aset = co.columns['writtenaset']\n        assert_equal(aset['1'], array5by7)\n        assert_equal(aset['2'], new_array)\n        assert_equal(aset['3'], new_new_array)\n        co.close()\n\n    def test_added_but_not_commited(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.close()\n\n        with pytest.raises(PermissionError):\n            co.commit('this is a commit message')\n\n        co = aset_samples_initialized_repo.checkout()\n        aset = co.columns['writtenaset']\n        with pytest.raises(KeyError):\n            aset['1']\n        co.close()\n\n    def test_remove_data(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.columns['writtenaset']['2'] = array5by7 + 1\n        co.columns['writtenaset']['3'] = array5by7 + 2\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        del co.columns['writtenaset']['1']\n        del co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['1']\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        assert_equal(co.columns['writtenaset']['2'], array5by7 + 1)\n        co.close()\n\n    def test_remove_data_multiple_items(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.columns['writtenaset']['2'] = array5by7 + 1\n        co.columns['writtenaset']['3'] = array5by7 + 2\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        with pytest.raises(KeyError):\n            del co.columns['writtenaset'][('1', '3')]\n        assert '1' in co.columns['writtenaset']\n        assert '3' in co.columns['writtenaset']\n        del co.columns['writtenaset']['1']\n        del co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['1']\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        assert_equal(co.columns['writtenaset']['2'], array5by7 + 1)\n        co.close()\n\n    def test_pop_data(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.columns['writtenaset']['2'] = array5by7 + 1\n        co.columns['writtenaset']['3'] = array5by7 + 2\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        res = co.columns['writtenaset'].pop('1')\n        assert_equal(res, array5by7)\n\n        aset = co.columns['writtenaset']\n        res = aset.pop('3')\n        assert_equal(res, array5by7 + 2)\n\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['1']\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        assert_equal(co.columns['writtenaset']['2'], array5by7 + 1)\n        co.close()\n\n    def test_pop_data_multiple_items(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        co.columns['writtenaset']['2'] = array5by7 + 1\n        co.columns['writtenaset']['3'] = array5by7 + 2\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 3\n        with pytest.raises(TypeError):\n            co.columns['writtenaset'].pop('1', '3')\n        res = co.columns['writtenaset'].pop('1')\n        assert_equal(res, array5by7)\n        res = co.columns['writtenaset'].pop('3')\n        assert_equal(res, array5by7 + 2)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['1']\n        with pytest.raises(KeyError):\n            co.columns['writtenaset']['3']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        assert_equal(co.columns['writtenaset']['2'], array5by7 + 1)\n        co.close()\n\n    def test_remove_all_data(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        new_array = np.zeros_like(array5by7)\n        co.columns['writtenaset']['2'] = new_array\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 2\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 2\n        del co.columns['writtenaset']['1']\n        del co.columns['writtenaset']['2']\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 0\n\n        wset = co.columns['writtenaset']\n        del co.columns['writtenaset']\n\n        assert len(co.columns) == 0\n        with pytest.raises(KeyError):\n            len(co.columns['writtenaset'])\n        co.commit('this is a commit message')\n        co.close()\n\n        # recreating same and verifying\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 0\n        co.add_ndarray_column('writtenaset', prototype=array5by7)\n        co.columns['writtenaset']['1'] = array5by7\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = aset_samples_initialized_repo.checkout()\n        assert_equal(co.columns['writtenaset']['1'], array5by7)\n        assert len(co.columns) == 1\n        assert len(co.columns['writtenaset']) == 1\n        co.close()\n\n    def test_remove_data_nonexistant_sample_key_raises(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset']['1'] = array5by7\n        new_array = np.zeros_like(array5by7)\n        co.columns['writtenaset']['2'] = new_array\n        co.columns['writtenaset']['3'] = new_array + 5\n        with pytest.raises(KeyError):\n            del co.columns['writtenaset']['doesnotexist']\n        co.commit('this is a commit message')\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    def test_multiple_columns_single_commit(\n            self, aset1_backend, aset2_backend, aset_samples_initialized_repo, randomsizedarray\n    ):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', prototype=randomsizedarray, backend=aset2_backend)\n        aset1['arr'] = randomsizedarray\n        aset2['arr'] = randomsizedarray\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_samples_initialized_repo.checkout()\n        assert_equal(co.columns['aset1']['arr'], randomsizedarray)\n        assert_equal(co.columns['aset2']['arr'], randomsizedarray)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    def test_prototype_and_shape(self, aset1_backend, aset2_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray, backend=aset1_backend)\n        aset2 = co.add_ndarray_column('aset2', shape=randomsizedarray.shape, dtype=randomsizedarray.dtype, backend=aset2_backend)\n\n        newarray = np.random.random(randomsizedarray.shape).astype(randomsizedarray.dtype)\n        aset1['arr1'] = newarray\n        aset2['arr'] = newarray\n        co.commit('this is a commit message')\n        co.close()\n\n        co = repo.checkout()\n        assert_equal(co.columns['aset1']['arr1'], newarray)\n        assert_equal(co.columns['aset2']['arr'], newarray)\n        co.close()\n\n    def test_samples_without_name(self, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=randomsizedarray)\n        with pytest.raises(TypeError):\n            aset[randomsizedarray]\n\n        aset_no_name = co.add_ndarray_column('aset_no_name', prototype=randomsizedarray)\n        added = aset_no_name.append(randomsizedarray)\n        assert_equal(next(aset_no_name.values()), randomsizedarray)\n        assert_equal(aset_no_name[added], randomsizedarray)\n        co.close()\n\n    def test_append_samples(self, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=randomsizedarray)\n        with pytest.raises((ValueError, TypeError)):\n            aset[randomsizedarray]\n\n        aset_no_name = co.add_ndarray_column('aset_no_name', prototype=randomsizedarray)\n        generated_key = aset_no_name.append(randomsizedarray)\n        assert generated_key in aset_no_name\n        assert len(aset_no_name) == 1\n        assert_equal(aset_no_name[generated_key], randomsizedarray)\n        co.close()\n\n    def test_different_data_types_and_shapes(self, repo):\n        co = repo.checkout(write=True)\n        shape = (2, 3)\n        dtype = np.int\n        another_dtype = np.float64\n        another_shape = (3, 4)\n        arr = np.random.random(shape).astype(dtype)\n        aset = co.add_ndarray_column('aset', shape=shape, dtype=dtype)\n        aset['1'] = arr\n\n        newarr = np.random.random(shape).astype(another_dtype)\n        with pytest.raises(ValueError):\n            aset['2'] = newarr\n\n        newarr = np.random.random(another_shape).astype(dtype)\n        with pytest.raises(ValueError):\n            aset['3'] = newarr\n        co.close()\n\n    def test_add_sample_with_non_numpy_array_data_fails(self, aset_samples_initialized_repo):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        with pytest.raises(ValueError, match='`data` argument type'):\n            co.columns['writtenaset'][1] = [[1, 2, 3, 4, 5, 6, 7] for i in range(5)]\n        co.close()\n\n    def test_add_sample_with_fortran_order_data_fails(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        with pytest.raises(ValueError, match='`data` must be \"C\" contiguous array.'):\n            co.columns['writtenaset'][1] = np.asfortranarray(array5by7)\n        co.close()\n\n    def test_add_sample_with_dimension_rank_fails(self, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', shape=(2, 3), dtype=np.float32, variable_shape=True)\n        arr = np.random.randn(2, 3, 2).astype(np.float32)\n        with pytest.raises(ValueError, match='data rank 3 != aset rank 2'):\n            aset[1] = arr\n        co.close()\n\n    def test_add_sample_with_dimension_exceeding_max_fails(self, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', shape=(2, 3), dtype=np.float32, variable_shape=True)\n        arr = np.random.randn(2, 4).astype(np.float32)\n        with pytest.raises(ValueError, match='exceeds schema max'):\n            aset[1] = arr\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_writer_context_manager_column_add_sample(self, aset_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=randomsizedarray, backend=aset_backend)\n        with co.columns['aset'] as aset:\n            aset['1'] = randomsizedarray\n        co.commit('this is a commit message')\n        co.close()\n        co = repo.checkout()\n        assert_equal(co.columns['aset']['1'], randomsizedarray)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_column_context_manager_aset_sample_add(self, aset_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('aset', prototype=randomsizedarray, backend=aset_backend)\n        with co.columns['aset'] as aset:\n            aset['1'] = randomsizedarray\n            aset['2'] = randomsizedarray + 1\n        co.commit('this is a commit message')\n        co.close()\n\n        co = repo.checkout()\n        assert_equal(co.columns['aset']['1'], randomsizedarray)\n        assert np.allclose(co.columns['aset'].get('2'), randomsizedarray + 1)\n        co.close()\n\n    def test_writer_column_properties_are_correct(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        assert co.columns.iswriteable is True\n        d = co.columns['writtenaset']\n        assert d.column =='writtenaset'\n        assert d.dtype == array5by7.dtype\n        assert np.allclose(d.shape, array5by7.shape) is True\n        assert d.schema_type == 'fixed_shape'\n        assert d.iswriteable is True\n        assert d.backend == '01'\n        assert isinstance(d.backend_options, dict)\n        assert len(d.backend_options) > 0\n        assert d.contains_subsamples is False\n        assert d.remote_reference_keys == ()\n        assert d.contains_remote_references is False\n        co.close()\n\n    def test_reader_column_properties_are_correct(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=False)\n        assert co.columns.iswriteable is False\n        d = co.columns['writtenaset']\n        assert d.column =='writtenaset'\n        assert d.dtype == array5by7.dtype\n        assert np.allclose(d.shape, array5by7.shape) is True\n        assert d.schema_type == 'fixed_shape'\n        assert d.iswriteable is False\n        assert d.backend == '01'\n        assert isinstance(d.backend_options, dict)\n        assert len(d.backend_options) > 0\n        assert d.contains_subsamples is False\n        assert d.remote_reference_keys == ()\n        assert d.contains_remote_references is False\n\n    def test_iter_column_samples_yields_keys(self, aset_samples_initialized_repo, array5by7):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        co.columns['writtenaset'][0] = array5by7\n        new_array = np.zeros_like(array5by7)\n        co.columns['writtenaset'][1] = new_array\n        co.columns['writtenaset'][2] = new_array + 5\n\n        for idx, sname in enumerate(iter(co.columns['writtenaset'])):\n            assert sname == idx\n        assert idx == 2\n        co.close()\n\n    def test_iter_columns_yields_aset_names(self, repo_20_filled_samples):\n        co = repo_20_filled_samples.checkout(write=True)\n        for k in iter(co.columns):\n            assert k in ['second_aset', 'writtenaset']\n        co.close()\n\n    def test_set_item_column_fails(self, aset_samples_initialized_repo):\n        co = aset_samples_initialized_repo.checkout(write=True)\n        with pytest.raises(AttributeError):\n            co.columns['newaset'] = co.columns['writtenaset']\n        co.close()\n\n\nclass TestVariableSizedColumn(object):\n\n    @pytest.mark.parametrize(\n        'test_shapes,max_shape',\n        [[[(2, 5), (1, 10), (10, 1), (5, 2)], (10, 10)],\n         [[(10,), (10,)], (10,)],\n         [[(3, 3, 3), (27, 1, 1), (1, 27, 1), (1, 1, 27), (3, 9, 1), (9, 3, 1), (1, 3, 9), (1, 9, 3)], (27, 27, 27)]])\n    @pytest.mark.parametrize(\"dtype1\", [np.uint8, np.float32, np.int32])\n    @pytest.mark.parametrize(\"dtype2\", [np.uint8, np.float32, np.int32])\n    @pytest.mark.parametrize('backend1', variable_shape_backend_params)\n    @pytest.mark.parametrize('backend2', variable_shape_backend_params)\n    def test_write_all_zeros_same_size_different_shape_does_not_store_as_identical_hashs(\n        self, repo, test_shapes, max_shape, dtype1, dtype2, backend1, backend2\n    ):\n        wco = repo.checkout(write=True)\n        aset1 = wco.add_ndarray_column('aset1', shape=max_shape, dtype=dtype1, variable_shape=True, backend=backend1)\n        aset2 = wco.add_ndarray_column('aset2', shape=max_shape, dtype=dtype2, variable_shape=True, backend=backend2)\n\n        arrdict1, arrdict2 = {}, {}\n        for idx, shape in enumerate(test_shapes):\n            arr1 = np.zeros(shape, dtype=dtype1)\n            arr2 = np.zeros(shape, dtype=dtype2)\n            arrdict1[idx] = arr1\n            arrdict2[idx] = arr2\n            aset1[idx] = arr1\n            aset2[idx] = arr2\n\n        for k, v in arrdict1.items():\n            # make sure they are good before committed\n            res = aset1[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n        for k, v in arrdict2.items():\n            # make sure they are good before committed\n            res = aset2[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n\n        wco.commit('first')\n\n        for k, v in arrdict1.items():\n            # make sure they are good before committed\n            res = aset1[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n        for k, v in arrdict2.items():\n            # make sure they are good before committed\n            res = aset2[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n\n        wco.close()\n        rco = repo.checkout()\n        naset1 = rco.columns['aset1']\n        naset2 = rco.columns['aset2']\n\n        for k, v in arrdict1.items():\n            # make sure they are good before committed\n            res = naset1[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n        for k, v in arrdict2.items():\n            # make sure they are good before committed\n            res = naset2[k]\n            assert res.dtype == v.dtype\n            assert res.shape == v.shape\n            assert_equal(res, v)\n        rco.close()\n\n    @pytest.mark.parametrize(\n        'test_shapes,shape',\n        [[[(10, 10), (1, 10), (2, 2), (3, 5), (1, 1), (10, 1)], (10, 10)],\n         [[(10,), (1,), (5,)], (10,)],\n         [[(100, 100, 100), (100, 100, 1), (100, 1, 100), (1, 100, 100), (1, 1, 1), (34, 6, 3)], (100, 100, 100)]])\n    @pytest.mark.parametrize(\"dtype\", [np.uint8, np.float32])\n    @pytest.mark.parametrize('backend', variable_shape_backend_params)\n    def test_writer_can_create_variable_size_column(\n        self, aset_samples_initialized_repo, dtype, test_shapes, shape, backend\n    ):\n        repo = aset_samples_initialized_repo\n        wco = repo.checkout(write=True)\n        wco.add_ndarray_column('varaset', shape=shape, dtype=dtype, variable_shape=True, backend=backend)\n        d = wco.columns['varaset']\n\n        arrdict = {}\n        for idx, shape in enumerate(test_shapes):\n            arr = (np.random.random_sample(shape) * 10).astype(dtype)\n            arrdict[str(idx)] = arr\n            d[str(idx)] = arr\n\n        for k, v in arrdict.items():\n            # make sure they are good before committed\n            assert_equal(d[k], v)\n\n        wco.commit('first')\n\n        for k, v in arrdict.items():\n            # make sure they can work after commit\n            assert_equal(d[k], v)\n        wco.close()\n\n    @pytest.mark.parametrize('test_shapes,shape', [\n        [[(10, 10), (1, 10), (2, 2), (3, 5), (1, 1), (10, 1)], (10, 10)],\n        [[(10,), (1,), (5,)], (10,)],\n        [[(100, 100, 100), (100, 100, 1), (100, 1, 100), (1, 100, 100), (1, 1, 1), (34, 6, 3)], (100, 100, 100)]\n    ])\n    @pytest.mark.parametrize(\"dtype\", [np.uint8, np.float32])\n    @pytest.mark.parametrize('backend', variable_shape_backend_params)\n    def test_reader_recieves_expected_values_for_variable_size_column(\n        self, aset_samples_initialized_repo, dtype, test_shapes, shape, backend\n    ):\n        repo = aset_samples_initialized_repo\n        wco = repo.checkout(write=True)\n        wco.add_ndarray_column('varaset', shape=shape, dtype=dtype, variable_shape=True, backend=backend)\n        wd = wco.columns['varaset']\n\n        arrdict = {}\n        for idx, shape in enumerate(test_shapes):\n            arr = (np.random.random_sample(shape) * 10).astype(dtype)\n            arrdict[str(idx)] = arr\n            wd[str(idx)] = arr\n\n        for k, v in arrdict.items():\n            # make sure they are good before committed\n            assert_equal(wd[k], v)\n\n        wco.commit('first')\n        rco = repo.checkout()\n        rd = rco.columns['varaset']\n\n        for k, v in arrdict.items():\n            # make sure they can work after commit\n            assert_equal(wd[k], v)\n            assert_equal(rd[k], v)\n        wco.close()\n        rco.close()\n\n    @pytest.mark.parametrize('aset_specs', [\n        [['aset1', [(10, 10), (1, 10), (2, 2), (3, 5), (1, 1), (10, 1)], (10, 10)],\n         ['aset2', [(10,), (1,), (5,)], (10,)]],\n        [['aset1', [(100, 100), (1, 100), (20, 20), (30, 50), (1, 10), (10, 1)], (100, 100)],\n         ['aset2', [(100,), (1,), (50,)], (100,)]]])\n    @pytest.mark.parametrize('backends', permutations(variable_shape_backend_params, 2))\n    @pytest.mark.parametrize('dtype', [np.float32, np.uint8])\n    def test_writer_reader_can_create_read_multiple_variable_size_column(\n        self, aset_samples_initialized_repo, aset_specs, backends, dtype\n    ):\n        repo = aset_samples_initialized_repo\n        wco = repo.checkout(write=True)\n        arrdict = {}\n        for backend, aset_spec in zip(backends, aset_specs):\n            aset_name, test_shapes, max_shape = aset_spec\n            wco.add_ndarray_column(\n                aset_name, shape=max_shape, dtype=dtype, variable_shape=True, backend=backend)\n\n            arrdict[aset_name] = {}\n            for idx, shape in enumerate(test_shapes):\n                arr = (np.random.random_sample(shape) * 10).astype(dtype)\n                arrdict[aset_name][str(idx)] = arr\n                wco.columns[aset_name][str(idx)] = arr\n\n        for aset_k in arrdict.keys():\n            for samp_k, v in arrdict[aset_k].items():\n                # make sure they are good before committed\n                assert_equal(wco.columns[aset_k][samp_k], v)\n\n        wco.commit('first')\n        rco = repo.checkout()\n\n        for aset_k in arrdict.keys():\n            for samp_k, v in arrdict[aset_k].items():\n                # make sure they are good before committed\n                assert_equal(wco.columns[aset_k][samp_k], v)\n                assert_equal(rco.columns[aset_k][samp_k], v)\n        wco.close()\n        rco.close()\n\n    def test_writer_column_properties_are_correct(self, aset_samples_var_shape_initialized_repo):\n        co = aset_samples_var_shape_initialized_repo.checkout(write=True)\n        d = co.columns['writtenaset']\n        assert d.column =='writtenaset'\n        assert d.dtype == np.float64\n        assert np.allclose(d.shape, (10, 10))\n        assert d.schema_type == 'variable_shape'\n        assert d.iswriteable is True\n        assert d.backend in variable_shape_backend_params\n        assert isinstance(d.backend_options, dict)\n        assert d.contains_subsamples is False\n        assert d.remote_reference_keys == ()\n        assert d.contains_remote_references is False\n        co.close()\n\n    def test_reader_column_properties_are_correct(self, aset_samples_var_shape_initialized_repo):\n        co = aset_samples_var_shape_initialized_repo.checkout(write=False)\n        d = co.columns['writtenaset']\n        assert d.column =='writtenaset'\n        assert d.dtype == np.float64\n        assert np.allclose(d.shape, (10, 10))\n        assert d.schema_type == 'variable_shape'\n        assert d.iswriteable is False\n        assert d.backend in variable_shape_backend_params\n        assert isinstance(d.backend_options, dict)\n        assert d.contains_subsamples is False\n        assert d.remote_reference_keys == ()\n        assert d.contains_remote_references is False\n        co.close()\n\n\nclass TestMultiprocessColumnReads(object):\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_external_multi_process_pool(self, repo, backend):\n        from multiprocessing import get_context\n\n        masterCmtList = []\n        co = repo.checkout(write=True)\n        co.add_ndarray_column(name='writtenaset', shape=(20, 20), dtype=np.float32, backend=backend)\n        masterSampList = []\n        for cIdx in range(2):\n            if cIdx != 0:\n                co = repo.checkout(write=True)\n            with co.columns['writtenaset'] as d:\n                kstart = 20 * cIdx\n                for sIdx in range(20):\n                    arr = np.random.randn(20, 20).astype(np.float32) * 100\n                    sName = str(sIdx + kstart)\n                    d[sName] = arr\n                    masterSampList.append(arr)\n            assert d.backend == backend\n            cmt = co.commit(f'master commit number: {cIdx}')\n            masterCmtList.append((cmt, list(masterSampList)))\n            co.close()\n\n        cmtIdx = 0\n        for cmt, sampList in masterCmtList:\n            nco = repo.checkout(write=False, commit=cmt)\n            ds = nco.columns['writtenaset']\n            keys = [str(i) for i in range(20 + (20*cmtIdx))]\n            with get_context().Pool(2) as P:\n                cmtData = P.map(ds.get, keys)\n            for data, sampData in zip(cmtData, sampList):\n                assert_equal(data, sampData) is True\n            cmtIdx += 1\n            nco.close()\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_external_multi_process_pool_fails_on_write_enabled_checkout(self, repo, backend):\n        from multiprocessing import get_context\n\n        co = repo.checkout(write=True)\n        co.add_ndarray_column(name='writtenaset', shape=(20, 20), dtype=np.float32, backend=backend)\n        with co.columns['writtenaset'] as d:\n            for sIdx in range(20):\n                d[sIdx] = np.random.randn(20, 20).astype(np.float32) * 100\n        assert d.backend == backend\n        co.commit(f'master commit number 1')\n        co.close()\n\n        nco = repo.checkout(write=True)\n        ds = nco.columns['writtenaset']\n        keys = [i for i in range(20)]\n        with pytest.raises(PermissionError):\n            with get_context().Pool(2) as P:\n                cmtData = P.map(ds.get, keys)\n        nco.close()\n\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_multiprocess_get_succeeds_on_superset_and_subset_of_keys(self, repo, backend):\n        from multiprocessing import get_context\n\n        co = repo.checkout(write=True)\n        co.add_ndarray_column(name='writtenaset', shape=(20, 20), dtype=np.float32, backend=backend)\n        masterSampList = []\n        with co.columns['writtenaset'] as d:\n            for sIdx in range(20):\n                arr = np.random.randn(20, 20).astype(np.float32) * 100\n                d[sIdx] = arr\n                masterSampList.append(arr)\n            assert d.backend == backend\n        cmt = co.commit(f'master commit number one')\n        co.close()\n\n        nco = repo.checkout(write=False, commit=cmt)\n        ds = nco.columns['writtenaset']\n\n        # superset of keys fails\n        keys = [i for i in range(24)]\n        with get_context().Pool(2) as P:\n            cmtData = P.map(ds.get, keys)\n        for idx, data in enumerate(cmtData):\n            if idx >= 20:\n                assert data is None\n            else:\n                assert_equal(data, masterSampList[idx])\n\n        # subset of keys works\n        keys = [i for i in range(10, 20)]\n        with get_context().Pool(2) as P:\n            cmtData = P.map(ds.get, keys)\n        for idx, data in enumerate(cmtData):\n            assert_equal(data, masterSampList[10+idx])\n        nco.close()\n\n    def test_writer_iterating_over_keys_can_have_additions_made_no_error(self, two_commit_filled_samples_repo):\n        # do not want ``RuntimeError dictionary changed size during iteration``\n\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        with aset as ds:\n            for idx, k in enumerate(ds.keys()):\n                if idx == 0:\n                    ds['1232'] = np.random.randn(5, 7).astype(np.float32)\n                assert '1232' != k\n\n        added_key_exists_on_later_iteration = False\n        for k in aset.keys():\n            if k == '1232':\n                added_key_exists_on_later_iteration = True\n                break\n        assert added_key_exists_on_later_iteration is True\n        co.close()\n\n    def test_writer_iterating_over_values_can_have_additions_made_no_error(self, two_commit_filled_samples_repo):\n        # do not want ``RuntimeError dictionary changed size during iteration``\n\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        mysample = np.random.randn(5, 7).astype(np.float32)\n        with aset as ds:\n            for idx, v in enumerate(ds.values()):\n                if idx == 0:\n                    ds['1232'] = mysample\n                assert not np.allclose(v, mysample)\n\n        added_value_exists_on_later_iteration = False\n        for v in aset.values():\n            if np.allclose(v, mysample):\n                added_value_exists_on_later_iteration = True\n                break\n        assert added_value_exists_on_later_iteration is True\n        co.close()\n\n    def test_writer_iterating_over_items_can_have_additions_made_no_error(self, two_commit_filled_samples_repo):\n        # do not want ``RuntimeError dictionary changed size during iteration``\n\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=True)\n        aset = co.columns['writtenaset']\n        mysample = np.random.randn(5, 7).astype(np.float32)\n        with aset as ds:\n            for idx, kv in enumerate(ds.items()):\n                if idx == 0:\n                    ds['1232'] = mysample\n                k, v = kv\n                assert not np.allclose(v, mysample)\n                assert k != '1232'\n\n        added_value_exists_on_later_iteration = False\n        for k, v in aset.items():\n            if (k == '1232') and np.allclose(v, mysample):\n                added_value_exists_on_later_iteration = True\n                break\n        assert added_value_exists_on_later_iteration is True\n        co.close()\n\n    def test_reader_iterating_over_items_can_not_make_additions(self, two_commit_filled_samples_repo):\n        # do not want ``RuntimeError dictionary changed size during iteration``\n\n        repo = two_commit_filled_samples_repo\n        co = repo.checkout(write=False)\n        aset = co.columns['writtenaset']\n        mysample = np.random.randn(5, 7).astype(np.float32)\n        with aset as ds:\n            for idx, kv in enumerate(ds.items()):\n                if idx == 0:\n                    with pytest.raises(TypeError):\n                        ds['1232'] = mysample\n                k, v = kv\n                assert not np.allclose(v, mysample)\n                assert k != '1232'\n\n        assert '1232' not in aset\n        co.close()\n"
  },
  {
    "path": "tests/test_column_backends.py",
    "content": "import pytest\nimport numpy as np\nfrom conftest import fixed_shape_backend_params\n\n\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\ndef test_backend_property_reports_correct_backend(repo, array5by7, backend):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column('aset', prototype=array5by7, backend=backend)\n    assert aset.backend == backend\n    aset[0] = array5by7\n    wco.commit('first')\n    wco.close()\n\n    rco = repo.checkout()\n    naset = rco.columns['aset']\n    assert naset.backend == backend\n    rco.close()\n\n\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\ndef test_setting_backend_property_cannot_change_backend(repo, array5by7, backend):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column('aset', prototype=array5by7, backend=backend)\n    assert aset.backend == backend\n    aset[0] = array5by7\n    with pytest.raises(AttributeError):\n        aset.backend = 'foo'\n    wco.commit('first')\n    wco.close()\n\n    rco = repo.checkout()\n    naset = rco.columns['aset']\n    assert naset.backend == backend\n    with pytest.raises(AttributeError):\n        naset.backend = 'foo'\n    rco.close()\n\n\n@pytest.mark.parametrize('subsamples', [True, False])\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\ndef test_setting_backend_opts_property_cannot_change_backend_opts(repo, array5by7, backend, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=array5by7, backend=backend, contains_subsamples=subsamples)\n    if subsamples:\n        aset.update({0: {0: array5by7}})\n    else:\n        aset[0] = array5by7\n    with pytest.raises(AttributeError):\n        aset.backend_options = {'foo': 'bar'}\n    wco.commit('first')\n    wco.close()\n\n    rco = repo.checkout()\n    naset = rco.columns['aset']\n    assert naset.backend == backend\n    with pytest.raises(AttributeError):\n        naset.backend = {'foo': 'bar'}\n    rco.close()\n\n\n@pytest.mark.parametrize('shape,dtype,variable_shape,expected_backend', [\n    [(10,), np.uint16, True, '10'],\n    [(1000,), np.uint16, True, '00'],\n    [(1000,), np.uint16, False, '00'],\n    [(9_999_999,), np.uint8, False, '00'],\n    [(10_000_000,), np.uint8, False, '00'],\n    [(10_000_001,), np.uint8, False, '01'],\n    [(10_000_001,), np.uint8, True, '00'],\n    [(2, 2), np.uint16, True, '00'],\n    [(2, 2), np.uint16, False, '01'],\n    [(5, 2), np.uint16, True, '00'],\n    [(5, 2), np.uint16, False, '01'],\n])\n@pytest.mark.parametrize('subsamples', [True, False])\ndef test_heuristics_select_backend(repo, shape, dtype, variable_shape, expected_backend, subsamples):\n    wco = repo.checkout(write=True)\n    prototype = np.ones(shape, dtype=dtype)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=prototype, variable_shape=variable_shape, contains_subsamples=subsamples)\n    assert aset.backend == expected_backend\n    if subsamples:\n        aset.update({'0': {'0': prototype}})\n    else:\n        aset['0'] = prototype\n    wco.commit('first commit')\n    assert aset.backend == expected_backend\n    if subsamples:\n        assert np.allclose(prototype, aset['0']['0'])\n    else:\n        assert np.allclose(prototype, aset['0'])\n    wco.close()\n\n    nwco = repo.checkout(write=True)\n    naset = nwco.columns['aset']\n    assert naset.backend == expected_backend\n    if subsamples:\n        assert np.allclose(prototype, naset['0']['0'])\n    else:\n        assert np.allclose(prototype, naset['0'])\n    nwco.close()\n\n\n@pytest.mark.parametrize('prototype', [np.random.randn(10), np.random.randn(1000), np.random.randn(2, 2)])\n@pytest.mark.parametrize('backend', fixed_shape_backend_params)\n@pytest.mark.parametrize('subsamples', [True, False])\ndef test_manual_override_heuristics_select_backend(repo, prototype, backend, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=prototype, backend=backend, contains_subsamples=subsamples)\n    assert aset.backend == backend\n    if subsamples:\n        aset.update({'0': {'0': prototype}})\n    else:\n        aset['0'] = prototype\n    wco.commit('first commit')\n    assert aset.backend == backend\n    if subsamples:\n        assert np.allclose(prototype, aset['0']['0'])\n    else:\n        assert np.allclose(prototype, aset['0'])\n    wco.close()\n\n    nwco = repo.checkout(write=True)\n    naset = nwco.columns['aset']\n    assert naset.backend == backend\n    if subsamples:\n        assert np.allclose(prototype, naset['0']['0'])\n    else:\n        assert np.allclose(prototype, naset['0'])\n    nwco.close()\n\n\ndef test_manual_override_heuristics_invalid_value_raises_error(repo):\n\n    wco = repo.checkout(write=True)\n    with pytest.raises(ValueError):\n        wco.add_ndarray_column('aset', prototype=np.arange(10), backend='ERROR')\n    wco.close()\n\n\n@pytest.mark.parametrize('backendStart', fixed_shape_backend_params)\n@pytest.mark.parametrize('backendEnd', fixed_shape_backend_params)\n@pytest.mark.parametrize('subsamples', [True, False])\ndef test_manual_change_backends_after_write_works(repo, array5by7, backendStart, backendEnd, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=array5by7, backend=backendStart, contains_subsamples=subsamples)\n    assert aset.backend == backendStart\n    if subsamples:\n        aset.update({0: {0: array5by7}})\n    else:\n        aset[0] = array5by7\n    wco.commit('first commit')\n    assert aset.backend == backendStart\n    if subsamples:\n        assert np.allclose(array5by7, aset[0][0])\n    else:\n        assert np.allclose(array5by7, aset[0])\n    wco.close()\n\n    nwco = repo.checkout(write=True)\n    naset = nwco.columns['aset']\n    assert naset.backend == backendStart\n\n    naset.change_backend(backend=backendEnd)\n    if subsamples:\n        naset.update({1: {1: array5by7+1}})\n    else:\n        naset[1] = array5by7 + 1\n\n    assert naset.backend == backendEnd\n    if subsamples:\n        assert np.allclose(array5by7, naset[0][0])\n        assert np.allclose(array5by7+1, naset[1][1])\n    else:\n        assert np.allclose(array5by7, naset[0])\n        assert np.allclose(array5by7+1, naset[1])\n    nwco.commit('second')\n    nwco.close()\n\n    rco = repo.checkout()\n    assert rco.columns['aset'].backend == backendEnd\n    rco.close()\n\n\n@pytest.mark.parametrize('backendStart', fixed_shape_backend_params)\n@pytest.mark.parametrize('backendFail', ['lmao', '000'])\n@pytest.mark.parametrize('subsamples', [True, False])\ndef test_manual_change_backend_to_invalid_fmt_code_fails(repo, array5by7, backendStart, backendFail, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=array5by7, backend=backendStart, contains_subsamples=subsamples)\n    assert aset.backend == backendStart\n    if subsamples:\n        aset[0] = {0: array5by7}\n    else:\n        aset[0] = array5by7\n    wco.commit('first commit')\n    assert aset.backend == backendStart\n    if subsamples:\n        assert np.allclose(array5by7, aset[0][0])\n    else:\n        assert np.allclose(array5by7, aset[0])\n    wco.close()\n\n    nwco = repo.checkout(write=True)\n    naset = nwco.columns['aset']\n    assert naset.backend == backendStart\n\n    with pytest.raises(ValueError):\n        naset.change_backend(backend=backendFail)\n    assert naset.backend == backendStart\n    if subsamples:\n        naset[1] = {1: array5by7+1}\n    else:\n        naset[1] = array5by7 + 1\n\n    if subsamples:\n        assert np.allclose(array5by7, naset[0][0])\n        assert np.allclose(array5by7 + 1, naset[1][1])\n    else:\n        assert np.allclose(array5by7, naset[0])\n        assert np.allclose(array5by7 + 1, naset[1])\n    nwco.commit('second')\n    nwco.close()\n\n\n@pytest.mark.parametrize('backendStart', fixed_shape_backend_params)\n@pytest.mark.parametrize('backendEnd', fixed_shape_backend_params)\n@pytest.mark.parametrize('subsamples', [True, False])\ndef test_manual_change_backend_fails_while_in_cm(repo, array5by7, backendStart, backendEnd, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=array5by7, backend=backendStart, contains_subsamples=subsamples)\n    assert aset.backend == backendStart\n    if subsamples:\n        aset[0] = {0: array5by7}\n    else:\n        aset[0] = array5by7\n    wco.commit('first commit')\n    assert aset.backend == backendStart\n    if subsamples:\n        assert np.allclose(array5by7, aset[0][0])\n    else:\n        assert np.allclose(array5by7, aset[0])\n    wco.close()\n\n    nwco = repo.checkout(write=True)\n    naset = nwco.columns['aset']\n    assert naset.backend == backendStart\n\n    with nwco as c:\n        with pytest.raises(RuntimeError):\n            c['aset'].change_backend(backend=backendEnd)\n        with pytest.raises(RuntimeError):\n            naset.change_backend(backend=backendEnd)\n        with pytest.raises(RuntimeError):\n            c.columns['aset'].change_backend(backend=backendEnd)\n        with pytest.raises(RuntimeError):\n            nwco.columns['aset'].change_backend(backend=backendEnd)\n\n    with naset as na:\n        with pytest.raises(RuntimeError):\n            na.change_backend(backend=backendEnd)\n        with pytest.raises(RuntimeError):\n            naset.change_backend(backend=backendEnd)\n        with pytest.raises(RuntimeError):\n            nwco.columns['aset'].change_backend(backend=backendEnd)\n\n    assert naset.backend == backendStart\n    if subsamples:\n        naset[1] = {1: array5by7+1}\n    else:\n        naset[1] = array5by7 + 1\n\n    if subsamples:\n        assert np.allclose(array5by7, naset[0][0])\n        assert np.allclose(array5by7 + 1, naset[1][1])\n    else:\n        assert np.allclose(array5by7, naset[0])\n        assert np.allclose(array5by7 + 1, naset[1])\n    nwco.commit('second')\n    nwco.close()\n\n\n\n@pytest.fixture(scope='class')\ndef dummy_writer_checkout(classrepo):\n    wco = classrepo.checkout(write=True)\n    yield wco\n    wco.close()\n\n\nclass TestComplibRestrictions:\n\n    @pytest.mark.parametrize('backend', ['01', '00'])\n    @pytest.mark.parametrize('subsamples', [True, False])\n    @pytest.mark.parametrize('complib', [\n        'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd'\n    ])\n    @pytest.mark.parametrize('dtype,shape', [\n        [np.float32, (1, 1, 1)],\n        [np.float32, (3,)],\n        [np.float64, (1,)],\n        [np.uint8, (15,)],\n        [np.uint8, (3, 2, 2)],\n    ])\n    def test_schema_smaller_16_bytes_cannot_select_blosc_backend(\n        self, dummy_writer_checkout, backend, complib, dtype, shape, subsamples\n    ):\n        wco = dummy_writer_checkout\n        be_opts = {'complib': complib, 'complevel': 3, 'shuffle': 'byte'}\n\n        # prototype spec\n        with pytest.raises(ValueError, match='blosc clib requires'):\n            proto = np.zeros(shape, dtype=dtype)\n            wco.add_ndarray_column(\n                'aset', prototype=proto, backend=backend,\n                backend_options=be_opts, contains_subsamples=subsamples)\n\n        # shape and dtype spec\n        with pytest.raises(ValueError, match='blosc clib requires'):\n            wco.add_ndarray_column(\n                'aset', shape=shape, dtype=dtype, backend=backend,\n                backend_options=be_opts, contains_subsamples=subsamples)\n\n\n@pytest.mark.parametrize('backend', ['01', '00'])\n@pytest.mark.parametrize('subsamples', [True, False])\n@pytest.mark.parametrize('dtype,shape', [\n    [np.float32, (1, 1, 1)],\n    [np.float32, (3,)],\n    [np.float64, (1,)],\n    [np.uint8, (15,)],\n    [np.uint8, (3, 2, 2)],\n])\ndef test_schema_smaller_16_bytes_does_not_use_heuristic_to_select_blosc(\n    repo, backend, dtype, shape, subsamples\n):\n    wco = repo.checkout(write=True)\n    proto = np.zeros(shape, dtype=dtype)\n    aset = wco.add_ndarray_column(\n        'aset', prototype=proto, backend=backend, contains_subsamples=subsamples)\n    bad_clibs = ['blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd']\n    assert aset.backend_options['complib'] not in bad_clibs\n    if subsamples:\n        aset[0] = {0: proto}\n    else:\n        aset[0] = proto\n    assert aset.backend_options['complib'] not in bad_clibs\n    wco.close()\n\n\n@pytest.mark.parametrize('backend', ['01', '00'])\n@pytest.mark.parametrize('subsamples', [True, False])\n@pytest.mark.parametrize('complib', [\n    'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:zlib', 'blosc:zstd'\n])\n@pytest.mark.parametrize('dtype,shape', [\n    [np.float32, (1, 1, 1)],\n    [np.float32, (3,)],\n    [np.float64, (1,)],\n    [np.uint8, (15,)],\n    [np.uint8, (3, 2, 2)],\n])\ndef test_schema_smaller_16_bytes_cannot_change_to_blosc_backend(\n    repo, backend, complib, shape, dtype, subsamples):\n\n    wco = repo.checkout(write=True)\n    aset = wco.add_ndarray_column(\n        'aset', shape=shape, dtype=dtype, backend=backend, contains_subsamples=subsamples)\n    proto = np.zeros(shape, dtype=dtype)\n    if subsamples:\n        aset[0] = {0: proto}\n    else:\n        aset[0] = proto\n\n    be_opts = {'complib': complib, 'complevel': 3, 'shuffle': None}\n    with pytest.raises(ValueError, match='blosc clib requires'):\n        aset.change_backend(backend=backend, backend_options=be_opts)\n    wco.close()\n"
  },
  {
    "path": "tests/test_column_definition_permutations.py",
    "content": "from collections import defaultdict\nfrom functools import partial\nimport secrets\nimport string\n\nimport pytest\nimport numpy as np\n\n\ndef assert_equal(expected, actual):\n    if isinstance(expected, (str, bytes)):\n        assert expected == actual\n    elif isinstance(expected, np.ndarray):\n        assert np.allclose(expected, actual)\n        assert expected.dtype == actual.dtype\n    else:\n        raise TypeError(f'unknown type of data {type(expected)}')\n\n\ndef ndarray_generate_data_fixed_shape(shape, dtype, low=0, high=255):\n    arr = np.random.randint(low, high, size=shape, dtype=dtype)\n    return arr\n\n\ndef ndarray_generate_data_variable_shape(shape, dtype, low=0, high=255):\n    arr_dims = []\n    for dim in shape:\n        valid_dim_shapes = [i for i in range(1, dim + 1)]\n        dimsize = secrets.choice(valid_dim_shapes)\n        arr_dims.append(dimsize)\n    arr_dims = tuple(arr_dims)\n    arr = np.random.randint(low, high, size=arr_dims, dtype=dtype)\n    return arr\n\n\ndef str_generate_data_variable_shape(\n        length=20, *, _ALPHABET=''.join([string.ascii_letters, string.digits, string.punctuation, ' '])\n):\n    tokens = [secrets.choice(_ALPHABET) for i in range(length)]\n    res = ''.join(tokens)\n    return res\n\n\ndef bytes_generate_data_variable_shape(\n        length=20, *,\n        _ALPHABET=''.join([string.printable, '\\x01', '\\x12', '\\x25', '\\x26', '\\x27', '\\x91'])\n):\n    tokens = [secrets.choice(_ALPHABET) for i in range(length)]\n    res = ''.join(tokens).encode()\n    return res\n\n\ncolumn_settings = {\n    'ndarray': {\n        'fixed_shape': ['00', '01', '10'],\n        'variable_shape': ['00', '10'],\n    },\n    'str': {\n        'variable_shape': ['30']\n    },\n    'bytes': {\n        'variable_shape': ['31']\n    }\n}\n\n\ncolumn_data_generators = {\n    'ndarray': {\n        'fixed_shape': ndarray_generate_data_fixed_shape,\n        'variable_shape': ndarray_generate_data_variable_shape,\n    },\n    'str': {\n        'variable_shape': str_generate_data_variable_shape\n    },\n    'bytes': {\n        'variable_shape': bytes_generate_data_variable_shape\n    }\n}\n\n\ncolumn_layouts = {\n    'ndarray': ['flat', 'nested'],\n    'str': ['flat', 'nested'],\n    'bytes': ['flat', 'nested']\n}\n\n\ndef add_data_to_column(col, data_gen, nsamples, nsubsamples=None):\n    column_data = {}\n    for samp in range(nsamples):\n        if nsubsamples is None:\n            data = data_gen()\n            column_data[samp] = data\n            col[samp] = data\n        else:\n            column_data[samp] = {}\n            for subsamp in range(nsubsamples):\n                data = data_gen()\n                column_data[samp][subsamp] = data\n            col[samp] = column_data[samp]\n    return column_data\n\n\n@pytest.fixture(params=[1, 3])\ndef num_samples_gen(request):\n    return request.param\n\n\n@pytest.fixture(params=[1, 3])\ndef num_subsamples_gen(request):\n    return request.param\n\n\n@pytest.fixture()\ndef column_permutation_repo(repo, num_samples_gen, num_subsamples_gen):\n    co = repo.checkout(write=True)\n    nsamp = num_samples_gen\n    nsubs = num_subsamples_gen\n\n    column_name_partials = {}\n    column_data_copy = defaultdict(dict)\n    shape = (4, 4)\n    dtype = np.uint8\n    for col_dtype, schema_settings in column_settings.items():\n        for layout in column_layouts[col_dtype]:\n            for schema_type, valid_backends in schema_settings.items():\n                for backend in valid_backends:\n                    name = f'{col_dtype}_{layout}_{schema_type}_{backend}'\n                    generator = column_data_generators[col_dtype][schema_type]\n                    has_subs = False if layout == 'flat' else True\n                    is_var = False if schema_type == 'fixed_shape' else True\n\n                    if col_dtype == 'ndarray':\n                        col = co.add_ndarray_column(name,\n                                                    shape=shape,\n                                                    dtype=dtype,\n                                                    variable_shape=is_var,\n                                                    contains_subsamples=has_subs,\n                                                    backend=backend)\n                        data_partial = partial(generator, shape, dtype)\n                        if layout == 'flat':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp)\n                        elif layout == 'nested':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp, nsubs)\n                        else:\n                            raise ValueError(f'invalid layout {layout}')\n                    elif col_dtype == 'str':\n                        col = co.add_str_column(name, contains_subsamples=has_subs, backend=backend)\n                        data_partial = partial(generator)\n                        if layout == 'flat':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp)\n                        elif layout == 'nested':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp, nsubs)\n                    elif col_dtype == 'bytes':\n                        col = co.add_bytes_column(name, contains_subsamples=has_subs, backend=backend)\n                        data_partial = partial(generator)\n                        if layout == 'flat':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp)\n                        elif layout == 'nested':\n                            column_data_copy[name] = add_data_to_column(col, data_partial, nsamp, nsubs)\n                    else:\n                        raise ValueError(f'column dtype {col_dtype} invalid')\n\n                    column_name_partials[name] = data_partial\n\n    co.commit('first')\n    co.close()\n    yield repo, column_data_copy, column_name_partials\n\n\n@pytest.fixture(params=[True, False])\ndef column_permutations_read_write_checkout(request, column_permutation_repo):\n    repo, column_data, column_data_partials = column_permutation_repo\n    co = repo.checkout(write=request.param)\n    yield co, column_data, column_data_partials\n    co.close()\n\n\n@pytest.fixture()\ndef column_permutations_write_checkout(column_permutation_repo):\n    repo, column_data, column_data_partials = column_permutation_repo\n    co = repo.checkout(write=True)\n    yield co, column_data, column_data_partials\n    co.close()\n\n\n@pytest.mark.parametrize('column_type,column_kwargs', [\n    ('ndarray', {'prototype': np.array([1, 2, 3])}),\n    ('str', {}),\n    ('bytes', {}),\n])\n@pytest.mark.parametrize('contains_subsamples', [True, False])\ndef test_cannot_create_column_within_cm(repo, column_type, column_kwargs, contains_subsamples):\n    co = repo.checkout(write=True)\n    with co:\n        with pytest.raises(PermissionError):\n            if column_type == 'ndarray':\n                co.add_ndarray_column(\n                    'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n            elif column_type == 'str':\n                co.add_str_column(\n                    'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n            elif column_type == 'bytes':\n                co.add_bytes_column(\n                    'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n            else:\n                raise ValueError(column_type)\n    co.close()\n\n\n@pytest.mark.parametrize('column_type,column_kwargs', [\n    ('ndarray', {'prototype': np.array([1, 2, 3])}),\n    ('str', {}),\n    ('bytes', {}),\n])\ndef test_contains_subsamples_non_bool_value_fails(repo, column_type, column_kwargs):\n    co = repo.checkout(write=True)\n    with pytest.raises(ValueError):\n        if column_type == 'ndarray':\n            co.add_ndarray_column(\n                'testcol', contains_subsamples=None, **column_kwargs)\n        elif column_type == 'str':\n            co.add_str_column(\n                'testcol', contains_subsamples=None, **column_kwargs)\n        elif column_type == 'bytes':\n            co.add_bytes_column(\n                'testcol', contains_subsamples=None, **column_kwargs)\n        else:\n            raise ValueError(column_type)\n    co.close()\n\n\n@pytest.mark.parametrize('column_type,column_kwargs', [\n    ('ndarray', {'prototype': np.array([1, 2, 3])}),\n    ('str', {}),\n    ('bytes', {}),\n])\n@pytest.mark.parametrize('contains_subsamples', [True, False])\ndef test_cannot_create_column_name_exists(repo, column_type, column_kwargs, contains_subsamples):\n    co = repo.checkout(write=True)\n\n    # setup so that a column already exists\n    if column_type == 'ndarray':\n        co.add_ndarray_column(\n            'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n    elif column_type == 'str':\n        co.add_str_column(\n            'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n    elif column_type == 'bytes':\n        co.add_bytes_column(\n            'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n\n    with pytest.raises(LookupError):\n        if column_type == 'ndarray':\n            co.add_ndarray_column(\n                'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n        elif column_type == 'str':\n            co.add_str_column(\n                'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n        elif column_type == 'bytes':\n            co.add_bytes_column(\n                'testcol', contains_subsamples=contains_subsamples, **column_kwargs)\n        else:\n            raise ValueError(column_type)\n    co.close()\n\n\ndef test_read_data_from_column_permutations(column_permutations_read_write_checkout):\n    co, column_data, column_data_partials = column_permutations_read_write_checkout\n\n    assert len(co.columns) == len(column_data)\n    for column_name, column_samples in column_data.items():\n        assert column_name in co.columns\n        col = co[column_name]\n        assert len(column_samples) == len(col)\n\n        for sample_key, sample_value in column_samples.items():\n            if not isinstance(sample_value, dict):\n                recorded = col[sample_key]\n                assert_equal(sample_value, recorded)\n            else:\n                col_samp = col[sample_key]\n                assert len(sample_value) == len(col_samp)\n                for subsample_key, subsample_value in sample_value.items():\n                    recorded = col_samp[subsample_key]\n                    assert_equal(subsample_value, recorded)\n\n\ndef test_write_data_to_column_permutations(\n        column_permutations_write_checkout, num_samples_gen, num_subsamples_gen\n):\n    co, column_data, column_data_partials = column_permutations_write_checkout\n\n    for column_name in co.columns:\n        col = co[column_name]\n        data_gen = column_data_partials[column_name]\n        if col.column_layout == 'flat':\n            for samp in range(num_samples_gen):\n                data = data_gen()\n                column_data[column_name][str(samp)] = data\n                col[str(samp)] = data\n        elif col.column_layout == 'nested':\n            for samp in range(num_samples_gen):\n                column_data[column_name][str(samp)] = {}\n                for ssamp in range(num_subsamples_gen):\n                    data = data_gen()\n                    column_data[column_name][str(samp)][str(ssamp)] = data\n                col[str(samp)] = column_data[column_name][str(samp)]\n        else:\n            raise ValueError(f'unknown layout option {col.column_layout}')\n\n    assert len(co.columns) == len(column_data)\n    for column_name, column_samples in column_data.items():\n        assert column_name in co.columns\n        col = co[column_name]\n        assert len(column_samples) == len(col)\n\n        for sample_key, sample_value in column_samples.items():\n            if not isinstance(sample_value, dict):\n                recorded = col[sample_key]\n                assert_equal(sample_value, recorded)\n            else:\n                col_samp = col[sample_key]\n                assert len(sample_value) == len(col_samp)\n                for subsample_key, subsample_value in sample_value.items():\n                    recorded = col_samp[subsample_key]\n                    assert_equal(subsample_value, recorded)\n\n\ndef test_merge_write_data_to_column_permutations(\n        column_permutation_repo, num_samples_gen, num_subsamples_gen\n):\n    repo, column_data, column_data_partials = column_permutation_repo\n    repo.create_branch('testbranch')\n\n    # Write new data to master branch\n    co = repo.checkout(write=True, branch='master')\n    for column_name in co.columns:\n        col = co[column_name]\n        data_gen = column_data_partials[column_name]\n        if col.column_layout == 'flat':\n            for samp in range(num_samples_gen):\n                data = data_gen()\n                column_data[column_name][str(samp)] = data\n                col[str(samp)] = data\n        elif col.column_layout == 'nested':\n            for samp in range(num_samples_gen):\n                column_data[column_name][str(samp)] = {}\n                for ssamp in range(num_subsamples_gen):\n                    data = data_gen()\n                    column_data[column_name][str(samp)][str(ssamp)] = data\n                col[str(samp)] = column_data[column_name][str(samp)]\n        else:\n            raise ValueError(f'unknown layout option {col.column_layout}')\n    co.commit('commit on master adding data')\n    co.close()\n\n    # Write new data to testbranch branch\n    co = repo.checkout(write=True, branch='testbranch')\n    for column_name in co.columns:\n        col = co[column_name]\n        data_gen = column_data_partials[column_name]\n        if col.column_layout == 'flat':\n            for samp in range(num_samples_gen):\n                data = data_gen()\n                column_data[column_name][f'_{samp}'] = data\n                col[f'_{samp}'] = data\n        elif col.column_layout == 'nested':\n            for samp in range(num_samples_gen):\n                column_data[column_name][f'_{samp}'] = {}\n                for ssamp in range(num_subsamples_gen):\n                    data = data_gen()\n                    column_data[column_name][f'_{samp}'][f'_{ssamp}'] = data\n                col[f'_{samp}'] = column_data[column_name][f'_{samp}']\n        else:\n            raise ValueError(f'unknown layout option {col.column_layout}')\n    co.commit('commit on master adding data')\n    co.close()\n\n    # Merge and check that union of all data added is present\n    repo.merge('merge commit', 'master', 'testbranch')\n\n    co = repo.checkout(write=True, branch='master')\n    assert len(co.columns) == len(column_data)\n    for column_name, column_samples in column_data.items():\n        assert column_name in co.columns\n        col = co[column_name]\n        assert len(column_samples) == len(col)\n\n        for sample_key, sample_value in column_samples.items():\n            if not isinstance(sample_value, dict):\n                recorded = col[sample_key]\n                assert_equal(sample_value, recorded)\n            else:\n                col_samp = col[sample_key]\n                assert len(sample_value) == len(col_samp)\n                for subsample_key, subsample_value in sample_value.items():\n                    recorded = col_samp[subsample_key]\n                    assert_equal(subsample_value, recorded)\n    co.close()\n\n"
  },
  {
    "path": "tests/test_column_nested.py",
    "content": "\"\"\"Tests for the class methods contained in the nested subsample column accessor.\n\"\"\"\nimport numpy as np\nimport pytest\nfrom conftest import fixed_shape_backend_params, variable_shape_backend_params\n\n\n# --------------------------- Setup ------------------------------\n\n\ndef assert_equal(arr, arr2):\n    assert np.array_equal(arr, arr2)\n    assert arr.dtype == arr2.dtype\n\n\n# ------------------------ Tests ----------------------------------\n\n\nclass TestArraysetSetup:\n\n    @pytest.mark.parametrize('name', [\n        'invalid\\n', '\\ninvalid', 'inv name', 'inva@lid', 12, ' try', 'andthis ',\n        'VeryLongNameIsInvalidOver64CharactersNotAllowedVeryLongNameIsInva'])\n    def test_does_not_allow_invalid_arrayset_names(self, repo, randomsizedarray, name):\n        co = repo.checkout(write=True)\n        with pytest.raises(ValueError):\n            co.add_ndarray_column(name, prototype=randomsizedarray, contains_subsamples=True)\n        co.close()\n\n    def test_read_only_mode_arrayset_methods_limited(self, aset_subsamples_initialized_repo):\n        import hangar\n        co = aset_subsamples_initialized_repo.checkout()\n        assert isinstance(co, hangar.checkout.ReaderCheckout)\n        with pytest.raises(AttributeError):\n            assert co.add_ndarray_column('foo')\n        with pytest.raises(AttributeError):\n            assert co.add_str_column('foo')\n        with pytest.raises(PermissionError):\n            assert co.columns.delete('foo')\n        assert len(co.columns['writtenaset']) == 0\n        co.close()\n\n    def test_get_arrayset_in_read_and_write_checkouts(self, aset_subsamples_initialized_repo, array5by7):\n        co = aset_subsamples_initialized_repo.checkout(write=True)\n        # getting the column with `get`\n        asetOld = co.columns.get('writtenaset')\n        asetOldPath = asetOld._path\n        asetOldAsetn = asetOld.column\n        asetOldDefaultSchemaHash = asetOld._schema.schema_hash_digest()\n        co.close()\n\n        co = aset_subsamples_initialized_repo.checkout()\n        # getting column with dictionary like style method\n        asetNew = co.columns['writtenaset']\n        assert asetOldPath == asetNew._path\n        assert asetOldAsetn == asetNew.column\n        assert asetOldDefaultSchemaHash == asetNew._schema.schema_hash_digest()\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_delete_arrayset(self, aset_backend, aset_subsamples_initialized_repo):\n        co = aset_subsamples_initialized_repo.checkout(write=True)\n        co.columns.delete('writtenaset')\n        assert 'writtenaset' not in co.columns\n        with pytest.raises(KeyError):\n            # cannot delete twice\n            co.columns.delete('writtenaset')\n\n        # init and immediate delete leaves no trace\n        co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float64,\n                              backend=aset_backend, contains_subsamples=True)\n        assert len(co.columns) == 1\n        co.columns.delete('writtenaset')\n        assert len(co.columns) == 0\n        co.commit('this is a commit message')\n        co.close()\n\n        # init column in checkout persists aset records/accessor even if no samples contained\n        co = aset_subsamples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 0\n        co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float64,\n                              backend=aset_backend, contains_subsamples=True)\n        co.commit('this is a commit message')\n        co.close()\n        co = aset_subsamples_initialized_repo.checkout(write=True)\n        assert len(co.columns) == 1\n\n        # column can be deleted with via __delitem__ dict style command.\n        del co.columns['writtenaset']\n        assert len(co.columns) == 0\n        co.commit('this is a commit message')\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_init_same_arrayset_twice_fails_again(self, aset_backend, repo, randomsizedarray):\n        co = repo.checkout(write=True)\n        co.add_ndarray_column('aset', prototype=randomsizedarray,\n                              backend=aset_backend, contains_subsamples=True)\n        with pytest.raises(LookupError):\n            # test if everything is the same as initalized one.\n            co.add_ndarray_column('aset', prototype=randomsizedarray,\n                                  backend=aset_backend, contains_subsamples=True)\n        with pytest.raises(LookupError):\n            # test if column container type is different than existing name (no subsamples0\n            co.add_ndarray_column('aset', prototype=randomsizedarray,\n                                  backend=aset_backend, contains_subsamples=False)\n        co.close()\n\n    @pytest.mark.parametrize(\"aset_backend\", fixed_shape_backend_params)\n    def test_arrayset_with_invalid_dimension_sizes_shapes(self, aset_backend, repo):\n        co = repo.checkout(write=True)\n\n        shape = (0, 1, 2)\n        with pytest.raises(ValueError):\n            # cannot have zero valued size for any dimension\n            co.add_ndarray_column('aset', shape=shape, dtype=np.int,\n                                  backend=aset_backend, contains_subsamples=True)\n\n        shape = [1] * 31\n        aset = co.add_ndarray_column('aset1', shape=shape, dtype=np.int,\n                                     backend=aset_backend, contains_subsamples=True)\n        assert len(aset.shape) == 31\n\n        shape = [1] * 32\n        with pytest.raises(ValueError):\n            # maximum tensor rank must be <= 31\n            co.add_ndarray_column('aset2', shape=shape, dtype=np.int,\n                                  backend=aset_backend, contains_subsamples=True)\n        co.close()\n\n\n# ------------------------------ Add Data Tests --------------------------------------------\n\n@pytest.fixture(params=[1, 3], scope='class')\ndef multi_item_generator(request):\n    yield request.param\n\n\n@pytest.fixture(params=[\n    # specifies container types, two-elements: ['outer', 'inner']\n    ['dict', None],\n    ['list', 'tuple'],\n    ['tuple', 'list'],\n], scope='class')\ndef iterable_subsamples(request, multi_item_generator):\n    outer, inner = request.param\n    arrays = []\n    for num_item in range(multi_item_generator):\n        arr = np.arange(16, dtype=np.uint8).reshape(4, 4)\n        arr += 1\n        arrays.append(arr)\n\n    components = []\n    for idx, array in enumerate(arrays):\n        if inner == 'list':\n            component = [f'subsample{idx}', array]\n        elif inner == 'tuple':\n            component = (f'subsample{idx}', array)\n        elif inner is None:\n            component = {f'subsample{idx}': array}\n        else:\n            raise ValueError(\n                f'unknown parameter of `inner` {inner} in test suite generation')\n        components.append(component)\n\n    if outer == 'dict':\n        res = {}\n        for part in components:\n            res.update(part)\n    elif outer == 'list':\n        res = []\n        for part in components:\n            res.append(part)\n    elif outer == 'tuple':\n        res = []\n        for part in components:\n            res.append(part)\n        res = tuple(res)\n    else:\n        raise ValueError(\n            f'unknown parameter of `outer` {outer} in test suite generation')\n    return res\n\n\n@pytest.fixture(params=['dict', 'list', 'tuple'], scope='class')\ndef iterable_samples(request, multi_item_generator, iterable_subsamples):\n    container = request.param\n\n    if container == 'dict':\n        res = {}\n        for idx in range(multi_item_generator):\n            res[f'sample{idx}'] = iterable_subsamples\n    elif container == 'list':\n        res = []\n        for idx in range(multi_item_generator):\n            res.append([f'sample{idx}', iterable_subsamples])\n    elif container == 'tuple':\n        res = []\n        for idx in range(multi_item_generator):\n            res.append([f'sample{idx}', iterable_subsamples])\n        res = tuple(res)\n    else:\n        raise ValueError(\n            f'unknown parameter of `container` {container} in test suite generation')\n    return res\n\n\n@pytest.fixture(params=fixed_shape_backend_params, scope='class')\ndef backend_params(request):\n    return request.param\n\n\n@pytest.fixture()\ndef subsample_writer_written_aset(backend_params, repo, monkeypatch):\n    from hangar.backends import hdf5_00\n    from hangar.backends import hdf5_01\n    from hangar.backends import numpy_10\n    monkeypatch.setattr(hdf5_00, 'COLLECTION_COUNT', 5)\n    monkeypatch.setattr(hdf5_00, 'COLLECTION_SIZE', 10)\n    monkeypatch.setattr(hdf5_01, 'COLLECTION_COUNT', 5)\n    monkeypatch.setattr(hdf5_01, 'COLLECTION_SIZE', 10)\n    monkeypatch.setattr(numpy_10, 'COLLECTION_SIZE', 10)\n\n    co = repo.checkout(write=True)\n    aset = co.add_ndarray_column('foo', shape=(4, 4), dtype=np.uint8, variable_shape=False,\n                                 backend=backend_params, contains_subsamples=True)\n    yield aset\n    co.close()\n\n\nclass TestAddData:\n\n    def test_update_sample_subsamples_empty_arrayset(self, subsample_writer_written_aset, iterable_samples):\n        aset = subsample_writer_written_aset\n        added = aset.update(iterable_samples)\n        assert added is None\n        assert len(aset._samples) == len(iterable_samples)\n        for sample_idx, sample_data in enumerate(iterable_samples):\n            assert f'sample{sample_idx}' in aset._samples\n\n    def test_update_sample_kwargs_only_empty_arrayset(self, subsample_writer_written_aset, iterable_subsamples):\n        aset = subsample_writer_written_aset\n        added = aset.update(fookwarg=iterable_subsamples)\n        assert added is None\n        assert len(aset._samples) == 1\n        assert 'fookwarg' in aset._samples\n\n        added = aset.update(bar=iterable_subsamples, baz=iterable_subsamples)\n        assert added is None\n        assert len(aset._samples) == 3\n        assert 'bar' in aset._samples\n        assert 'baz' in aset._samples\n        for subsample_idx, _data in enumerate(iterable_subsamples):\n            assert f'subsample{subsample_idx}' in aset._samples['fookwarg']._subsamples\n            assert f'subsample{subsample_idx}' in aset._samples['bar']._subsamples\n            assert f'subsample{subsample_idx}' in aset._samples['baz']._subsamples\n\n    def test_update_sample_kwargs_and_other_dict_doesnt_modify_input_in_calling_scope(\n            self, subsample_writer_written_aset, iterable_subsamples, iterable_samples\n    ):\n        \"\"\"ensure bug does not revert.\n\n        Had a case where if dict was passed as ``other`` along with kwargs, the operation\n        would complete as normally, but when control returned to the caller the original\n        dict passed in as ``other`` would have been silently merged with the kwargs.\n        \"\"\"\n        aset = subsample_writer_written_aset\n        if not isinstance(iterable_samples, dict):\n            return\n        iterable_samples_before = list(iterable_samples.items())\n\n        aset.update(iterable_samples, kwargadded=iterable_subsamples)\n        # in bug case, would now observe that iterable_samples would have been\n        # silently modified in a method analogous to calling:\n        #\n        #   ``iterable_samples.update({'kwargadded': iterable_subsamples})``\n        #\n        assert list(iterable_samples.items()) == iterable_samples_before\n\n    def test_update_sample_kwargs_and_iterably_empty_arrayset(\n            self, subsample_writer_written_aset, iterable_subsamples, iterable_samples\n    ):\n        aset = subsample_writer_written_aset\n        aset.update(iterable_samples, fookwarg=iterable_subsamples)\n        assert len(aset._samples) == len(iterable_samples) + 1\n\n        assert 'fookwarg' in aset._samples\n        for sample_idx in range(len(iterable_samples)):\n            assert f'sample{sample_idx}' in aset._samples\n\n    def test_update_sample_subsamples_duplicate_data_does_not_save_new(\n            self, subsample_writer_written_aset, iterable_samples\n    ):\n        aset = subsample_writer_written_aset\n        aset.update(iterable_samples)\n        old_specs = {}\n        for sample_idx, sample_data in enumerate(iterable_samples):\n            old_specs[f'sample{sample_idx}'] = aset._samples[f'sample{sample_idx}']._subsamples.copy()\n\n        aset.update(iterable_samples)\n        new_specs = {}\n        for sample_idx, sample_data in enumerate(iterable_samples):\n            new_specs[f'sample{sample_idx}'] = aset._samples[f'sample{sample_idx}']._subsamples.copy()\n        assert old_specs == new_specs\n\n    def test_update_sample_subsamples_context_manager(self, subsample_writer_written_aset, iterable_samples):\n        aset = subsample_writer_written_aset\n        assert aset._is_conman is False\n        with aset as cm_aset:\n            assert cm_aset._is_conman is True\n            added = cm_aset.update(iterable_samples)\n            assert added is None\n        assert aset._is_conman is False\n\n        assert len(aset._samples) == len(iterable_samples)\n        for sample_idx, sample_data in enumerate(iterable_samples):\n            assert f'sample{sample_idx}' in aset._samples\n\n    def test_setitem_sample_subsamples_empty_arrayset(\n            self, multi_item_generator, subsample_writer_written_aset, iterable_subsamples\n    ):\n        aset = subsample_writer_written_aset\n\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = iterable_subsamples\n        assert len(aset._samples) == len(iterable_subsamples)\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(iterable_subsamples)\n            for subsample_idx in range(len(iterable_subsamples)):\n                assert f'subsample{subsample_idx}' in aset._samples[f'sample{sample_idx}']._subsamples\n\n    def test_setitem_sample_subsamples_contextmanager(\n            self, multi_item_generator, subsample_writer_written_aset, iterable_subsamples\n    ):\n        aset = subsample_writer_written_aset\n        assert aset._is_conman is False\n        with aset as aset_cm:\n            assert aset_cm._is_conman is True\n            for sample_idx in range(multi_item_generator):\n                aset_cm[f'sample{sample_idx}'] = iterable_subsamples\n            assert len(aset_cm._samples) == len(iterable_subsamples)\n            assert aset_cm._samples[f'sample{sample_idx}']._is_conman is True\n        assert aset._is_conman is False\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(iterable_subsamples)\n            for subsample_idx in range(len(iterable_subsamples)):\n                assert f'subsample{subsample_idx}' in aset._samples[\n                    f'sample{sample_idx}']._subsamples\n\n    def test_update_subsamples_empty_arrayset(self, multi_item_generator, subsample_writer_written_aset,\n                                              iterable_subsamples):\n        aset = subsample_writer_written_aset\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            aset[f'sample{sample_idx}'].update(iterable_subsamples)\n        assert len(aset._samples) == len(iterable_subsamples)\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(iterable_subsamples) + 1\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            for subsample_idx in range(len(iterable_subsamples)):\n                assert f'subsample{subsample_idx}' in aset._samples[f'sample{sample_idx}']._subsamples\n\n    def test_update_subsamples_via_kwargs_empty_arrayset(self, multi_item_generator, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            aset[f'sample{sample_idx}'].update(bar=np.arange(16, dtype=np.uint8).reshape(4, 4) + 20)\n        assert len(aset._samples) == multi_item_generator\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == 2\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            assert 'bar' in aset._samples[f'sample{sample_idx}']._subsamples\n\n    def test_update_subsamples_kwargs_and_other_dict_doesnt_modify_input_in_calling_scopy(\n            self, multi_item_generator, subsample_writer_written_aset, iterable_subsamples\n    ):\n        \"\"\"ensure bug does not revert.\n\n        Had a case where if dict was passed as ``other`` along with kwargs, the operation\n        would complete as normally, but when control returned to the caller the original\n        dict passed in as ``other`` would have been silently merged with the kwargs.\n        \"\"\"\n        aset = subsample_writer_written_aset\n        if not isinstance(iterable_subsamples, dict):\n            return\n        iterable_subsamples_before = list(iterable_subsamples.keys())\n\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            aset[f'sample{sample_idx}'].update(iterable_subsamples,\n                                               kwargadded=np.arange(16, dtype=np.uint8).reshape(4, 4))\n            # in bug case, would now observe that iterable_subsamples would have been\n            # silently modified in a method analogous to calling:\n            #\n            #   ``iterable_subsamples.update({'kwargadded': np.array})``\n            #\n            assert list(iterable_subsamples.keys()) == iterable_subsamples_before\n        assert list(iterable_subsamples.keys()) == iterable_subsamples_before\n\n    def test_update_subsamples_via_kwargs_and_iterable_empty_arrayset(\n            self, multi_item_generator, subsample_writer_written_aset, iterable_subsamples\n    ):\n        aset = subsample_writer_written_aset\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            aset[f'sample{sample_idx}'].update(iterable_subsamples, bar=np.arange(16, dtype=np.uint8).reshape(4, 4))\n\n        assert len(aset._samples) == multi_item_generator\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(iterable_subsamples) + 2\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            assert 'bar' in aset._samples[f'sample{sample_idx}']._subsamples\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_update_subsamples_context_manager(\n            self, backend, multi_item_generator, iterable_subsamples, repo\n    ):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo', shape=(4, 4), dtype=np.uint8,\n                                     backend=backend, contains_subsamples=True)\n\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            assert aset._is_conman is False\n            with aset[f'sample{sample_idx}'] as sample_cm:\n                assert sample_cm._is_conman is True\n                assert aset._is_conman is True\n                sample_cm.update(iterable_subsamples)\n            assert aset._is_conman is False\n        assert len(aset._samples) == len(iterable_subsamples)\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(iterable_subsamples) + 1\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            for subsample_idx in range(len(iterable_subsamples)):\n                assert f'subsample{subsample_idx}' in aset._samples[f'sample{sample_idx}']._subsamples\n        co.close()\n\n    def test_setitem_sample_empty_arrayset(\n            self, multi_item_generator, iterable_subsamples, subsample_writer_written_aset\n    ):\n        aset = subsample_writer_written_aset\n\n        subsamples_dict = dict(iterable_subsamples)\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            for subsample_key, subsample_val in subsamples_dict.items():\n                aset[f'sample{sample_idx}'][subsample_key] = subsample_val\n        assert len(aset._samples) == len(iterable_subsamples)\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(subsamples_dict) + 1\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            for subkey in subsamples_dict.keys():\n                assert subkey in aset._samples[f'sample{sample_idx}']._subsamples\n\n    def test_setitem_sample_setitem_subsample_empty_arrayset_fails(self, subsample_writer_written_aset):\n        \"\"\"This should fail because __getitem___ raises keyerror when\n\n        ``aset[foo-sample][subsample] = np.ndarray`` runs.\n\n        The ``aset[foo-sample]`` part fails with KeyError, and no subsample\n        accessor is returned for the __setitem__ call following __getitem__\n        \"\"\"\n        aset = subsample_writer_written_aset\n        with pytest.raises(KeyError, match='sample'):\n            aset['sample']\n        with pytest.raises(KeyError, match='sample'):\n            aset['sample']['subsample'] = np.arange(16, dtype=np.uint8).reshape(4, 4)\n        assert len(aset) == 0\n\n    def test_setitem_subsamples_contextmanager(self, multi_item_generator, iterable_subsamples,\n                                               subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        subsamples_dict = dict(iterable_subsamples)\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n            assert aset._is_conman is False\n            with aset[f'sample{sample_idx}'] as sample_cm:\n                assert sample_cm._is_conman is True\n                assert aset._is_conman is True\n                for subsample_key, subsample_val in subsamples_dict.items():\n                    sample_cm[subsample_key] = subsample_val\n            assert aset._is_conman is False\n        assert len(aset._samples) == len(iterable_subsamples)\n\n        for sample_idx in range(multi_item_generator):\n            assert f'sample{sample_idx}' in aset._samples\n            assert len(aset._samples[f'sample{sample_idx}']._subsamples) == len(subsamples_dict) + 1\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            for subkey in subsamples_dict.keys():\n                assert subkey in aset._samples[f'sample{sample_idx}']._subsamples\n\n    def test_append_subsamples_empty_arrayset(self, multi_item_generator, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {\n                'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + ((sample_idx * 2) + 1)\n            }\n            outkey = aset[f'sample{sample_idx}'].append(\n                np.arange(16, dtype=np.uint8).reshape(4, 4) + sample_idx\n            )\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            assert outkey in aset._samples[f'sample{sample_idx}']._subsamples\n        assert len(aset._samples) == multi_item_generator\n\n    def test_append_subsamples_contextmanager(self, multi_item_generator, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        for sample_idx in range(multi_item_generator):\n            aset[f'sample{sample_idx}'] = {\n                'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + ((sample_idx * 2) + 1)\n            }\n            assert aset._is_conman is False\n            with aset[f'sample{sample_idx}'] as sample_cm:\n                assert aset._is_conman is True\n                assert sample_cm._is_conman is True\n                outkey = sample_cm.append(np.arange(16, dtype=np.uint8).reshape(4, 4) + sample_idx)\n            assert aset._is_conman is False\n            assert 'foo' in aset._samples[f'sample{sample_idx}']._subsamples\n            assert outkey in aset._samples[f'sample{sample_idx}']._subsamples\n        assert len(aset._samples) == multi_item_generator\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    @pytest.mark.parametrize('other', [\n        [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)],\n        (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)),\n    ])\n    def test_update_noniterable_subsample_iter_fails(self, backend, other, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo', shape=(4, 4), dtype=np.uint8,\n                                     backend=backend, contains_subsamples=True)\n        aset[f'foo'] = {'foo': np.arange(16, dtype=np.uint8).reshape(4, 4) + 10}\n        with pytest.raises(ValueError, match='dictionary update sequence'):\n            aset['foo'].update(other)\n        assert len(aset._samples) == 1\n        assert len(aset._samples['foo']._subsamples) == 1\n        assert 'foo' in aset._samples['foo']._subsamples\n        assert 'subsample1' not in aset._samples['foo']._subsamples\n        co.close()\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_update_subsamples_with_too_many_arguments_fails(self, backend, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo', shape=(4, 4), dtype=np.uint8,\n                                     backend=backend, contains_subsamples=True)\n        arr = np.arange(16, dtype=np.uint8).reshape(4, 4)\n        aset[f'foo'] = {'foo': arr + 10}\n        with pytest.raises(TypeError, match='takes from 1 to 2 positional arguments'):\n            aset['foo'].update('fail', arr)\n        assert len(aset._samples) == 1\n        assert len(aset._samples['foo']._subsamples) == 1\n        assert 'foo' in aset._samples['foo']._subsamples\n        co.close()\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_update_subsamples_with_too_few_arguments_fails(self, backend, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo', shape=(4, 4), dtype=np.uint8,\n                                     backend=backend, contains_subsamples=True)\n        arr = np.arange(16, dtype=np.uint8).reshape(4, 4)\n        aset[f'foo'] = {'foo': arr + 10}\n        with pytest.raises(ValueError, match='dictionary update sequence element #0 has length 1; 2 is required'):\n            aset['foo'].update('fail')\n        assert len(aset._samples) == 1\n        assert len(aset._samples['foo']._subsamples) == 1\n        assert 'foo' in aset._samples['foo']._subsamples\n        co.close()\n\n    @pytest.mark.parametrize('other', [\n        ['sample1', [[f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]]],\n        ['sample1', ((f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)),)],\n        ('sample1', ((f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)),),),\n        ('sample1', [[f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]],),\n        ['sample1', ([f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)])],\n        ['sample1', [(f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))]],\n        ('sample1', ([f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]),),\n        ('sample1', [(f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))],),\n        ['sample1', [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]],\n        ['sample1', (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))],\n        ('sample1', [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)],),\n        ('sample1', (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)),),\n        ('sample1', {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)},),\n        ['sample1', {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}],\n    ])\n    def test_update_noniterable_samples_fails(self, other, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        with pytest.raises(ValueError, match='dictionary update sequence'):\n            aset.update(other)\n        assert len(aset._samples) == 0\n\n    @pytest.mark.parametrize('other', [\n        [['sample1', [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]]],\n        [['sample1', (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4),)]],\n        (('sample1', (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))),),\n        (('sample1', [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]),),\n        {'sample1': [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]},\n        {'sample1': (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))},\n        {'sample1': (f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4))},\n        {'sample1': [f'subsample1', np.arange(16, dtype=np.uint8).reshape(4, 4)]},\n    ])\n    def test_update_noniterable_subsamples_fails(self, other, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        with pytest.raises(ValueError, match='dictionary update sequence'):\n            aset.update(other)\n        assert len(aset._samples) == 0\n\n    @pytest.mark.parametrize('other', [\n        {'sample1!': {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {-2: {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'lol cat': {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample 1': {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {('sample', 'one'): {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {(1, 2): {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {('sample', 2): {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {(1, 'sample'): {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n    ])\n    def test_update_invalid_sample_key_fails(self, other, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        with pytest.raises(ValueError, match='is not suitable'):\n            aset.update(other)\n        assert len(aset._samples) == 0\n\n    @pytest.mark.parametrize('other', [\n        {'sample': {f'subsample1!': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {f'subsample 1': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {-2: np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {f'subsample1\\n': np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {(1, 2): np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {('s1', 's2'): np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {('s1', 1): np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n        {'sample': {(1, 's1'): np.arange(16, dtype=np.uint8).reshape(4, 4)}},\n    ])\n    def test_update_sample_invalid_subsample_key_fails(self, other, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        with pytest.raises(ValueError, match='is not suitable'):\n            aset.update(other)\n        assert len(aset._samples) == 0\n\n    @pytest.mark.parametrize('variable_shape,backend', [\n        *[[True, be] for be in variable_shape_backend_params],\n        *[[False, be] for be in fixed_shape_backend_params],\n    ])\n    @pytest.mark.parametrize('other', [\n        {'sample': {f'subsample1': np.arange(9, dtype=np.uint8).reshape(3, 3)}},\n        {'sample': {f'subsample1': np.arange(8, dtype=np.uint8).reshape(2, 2, 2)}},\n        {'sample': {f'subsample1': np.arange(4, dtype=np.float32).reshape(2, 2)}},\n        {'sample': {f'subsample1': np.arange(4, dtype=np.uint8).reshape((2, 2), order='F')}},\n        {'sample': {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4).tolist()}},\n    ])\n    def test_update_sample_invalid_array_fails_fixed_shape(self, backend, variable_shape, other, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo',\n                                     shape=(2, 2), dtype=np.uint8, variable_shape=variable_shape,\n                                     backend=backend, contains_subsamples=True)\n        with pytest.raises(ValueError):\n            aset.update(other)\n        assert len(aset._samples) == 0\n        co.close()\n\n    @pytest.mark.parametrize('other', [\n        {f'subsample1!': np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {f'subsample 1': np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {-2: np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {f'subsample1\\n': np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {(1, 2): np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {('s1', 's2'): np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {('s1', 1): np.arange(16, dtype=np.uint8).reshape(4, 4)},\n        {(1, 's1'): np.arange(16, dtype=np.uint8).reshape(4, 4)},\n    ])\n    def test_update_subsample_invalid_subsample_key_fails(self, other, subsample_writer_written_aset):\n        aset = subsample_writer_written_aset\n        aset['sample'] = {0: np.zeros((4, 4), dtype=np.uint8)}\n        with pytest.raises(ValueError, match='is not suitable'):\n            aset['sample'].update(other)\n        assert len(aset._samples) == 1\n        assert len(aset._samples['sample']._subsamples) == 1\n        assert 0 in aset._samples['sample']._subsamples\n\n    @pytest.mark.parametrize('variable_shape,backend', [\n        *[[False, be] for be in fixed_shape_backend_params],\n    ])\n    @pytest.mark.parametrize('other', [\n        {f'subsample1': np.arange(9, dtype=np.uint8).reshape(3, 3)},\n        {f'subsample1': np.arange(8, dtype=np.uint8).reshape(2, 2, 2)},\n        {f'subsample1': np.arange(4, dtype=np.float32).reshape(2, 2)},\n        {f'subsample1': np.arange(4, dtype=np.uint8).reshape((2, 2), order='F')},\n        {f'subsample1': np.arange(16, dtype=np.uint8).reshape(4, 4).tolist()},\n    ])\n    def test_update_subsample_invalid_array_fails_fixed_shape(self, backend, variable_shape, other, repo):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo',\n                                     shape=(4, 4), dtype=np.uint8, variable_shape=variable_shape,\n                                     backend=backend, contains_subsamples=True)\n        aset['sample'] = {0: np.zeros((4, 4), dtype=np.uint8)}\n        with pytest.raises(ValueError):\n            aset['sample'].update(other)\n        assert len(aset._samples) == 1\n        assert len(aset._samples['sample']._subsamples) == 1\n        assert 0 in aset._samples['sample']._subsamples\n        co.close()\n\n\n# --------------------------- Test Remove Data -------------------------------------\n\n\n@pytest.fixture(scope='class')\ndef subsample_data_map():\n    arr = np.arange(5 * 7).astype(np.uint16).reshape((5, 7))\n    res = {\n        'foo': {\n            0: arr,\n            1: arr + 1,\n            2: arr + 2\n        },\n        2: {\n            'bar': arr + 3,\n            'baz': arr + 4\n        }\n    }\n    return res\n\n\n@pytest.fixture(params=fixed_shape_backend_params, scope='class')\ndef backend_param(request):\n    return request.param\n\n\n@pytest.fixture(params=[False, True], scope='class')\ndef write_enabled(request):\n    return request.param\n\n\n@pytest.fixture(scope='class')\ndef initialized_arrayset(write_enabled, backend_param, classrepo, subsample_data_map):\n    co = classrepo.checkout(write=True)\n    aset = co.add_ndarray_column(f'foo{backend_param}{int(write_enabled)}',\n                                 shape=(5, 7), dtype=np.uint16, backend=backend_param,\n                                 contains_subsamples=True)\n    aset.update(subsample_data_map)\n    co.commit(f'done {backend_param}{write_enabled}')\n    co.close()\n    if write_enabled:\n        nco = classrepo.checkout(write=True)\n        yield nco.columns[f'foo{backend_param}{int(write_enabled)}']\n        nco.close()\n    else:\n        nco = classrepo.checkout()\n        yield nco.columns[f'foo{backend_param}{int(write_enabled)}']\n        nco.close()\n\n\n@pytest.fixture()\ndef initialized_arrayset_write_only(backend_param, repo, subsample_data_map):\n    co = repo.checkout(write=True)\n    aset = co.add_ndarray_column('foo', shape=(5, 7), dtype=np.uint16,\n                                 backend=backend_param, contains_subsamples=True)\n    aset.update(subsample_data_map)\n    yield co.columns['foo']\n    co.close()\n\n\nclass TestRemoveData:\n\n    # --------------------- delete -----------------------------\n\n    def test_delitem_single_sample_from_arrayset(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        del aset['foo']\n        assert 'foo' not in aset._samples\n        assert 'foo' not in aset\n\n    def test_delitem_single_subsample_from_sample(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        del aset['foo'][0]\n        assert 0 not in aset._samples['foo']._subsamples\n        assert 0 not in aset['foo']\n\n    def test_delitem_sample_nonexisting_keys_fails(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        assert 'doesnotexist' not in aset._samples\n        assert 'doesnotexist' not in aset\n        with pytest.raises(KeyError):\n            del aset['doesnotexist']\n\n    def test_delitem_single_subsample_nonexisting_key_fails(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        assert 'foo' in aset._samples\n        assert 'foo' in aset\n        assert 'doesnotexist' not in aset._samples['foo']._subsamples\n        assert 'doesnotexist' not in aset['foo']\n        with pytest.raises(KeyError):\n            del aset['foo']['doesnotexist']\n\n    def test_delitem_multiple_samples_fails_keyerror(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        with pytest.raises(KeyError, match=\"('foo', 2)\"):\n            del aset['foo', 2]\n        assert 'foo' in aset\n        assert 2 in aset\n\n    # ------------------------ pop ----------------------------\n\n    def test_pop_single_sample_from_arrayset(self, initialized_arrayset_write_only, subsample_data_map):\n        aset = initialized_arrayset_write_only\n        res = aset.pop('foo')\n        assert 'foo' not in aset\n        assert isinstance(res, dict)\n        assert len(res) == len(subsample_data_map['foo'])\n        for expected_k, expected_v in subsample_data_map['foo'].items():\n            assert_equal(res[expected_k], expected_v)\n\n    def test_pop_multiple_samples_from_arrayset_fails(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        with pytest.raises(TypeError, match=\"takes 2 positional arguments but 3 were\"):\n            aset.pop('foo', 2)\n        assert 'foo' in aset\n        assert 2 in aset\n\n    def test_pop_single_subsample_from_sample(self, initialized_arrayset_write_only, subsample_data_map):\n        aset = initialized_arrayset_write_only\n        res = aset['foo'].pop(0)\n        assert 0 not in aset['foo']\n        assert isinstance(res, np.ndarray)\n        assert_equal(res, subsample_data_map['foo'][0])\n\n    def test_pop_multiple_subsample_from_sample_fails(self, initialized_arrayset_write_only):\n        aset = initialized_arrayset_write_only\n        with pytest.raises(TypeError, match=\"takes 2 positional arguments but 3 were given\"):\n            aset['foo'].pop(*[0, 1])\n        assert 0 in aset['foo']\n        assert 1 in aset['foo']\n\n\n# ------------------------------ Container Introspection -----------------------------------\n\n\nclass TestContainerIntrospection:\n\n    def test_get_sample_returns_object(self, initialized_arrayset, subsample_data_map):\n        from hangar.columns.layout_nested import FlatSubsampleReader, NestedSampleReader\n\n        aset = initialized_arrayset\n        assert isinstance(aset, NestedSampleReader)\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert isinstance(sample, FlatSubsampleReader)\n\n    # -------------------------- test __dunder__ methods ----------------------------------\n\n    def test_get_sample_test_subsample_len_method(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert len(sample) == len(subsample_data)\n\n    def test_get_sample_test_subsample_contains_method(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            for subsample_name in subsample_data.keys():\n                assert subsample_name in sample\n\n    def test_sample_len_reported_correctly(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        assert len(aset) == len(subsample_data_map)\n        assert aset.num_subsamples == sum([len(subsample) for subsample in subsample_data_map.values()])\n\n    # ----------------------------- test property ---------------------------\n\n    def test_get_sample_test_subsample_sample_property(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.sample == sample_name\n\n    def test_get_sample_test_subsample_arrayset_property(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.column.startswith('foo')\n\n    def test_get_sample_test_data_property(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            res = sample.data\n            assert isinstance(res, dict)\n            assert len(res) == len(subsample_data)\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_get_sample_test_subsample_contains_remote_references_property(\n            self, initialized_arrayset, subsample_data_map\n    ):\n        aset = initialized_arrayset\n        # test works before add remote references\n        aset.contains_remote_references is False\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.contains_remote_references is False\n\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n        aset[2]._subsamples[50] = template\n\n        aset.contains_remote_references is True\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.contains_remote_references is True\n\n        del aset._samples['foo']._subsamples[50]\n        del aset._samples[2]._subsamples[50]\n\n    def test_get_sample_test_subsample_remote_reference_keys_property(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        # test works before add remote references\n        assert aset.remote_reference_keys == ()\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.remote_reference_keys == ()\n\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n        aset[2]._subsamples[50] = template\n\n        assert aset.remote_reference_keys == (2, 'foo') or ('foo', 2)\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert sample.remote_reference_keys == (50,)\n\n        del aset._samples['foo']._subsamples[50]\n        del aset._samples[2]._subsamples[50]\n\n    def test_getattr_does_not_raise_permission_error_if_alive(self, initialized_arrayset):\n        aset = initialized_arrayset\n\n        assert hasattr(aset, 'doesnotexist') is False  # does not raise error\n        assert hasattr(aset, '_mode') is True\n        with pytest.raises(AttributeError):\n            assert getattr(aset, 'doesnotexist')\n        assert getattr(aset, '_mode') == 'a' if aset.iswriteable else 'r'\n\n        sample = aset['foo']\n        assert hasattr(sample, 'doesnotexist') is False  # does not raise error\n        assert hasattr(sample, '_mode') is True\n        with pytest.raises(AttributeError):\n            assert getattr(sample, 'doesnotexist')\n        assert getattr(sample, '_mode') == 'a' if aset.iswriteable else 'r'\n\n        # mock up destruct call in sample and aset.\n        original = getattr(aset, '_mode')\n        delattr(aset, '_mode')\n        delattr(sample, '_mode')\n        with pytest.raises(PermissionError):\n            hasattr(aset, 'doesnotexist')\n        with pytest.raises(PermissionError):\n            hasattr(aset, '_mode')\n\n        with pytest.raises(PermissionError):\n            hasattr(sample, 'doesnotexist')\n        with pytest.raises(PermissionError):\n            hasattr(sample, '_mode')\n        setattr(aset, '_mode', original)\n        setattr(sample, '_mode', original)\n\n\n# ------------------------------ Getting Data --------------------------------------------\n\n\nclass TestGetDataMethods:\n\n    def test_get_sample_missing_key(self, initialized_arrayset):\n        aset = initialized_arrayset\n        returned = aset.get('doesnotexist')\n        assert returned is None\n        default_returned = aset.get(9999, default=True)\n        assert default_returned is True\n\n    def test_getitem_sample_missing_key(self, initialized_arrayset):\n        aset = initialized_arrayset\n        with pytest.raises(KeyError):\n            aset['doesnotexist']\n\n    def test_get_sample_get_subsample(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            for subsample_name, subsample_value in subsample_data.items():\n                res = sample.get(subsample_name)\n                assert_equal(res, subsample_value)\n\n    def test_getitem_sample_getitem_subsample(self, initialized_arrayset, subsample_data_map):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset[sample_name]\n            assert isinstance(sample, FlatSubsampleReader)\n            for subsample_name, subsample_value in subsample_data.items():\n                res = sample[subsample_name]\n                assert_equal(res, subsample_value)\n\n    def test_getitem_subsample_from_column(self, initialized_arrayset, subsample_data_map):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset[sample_name]\n            assert isinstance(sample, FlatSubsampleReader)\n            res = aset[sample_name, ...]\n            assert res.keys() == subsample_data.keys()\n            for subsample_name, subsample_value in subsample_data.items():\n                res = aset[sample_name, subsample_name]\n                assert_equal(res, subsample_value)\n\n    def test_recursive_subsample_getitem_from_column(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            for subsample_name, subsample_value in subsample_data.items():\n                assert isinstance(aset[sample_name, subsample_name, 0, 0], np.uint16)\n                assert aset[sample_name, subsample_name, 0, 0] == subsample_value[0][0]\n\n    def test_get_subsample_from_column(self, initialized_arrayset, subsample_data_map):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert isinstance(sample, FlatSubsampleReader)\n            res = aset.get((sample_name, ...))\n            assert res.keys() == subsample_data.keys()\n            for subsample_name, subsample_value in subsample_data.items():\n                res = aset.get((sample_name, subsample_name))\n                assert_equal(res, subsample_value)\n\n    def test_get_sample_get_subsample_missing_key(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name in subsample_data_map.keys():\n\n            with pytest.raises(KeyError):\n                aset[sample_name, 'doesnotexist']\n            returned = aset.get((sample_name, \"doesnotexist\"))\n            assert returned is None\n            default_returned = aset.get((sample_name, 9999), default=True)\n            assert default_returned is True\n\n            sample = aset.get(sample_name)\n            returned = sample.get('doesnotexist')\n            assert returned is None\n            default_returned = sample.get(9999, default=True)\n            assert default_returned is True\n\n    def test_getitem_sample_getitem_subsample_missing_key(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name in subsample_data_map.keys():\n            sample = aset[sample_name]\n            with pytest.raises(KeyError):\n                sample['doesnotexist']\n\n    def test_get_sample_get_multiple_subsamples_fails(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            with pytest.raises(TypeError):\n                sample.get(*list(list(subsample_data.keys())[:2]), default=None)\n\n    def test_get_sample_getitem_single_subsample(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            for subsample_name, subsample_value in subsample_data.items():\n                res = sample[subsample_name]\n                assert_equal(res, subsample_value)\n\n    def test_get_sample_getitem_single_subsample_missing_key(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name in subsample_data_map.keys():\n            sample = aset.get(sample_name)\n            returned = sample.get('doesnotexist')\n            assert returned is None\n            default_returned = sample.get(9999, default=True)\n            assert default_returned is True\n\n    def test_get_sample_getitem_multiple_subsamples_fails(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            with pytest.raises(TypeError):\n                sample[list(subsample_data.keys())[:2]]\n\n    def test_get_sample_getitem_subsamples_with_ellipsis(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            res = sample[...]\n            assert isinstance(res, dict)\n            assert len(res) == len(subsample_data)\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_get_sample_getitem_subsamples_with_keys_and_ellipsis_fails(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            existing_subsample_key = next(iter(subsample_data.keys()))\n            with pytest.raises(TypeError):\n                sample[..., existing_subsample_key]\n            with pytest.raises(TypeError):\n                sample[..., [existing_subsample_key]]\n\n    def test_get_sample_getitem_subsamples_with_unbound_slice(self, initialized_arrayset, subsample_data_map):\n        \"\"\"unbound slice is ``slice(None) == [:]``\"\"\"\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            res = sample[:]\n            assert isinstance(res, dict)\n            assert len(res) == len(subsample_data)\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_get_sample_getitem_subsamples_with_bounded_slice(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            res = sample[0:2]\n            assert isinstance(res, dict)\n            assert len(res) == 2\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_subsample_getitem_with_bounded_slice_from_column(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            res = aset[sample_name, 0:2]\n            assert isinstance(res, dict)\n            assert len(res) == 2\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_get_sample_getitem_subsamples_with_out_of_bounds_slice_does_not_fail(\n            self, initialized_arrayset, subsample_data_map):\n        \"\"\"Odd python behavior we emulate: out of bounds sequence slicing is allowed.\n\n        Instead of throwing an exception, the slice is treated as if it should just\n        go up to the total number of elements in the container. For example:\n            [1, 2, 3][0:5] == [1, 2, 3]\n        \"\"\"\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            res = sample[0:5]\n            assert isinstance(res, dict)\n            assert len(res) == len(subsample_data)\n            for k, v in res.items():\n                assert_equal(v, subsample_data[k])\n\n    def test_aset_contextmanager(self, initialized_arrayset, subsample_data_map):\n        assert initialized_arrayset._is_conman is False\n        with initialized_arrayset as aset:\n            assert aset._is_conman is True\n            for sample_name, subsample_data in subsample_data_map.items():\n                sample = aset.get(sample_name)\n                assert sample._is_conman is True\n                for subsample_name, expected_val in subsample_data.items():\n                    assert_equal(sample.get(subsample_name), expected_val)\n                assert sample._is_conman is True\n        assert initialized_arrayset._is_conman is False\n        assert aset._is_conman is False\n        assert sample._is_conman is False\n\n    def test_sample_contextmanager(self, initialized_arrayset, subsample_data_map):\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = initialized_arrayset.get(sample_name)\n            assert initialized_arrayset._is_conman is False\n            assert sample._is_conman is False\n            with sample as sample_cm:\n                assert sample_cm._is_conman is True\n                assert initialized_arrayset._is_conman is True\n                for subsample_name, expected_val in subsample_data.items():\n                    assert_equal(sample_cm.get(subsample_name), expected_val)\n            assert sample._is_conman is False\n            assert initialized_arrayset._is_conman is False\n        assert initialized_arrayset._is_conman is False\n        assert sample._is_conman is False\n\n    def test_sample_subsample_contextmanager(self, initialized_arrayset, subsample_data_map):\n        assert initialized_arrayset._is_conman is False\n        with initialized_arrayset as aset:\n            assert aset._is_conman is True\n            assert aset._enter_count == 1\n            for sample_name, subsample_data in subsample_data_map.items():\n                sample = aset.get(sample_name)\n                assert sample._is_conman is True\n                assert sample._enter_count == 1\n                with sample as sample_cm:\n                    assert aset._is_conman is True\n                    assert sample_cm._is_conman is True\n                    assert aset._enter_count == 2\n                    assert sample_cm._enter_count == 2\n                    for subsample_name, expected_val in subsample_data.items():\n                        assert_equal(sample_cm.get(subsample_name), expected_val)\n                assert aset._is_conman is True\n                assert sample_cm._is_conman is True\n                assert aset._enter_count == 1\n                assert sample_cm._enter_count == 1\n        assert initialized_arrayset._is_conman is False\n        assert aset._is_conman is False\n        assert sample._is_conman is False\n        assert aset._enter_count == 0\n        assert sample_cm._enter_count == 0\n\n    def test_sample_reentrant_contextmanager_fails(self, initialized_arrayset, subsample_data_map):\n        assert initialized_arrayset._is_conman is False\n\n        with initialized_arrayset as aset:\n            assert aset._is_conman is True\n            assert aset._enter_count == 1\n            for sample_name, subsample_data in subsample_data_map.items():\n                sample = aset.get(sample_name)\n                assert sample._is_conman is True\n                assert sample._enter_count == 1\n                with sample as sample_cm:\n                    assert aset._is_conman is True\n                    assert sample_cm._is_conman is True\n                    assert aset._enter_count == 2\n                    assert sample_cm._enter_count == 2\n                    for subsample_name, expected_val in subsample_data.items():\n                        assert_equal(sample_cm.get(subsample_name), expected_val)\n                # reentrant demonstrated here here\n                with sample as sample_cm2:\n                    assert aset._is_conman is True\n                    assert sample_cm._is_conman is True\n                    assert sample_cm2._is_conman is True\n                    assert aset._enter_count == 2\n                    assert sample_cm._enter_count == 2\n                    assert sample_cm2._enter_count == 2\n                    for subsample_name, expected_val in subsample_data.items():\n                        assert_equal(sample_cm2.get(subsample_name), expected_val)\n                assert aset._is_conman is True\n                assert sample_cm._is_conman is True\n                assert aset._enter_count == 1\n                assert sample_cm._enter_count == 1\n        assert initialized_arrayset._is_conman is False\n        assert aset._is_conman is False\n        assert sample._is_conman is False\n        assert aset._enter_count == 0\n        assert sample_cm._enter_count == 0\n\n    # -------------------------- dict-style iteration methods ---------------------------\n\n    def test_calling_iter_on_arrayset(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        arrayset_it = iter(aset)  # returns iterator over sample keys\n        for sample_name in arrayset_it:\n            assert sample_name in aset\n            assert sample_name in subsample_data_map\n\n    def test_calling_iter_on_sample_in_arrayset(self, initialized_arrayset, subsample_data_map):\n        aset = initialized_arrayset\n        arrayset_it = iter(aset)  # returns iterator over sample keys\n        for sample_name in arrayset_it:\n            assert sample_name in aset\n            assert sample_name in subsample_data_map\n\n            sample_it = iter(aset[sample_name])  # returns iterator over subsample keys\n            for subsample_name in sample_it:\n                assert subsample_name in aset[sample_name]\n                assert subsample_name in subsample_data_map[sample_name]\n\n    def test_get_sample_keys_method(self, initialized_arrayset):\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        assert isinstance(aset.keys(), Iterator)\n        res = list(aset.keys())\n        assert len(res) == 2\n        assert 2 and 'foo' in res\n\n    def test_get_sample_keys_method_local_only(self, initialized_arrayset):\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n\n        assert isinstance(aset.keys(local=True), Iterator)\n        res = list(aset.keys(local=True))\n        assert len(res) == 1\n        assert 2 in res\n\n        del aset._samples['foo']._subsamples[50]\n\n    def test_get_sample_subsample_keys_method(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert isinstance(sample.keys(), Iterator)\n            res = list(sample.keys())\n            for k in res:\n                assert k in subsample_data\n\n    def test_get_sample_subsample_keys_method_local_only(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n        aset[2]._subsamples[50] = template\n\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n\n            # test local only properties\n            assert isinstance(sample.keys(local=True), Iterator)\n            res = list(sample.keys(local=True))\n            assert len(res) == len(subsample_data)\n            for k in res:\n                assert k in subsample_data\n                assert k != 50\n\n            # compare to local+remote properties\n            assert isinstance(sample.keys(local=False), Iterator)\n            res = list(sample.keys(local=False))\n            assert len(res) == len(subsample_data) + 1\n            assert 50 in res\n            for k in res:\n                assert k in list(subsample_data.keys()) + [50]\n\n        del aset._samples['foo']._subsamples[50]\n        del aset._samples[2]._subsamples[50]\n\n    def test_get_sample_values_method(self, initialized_arrayset):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        assert isinstance(aset.values(), Iterator)\n        res = list(aset.values())\n        assert len(res) == 2\n        for sample in res:\n            assert sample.sample == 'foo' or 2\n            assert isinstance(sample, FlatSubsampleReader)\n\n    def test_get_sample_values_method_local_only(self, initialized_arrayset):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n\n        assert isinstance(aset.values(local=True), Iterator)\n        res = list(aset.values(local=True))\n        assert len(res) == 1\n        sample = res[0]\n        assert sample.sample == 2\n        assert isinstance(sample, FlatSubsampleReader)\n\n        del aset._samples['foo']._subsamples[50]\n\n    def test_get_sample_subsample_values_method(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert isinstance(sample.values(), Iterator)\n            res = list(sample.values())\n            for v in res:\n                assert any([np.allclose(v, arr) for arr in subsample_data.values()])\n\n    def test_get_sample_subsample_values_method_local_only(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        from hangar.columns.common import open_file_handles\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n        aset[2]._subsamples[50] = template\n        mocked_fhand = open_file_handles(\n            ['50'], path=initialized_arrayset._path, mode='a', schema=aset._schema)\n        aset._be_fs['50'] = mocked_fhand['50']\n\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n\n            # test local only properties\n            assert isinstance(sample.values(local=True), Iterator)\n            res = list(sample.values(local=True))\n            assert len(res) == len(subsample_data)\n            for v in res:\n                assert any([np.allclose(v, arr) for arr in subsample_data.values()])\n\n            # test local+remote properties\n            with pytest.raises(FileNotFoundError):\n                list(sample.values(local=False))\n\n        del aset._be_fs['50']\n        del aset._samples['foo']._subsamples[50]\n        del aset._samples[2]._subsamples[50]\n\n    def test_get_sample_items_method(self, initialized_arrayset):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        assert isinstance(aset.items(), Iterator)\n        res = list(aset.items())\n        assert len(res) == 2\n        for sample_name, sample in res:\n            assert sample_name == 2 or 'foo'\n            assert isinstance(sample, FlatSubsampleReader)\n            assert sample_name == sample.sample\n\n    def test_get_sample_items_method_local_only(self, initialized_arrayset):\n        from hangar.columns.layout_nested import FlatSubsampleReader\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n        # add subsamples which are not local to each subsample\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n\n        assert isinstance(aset.items(local=True), Iterator)\n        res = list(aset.items(local=True))\n        assert len(res) == 1\n        sample_name, sample = res[0]\n        assert sample_name == 2\n        assert isinstance(sample, FlatSubsampleReader)\n        assert sample.sample == sample_name\n\n        del aset._samples['foo']._subsamples[50]\n\n    def test_get_sample_subsample_items_method(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n\n        aset = initialized_arrayset\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            assert isinstance(sample.items(), Iterator)\n            res = list(sample.items())\n            for k, v in res:\n                assert_equal(v, subsample_data[k])\n\n    def test_get_sample_subsample_items_method_local_only(self, initialized_arrayset, subsample_data_map):\n        from collections.abc import Iterator\n        aset = initialized_arrayset\n\n        # add subsamples which are not local to each subsample ato perform the mock\n        from hangar.backends import backend_decoder\n        from hangar.columns.common import open_file_handles\n        template = backend_decoder(b'50:daeaaeeaebv')\n        aset['foo']._subsamples[50] = template\n        aset[2]._subsamples[50] = template\n        mocked_fhand = open_file_handles(\n            ['50'], path=initialized_arrayset._path, mode='a', schema=aset._schema)\n        aset._be_fs['50'] = mocked_fhand['50']\n\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n\n            # test local only properties\n            assert isinstance(sample.items(local=True), Iterator)\n            res = list(sample.items(local=True))\n            assert len(res) == len(subsample_data)\n            for k, v in res:\n                assert_equal(v, subsample_data[k])\n                assert k != 50\n\n            # test local+remote properties\n            with pytest.raises(FileNotFoundError):\n                list(sample.items(local=False))\n\n        del aset._be_fs['50']\n        del aset._samples['foo']._subsamples[50]\n        del aset._samples[2]._subsamples[50]\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_arrayset_remote_references_property_with_none(\n            self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray\n    ):\n        co = repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray,\n                                      backend=aset1_backend, contains_subsamples=True)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int,\n                                      backend=aset2_backend, contains_subsamples=True)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32,\n                                      backend=aset3_backend, contains_subsamples=True)\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1[1] = {11: randomsizedarray}\n            d2[1] = {21: np.ones((2, 2), dtype=np.int)}\n            d3[1] = {31: np.ones((3, 4), dtype=np.float32)}\n\n        assert co.columns.contains_remote_references == {'aset1': False, 'aset2': False, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (), 'aset2': (), 'aset3': ()}\n        co.close()\n\n    @pytest.mark.parametrize(\"aset1_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset2_backend\", fixed_shape_backend_params)\n    @pytest.mark.parametrize(\"aset3_backend\", fixed_shape_backend_params)\n    def test_arrayset_remote_references_property_with_remotes(\n            self, aset1_backend, aset2_backend, aset3_backend, repo, randomsizedarray\n    ):\n        co = repo.checkout(write=True)\n        aset1 = co.add_ndarray_column('aset1', prototype=randomsizedarray,\n                                      backend=aset1_backend, contains_subsamples=True)\n        aset2 = co.add_ndarray_column('aset2', shape=(2, 2), dtype=np.int,\n                                      backend=aset2_backend, contains_subsamples=True)\n        aset3 = co.add_ndarray_column('aset3', shape=(3, 4), dtype=np.float32,\n                                      backend=aset3_backend, contains_subsamples=True)\n        with aset1 as d1, aset2 as d2, aset3 as d3:\n            d1[1] = {11: randomsizedarray}\n            d2[1] = {21: np.ones((2, 2), dtype=np.int)}\n            d3[1] = {31: np.ones((3, 4), dtype=np.float32)}\n\n        assert co.columns.contains_remote_references == {'aset1': False, 'aset2': False, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (), 'aset2': (), 'aset3': ()}\n        co.commit('hello')\n        co.close()\n        co = repo.checkout()\n        # perform the mock\n        # perform the mock\n        from hangar.backends import backend_decoder\n        template = backend_decoder(b'50:daeaaeeaebv')\n        co._columns._columns['aset1']._samples[1]._subsamples[12] = template\n        co._columns._columns['aset2']._samples[1]._subsamples[22] = template\n\n        assert co.columns.contains_remote_references == {'aset1': True, 'aset2': True, 'aset3': False}\n        assert co.columns.remote_sample_keys == {'aset1': (1,), 'aset2': (1,), 'aset3': ()}\n        co.close()\n\n\nclass TestWriteThenReadCheckout:\n\n    @pytest.mark.parametrize('backend', fixed_shape_backend_params)\n    def test_add_data_commit_checkout_read_only_contains_same(self, backend, repo, subsample_data_map):\n        co = repo.checkout(write=True)\n        aset = co.add_ndarray_column('foo', shape=(5, 7), dtype=np.uint16,\n                                     backend=backend, contains_subsamples=True)\n        added = aset.update(subsample_data_map)\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = aset.get(sample_name)\n            for subsample_name, subsample_val in subsample_data.items():\n                assert_equal(sample[subsample_name], subsample_val)\n        co.commit('first')\n        co.close()\n\n        rco = repo.checkout()\n        naset = rco.columns['foo']\n        for sample_name, subsample_data in subsample_data_map.items():\n            sample = naset.get(sample_name)\n            for subsample_name, subsample_val in subsample_data.items():\n                assert_equal(sample[subsample_name], subsample_val)\n        rco.close()\n"
  },
  {
    "path": "tests/test_column_pickle.py",
    "content": "import pytest\nimport numpy as np\nfrom conftest import fixed_shape_backend_params\n\n\ndef assert_equal(arr, arr2):\n    assert np.array_equal(arr, arr2)\n    assert arr.dtype == arr2.dtype\n\n\n\n@pytest.fixture(scope='class')\ndef subsample_data_map():\n    arr = np.arange(5*7).astype(np.uint16).reshape((5, 7))\n    res = {\n        'foo': {\n            0: arr,\n            1: arr + 1,\n            2: arr + 2\n        },\n        2: {\n            'bar': arr + 3,\n            'baz': arr + 4\n        }\n    }\n    return res\n\n\n@pytest.fixture(scope='class')\ndef sample_data_map():\n    arr = np.arange(5*7).astype(np.uint16).reshape((5, 7))\n    res = {\n        0: arr,\n        1: arr + 1,\n        2: arr + 2,\n        'bar': arr + 3,\n        'baz': arr + 4,\n    }\n    return res\n\n\n@pytest.fixture(params=fixed_shape_backend_params, scope='class')\ndef backend_param(request):\n    return request.param\n\n\n@pytest.fixture(params=[False, True], scope='class')\ndef write_enabled(request):\n    return request.param\n\n\n@pytest.fixture(params=[False, True], scope='class')\ndef contains_subsamples(request):\n    return request.param\n\n\n@pytest.fixture(scope='class')\ndef initialized_column(\n    write_enabled, backend_param, contains_subsamples, classrepo, subsample_data_map, sample_data_map\n):\n    co = classrepo.checkout(write=True)\n    aset = co.add_ndarray_column(f'foo{backend_param}{int(write_enabled)}{int(contains_subsamples)}',\n                                    shape=(5, 7), dtype=np.uint16,\n                                    backend=backend_param, contains_subsamples=contains_subsamples)\n    if contains_subsamples:\n        aset.update(subsample_data_map)\n    else:\n        aset.update(sample_data_map)\n    co.commit(f'done {backend_param}{write_enabled}{contains_subsamples}')\n    co.close()\n    if write_enabled:\n        nco = classrepo.checkout(write=True)\n        yield nco.columns[f'foo{backend_param}{int(write_enabled)}{int(contains_subsamples)}']\n        nco.close()\n    else:\n        nco = classrepo.checkout()\n        yield nco.columns[f'foo{backend_param}{int(write_enabled)}{int(contains_subsamples)}']\n        nco.close()\n\n\n@pytest.fixture(scope='class')\ndef initialized_column_read_only(backend_param, contains_subsamples, classrepo, subsample_data_map, sample_data_map):\n    co = classrepo.checkout(write=True)\n    aset = co.add_ndarray_column(f'foo{backend_param}{int(contains_subsamples)}',\n                                    shape=(5, 7), dtype=np.uint16,\n                                    backend=backend_param, contains_subsamples=contains_subsamples)\n    if contains_subsamples:\n        aset.update(subsample_data_map)\n    else:\n        aset.update(sample_data_map)\n\n    digest = co.commit(f'done {backend_param}{contains_subsamples}')\n    co.close()\n    nco = classrepo.checkout(write=False, commit=digest)\n    yield nco.columns[f'foo{backend_param}{int(contains_subsamples)}']\n    nco.close()\n\n\nclass TestPickleableColumns:\n\n    def test_is_pickleable(self, initialized_column, sample_data_map, subsample_data_map):\n        import pickle\n\n        aset = initialized_column\n        if aset.iswriteable:\n            with pytest.raises(PermissionError, match='Method \"__getstate__\" cannot'):\n                pickle.dumps(aset, protocol=pickle.HIGHEST_PROTOCOL)\n        else:\n            pkl = pickle.dumps(aset, protocol=pickle.HIGHEST_PROTOCOL)\n            assert isinstance(pkl, bytes)\n\n\nclass TestLoadableColumns:\n\n    def test_is_pickle_is_loadable(self, initialized_column_read_only, sample_data_map, subsample_data_map):\n        import pickle\n\n        aset = initialized_column_read_only\n        pkl = pickle.dumps(aset, protocol=pickle.HIGHEST_PROTOCOL)\n        assert isinstance(pkl, bytes)\n        equiv = pickle.loads(pkl)\n\n        if aset.contains_subsamples:\n            assert len(aset) == len(subsample_data_map)\n            assert len(equiv) == len(subsample_data_map)\n\n            for sample_key, subsample_data in subsample_data_map.items():\n                assert sample_key in aset\n                assert sample_key in equiv\n                aset_sample = aset[sample_key]\n                equiv_sample = equiv[sample_key]\n                assert len(aset_sample) == len(subsample_data)\n                assert len(equiv_sample) == len(subsample_data)\n\n                for subsample_key, expected in subsample_data.items():\n                    assert subsample_key in aset_sample\n                    assert subsample_key in equiv_sample\n                    assert_equal(aset_sample[subsample_key], expected)\n                    assert_equal(equiv_sample[subsample_key], expected)\n        else:\n            assert len(aset) == len(sample_data_map)\n            assert len(equiv) == len(sample_data_map)\n            for sample_key, expected in sample_data_map.items():\n                assert sample_key in aset\n                assert sample_key in equiv\n                assert_equal(aset[sample_key], expected)\n                assert_equal(equiv[sample_key], expected)\n        equiv._destruct()\n        del equiv\n"
  },
  {
    "path": "tests/test_commit_ref_verification.py",
    "content": "import pytest\n\n\ndef test_verify_corruption_in_commit_ref_alerts(two_commit_filled_samples_repo):\n    from hangar.records.parsing import commit_ref_db_key_from_raw_key\n    from hangar.records.parsing import commit_ref_raw_val_from_db_val\n    from hangar.records.parsing import commit_ref_db_val_from_raw_val\n\n    repo = two_commit_filled_samples_repo\n    history = repo.log(return_contents=True)\n    head_commit = history['head']\n\n    refKey = commit_ref_db_key_from_raw_key(head_commit)\n    with repo._env.refenv.begin(write=True) as txn:\n        refVal = txn.get(refKey)\n        ref_unpacked = commit_ref_raw_val_from_db_val(refVal)\n\n        modified_ref = list(ref_unpacked.db_kvs)\n        modified_ref[0] = list(modified_ref[0])\n        modified_ref[0][1] = b'corrupt!'\n        modified_ref[0] = tuple(modified_ref[0])\n        modified_ref = tuple(modified_ref)\n        modifiedVal = commit_ref_db_val_from_raw_val(modified_ref)\n\n        txn.put(refKey, modifiedVal.raw, overwrite=True)\n\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=True)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False, commit=head_commit)\n\n\ndef test_verify_corruption_in_commit_parent_val_alerts(two_commit_filled_samples_repo):\n    from hangar.records.parsing import commit_parent_db_key_from_raw_key\n    from hangar.records.parsing import commit_parent_raw_val_from_db_val\n    from hangar.records.parsing import commit_parent_db_val_from_raw_val\n\n    repo = two_commit_filled_samples_repo\n    history = repo.log(return_contents=True)\n    head_commit = history['head']\n\n    parentKey = commit_parent_db_key_from_raw_key(head_commit)\n    with repo._env.refenv.begin(write=True) as txn:\n        parentVal = txn.get(parentKey)\n\n        parent_raw = commit_parent_raw_val_from_db_val(parentVal)\n        parent = parent_raw.ancestor_spec\n        modifiedVal = commit_parent_db_val_from_raw_val(\n            master_ancestor='corrupt',\n            dev_ancestor=parent.dev_ancestor,\n            is_merge_commit=parent.is_merge_commit)\n\n        txn.put(parentKey, modifiedVal.raw, overwrite=True)\n\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=True)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False, commit=head_commit)\n\n\ndef test_verify_corruption_in_spec_val_alerts(two_commit_filled_samples_repo):\n    from hangar.records.parsing import commit_spec_db_key_from_raw_key\n    from hangar.records.parsing import commit_spec_db_val_from_raw_val\n    from hangar.records.parsing import commit_spec_raw_val_from_db_val\n\n    repo = two_commit_filled_samples_repo\n    history = repo.log(return_contents=True)\n    head_commit = history['head']\n\n    specKey = commit_spec_db_key_from_raw_key(head_commit)\n    with repo._env.refenv.begin(write=True) as txn:\n        specVal = txn.get(specKey)\n\n        spec_raw = commit_spec_raw_val_from_db_val(specVal)\n        modified_spec = spec_raw.user_spec\n        modified_spec = modified_spec._replace(commit_time=10.42)\n        modifiedVal = commit_spec_db_val_from_raw_val(*modified_spec)\n\n        txn.put(specKey, modifiedVal.raw, overwrite=True)\n\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=True)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False)\n    with pytest.raises(IOError):\n        _ = repo.checkout(write=False, commit=head_commit)\n"
  },
  {
    "path": "tests/test_context_management.py",
    "content": "import pytest\nimport numpy as np\n\nfrom conftest import fixed_shape_backend_params, variable_shape_backend_params\n\nall_backend_params = list(set(fixed_shape_backend_params).union(set(variable_shape_backend_params)))\n\n\n@pytest.mark.parametrize('backend1', all_backend_params)\n@pytest.mark.parametrize('backend2', all_backend_params)\ndef test_nested_context_manager_does_not_close_all_open(repo, backend1, backend2):\n    co = repo.checkout(write=True)\n    fooaset = co.add_ndarray_column('foo', prototype=np.arange(10), backend=backend1)\n    baraset = co.add_ndarray_column('bar', prototype=np.arange(10), backend=backend2, contains_subsamples=True)\n\n    with co:\n        assert co.columns._any_is_conman() is True\n        assert fooaset._is_conman is True\n        assert baraset._is_conman is True\n        with fooaset as foo:\n            assert co.columns._any_is_conman() is True\n            assert foo._is_conman is True\n            assert fooaset._is_conman is True\n            assert baraset._is_conman is True\n        assert co.columns._any_is_conman() is True\n        assert fooaset._is_conman is True\n        assert baraset._is_conman is True\n    assert co.columns._any_is_conman() is False\n    co.close()\n"
  },
  {
    "path": "tests/test_diff.py",
    "content": "import pytest\nimport numpy as np\n\n\nclass TestReaderWriterDiff(object):\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_diff_by_commit_and_branch(self, repo_2_br_no_conf, writer):\n        repo = repo_2_br_no_conf\n        testco = repo.checkout(branch='testbranch')\n        masterco = repo.checkout(write=writer, branch='master')\n        commit_diffs = masterco.diff.commit(testco.commit_hash)\n        branch_diffs = masterco.diff.branch('testbranch')\n        assert commit_diffs == branch_diffs\n        testco.close()\n        masterco.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_diff_with_wrong_commit_hash(self, repo_2_br_no_conf, writer):\n        repo = repo_2_br_no_conf\n        testco = repo.checkout(branch='testbranch')\n        masterco = repo.checkout(write=writer, branch='master')\n        wrong_commit_hash = testco.commit_hash + 'WrongHash'\n        with pytest.raises(ValueError):\n            masterco.diff.commit(wrong_commit_hash)\n        testco.close()\n        masterco.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_diff_with_wrong_branch_name(self, repo_1_br_no_conf, writer):\n        repo = repo_1_br_no_conf\n        masterco = repo.checkout(write=writer, branch='master')\n        with pytest.raises(ValueError):\n            masterco.diff.branch('wrong_branch_name')\n        masterco.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_comparing_diffs_of_dev_and_master(self, repo_1_br_no_conf, writer):\n        repo = repo_1_br_no_conf\n        dummyData = np.arange(50)\n\n        # mutating and removing data from testbranch\n        testco = repo.checkout(write=True, branch='testbranch')\n        testco.columns['dummy']['1'] = dummyData\n        del testco.columns['dummy']['2']\n        testco.commit(\"mutation and removal\")\n        testco.close()\n\n        co1 = repo.checkout(write=writer, branch='master')\n        diffdata1 = co1.diff.branch('testbranch')\n        diffs1 = diffdata1.diff\n        co1.close()\n\n        co2 = repo.checkout(write=writer, branch='testbranch')\n        diffdata2 = co2.diff.branch('master')\n        diffs2 = diffdata2.diff\n        co2.close()\n\n        assert diffs1.added.samples == diffs2.added.samples\n        assert diffs1.deleted.samples == diffs2.deleted.samples\n        assert diffs1.mutated.samples == diffs2.mutated.samples\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_diff_data_samples(self, repo_1_br_no_conf, writer):\n        repo = repo_1_br_no_conf\n        dummyData = np.arange(50)\n\n        # mutating and removing data from testbranch\n        testco = repo.checkout(write=True, branch='testbranch')\n        testco.columns['dummy']['1'] = dummyData\n        del testco.columns['dummy']['2']\n        testco.commit(\"mutation and removal\")\n        testco.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        diffdata = co.diff.branch('testbranch')\n        conflicts = diffdata.conflict\n        assert conflicts.conflict is False\n\n        diffs = diffdata.diff\n\n        # testing columns and metadata that has no change\n        assert len(diffs.added.samples) == 20\n        assert len(diffs.mutated.samples) == 1\n        assert len(diffs.deleted.samples) == 1\n\n        assert len(diffs.added.schema) == 0\n        assert len(diffs.deleted.schema) == 0\n        assert len(diffs.mutated.schema) == 0\n\n        for datarecord in diffs.added.samples:\n            assert 9 < int(datarecord.sample) < 20\n        for mutated in diffs.mutated.samples:\n            assert mutated.sample == '1'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_sample_addition_conflict(self, repo_1_br_no_conf, writer):\n        # t1\n        repo = repo_1_br_no_conf\n        dummyData = np.arange(50)\n\n        # adding data in master\n        co = repo.checkout(write=True, branch='master')\n        dummyData[:] = 123\n        co.columns['dummy']['55'] = dummyData\n        co.commit('Adding data in master')\n        co.close()\n\n        # adding data in testbranch\n        co = repo.checkout(write=True, branch='testbranch')\n        dummyData[:] = 234\n        co.columns['dummy']['55'] = dummyData\n        co.commit('adding data in testbranch')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch').conflict\n        assert conflicts.conflict is True\n        assert len(conflicts.t1.samples) == 1\n        for k in conflicts.t1.samples:\n            assert k.sample == '55'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_sample_removal_conflict(self, repo_1_br_no_conf, writer):\n        # t21 and t22\n        dummyData = np.arange(50)\n        dummyData[:] = 123\n        repo = repo_1_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        del co.columns['dummy']['6']\n        co.columns['dummy']['7'] = dummyData\n        co.commit('removal & mutation in master')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        co.columns['dummy']['6'] = dummyData\n        del co.columns['dummy']['7']\n        co.commit('removal & mutation in dev')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch').conflict\n        assert len(conflicts.t21.samples) == 1\n        assert len(conflicts.t22.samples) == 1\n        for k in conflicts.t21.samples:\n            assert k.sample == '6'\n        for k in conflicts.t22.samples:\n            assert k.sample == '7'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_sample_mutation_conflict(self, repo_1_br_no_conf, writer):\n        # t3\n        dummyData = np.arange(50)\n        dummyData[:] = 123\n        repo = repo_1_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        co.columns['dummy']['7'] = dummyData\n        co.commit('mutation in master')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        dummyData[:] = 234\n        co.columns['dummy']['7'] = dummyData\n        co.commit('mutation in dev')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch').conflict\n        assert len(conflicts.t3.samples) == 1\n        for k in conflicts.t3.samples:\n            assert k.sample == '7'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_aset_addition_conflict(self, aset_samples_initialized_repo, writer):\n        # t1\n        repo = aset_samples_initialized_repo\n\n        repo.create_branch('testbranch')\n        co = repo.checkout(write=True, branch='master')\n        co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float64)\n        co.commit('aset init in master')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        co.add_ndarray_column(name='testing_aset', shape=(7, 7), dtype=np.float64)\n        co.commit('aset init in dev')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch').conflict\n        assert len(conflicts.t1.schema) == 1\n        for k in conflicts.t1.schema:\n            assert k.column == 'testing_aset'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_aset_removal_conflict(self, aset_samples_initialized_repo, writer):\n        # t21 and t22\n        repo = aset_samples_initialized_repo\n        co = repo.checkout(write=True, branch='master')\n        co.add_ndarray_column(name='testing_aset1', shape=(5, 7), dtype=np.float64)\n        co.add_ndarray_column(name='testing_aset2', shape=(5, 7), dtype=np.float64)\n        co.commit('added asets')\n        co.close()\n        repo.create_branch('testbranch')\n\n        co = repo.checkout(write=True, branch='master')\n        del co.columns['testing_aset1']\n        del co.columns['testing_aset2']\n        co.add_ndarray_column(name='testing_aset2', shape=(5, 7), dtype=np.float32)\n        co.commit('mutation and removal from master')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        del co.columns['testing_aset1']\n        del co.columns['testing_aset2']\n        co.add_ndarray_column(name='testing_aset1', shape=(5, 7), dtype=np.float32)\n        co.commit('mutation and removal from dev')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch')[1]\n        assert len(conflicts.t21.schema) == 1\n        assert len(conflicts.t22.schema) == 1\n        assert list(conflicts.t21.schema.keys())[0].column == 'testing_aset1'\n        assert list(conflicts.t22.schema.keys())[0].column == 'testing_aset2'\n        co.close()\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_aset_mutation_conflict(self, aset_samples_initialized_repo, writer):\n        # t3\n        repo = aset_samples_initialized_repo\n        co = repo.checkout(write=True, branch='master')\n        co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float64)\n        co.commit('added aset')\n        co.close()\n        repo.create_branch('testbranch')\n\n        co = repo.checkout(write=True, branch='master')\n        del co.columns['testing_aset']\n        co.add_ndarray_column(name='testing_aset', shape=(7, 7), dtype=np.float64)\n        co.commit('mutation from master')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        del co.columns['testing_aset']\n        co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float32)\n        co.commit('mutation from dev')\n        co.close()\n\n        co = repo.checkout(write=writer, branch='master')\n        conflicts = co.diff.branch('testbranch')[1]\n        assert len(conflicts.t3.schema) == 1\n        assert list(conflicts.t3.schema.keys())[0].column == 'testing_aset'\n        co.close()\n\n\n    @pytest.mark.parametrize('writer', [False, True])\n    def test_commits_inside_cm(self, aset_samples_initialized_repo, array5by7, writer):\n        repo = aset_samples_initialized_repo\n        repo.create_branch('testbranch')\n        co = repo.checkout(write=True, branch='testbranch')\n        aset = co.columns['writtenaset']\n        aset2 = co.add_ndarray_column('aset2', prototype=array5by7)\n        aset2[1] = array5by7\n        with aset:\n            aset[100] = array5by7\n            co.commit('inside cm')\n            aset[101] = array5by7\n            co.commit('another commit inside cm')\n        co.close()\n        co = repo.checkout(write=writer, branch='testbranch')\n        assert np.allclose(co.columns['writtenaset'][101], array5by7)\n        diff = co.diff.branch('master').diff\n        assert 'aset2' in [x.column for x in diff.added.schema.keys()]\n        calledWithAset = False\n        for record in diff.added.samples:\n            if record.column == 'writtenaset':\n                calledWithAset = True\n                assert record.sample in [100, 101]\n        assert calledWithAset is True\n        co.close()\n\n\nclass TestWriterDiff(object):\n\n    def test_status_and_staged_column(self, aset_samples_initialized_repo):\n        repo = aset_samples_initialized_repo\n        co = repo.checkout(write=True)\n        co.add_str_column('DOESNOTEXIST')\n        co['DOESNOTEXIST'][1] = 'foo'\n        assert co.diff.status() == 'DIRTY'\n        co.commit('init metadata')\n        assert co.diff.status() == 'CLEAN'\n        co.close()\n\n    def test_status_and_staged_samples(self, aset_samples_initialized_repo):\n        dummyData = np.zeros((5, 7))\n        repo = aset_samples_initialized_repo\n        co = repo.checkout()\n        with pytest.raises(AttributeError):\n            co.diff.status()  # Read checkout doesn't have status()\n\n        co = repo.checkout(write=True)\n        co.columns['writtenaset']['45'] = dummyData\n        assert co.diff.status() == 'DIRTY'\n        diff = co.diff.staged()\n        calledWithAset = False\n        for record in diff.diff.added.samples:\n            if record.column == 'writtenaset':\n                calledWithAset = True\n                assert record.sample in '45'\n        assert calledWithAset is True\n        co.commit('adding')\n        assert co.diff.status() == 'CLEAN'\n        co.close()\n\n    def test_status_and_staged_aset(self, aset_samples_initialized_repo):\n        repo = aset_samples_initialized_repo\n        co = repo.checkout(write=True)\n        co.add_ndarray_column(name='sampleaset', shape=(3, 5), dtype=np.float32)\n        assert co.diff.status() == 'DIRTY'\n        diff = co.diff.staged()\n        assert 'sampleaset' in [x.column for x in diff.diff.added.schema]\n        co.commit('init aset')\n        assert co.diff.status() == 'CLEAN'\n        co.close()\n\n\ndef test_repo_diff_method_branch_names(aset_samples_initialized_repo):\n    # t3\n    repo = aset_samples_initialized_repo\n    co = repo.checkout(write=True, branch='master')\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float64)\n    co.commit('added aset')\n    co.close()\n    repo.create_branch('testbranch')\n\n    co = repo.checkout(write=True, branch='master')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(7, 7), dtype=np.float64)\n    masterHEAD = co.commit('mutation from master')\n    co.close()\n\n    co = repo.checkout(write=True, branch='testbranch')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float32)\n    devHEAD = co.commit('mutation from dev')\n    co.close()\n\n    co = repo.checkout(write=False, branch='master')\n    co_diff = co.diff.branch('testbranch')\n    co.close()\n\n    repo_diff = repo.diff('master', 'testbranch')\n    assert co_diff == repo_diff\n\n\ndef test_repo_diff_method_commit_digests(aset_samples_initialized_repo):\n    # t3\n    repo = aset_samples_initialized_repo\n    co = repo.checkout(write=True, branch='master')\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float64)\n    co.commit('added aset')\n    co.close()\n    repo.create_branch('testbranch')\n\n    co = repo.checkout(write=True, branch='master')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(7, 7), dtype=np.float64)\n    masterHEAD = co.commit('mutation from master')\n    co.close()\n\n    co = repo.checkout(write=True, branch='testbranch')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float32)\n    devHEAD = co.commit('mutation from dev')\n    co.close()\n\n    co = repo.checkout(write=False, branch='master')\n    co_diff = co.diff.commit(devHEAD)\n    co.close()\n\n    repo_diff = repo.diff(masterHEAD, devHEAD)\n    assert co_diff == repo_diff\n\n\n\n\ndef test_repo_diff_method_one_branch_one_commit_digest(aset_samples_initialized_repo):\n    # t3\n    repo = aset_samples_initialized_repo\n    co = repo.checkout(write=True, branch='master')\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float64)\n    co.commit('added aset')\n    co.close()\n    repo.create_branch('testbranch')\n\n    co = repo.checkout(write=True, branch='master')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(7, 7), dtype=np.float64)\n    masterHEAD = co.commit('mutation from master')\n    co.close()\n\n    co = repo.checkout(write=True, branch='testbranch')\n    del co.columns['testing_aset']\n    co.add_ndarray_column(name='testing_aset', shape=(5, 7), dtype=np.float32)\n    devHEAD = co.commit('mutation from dev')\n    co.close()\n\n    co = repo.checkout(write=False, branch='master')\n    co_diff = co.diff.commit(devHEAD)\n    co.close()\n\n    repo_diff1 = repo.diff('master', devHEAD)\n    assert co_diff == repo_diff1\n\n    repo_diff2 = repo.diff(masterHEAD, 'testbranch')\n    assert co_diff == repo_diff2\n"
  },
  {
    "path": "tests/test_diff_staged_summary.py",
    "content": "import pytest\nimport numpy as np\n\n\ndef test_add_samples_to_existing_column(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n    expected = '============ \\n'\\\n               '| Branch: master \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| ADDED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 20 \\n'\\\n               '|  - \"dummy\": 20 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| DELETED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| MUTATED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\n    dummyData = np.arange(50).astype(np.int64)\n    co2 = repo.checkout(write=True)\n    for idx in range(10, 20):\n        dummyData[:] = idx\n        co2.columns['dummy'][str(idx)] = dummyData\n        co2.columns['dummy'][idx] = dummyData\n    df = co2.diff.staged()\n    co2.close()\n    assert status(repo._env.hashenv, 'master', df.diff).getvalue() == expected\n\n\ndef test_mutate_sample_values(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo = repo_20_filled_samples2\n    expected = '============ \\n'\\\n               '| Branch: master \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| ADDED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| DELETED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| MUTATED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 5 \\n'\\\n               '|  - \"dummy\": 5 \\n'\\\n               ' \\n'\n\n    dummyData = np.arange(50).astype(np.int64)\n    co2 = repo.checkout(write=True)\n    for idx in range(5, 10):\n        dummyData[:] = idx + 10\n        co2.columns['dummy'][idx] = dummyData\n    df = co2.diff.staged()\n    co2.close()\n    assert status(repo._env.hashenv, 'master', df.diff).getvalue() == expected\n\n\ndef test_delete_samples(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n    expected = '============ \\n'\\\n               '| Branch: master \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| ADDED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| DELETED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 5 \\n'\\\n               '|  - \"dummy\": 5 \\n'\\\n               ' \\n'\\\n               '============ \\n'\\\n               '| MUTATED \\n'\\\n               '|---------- \\n'\\\n               '| Schema: 0 \\n'\\\n               '|---------- \\n'\\\n               '| Samples: 0 \\n'\\\n               ' \\n'\n\n    co2 = repo.checkout(write=True)\n    for idx in range(5, 10):\n        del co2.columns['dummy'][idx]\n    df = co2.diff.staged()\n    co2.close()\n    assert status(repo._env.hashenv, 'master', df.diff).getvalue() == expected\n\n\ndef test_add_new_column_schema_and_samples(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n    expected = (\n        '============ \\n'\n        '| Branch: master \\n'\n        ' \\n'\n        '============ \\n'\n        '| ADDED \\n'\n        '|---------- \\n'\n        '| Schema: 1 \\n'\n        '|  - \"new_aset\": \\n'\n        '|       digest=\"1=555a833b66ab\" \\n'\n        '|       column_layout: flat \\n'\n        '|       column_type: ndarray \\n'\n        '|       schema_hasher_tcode: 1 \\n'\n        '|       data_hasher_tcode: 0 \\n'\n        '|       schema_type: fixed_shape \\n'\n        '|       shape: (10, 10) \\n'\n        '|       dtype: float32 \\n'\n        '|       backend: 01 \\n'\n        '|       backend_options: {\\'complib\\': \\'blosc:lz4hc\\', \\'complevel\\': 5, \\'shuffle\\': \\'byte\\'} \\n'\n        '|---------- \\n'\n        '| Samples: 5 \\n'\n        '|  - \"new_aset\": 5 \\n'\n        ' \\n'\n        '============ \\n'\n        '| DELETED \\n'\n        '|---------- \\n'\n        '| Schema: 0 \\n'\n        '|---------- \\n'\n        '| Samples: 0 \\n'\n        ' \\n'\n        '============ \\n'\n        '| MUTATED \\n'\n        '|---------- \\n'\n        '| Schema: 0 \\n'\n        '|---------- \\n'\n        '| Samples: 0 \\n'\n        ' \\n'\n    )\n    co2 = repo.checkout(write=True)\n    co2.add_ndarray_column('new_aset', shape=(10, 10), dtype=np.float32)\n    for idx in range(5):\n        dummyData = np.random.randn(10, 10).astype(np.float32)\n        co2.columns['new_aset'][idx] = dummyData\n    df = co2.diff.staged()\n    co2.close()\n    result = status(repo._env.hashenv, 'master', df.diff).getvalue()\n    assert result == expected\n\n\ndef test_add_new_column_schema_and_sample_and_delete_old_column(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n    expected = (\n        '============ \\n'\n        '| Branch: master \\n'\n        ' \\n'\n        '============ \\n'\n        '| ADDED \\n'\n        '|---------- \\n'\n        '| Schema: 1 \\n'\n        '|  - \"new_aset\": \\n'\n        '|       digest=\"1=555a833b66ab\" \\n'\n        '|       column_layout: flat \\n'\n        '|       column_type: ndarray \\n'\n        '|       schema_hasher_tcode: 1 \\n'\n        '|       data_hasher_tcode: 0 \\n'\n        '|       schema_type: fixed_shape \\n'\n        '|       shape: (10, 10) \\n'\n        '|       dtype: float32 \\n'\n        '|       backend: 01 \\n'\n        '|       backend_options: {\\'complib\\': \\'blosc:lz4hc\\', \\'complevel\\': 5, \\'shuffle\\': \\'byte\\'} \\n'\n        '|---------- \\n'\n        '| Samples: 5 \\n'\n        '|  - \"new_aset\": 5 \\n'\n        ' \\n'\n        '============ \\n'\n        '| DELETED \\n'\n        '|---------- \\n'\n        '| Schema: 1 \\n'\n        '|  - \"dummy\": \\n'\n        '|       digest=\"1=18599cd5ea25\" \\n'\n        '|       column_layout: flat \\n'\n        '|       column_type: ndarray \\n'\n        '|       schema_hasher_tcode: 1 \\n'\n        '|       data_hasher_tcode: 0 \\n'\n        '|       schema_type: fixed_shape \\n'\n        '|       shape: (50,) \\n'\n        '|       dtype: int64 \\n'\n        '|       backend: 10 \\n'\n        '|       backend_options: {} \\n'\n        '|---------- \\n'\n        '| Samples: 10 \\n'\n        '|  - \"dummy\": 10 \\n'\n        ' \\n'\n        '============ \\n'\n        '| MUTATED \\n'\n        '|---------- \\n'\n        '| Schema: 0 \\n'\n        '|---------- \\n'\n        '| Samples: 0 \\n'\n        ' \\n'\n    )\n    co2 = repo.checkout(write=True)\n    new = co2.add_ndarray_column('new_aset', shape=(10, 10), dtype=np.float32)\n    for idx in range(5):\n               dummyData = np.random.randn(10, 10).astype(np.float32)\n               co2.columns['new_aset'][idx] = dummyData\n    del co2.columns['dummy']\n    df = co2.diff.staged()\n    co2.close()\n    result = status(repo._env.hashenv, 'master', df.diff).getvalue()\n    assert result == expected\n\n\ndef test_add_new_schema_and_samples_and_change_old_backend(repo_20_filled_samples2):\n    from hangar.records.summarize import status\n    repo = repo_20_filled_samples2\n    expected = (\n        '============ \\n'\n        '| Branch: master \\n'\n        ' \\n'\n        '============ \\n'\n        '| ADDED \\n'\n        '|---------- \\n'\n        '| Schema: 1 \\n'\n        '|  - \"new_aset\": \\n'\n        '|       digest=\"1=555a833b66ab\" \\n'\n        '|       column_layout: flat \\n'\n        '|       column_type: ndarray \\n'\n        '|       schema_hasher_tcode: 1 \\n'\n        '|       data_hasher_tcode: 0 \\n'\n        '|       schema_type: fixed_shape \\n'\n        '|       shape: (10, 10) \\n'\n        '|       dtype: float32 \\n'\n        '|       backend: 01 \\n'\n        '|       backend_options: {\\'complib\\': \\'blosc:lz4hc\\', \\'complevel\\': 5, \\'shuffle\\': \\'byte\\'} \\n'\n        '|---------- \\n'\n        '| Samples: 5 \\n'\n        '|  - \"new_aset\": 5 \\n'\n        ' \\n'\n        '============ \\n'\n        '| DELETED \\n'\n        '|---------- \\n'\n        '| Schema: 0 \\n'\n        '|---------- \\n'\n        '| Samples: 0 \\n'\n        ' \\n'\n        '============ \\n'\n        '| MUTATED \\n'\n        '|---------- \\n'\n        '| Schema: 1 \\n'\n        '|  - \"dummy\": \\n'\n        '|       digest=\"1=5d6cc8241705\" \\n'\n        '|       column_layout: flat \\n'\n        '|       column_type: ndarray \\n'\n        '|       schema_hasher_tcode: 1 \\n'\n        '|       data_hasher_tcode: 0 \\n'\n        '|       schema_type: fixed_shape \\n'\n        '|       shape: (50,) \\n'\n        '|       dtype: int64 \\n'\n        '|       backend: 00 \\n'\n        '|       backend_options: {\\'complib\\': \\'blosc:lz4hc\\', \\'complevel\\': 5, \\'shuffle\\': \\'byte\\'} \\n'\n        '|---------- \\n'\n        '| Samples: 5 \\n'\n        '|  - \"dummy\": 5 \\n'\n        ' \\n'\n    )\n    co2 = repo.checkout(write=True)\n    co2.columns['dummy'].change_backend('00')\n    co2.add_ndarray_column('new_aset', shape=(10, 10), dtype=np.float32)\n    for idx in range(5):\n        dummyData = np.random.randn(10, 10).astype(np.float32)\n        co2.columns['new_aset'][idx] = dummyData\n        co2.columns['dummy'][idx] = np.arange(50).astype(np.int64) + idx\n    df = co2.diff.staged()\n    co2.close()\n    result = status(repo._env.hashenv, 'master', df.diff).getvalue()\n    assert result == expected\n"
  },
  {
    "path": "tests/test_initiate.py",
    "content": "import os\nimport pytest\nfrom hangar import Repository\n\n\ndef test_imports():\n    import hangar\n    from hangar import Repository\n\n\ndef test_starting_up_repo_warns_should_exist_no_args(managed_tmpdir):\n    with pytest.warns(UserWarning):\n        repo = Repository(path=managed_tmpdir)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    assert repo.list_branches() == ['master']\n    assert os.path.isdir(repo._repo_path)\n    assert str(repo._repo_path) == os.path.join(managed_tmpdir, '.hangar')\n    co = repo.checkout(write=True)\n    assert co.diff.status() == 'CLEAN'\n    co.close()\n    repo._env._close_environments()\n\n\ndef test_starting_up_repo_warns_should_exist_manual_args(managed_tmpdir):\n    with pytest.warns(UserWarning):\n        repo = Repository(path=managed_tmpdir, exists=True)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    assert repo.list_branches() == ['master']\n    assert os.path.isdir(repo._repo_path)\n    assert str(repo._repo_path) == os.path.join(managed_tmpdir, '.hangar')\n    co = repo.checkout(write=True)\n    assert co.diff.status() == 'CLEAN'\n    co.close()\n    repo._env._close_environments()\n\n\ndef test_starting_up_repo_does_not_warn_not_exist_manual_args(managed_tmpdir):\n    with pytest.warns(None) as warn_recs:\n        repo = Repository(path=managed_tmpdir, exists=False)\n    assert len(warn_recs) == 0\n\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    assert repo.list_branches() == ['master']\n    assert os.path.isdir(repo._repo_path)\n    assert str(repo._repo_path) == os.path.join(managed_tmpdir, '.hangar')\n    co = repo.checkout(write=True)\n    assert co.diff.status() == 'CLEAN'\n    co.close()\n    repo._env._close_environments()\n\n\ndef test_initial_read_checkout(managed_tmpdir):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    with pytest.raises(ValueError):\n        repo.checkout()\n    repo._env._close_environments()\n\n\ndef test_initial_arrayset(managed_tmpdir, randomsizedarray):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n\n    w_checkout = repo.checkout(write=True)\n    assert len(w_checkout.columns) == 0\n    with pytest.raises(KeyError):\n        w_checkout.columns['aset']\n    aset = w_checkout.add_ndarray_column('aset', prototype=randomsizedarray)\n    assert aset.column == 'aset'\n    w_checkout.close()\n    repo._env._close_environments()\n\n\ndef test_empty_commit(managed_tmpdir, caplog):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    w_checkout = repo.checkout(write=True)\n    with pytest.raises(RuntimeError):\n        w_checkout.commit('this is a merge message')\n    w_checkout.close()\n    repo._env._close_environments()\n\n\ndef test_cannot_operate_without_repo_init(managed_tmpdir):\n    repo = Repository(path=managed_tmpdir, exists=False)\n\n    with pytest.raises(RuntimeError):\n        repo.writer_lock_held()\n    with pytest.raises(RuntimeError):\n        repo.checkout()\n    with pytest.raises(RuntimeError):\n        repo.writer_lock_held()\n    with pytest.raises(RuntimeError):\n        repo.log()\n    with pytest.raises(RuntimeError):\n        repo.summary()\n    with pytest.raises(RuntimeError):\n        repo.merge('fail', 'master', 'nonexistant')\n    with pytest.raises(RuntimeError):\n        repo.create_branch('test')\n    with pytest.raises(RuntimeError):\n        repo.list_branches()\n    with pytest.raises(RuntimeError):\n        repo.force_release_writer_lock()\n\n    with pytest.raises(RuntimeError):\n        repo.remote.add('origin', 'foo')\n    with pytest.raises(RuntimeError):\n        repo.remote.remove('origin')\n    with pytest.raises(RuntimeError):\n        repo.remote.fetch('origin', 'master')\n    with pytest.raises(RuntimeError):\n        repo.remote.fetch_data('origin', branch='master')\n    with pytest.raises(RuntimeError):\n        repo.remote.list_all()\n    with pytest.raises(RuntimeError):\n        repo.remote.ping('origin')\n    with pytest.raises(RuntimeError):\n        repo.remote.push('origin', 'master')\n    with pytest.raises(RuntimeError):\n        repo.remove_branch('master')\n\n    with pytest.raises(RuntimeError):\n        repo.path\n    with pytest.raises(RuntimeError):\n        repo.version\n    with pytest.raises(RuntimeError):\n        repo.writer_lock_held\n    with pytest.raises(RuntimeError):\n        repo.size_human\n    with pytest.raises(RuntimeError):\n        repo.size_nbytes\n\n    assert repo._env.repo_is_initialized is False\n\n\ndef test_check_repo_size(repo_20_filled_samples):\n    from hangar.utils import parse_bytes, folder_size\n\n    expected_nbytes = folder_size(repo_20_filled_samples._repo_path, recurse=True)\n    nbytes = repo_20_filled_samples.size_nbytes\n    assert expected_nbytes == nbytes\n\n    format_nbytes = repo_20_filled_samples.size_human\n    # account for rounding when converting int to str.\n    assert nbytes * 0.95 <= parse_bytes(format_nbytes) <= nbytes * 1.05\n\n\ndef test_force_release_writer_lock(managed_tmpdir, monkeypatch):\n\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    co = repo.checkout(write=True)\n    orig_lock = str(co._writer_lock)\n\n    def mock_true(*args, **kwargs):\n        return True\n\n    # try to release the writer lock with a process which has different uid\n    co._writer_lock = 'lololol'\n    with pytest.raises(RuntimeError):\n        monkeypatch.setattr(co, '_verify_alive', mock_true)\n        monkeypatch.setattr(co._columns, '_destruct', mock_true)\n        co.close()\n    # replace, but rest of object is closed\n    monkeypatch.setattr(co, '_writer_lock', orig_lock)\n    monkeypatch.delattr(co._columns, '_destruct')\n    co.close()\n    repo._env._close_environments()\n\n\ndef test_force_release_writer_lock_works(managed_tmpdir):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    co = repo.checkout(write=True)\n\n    # try to release the writer lock with a process which has different uid\n    with pytest.warns(ResourceWarning):\n        repo.force_release_writer_lock()\n\n    co._writer_lock == 'LOCK_AVAILABLE'\n    co.close()\n    # replace, but rest of object is closed\n    repo._env._close_environments()\n\n\ndef test_repo_summary_does_not_error_before_any_commit_made(capfd, managed_tmpdir):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n\n    assert repo.summary() is None\n    out, _ = capfd.readouterr()\n    assert 'No commits have been made in the repository' in out\n    repo._env._close_environments()\n\n\ndef test_get_ecosystem_details(managed_tmpdir):\n    repo = Repository(path=managed_tmpdir, exists=False)\n    repo.init(user_name='tester', user_email='foo@test.bar', remove_old=True)\n    eco = repo._ecosystem_details()\n    assert isinstance(eco, dict)\n    assert 'host' in eco\n    assert 'packages' in eco\n    for package_name, version in eco['packages']:\n        assert version is not None\n    repo._env._close_environments()\n\n\ndef test_inject_repo_version(monkeypatch):\n    import hangar\n    monkeypatch.setattr(\"hangar.__version__\", '0.2.0')\n    assert hangar.__version__ == '0.2.0'\n\n\ndef test_check_repository_version(aset_samples_initialized_repo):\n    from hangar import __version__\n    from pkg_resources import parse_version\n\n    repo = aset_samples_initialized_repo\n    assert repo.version == parse_version(__version__).public\n\n\ndef test_check_repository_software_version_startup(managed_tmpdir):\n    from hangar import Repository, __version__\n    from pkg_resources import parse_version\n\n    repo = Repository(managed_tmpdir, exists=False)\n    repo.init('test user', 'test@foo.bar', remove_old=True)\n    repo._env._close_environments()\n\n    nrepo = Repository(managed_tmpdir, exists=True)\n    assert nrepo.initialized is True\n    assert nrepo.version == parse_version(__version__).public\n    nrepo._env._close_environments()\n\n\n@pytest.mark.parametrize('repo_v,hangar_v', [\n    ['0.2.0', '0.3.0'],\n    ['0.2.0', '0.3.1rc1'],\n    ['0.2.0', '0.3.1.dev0'],\n    ['0.2.0', '0.3.1'],\n    ['0.3.0', '0.4.1.dev0'],\n    ['0.3.0', '0.4.1rc1'],\n    ['0.3.0', '0.4.0'],\n    ['0.3.0', '0.4.1'],\n    ['0.4.0', '0.5.0.dev0'],\n    ['0.4.0', '0.5.0rc1'],\n    ['0.4.0', '0.5.0'],\n    ['0.4.0', '0.5.1'],\n    ['0.5.0.dev0', '0.4.0'],\n    ['0.5.0.dev0', '0.4.1'],\n    ['0.5.0', '0.4.1'],\n])\ndef test_check_repository_software_version_fails_hangar_version(monkeypatch, managed_tmpdir, repo_v, hangar_v):\n    import hangar\n    monkeypatch.setattr(\"hangar.__version__\", hangar_v)\n    monkeypatch.setattr(\"hangar.context.__version__\", hangar_v)\n    from hangar import Repository\n    from hangar.records.vcompat import set_repository_software_version\n\n    repo = Repository(managed_tmpdir, exists=False)\n    repo.init('test user', 'test@foo.bar', remove_old=True)\n    # force writing of new software version. should trigger error on next read.\n    set_repository_software_version(repo._env.branchenv, repo_v, overwrite=True)\n    try:\n        assert repo.version == repo_v\n    finally:\n        repo._env._close_environments()\n\n    assert hangar.__version__ == hangar_v\n\n    with pytest.raises(RuntimeError):\n        Repository(managed_tmpdir, exists=True)\n\n\n@pytest.mark.parametrize('futureVersion', ['1.0.0', '0.14.1', '0.15.0', '1.4.1'])\ndef test_check_repository_software_version_works_on_newer_hangar_version(managed_tmpdir, monkeypatch, futureVersion):\n    from hangar import Repository\n\n    repo = Repository(managed_tmpdir, exists=False)\n    repo.init('test user', 'test@foo.bar', remove_old=True)\n    old_version = repo.version\n    # force writing of new software version. should trigger error on next read.\n    repo._env._close_environments()\n\n    import hangar\n    monkeypatch.setattr(hangar, '__version__', futureVersion)\n    nrepo = Repository(managed_tmpdir, exists=True)\n    assert hangar.__version__ == futureVersion\n    assert nrepo.version == old_version\n    nrepo._env._close_environments()\n"
  },
  {
    "path": "tests/test_merging.py",
    "content": "import pytest\nimport numpy as np\n\n\ndef test_merge_fails_with_invalid_branch_name(repo_1_br_no_conf):\n    with pytest.raises(ValueError):\n        cmt_hash = repo_1_br_no_conf.merge('merge commit', 'master', 'failbranchname')\n    # no message passed in\n    with pytest.raises(TypeError):\n        cmt_hash = repo_1_br_no_conf.merge('master', 'testbranch')\n\n\ndef test_is_ff_merge(repo_1_br_no_conf):\n    testbranch_head = repo_1_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    cmt_hash = repo_1_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    assert cmt_hash == testbranch_head\n\n\ndef test_writer_checkout_ff_merge(repo_1_br_no_conf):\n    testbranch_head = repo_1_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    co = repo_1_br_no_conf.checkout(write=True, branch='master')\n    master_head = co.commit_hash\n    mergeHash = co.merge('dummy message', 'testbranch')\n    assert mergeHash == testbranch_head\n    assert mergeHash != master_head\n    assert co.branch_name == 'master'\n    co.close()\n\n    master_order = repo_1_br_no_conf.log(branch='testbranch', return_contents=True)['order']\n    tesbranch_order = repo_1_br_no_conf.log(branch='master', return_contents=True)['order']\n    assert master_order == tesbranch_order\n\n\ndef test_merge_fails_if_changes_staged(repo_1_br_no_conf):\n    co = repo_1_br_no_conf.checkout(write=True, branch='master')\n    co.add_str_column('DOESNOTEXIST')\n    co['DOESNOTEXIST'][2] = 'lol'\n    co.close()\n    with pytest.raises(RuntimeError, match='Changes are currently pending'):\n        repo_1_br_no_conf.merge('merge commit', 'master', 'testbranch')\n\n\ndef test_writer_checkout_merge_fails_if_changes_staged(repo_1_br_no_conf):\n    co = repo_1_br_no_conf.checkout(write=True, branch='master')\n    co.add_str_column('DOESNOTEXIST')\n    co['DOESNOTEXIST'][2] = 'lol'\n    with pytest.raises(RuntimeError, match='Changes are currently pending'):\n        co.merge('merge commit', 'testbranch')\n    co.close()\n\n\ndef test_ff_merge_no_conf_correct_contents_for_name_or_hash_checkout(repo_1_br_no_conf):\n    cmt_hash = repo_1_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    coByName = repo_1_br_no_conf.checkout(branch='master')\n    coByHash = repo_1_br_no_conf.checkout(commit=cmt_hash)\n\n    assert len(coByHash.columns) == len(coByName.columns)\n    for asetn in coByHash.columns.keys():\n        aset_byHash = coByHash.columns[asetn]\n        aset_byName = coByName.columns[asetn]\n        assert len(aset_byHash) == len(aset_byHash)\n        for k, v in aset_byHash.items():\n            assert np.allclose(v, aset_byName[k])\n\n    coByHash.close()\n    coByName.close()\n\n\ndef test_ff_merge_no_conf_updates_head_commit_of_branches(repo_1_br_no_conf):\n    repo = repo_1_br_no_conf\n    co = repo.checkout(write=True, branch='master')\n    co.close()\n    repo.create_branch('NotUpdatedBranch')\n    old_branch_head = repo.log(branch='NotUpdatedBranch', return_contents=True)['head']\n\n    cmt_hash = repo.merge('merge commit', 'master', 'testbranch')\n    master_head = repo.log(branch='master', return_contents=True)['head']\n    testbranch_head = repo.log(branch='testbranch', return_contents=True)['head']\n    assert master_head == testbranch_head\n    assert cmt_hash == master_head\n\n    check_old_branch = repo.log(branch='NotUpdatedBranch', return_contents=True)['head']\n    assert check_old_branch == old_branch_head\n    assert check_old_branch != master_head\n\n\ndef test_is_3_way_merge(repo_2_br_no_conf):\n    testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n    cmt_hash = repo_2_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    assert cmt_hash != testbranch_head\n    assert cmt_hash != masterbranch_head\n\n\ndef test_writer_checkout_is_3_way_merge(repo_2_br_no_conf):\n    testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n    co = repo_2_br_no_conf.checkout(write=True, branch='master')\n    cmt_hash = co.merge('merge commit', 'testbranch')\n    co.close()\n    assert cmt_hash != testbranch_head\n    assert cmt_hash != masterbranch_head\n\n\ndef test_3_way_merge_no_conflict_correct_contents(repo_2_br_no_conf):\n    cmt_hash = repo_2_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    co = repo_2_br_no_conf.checkout(branch='master')\n    # columns\n    assert len(co.columns) == 1\n    assert 'dummy' in co.columns\n    # column samples\n    aset = co.columns['dummy']\n    assert len(aset) == 50\n\n    # column sample values\n    checkarr = np.zeros_like(np.arange(50))\n    for k, v in aset.items():\n        checkarr[:] = int(k)\n        assert np.allclose(v, checkarr)\n\n    # column sample keys\n    aset_keys = list(aset.keys())\n    for genKey in range(30):\n        assert str(genKey) in aset_keys\n        aset_keys.remove(str(genKey))\n    assert len(aset_keys) == 20\n    co.close()\n\n\ndef test_writer_checkout_3_way_merge_no_conflict_correct_contents(repo_2_br_no_conf):\n    co = repo_2_br_no_conf.checkout(write=True, branch='master')\n    cmt_hash = co.merge('merge commit', 'testbranch')\n\n    # columns\n    assert len(co.columns) == 1\n    assert 'dummy' in co.columns\n    # column samples\n    aset = co.columns['dummy']\n    assert len(aset) == 50\n\n    # column sample values\n    checkarr = np.zeros_like(np.arange(50))\n    for k, v in aset.items():\n        checkarr[:] = int(k)\n        assert np.allclose(v, checkarr)\n\n    # column sample keys\n    aset_keys = list(aset.keys())\n    for genKey in range(30):\n        assert str(genKey) in aset_keys\n        aset_keys.remove(str(genKey))\n    assert len(aset_keys) == 20\n    co.close()\n\n\ndef test_3_way_merge_no_conflict_and_mutation_correct_contents(repo_2_br_no_conf):\n    co = repo_2_br_no_conf.checkout(write=True, branch='master')\n    co.columns['dummy']['1'] = co.columns['dummy']['0']\n    co.commit('mutated master')\n    co.close()\n\n    co = repo_2_br_no_conf.checkout(write=True, branch='testbranch')\n    co.columns['dummy']['2'] = co.columns['dummy']['0']\n    co.commit('mutated testbranch')\n    co.close()\n\n    repo_2_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    co = repo_2_br_no_conf.checkout(branch='master')\n\n    # columns\n    assert len(co.columns) == 1\n    assert 'dummy' in co.columns\n    # column samples\n    aset = co.columns['dummy']\n    assert len(aset) == 50\n\n    # column sample values\n    checkarr = np.zeros_like(np.arange(50))\n    for k, v in aset.items():\n        if k == '2':\n            checkarr[:] = 0\n        elif k == '1':\n            checkarr[:] = 0\n        else:\n            checkarr[:] = int(k)\n        assert np.allclose(v, checkarr)\n\n    # column sample keys\n    aset_keys = list(aset.keys())\n    for genKey in range(30):\n        assert str(genKey) in aset_keys\n        aset_keys.remove(str(genKey))\n    assert len(aset_keys) == 20\n    co.close()\n\n\ndef test_3_way_merge_updates_head_commit_of_branches(repo_2_br_no_conf):\n    orig_testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    orig_masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n\n    cmt_hash = repo_2_br_no_conf.merge('merge commit', 'master', 'testbranch')\n\n    new_testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    new_masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n\n    assert orig_testbranch_head == new_testbranch_head\n    assert orig_masterbranch_head != new_masterbranch_head\n    assert new_masterbranch_head == cmt_hash\n\n\ndef test_writer_checkout_3_way_merge_updates_head_commit_of_branches(repo_2_br_no_conf):\n    orig_testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    orig_masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n\n    co = repo_2_br_no_conf.checkout(write=True, branch='master')\n    cmt_hash = co.merge('merge commit', 'testbranch')\n    assert cmt_hash == co.commit_hash\n    co.close()\n\n    new_testbranch_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    new_masterbranch_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n    assert orig_testbranch_head == new_testbranch_head\n    assert orig_masterbranch_head != new_masterbranch_head\n    assert new_masterbranch_head == cmt_hash\n\n\nclass TestArraysetSampleConflicts(object):\n\n    def test_conflict_additions_same_str_name_different_value(self, repo_2_br_no_conf):\n        newdata = np.arange(50)\n        newdata = newdata * 2\n\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        co.columns['dummy']['15'] = newdata\n        co.commit('commit on master with conflicting data')\n        co.close()\n\n        with pytest.raises(ValueError):\n            repo.merge('merge commit', 'master', 'testbranch')\n\n    def test_conflict_additions_same_int_name_different_value(self, repo_2_br_no_conf):\n        newdata = np.arange(50)\n        newdata = newdata * 2\n\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        co.columns['dummy'][15] = newdata\n        co.commit('commit on master with conflicting data')\n        co.close()\n\n        with pytest.raises(ValueError):\n            repo.merge('merge commit', 'master', 'testbranch')\n\n    def test_conflict_additions_same_str_and_int_name_different_value(self, repo_2_br_no_conf):\n        newdata = np.arange(50)\n        newdata = newdata * 2\n\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        co.columns['dummy'][15] = newdata\n        co.columns['dummy']['15'] = newdata\n        co.commit('commit on master with conflicting data')\n        co.close()\n\n        with pytest.raises(ValueError):\n            repo.merge('merge commit', 'master', 'testbranch')\n\n    def test_no_conflict_additions_same_name_and_value(self, repo_2_br_no_conf):\n        newdata = np.arange(50)\n        newdata[:] = 15\n\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        co.columns['dummy']['15'] = newdata\n        co.columns['dummy'][15] = newdata\n        co.commit('commit on master with same value data')\n        co.close()\n\n        cmt_hash = repo.merge('merge commit', 'master', 'testbranch')\n        co = repo.checkout(commit=cmt_hash)\n        aset = co.columns['dummy']\n        assert np.allclose(aset['15'], newdata)\n        assert np.allclose(aset[15], newdata)\n        co.close()\n\n    def test_conflict_mutations_same_name_different_value(self, repo_2_br_no_conf):\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        newdata = np.arange(50)\n        co.columns['dummy']['0'] = newdata\n        co.commit('commit on master with conflicting data')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        newdata = newdata * 2\n        co.columns['dummy']['0'] = newdata\n        co.commit('commit on testbranch with conflicting data')\n        co.close()\n\n        with pytest.raises(ValueError):\n            repo.merge('merge commit', 'master', 'testbranch')\n\n    def test_conflict_mutation_and_removal(self, repo_2_br_no_conf):\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        newdata = np.arange(50)\n        co.columns['dummy']['0'] = newdata\n        co.commit('commit on master with conflicting data')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        del co.columns['dummy']['0']\n        co.commit('commit on testbranch with removal')\n        co.close()\n\n        with pytest.raises(ValueError):\n            repo.merge('merge commit', 'master', 'testbranch')\n\n    def test_no_conflict_both_removal(self, repo_2_br_no_conf):\n        repo = repo_2_br_no_conf\n        co = repo.checkout(write=True, branch='master')\n        del co.columns['dummy']['0']\n        del co.columns['dummy'][21]\n        co.commit('commit on master with removal')\n        co.close()\n\n        co = repo.checkout(write=True, branch='testbranch')\n        del co.columns['dummy']['0']\n        del co.columns['dummy'][10]\n        co.commit('commit on testbranch with removal')\n        co.close()\n\n        cmt_hash = repo.merge('merge commit', 'master', 'testbranch')\n        co = repo.checkout(commit=cmt_hash)\n        aset = co.columns['dummy']\n        assert '0' not in aset\n        assert len(aset) == 47\n"
  },
  {
    "path": "tests/test_optimized_utils.py",
    "content": "import pytest\n\nfrom hangar.optimized_utils import SizedDict\n\n\ndef test_sizeddict_maxsize_property():\n    d = SizedDict(maxsize=5)\n    assert d.maxsize == 5\n    d2 = SizedDict(maxsize=10)\n    assert d2.maxsize == 10\n\n\ndef test_sizeddict_setitem_no_overflow_retains_keys_and_len():\n    d = SizedDict(maxsize=5)\n    for i in range(5):\n        d[i] = i\n\n    assert len(d) == 5\n    for i in range(5):\n        assert i in d\n        assert d[i] == i\n\n\ndef test_sizeddict_setitem_overflow_truncates_keys_and_len():\n    d = SizedDict(maxsize=5)\n    for i in range(10):\n        d[i] = i\n\n    assert len(d) == 5\n    for i in range(0, 5):\n        assert i not in d\n        with pytest.raises(KeyError):\n            _ = d[i]\n    for i in range(5, 10):\n        assert i in d\n        assert d[i] == i\n\n\ndef test_sizeddict_update_no_overflow_retains_keys_and_len():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    assert len(d) == 5\n    for i in range(5):\n        assert i in d\n        assert d[i] == i\n\n\ndef test_sizeddict_updateoverflow_truncates_keys_and_len():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(10)}\n    d.update(inp)\n\n    assert len(d) == 5\n    for i in range(0, 5):\n        assert i not in d\n        with pytest.raises(KeyError):\n            _ = d[i]\n    for i in range(5, 10):\n        assert i in d\n        assert d[i] == i\n\n\ndef test_sizeddict_get_returns_default_on_missing_key():\n    d = SizedDict()\n    res = d.get('doesnotexist')\n    assert res is None\n    res = d.get('doesnotexist', default='foo')\n    assert res == 'foo'\n\n\ndef test_sizeddict_delitem():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    del d[3]\n    assert len(d) == 4\n    assert 3 not in d\n\n    d['new'] = 'foo'\n    assert len(d) == 5\n    assert 'new' in d\n\n\ndef test_sizeddict_pop():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    res = d.pop(0)\n    assert res == 0\n    assert len(d) == 4\n    res = d.pop('doesnotexist', default='foo')\n    assert res == 'foo'\n    assert len(d) == 4\n\n\ndef test_sizeddict_popitem():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    res = d.popitem()\n    assert res == (4, 4)\n    assert len(d) == 4\n    res = d.popitem()\n    assert res == (3, 3)\n    assert len(d) == 3\n\n    d['foo'] = 'bar'\n    assert len(d) == 4\n    res = d.popitem()\n    assert res == ('foo', 'bar')\n    assert len(d) == 3\n\n\ndef test_sizeddict_keys():\n    d = SizedDict(maxsize=5)\n    inp = {str(i): i for i in range(5)}\n    d.update(inp)\n\n    assert list(d.keys()) == list(inp.keys())\n    for res_k, expected_k in zip(d.keys(), inp.keys()):\n        assert res_k == expected_k\n\n\ndef test_sizeddict_values():\n    d = SizedDict(maxsize=5)\n    inp = {str(i): i for i in range(5)}\n    d.update(inp)\n\n    assert list(d.values()) == list(inp.values())\n    for res_v, expected_v in zip(d.values(), inp.values()):\n        assert res_v == expected_v\n\n\ndef test_sizeddict_keys():\n    d = SizedDict(maxsize=5)\n    inp = {str(i): i for i in range(5)}\n    d.update(inp)\n\n    assert list(d.items()) == list(inp.items())\n    for res_kv, expected_kv in zip(d.items(), inp.items()):\n        assert res_kv == expected_kv\n\n\ndef test_sizeddict_setdefault():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    res = d.setdefault('doesnotexist')\n    assert res is None\n    assert len(d) == 5\n    assert 'doesnotexist' in d\n    assert d['doesnotexist'] is None\n\n    res = d.setdefault('doesnotexist2', default=True)\n    assert res is True\n    assert len(d) == 5\n    assert 'doesnotexist2' in d\n    assert d['doesnotexist2'] is True\n\n    res = d.setdefault(2, default=True)\n    assert res == 2\n    assert len(d) == 5\n    assert 2 in d\n    assert d[2] == 2\n\n\ndef test_sizeddict_clear():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    assert len(d) == 5\n    d.clear()\n    assert len(d) == 0\n    assert len(d._stack) == 0\n    assert len(d._data) == 0\n    assert d._stack_size == 0\n    assert d._maxsize == 5\n\n\ndef test_sizeddict_repr():\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    expected = repr(inp)\n    res = repr(d)\n    assert res == expected\n\n\ndef test_sizeddict_is_pickleable():\n    import pickle\n\n    d = SizedDict(maxsize=5)\n    inp = {i: i for i in range(5)}\n    d.update(inp)\n\n    pick = pickle.dumps(d, protocol=pickle.HIGHEST_PROTOCOL)\n    res = pickle.loads(pick)\n\n    assert res._maxsize == d._maxsize\n    assert res._stack == d._stack\n    assert res._stack_size == d._stack_size\n    assert res._data == d._data\n"
  },
  {
    "path": "tests/test_remote_serialize.py",
    "content": "import pytest\n\nimport numpy as np\n\n\nparam_shapes = [(1,), (1000,), (1, 1), (623, 3, 5), (2, 4, 5, 6, 1, 3)]\nparam_dtypes = [np.uint8, np.float32, np.float64, np.int32]\nparam_digest = ['0=digesta', '0=digestaaaaaa', '2=digestaaaaaaaaaaaaaaaaaaaaaaaaaa']\nparam_schema = ['schemaa', 'schemaaaaaaaaa', 'schemaaaaaaaaaaaaaaaa']\n\n\ndef assert_array_equal(arr, arr2):\n    assert np.array_equal(arr, arr2)\n    assert arr.dtype == arr2.dtype\n\n\n@pytest.fixture(scope='module', params=param_shapes)\ndef arr_shape(request):\n    return request.param\n\n@pytest.fixture(scope='module', params=param_dtypes)\ndef arr_dtype(request):\n    return request.param\n\n@pytest.fixture(scope='module', params=param_digest)\ndef ident_digest(request):\n    return request.param\n\n@pytest.fixture(scope='module', params=param_schema)\ndef ident_schema(request):\n    return request.param\n\n\n@pytest.fixture(scope='module')\ndef array_testcase(arr_shape, arr_dtype):\n    arr = 200 * np.random.random_sample(arr_shape) - 100\n    return arr.astype(arr_dtype)\n\n\n@pytest.fixture(scope='module', params=[\n    'hello', ' world', 'how are you today! ',\n    '325', f'{\"\".join([str(i) for i in range(100)])}',\n    'o\\n\\x01'\n])\ndef str_testcase(request):\n    return request.param\n\n\n@pytest.fixture(scope='module', params=[\n    b'hello', b' world', b'how are you today! ',\n    b'325', b'\\x01\\x00\\x12\\x14'\n])\ndef bytes_testcase(request):\n    return request.param\n\n\n@pytest.fixture(scope='module')\ndef ident_testcase(ident_digest, ident_schema):\n    return (ident_digest, ident_schema)\n\n\ndef test_serialize_deserialize_array(array_testcase):\n    from hangar.remote.chunks import _serialize_arr\n    from hangar.remote.chunks import _deserialize_arr\n\n    raw = _serialize_arr(array_testcase)\n    res = _deserialize_arr(raw)\n    assert_array_equal(array_testcase, res)\n\n\ndef test_serialize_deserialize_str(str_testcase):\n    from hangar.remote.chunks import _serialize_str, _deserialize_str\n    raw = _serialize_str(str_testcase)\n    res = _deserialize_str(raw)\n    assert res == str_testcase\n\n\ndef test_serialize_deserialize_bytes(bytes_testcase):\n    from hangar.remote.chunks import _serialize_bytes, _deserialize_bytes\n    raw = _serialize_bytes(bytes_testcase)\n    res = _deserialize_bytes(raw)\n    assert bytes_testcase == res\n\n\n@pytest.mark.parametrize('expected_dtype_code,data', [\n    (0, np.array([0, 1, 2, 3, 4])),\n    (2, 'i am string'),\n    (3, b'i am bytes')\n])\ndef test_serialize_deserialize_data(expected_dtype_code, data):\n    from hangar.remote.chunks import serialize_data, deserialize_data\n\n    dtcode, raw = serialize_data(data)\n    res = deserialize_data(dtype_code=dtcode, raw_data=raw)\n    assert dtcode == expected_dtype_code\n    if isinstance(res, np.ndarray):\n        assert_array_equal(data, res)\n    elif isinstance(res, (str, bytes)):\n        assert data == res\n    else:\n        raise TypeError(data)\n\n\ndef test_serialize_deserialize_ident(ident_testcase):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import DataIdent\n\n    digest, schema = ident_testcase\n    raw = serialize_ident(digest, schema)\n    res = deserialize_ident(raw)\n    assert isinstance(res, DataIdent)\n    assert res.digest == digest\n    assert res.schema == schema\n\n\ndef test_serialize_deserialize_record(array_testcase, ident_testcase):\n    from hangar.remote.chunks import serialize_record\n    from hangar.remote.chunks import deserialize_record\n    from hangar.remote.chunks import DataRecord\n\n    digest, schema = ident_testcase\n    raw = serialize_record(array_testcase, digest, schema)\n    res = deserialize_record(raw)\n    assert isinstance(res, DataRecord)\n    assert_array_equal(res.data, array_testcase)\n    assert res.digest == digest\n    assert res.schema == schema\n\n\n@pytest.mark.parametrize('nrecords', [1, 25])\ndef test_serialize_deserialize_record_pack(ident_testcase, nrecords):\n    from hangar.remote.chunks import serialize_record\n    from hangar.remote.chunks import serialize_record_pack\n    from hangar.remote.chunks import deserialize_record\n    from hangar.remote.chunks import deserialize_record_pack\n    from hangar.remote.chunks import DataRecord\n\n    idx = 0\n    ArrList, RecList = [], []\n    digest, schema = ident_testcase\n    for shape in param_shapes:\n        for dtype in param_dtypes:\n            arr = 200 * np.random.random_sample(shape) + 100\n            arr = arr.astype(dtype)\n            digest = f'digest{str(idx) + str(digest)}'\n            schema = f'schema{str(idx) + str(schema)}'\n            idx += 1\n\n            ArrList.append((arr, digest, schema))\n            RecList.append(serialize_record(arr, digest, schema))\n\n    rawpack = serialize_record_pack(RecList)\n    reslist = deserialize_record_pack(rawpack)\n\n    assert reslist == RecList\n\n    for rawres, origRec in zip(reslist, ArrList):\n        resRec = deserialize_record(rawres)\n        assert isinstance(resRec, DataRecord)\n        assert_array_equal(resRec.data, origRec[0])\n        assert resRec.digest == origRec[1]\n        assert resRec.schema == origRec[2]\n\n\ndef test_serialize_deserialize_ident_digest_field_only(ident_testcase):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import DataIdent\n\n    digest, schema = ident_testcase\n    raw = serialize_ident(digest, '')\n    res = deserialize_ident(raw)\n    assert isinstance(res, DataIdent)\n    assert res.digest == digest\n    assert res.schema == ''\n\n\ndef test_serialize_deserialize_ident_schema_field_only(ident_testcase):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import DataIdent\n\n    digest, schema = ident_testcase\n    raw = serialize_ident('', schema)\n    res = deserialize_ident(raw)\n    assert isinstance(res, DataIdent)\n    assert res.digest == ''\n    assert res.schema == schema\n\n\n@pytest.mark.parametrize('nrecords', [1, 25])\ndef test_serialize_deserialize_ident_only_record_pack(ident_testcase, nrecords):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import serialize_record_pack\n    from hangar.remote.chunks import deserialize_record_pack\n    from hangar.remote.chunks import DataIdent\n\n    idx = 0\n    IdentList, RawList = [], []\n    digest, schema = ident_testcase\n    for idx in range(nrecords):\n        digest = f'digest{str(idx) + str(digest)}'\n        schema = f'schema{str(idx) + str(schema)}'\n\n        IdentList.append((digest, schema))\n        RawList.append(serialize_ident(digest, schema))\n\n    packed_raw = serialize_record_pack(RawList)\n    unpacked_raw = deserialize_record_pack(packed_raw)\n\n    assert unpacked_raw == RawList\n\n    for raw, origIdent in zip(unpacked_raw, IdentList):\n        resIdent = deserialize_ident(raw)\n        assert isinstance(resIdent, DataIdent)\n        assert resIdent.digest == origIdent[0]\n        assert resIdent.schema == origIdent[1]\n\n\n@pytest.mark.parametrize('nrecords', [1, 25])\ndef test_serialize_deserialize_ident_only_digest_only_record_pack(ident_testcase, nrecords):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import serialize_record_pack\n    from hangar.remote.chunks import deserialize_record_pack\n    from hangar.remote.chunks import DataIdent\n\n    idx = 0\n    IdentList, RawList = [], []\n    digest, schema = ident_testcase\n    for idx in range(nrecords):\n        digest = f'digest{str(idx)+str(digest)}'\n        schema = f''\n\n        IdentList.append((digest, schema))\n        RawList.append(serialize_ident(digest, schema))\n\n    packed_raw = serialize_record_pack(RawList)\n    unpacked_raw = deserialize_record_pack(packed_raw)\n\n    assert unpacked_raw == RawList\n\n    for raw, origIdent in zip(unpacked_raw, IdentList):\n        resIdent = deserialize_ident(raw)\n        assert isinstance(resIdent, DataIdent)\n        assert resIdent.digest == origIdent[0]\n        assert resIdent.schema == origIdent[1]\n\n\n@pytest.mark.parametrize('nrecords', [1, 25])\ndef test_serialize_deserialize_ident_only_schema_only_record_pack(ident_testcase, nrecords):\n    from hangar.remote.chunks import serialize_ident\n    from hangar.remote.chunks import deserialize_ident\n    from hangar.remote.chunks import serialize_record_pack\n    from hangar.remote.chunks import deserialize_record_pack\n    from hangar.remote.chunks import DataIdent\n\n    idx = 0\n    IdentList, RawList = [], []\n    digest, schema = ident_testcase\n    for idx in range(nrecords):\n        digest = f''\n        schema = f'schema{str(idx)+str(schema)}'\n\n        IdentList.append((digest, schema))\n        RawList.append(serialize_ident(digest, schema))\n\n    packed_raw = serialize_record_pack(RawList)\n    unpacked_raw = deserialize_record_pack(packed_raw)\n\n    assert unpacked_raw == RawList\n\n    for raw, origIdent in zip(unpacked_raw, IdentList):\n        resIdent = deserialize_ident(raw)\n        assert isinstance(resIdent, DataIdent)\n        assert resIdent.digest == origIdent[0]\n        assert resIdent.schema == origIdent[1]\n"
  },
  {
    "path": "tests/test_remotes.py",
    "content": "import pytest\n\nimport numpy as np\nimport time\nfrom os.path import join as pjoin\nfrom os import mkdir\nfrom random import randint\nimport platform\n\n\n\n@pytest.mark.parametrize('name', [\n    'invalid\\n', '\\ninvalid', 'inv\\name', 'inva/lid', 12, ' try', 'and this ',\n    'VeryLongNameIsInvalidOver64CharactersNotAllowedVeryLongNameIsInva'])\ndef test_cannot_add_invalid_remote_names(repo, name):\n    with pytest.raises(ValueError):\n        repo.remote.add(name, 'localhost:50051')\n\n\ndef test_list_all_remotes_works(repo):\n\n    remote_spec1 = repo.remote.add('origin', 'test')\n    currentRemotes = repo.remote.list_all()\n\n    assert len(currentRemotes) == 1\n    currentSpec = currentRemotes[0]\n    assert len(currentSpec) == 2\n    assert currentSpec.name == 'origin'\n    assert currentSpec.address == 'test'\n\n    remote_spec2 = repo.remote.add('origin2', 'test2')\n    currentRemotes = repo.remote.list_all()\n\n    assert len(currentRemotes) == 2\n    currentSpec = currentRemotes[0]\n    assert currentSpec == remote_spec1\n    assert len(currentSpec) == 2\n    assert currentSpec.name == 'origin'\n    assert currentSpec.address == 'test'\n    currentSpec = currentRemotes[1]\n    assert currentSpec == remote_spec2\n    assert currentSpec.name == 'origin2'\n    assert currentSpec.address == 'test2'\n\n\ndef test_cannot_add_remote_twice_with_same_name(repo):\n    remote_spec = repo.remote.add('origin', 'test')\n    assert remote_spec.name == 'origin'\n    assert remote_spec.address == 'test'\n    with pytest.raises(ValueError):\n        repo.remote.add('origin', 'new')\n\n\ndef test_remote_remote_which_does_not_exist_fails(repo):\n    with pytest.raises(ValueError):\n        repo.remote.remove('origin')\n\n\ndef test_can_update_remote_after_removal(repo):\n    remote_spec = repo.remote.add('origin', 'test')\n    assert remote_spec.name == 'origin'\n    assert remote_spec.address == 'test'\n    channel_address_removed = repo.remote.remove('origin')\n    assert channel_address_removed.name == 'origin'\n    assert channel_address_removed.address == 'test'\n    new_name = repo.remote.add('origin', 'test2')\n    assert new_name.name == 'origin'\n    assert new_name.address == 'test2'\n\n\ndef test_server_is_started_multiple_times_via_ping_pong(server_instance,\n                                                        aset_samples_initialized_repo):\n    # start multiple times and test that pings go through multiple times\n    aset_samples_initialized_repo.remote.add('origin', server_instance)\n    roundTripTime = aset_samples_initialized_repo.remote.ping('origin')\n    assert isinstance(roundTripTime, float)\n\n\n@pytest.mark.parametrize('nCommits,nSamples', [[1, 10], [5, 10]])\ndef test_push_and_clone_master_linear_history_multiple_commits(\n        server_instance, repo, managed_tmpdir, array5by7, nCommits, nSamples):\n    from hangar import Repository\n    from hangar.records.summarize import list_history\n\n    cmtList = []\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float32)\n    for cIdx in range(nCommits):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n        sampList = []\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(nSamples):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n                sampList.append(arr)\n        cmt = co.commit(f'commit number: {cIdx}')\n        cmtList.append((cmt, sampList))\n        co.close()\n    masterHist = list_history(repo._env.refenv, repo._env.branchenv, branch_name='master')\n\n    repo.remote.add('origin', server_instance)\n    push1 = repo.remote.push('origin', 'master')\n    assert push1 == 'master'\n\n    new_tmpdir = pjoin(managed_tmpdir, 'new')\n    mkdir(new_tmpdir)\n    newRepo = Repository(path=new_tmpdir, exists=False)\n    newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n    assert newRepo.list_branches() == ['master', 'origin/master']\n    for cmt, sampList in cmtList:\n        with pytest.warns(UserWarning):\n            nco = newRepo.checkout(commit=cmt)\n        assert len(nco.columns) == 1\n        assert 'writtenaset' in nco.columns\n        assert len(nco.columns['writtenaset']) == len(sampList)\n\n        assert nco.columns['writtenaset'].contains_remote_references is True\n        remoteKeys = nco.columns['writtenaset'].remote_reference_keys\n        assert tuple([str(idx) for idx in range(len(sampList))]) == remoteKeys\n        for idx, _ in enumerate(sampList):\n            sIdx = str(idx)\n            assert sIdx in nco.columns['writtenaset']\n            with pytest.raises(FileNotFoundError):\n                shouldNotExist = nco.columns['writtenaset'][sIdx]\n        nco.close()\n    cloneMasterHist = list_history(newRepo._env.refenv, newRepo._env.branchenv, branch_name='master')\n    assert cloneMasterHist == masterHist\n    newRepo._env._close_environments()\n\n\n@pytest.mark.parametrize('nMasterCommits,nMasterSamples', [[1, 4], [5, 10]])\n@pytest.mark.parametrize('nDevCommits,nDevSamples', [[1, 3], [3, 5]])\ndef test_server_push_second_branch_with_new_commit(server_instance, repo,\n                                                   array5by7, nMasterCommits,\n                                                   nMasterSamples, nDevCommits,\n                                                   nDevSamples):\n\n    masterCmtList, devCmtList = [], []\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float32)\n    for cIdx in range(nMasterCommits):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n        masterSampList = []\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(nMasterSamples):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n                masterSampList.append(arr)\n        cmt = co.commit(f'master commit number: {cIdx}')\n        masterCmtList.append((cmt, masterSampList))\n        co.close()\n\n    repo.remote.add('origin', server_instance)\n    push1 = repo.remote.push('origin', 'master')\n    assert push1 == 'master'\n\n    branch = repo.create_branch('testbranch')\n    for cIdx in range(nDevCommits):\n        co = repo.checkout(write=True, branch=branch.name)\n        devSampList = []\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(nDevSamples):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n                devSampList.append(arr)\n        cmt = co.commit(f'dev commit number: {cIdx}')\n        devCmtList.append((cmt, devSampList))\n        co.close()\n\n    push2 = repo.remote.push('origin', branch.name)\n    assert push2 == branch.name\n\n\n@pytest.mark.parametrize('nMasterCommits,nMasterSamples', [[1, 4], [5, 10]])\n@pytest.mark.parametrize('nDevCommits,nDevSamples', [[1, 5], [3, 5]])\ndef test_server_push_second_branch_with_new_commit_then_clone_partial_fetch(\n        server_instance, repo, managed_tmpdir, array5by7, nMasterCommits,\n        nMasterSamples, nDevCommits, nDevSamples):\n    from hangar import Repository\n    from hangar.records.summarize import list_history\n\n    # Push master branch test\n    masterCmtList = []\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float32)\n    for cIdx in range(nMasterCommits):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n        masterSampList = []\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(nMasterSamples):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n                masterSampList.append(arr)\n        cmt = co.commit(f'master commit number: {cIdx}')\n        masterCmtList.append((cmt, masterSampList))\n        co.close()\n\n    repo.remote.add('origin', server_instance)\n    push1 = repo.remote.push('origin', 'master')\n    assert push1 == 'master'\n    masterHist = list_history(repo._env.refenv, repo._env.branchenv, branch_name='master')\n\n    # Push dev branch test\n    devCmtList = []\n    branch = repo.create_branch('testbranch')\n    for cIdx in range(nDevCommits):\n        co = repo.checkout(write=True, branch=branch.name)\n        devSampList = []\n        with co.columns['writtenaset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(nDevSamples):\n                arr = np.random.randn(*array5by7.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr\n                devSampList.append(arr)\n        cmt = co.commit(f'dev commit number: {cIdx}')\n        devCmtList.append((cmt, devSampList))\n        co.close()\n\n    push2 = repo.remote.push('origin', branch.name)\n    assert push2 == branch.name\n    branchHist = list_history(repo._env.refenv, repo._env.branchenv, branch_name=branch.name)\n\n    # Clone test (master branch)\n    new_tmpdir = pjoin(managed_tmpdir, 'new')\n    mkdir(new_tmpdir)\n    newRepo = Repository(path=new_tmpdir, exists=False)\n    newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n    assert newRepo.list_branches() == ['master', 'origin/master']\n    for cmt, sampList in masterCmtList:\n        with pytest.warns(UserWarning):\n            nco = newRepo.checkout(commit=cmt)\n        assert len(nco.columns) == 1\n        assert 'writtenaset' in nco.columns\n        assert len(nco.columns['writtenaset']) == nMasterSamples\n\n        assert nco.columns['writtenaset'].contains_remote_references is True\n        remoteKeys = nco.columns['writtenaset'].remote_reference_keys\n        assert tuple([str(idx) for idx in range(len(sampList))]) == remoteKeys\n        for idx, _ in enumerate(sampList):\n            sIdx = str(idx)\n            assert sIdx in nco.columns['writtenaset']\n            with pytest.raises(FileNotFoundError):\n                shouldNotExist = nco.columns['writtenaset'][sIdx]\n        nco.close()\n    cloneMasterHist = list_history(newRepo._env.refenv, newRepo._env.branchenv, branch_name='master')\n    assert cloneMasterHist == masterHist\n\n    # Fetch test\n    fetch = newRepo.remote.fetch('origin', branch=branch.name)\n    assert fetch == f'origin/{branch.name}'\n    assert newRepo.list_branches() == ['master', 'origin/master', f'origin/{branch.name}']\n    for cmt, sampList in devCmtList:\n\n        with pytest.warns(UserWarning):\n            nco = newRepo.checkout(commit=cmt)\n        assert len(nco.columns) == 1\n        assert 'writtenaset' in nco.columns\n        assert len(nco.columns['writtenaset']) == nDevSamples\n\n        assert nco.columns['writtenaset'].contains_remote_references is True\n        remoteKeys = nco.columns['writtenaset'].remote_reference_keys\n        assert tuple([str(idx) for idx in range(len(sampList))]) == remoteKeys\n        for idx, _ in enumerate(sampList):\n            sIdx = str(idx)\n            assert sIdx in nco.columns['writtenaset']\n            with pytest.raises(FileNotFoundError):\n                shouldNotExist = nco.columns['writtenaset'][sIdx]\n        nco.close()\n\n    cloneBranchHist = list_history(newRepo._env.refenv, newRepo._env.branchenv, branch_name=f'origin/{branch.name}')\n    assert cloneBranchHist == branchHist\n    newRepo._env._close_environments()\n\n\n@pytest.fixture(scope='class')\ndef array5by7_class():\n    return np.random.random((5, 7))\n\n@pytest.fixture(scope='class')\ndef two_branch_multi_commit_repo_class(server_instance_class, classrepo, array5by7_class):\n    from hangar.records.summarize import list_history\n\n    nMasterCommits = 2\n    nMasterSamples = 10\n    nDevCommits = 1\n    nDevSamples = 16\n\n    # Push master branch test\n    masterCmts = {}\n    co = classrepo.checkout(write=True)\n    co.add_ndarray_column(name='writtenaset', shape=(5, 7), dtype=np.float32)\n    co.add_ndarray_column(name='_two', shape=(20), dtype=np.float32)\n    co.add_str_column('str_col')\n    co.add_bytes_column('bytes_col')\n    for cIdx in range(nMasterCommits):\n        if cIdx != 0:\n            co = classrepo.checkout(write=True)\n        masterSampList1 = []\n        masterSampList2 = []\n        masterSampList3 = []\n        masterSampList4 = []\n        with co.columns['writtenaset'] as d,\\\n                co.columns['_two'] as dd,\\\n                co.columns['str_col'] as scol, \\\n                co.columns['bytes_col'] as bcol:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n                del dd[prevKey]\n                del scol[prevKey]\n                del bcol[prevKey]\n\n            for sIdx in range(nMasterSamples):\n                arr1 = np.random.randn(*array5by7_class.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr1\n                masterSampList1.append(arr1)\n                arr2 = np.random.randn(20).astype(np.float32)\n                dd[str(sIdx)] = arr2\n                masterSampList2.append(arr2)\n                sval = f'strval master {cIdx} {sIdx}'\n                scol[str(sIdx)] = sval\n                masterSampList3.append(sval)\n                bval = f'bytesval master {cIdx} {sIdx}'.encode()\n                bcol[str(sIdx)] = bval\n                masterSampList4.append(bval)\n\n        cmt = co.commit(f'master commit number: {cIdx}')\n        masterCmts[cmt] = (masterSampList1, masterSampList2, masterSampList3, masterSampList4)\n        co.close()\n\n    classrepo.remote.add('origin', server_instance_class)\n    push1 = classrepo.remote.push('origin', 'master')\n    assert push1 == 'master'\n    masterHist = list_history(classrepo._env.refenv, classrepo._env.branchenv, branch_name='master')\n\n    # Push dev branch test\n    devCmts = masterCmts.copy()\n    branch = classrepo.create_branch('testbranch')\n    for cIdx in range(nDevCommits):\n        co = classrepo.checkout(write=True, branch=branch.name)\n        devSampList1 = []\n        devSampList2 = []\n        devSampList3 = []\n        devSampList4 = []\n        with co.columns['writtenaset'] as d,\\\n                co.columns['_two'] as dd,\\\n                co.columns['str_col'] as scol, \\\n                co.columns['bytes_col'] as bcol:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n                del dd[prevKey]\n                del scol[prevKey]\n                del bcol[prevKey]\n\n            for sIdx in range(nDevSamples):\n                arr1 = np.random.randn(*array5by7_class.shape).astype(np.float32) * 100\n                d[str(sIdx)] = arr1\n                devSampList1.append(arr1)\n                arr2 = np.random.randn(20).astype(np.float32)\n                dd[str(sIdx)] = arr2\n                devSampList2.append(arr2)\n                sval = f'strval dev {cIdx} {sIdx}'\n                scol[str(sIdx)] = sval\n                devSampList3.append(sval)\n                bval = f'bytesval dev {cIdx} {sIdx}'.encode()\n                bcol[str(sIdx)] = bval\n                devSampList4.append(bval)\n\n        cmt = co.commit(f'dev commit number: {cIdx}')\n        devCmts[cmt] = (devSampList1, devSampList2, devSampList3, devSampList4)\n        co.close()\n\n    push2 = classrepo.remote.push('origin', branch.name)\n    assert push2 == branch.name\n    branchHist = list_history(classrepo._env.refenv, classrepo._env.branchenv, branch_name=branch.name)\n\n    yield branch, branchHist, devCmts, masterHist, server_instance_class\n    pass\n\n\nclass TestLargeRemoteServer:\n\n    @pytest.mark.filterwarnings('ignore::UserWarning')\n    @pytest.mark.parametrize('fetchAsetns', [\n        None, ('writtenaset',), ('_two',), ('str_col',), ('bytes_col',),\n    ])\n    @pytest.mark.parametrize('fetchBranch', [None, 'testbranch'])\n    @pytest.mark.parametrize('fetchCommit', [None, 'ma'])\n    @pytest.mark.parametrize('fetchAll_history', [False, True])\n    def test_server_push_two_branch_then_clone_fetch_data_options(\n            self, two_branch_multi_commit_repo_class, managed_tmpdir_class, array5by7_class,\n            fetchBranch, fetchCommit, fetchAsetns, fetchAll_history, tmp_path_factory):\n        from hangar import Repository\n        from operator import eq\n\n        branch, branchHist, devCmts, masterHist, server_instance = two_branch_multi_commit_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n        newRepo.remote.fetch('origin', branch=branch.name)\n        newRepo.create_branch('testbranch', base_commit=branchHist['head'])\n        assert newRepo.list_branches() == ['master', 'origin/master', f'origin/{branch.name}', branch.name]\n\n        # ------------------ format arguments depending on options -----------------\n\n        kwargs = {\n            'column_names': fetchAsetns,\n            'retrieve_all_history': fetchAll_history,\n        }\n        if fetchBranch is not None:\n            func = branchHist if fetchBranch == 'testbranch' else masterHist\n            kwargs['branch'] = fetchBranch\n            kwargs['commit'] = None\n        else:\n            func = branchHist if fetchBranch == 'br' else masterHist\n            kwargs['branch'] = None\n            kwargs['commit'] = func['head']\n\n        if fetchAll_history is True:\n            commits_to_check = func['order']\n        else:\n            commits_to_check = [func['head']]\n\n        # ----------------------- retrieve data with desired options --------------\n\n        # get data\n        fetch_commits = newRepo.remote.fetch_data(remote='origin', **kwargs)\n        assert commits_to_check == fetch_commits\n\n        # ------------- check that you got everything you expected ----------------\n\n        for fCmt in fetch_commits:\n            co = newRepo.checkout(commit=fCmt)\n            assert co.commit_hash == fCmt\n\n            # when we are checking one aset only\n            if isinstance(fetchAsetns, tuple):\n                d = co.columns[fetchAsetns[0]]\n                # ensure we didn't fetch the other data simultaneously\n                ds1SampList, ds2SampList, ds3SampList, ds4SampList = devCmts[fCmt]\n                if fetchAsetns[0] == 'writtenaset':\n                    compare = ds1SampList\n                    cmp_func = np.allclose\n                elif fetchAsetns[0] == '_two':\n                    compare = ds2SampList\n                    cmp_func = np.allclose\n                elif fetchAsetns[0] == 'str_col':\n                    compare = ds3SampList\n                    cmp_func = eq\n                else:\n                    compare = ds4SampList\n                    cmp_func = eq\n\n                for idx, samp in enumerate(compare):\n                    assert cmp_func(samp, d[str(idx)])\n\n            # compare both asets at the same time\n            else:\n                d = co.columns['writtenaset']\n                dd = co.columns['_two']\n                str_col = co.columns['str_col']\n                bytes_col = co.columns['bytes_col']\n                ds1List, ds2List, ds3List, ds4List = devCmts[fCmt]\n                for idx, ds1ds2ds3ds4 in enumerate(zip(ds1List, ds2List, ds3List, ds4List)):\n                    ds1, ds2, ds3, ds4 = ds1ds2ds3ds4\n                    assert np.allclose(ds1, d[str(idx)])\n                    assert np.allclose(ds2, dd[str(idx)])\n                    assert ds3 == str_col[str(idx)]\n                    assert ds4 == bytes_col[str(idx)]\n            co.close()\n        newRepo._env._close_environments()\n\n\n@pytest.fixture(scope='class')\ndef two_multi_format_repo_class(server_instance_class, classrepo):\n\n    co = classrepo.checkout(write=True)\n    array_flat = co.add_ndarray_column(name='array_flat', shape=(5, 7), dtype=np.float32)\n    array_nested = co.add_ndarray_column(name='array_nested', shape=(20,), dtype=np.float32, contains_subsamples=True)\n    str_flat = co.add_str_column('str_flat')\n    str_nested = co.add_str_column('str_nested', contains_subsamples=True)\n    bytes_flat = co.add_bytes_column('bytes_flat')\n    bytes_nested = co.add_bytes_column('bytes_nested', contains_subsamples=True)\n\n    for i in range(5):\n        arr = np.random.randn(5, 7).astype(np.float32)\n        array_flat[i] = arr\n    for i in range(5):\n        data = {f'{idx}': np.random.randn(20).astype(np.float32) for idx in range(4)}\n        array_nested[i] = data\n\n    for i in range(5):\n        str_flat[i] = f'string_{i}' * (i + 1)\n    for i in range(5):\n        data = {f'{idx}': f'string_{idx}' * (idx + 1) for idx in range(4)}\n        str_nested[i] = data\n\n    for i in range(5):\n        bytes_flat[i] = f'bytes_{i}'.encode() * (i + 1)\n    for i in range(5):\n        data = {f'{idx}': f'bytes_{idx}'.encode() * (idx + 1) for idx in range(4)}\n        bytes_nested[i] = data\n\n    cmt = co.commit('first commit')\n    co.close()\n    classrepo.remote.add('origin', server_instance_class)\n    classrepo.remote.push('origin', 'master')\n\n    yield cmt, server_instance_class\n    pass\n\n\nclass TestRemoteServerFetchDataSample:\n\n    @pytest.mark.filterwarnings('ignore::UserWarning')\n    @pytest.mark.parametrize('fetchOp', ['branch', 'commit'])\n    @pytest.mark.parametrize('column_name,keys', [\n        ('array_flat', 0),\n        ('array_flat', [0, 1]),\n        ('array_nested', 0),\n        ('array_nested', [0, 1]),\n        ('array_nested', [(0, '0')]),\n        ('array_nested', [(0, '0'), (1, '1')]),\n        ('array_nested', [0, (1, '1')]),\n        ('array_nested', [(0,)]),\n        ('array_nested', [(0, ...)]),\n        ('array_nested', [(0, ...), 1, (2, '2')]),\n        ('str_flat', 0),\n        ('str_flat', [0, 1]),\n        ('str_nested', 0),\n        ('str_nested', [0, 1]),\n        ('str_nested', [(0, '0')]),\n        ('str_nested', [(0, '0'), (1, '1')]),\n        ('str_nested', [0, (1, '1')]),\n        ('str_nested', [(0,)]),\n        ('str_nested', [(0, ...)]),\n        ('str_nested', [(0, ...), 1, (2, '2')]),\n        ('bytes_flat', 0),\n        ('bytes_flat', [0, 1]),\n        ('bytes_nested', 0),\n        ('bytes_nested', [0, 1]),\n        ('bytes_nested', [(0, '0')]),\n        ('bytes_nested', [(0, '0'), (1, '1')]),\n        ('bytes_nested', [0, (1, '1')]),\n        ('bytes_nested', [(0,)]),\n        ('bytes_nested', [(0, ...)]),\n        ('bytes_nested', [(0, ...), 1, (2, '2')]),\n    ])\n    def test_server_fetch_data_sample(\n            self, two_multi_format_repo_class, managed_tmpdir_class,\n            fetchOp, column_name, keys, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        # ------------------ format arguments depending on options -----------------\n\n        kwargs = {\n            'column': column_name,\n            'samples': keys\n        }\n        if fetchOp == 'branch':\n            kwargs['branch'] = 'master'\n        elif fetchOp == 'commit':\n            kwargs['commit'] = cmt\n        else:\n            raise ValueError(f'fetchOp unknown: {fetchOp}')\n\n        fetch_commit = newRepo.remote.fetch_data_sample(remote='origin', **kwargs)\n        assert fetch_commit == cmt\n\n        co = newRepo.checkout()\n        try:\n            col = co[column_name]\n            if isinstance(keys, (list, tuple)):\n                if column_name.endswith('flat'):\n                    for key in keys:\n                        assert col[key] is not None\n                else:\n                    for sample in keys:\n                        if isinstance(sample, (list, tuple)):\n                            if len(sample) == 2:\n                                assert col[sample[0]][sample[1]] is not None\n                            elif len(sample) == 1:\n                                assert col[sample[0]][...] is not None\n                        else:\n                            assert col[sample][...] is not None\n        finally:\n            co.close()\n            newRepo._env._close_environments()\n\n    def test_server_fetch_data_sample_commit_not_existing(\n            self, two_multi_format_repo_class, managed_tmpdir_class, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        with pytest.raises(ValueError, match='specified commit'):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                commit='DOESNOTEXISTCOMMIT',\n                column='array_flat',\n                samples=[0, 1])\n\n        newRepo._env._close_environments()\n\n    def test_server_fetch_data_sample_branch_not_existing(\n            self, two_multi_format_repo_class, managed_tmpdir_class, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        with pytest.raises(ValueError, match='branch with name'):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='DOESNOTEXISTBRANCH',\n                column='array_flat',\n                samples=[0, 1])\n\n        newRepo._env._close_environments()\n\n    def test_server_fetch_data_sample_branch_and_commit_args_passed_fails(\n            self, two_multi_format_repo_class, managed_tmpdir_class, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        with pytest.raises(ValueError, match='``branch`` and ``commit``'):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',  # actual value which might otherwise work\n                commit=cmt,  # actual value which might otherwise work\n                column='array_flat',\n                samples=[0, 1])\n\n        newRepo._env._close_environments()\n\n    def test_server_fetch_data_sample_not_existing_fails(\n            self, two_multi_format_repo_class, managed_tmpdir_class, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        with pytest.raises(KeyError):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',\n                column='array_flat',\n                samples=['DOESNOTEXIST'])\n\n        with pytest.raises(KeyError):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',\n                column='array_nested',\n                samples=[(1, 'DOESNOTEXIST')])\n\n        with pytest.raises(KeyError):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',\n                column='array_nested',\n                samples=[('DOESNOTEXIST', 0)])\n\n        newRepo._env._close_environments()\n\n    def test_server_fetch_data_sample_not_valid_type(\n            self, two_multi_format_repo_class, managed_tmpdir_class, tmp_path_factory\n    ):\n        from hangar import Repository\n\n        cmt, server_instance = two_multi_format_repo_class\n\n        # Clone test (master branch)\n        _new_tmpdir = tmp_path_factory.mktemp('newclone', numbered=True)\n        new_tmpdir = str(_new_tmpdir)\n        newRepo = Repository(path=new_tmpdir, exists=False)\n        newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n        with pytest.raises(TypeError):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',\n                column='array_flat',\n                samples=[b'BYTES_TYPE_NOT_VALID'])\n\n        with pytest.raises(ValueError, match='nested column specifier sequence'):\n            newRepo.remote.fetch_data_sample(\n                remote='origin',\n                branch='master',\n                column='array_nested',\n                samples=[(0, 1, 'ARRAY_NOT_VALID')])\n\n        newRepo._env._close_environments()\n\n\ndef test_push_unchanged_repo_makes_no_modifications(written_two_cmt_server_repo):\n    _, repo = written_two_cmt_server_repo\n    with pytest.warns(UserWarning):\n        branchName = repo.remote.push('origin', 'master')\n    assert branchName == 'master'\n\n\ndef test_fetch_unchanged_repo_makes_no_modifications(written_two_cmt_server_repo):\n    _, repo = written_two_cmt_server_repo\n    with pytest.warns(UserWarning):\n        branchName = repo.remote.fetch('origin', 'master')\n    assert branchName == 'master'\n\n\ndef test_fetch_newer_disk_repo_makes_no_modifications(written_two_cmt_server_repo):\n    _, repo = written_two_cmt_server_repo\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co['test_meta'][0] = 'lol'\n    co.commit('newer commit')\n    co.close()\n    with pytest.warns(UserWarning):\n        branchName = repo.remote.fetch('origin', 'master')\n    assert branchName == 'master'\n\n\ndef test_fetch_branch_which_does_not_exist_client_server_raises_rpc_error(written_two_cmt_server_repo):\n    import grpc\n    _, repo = written_two_cmt_server_repo\n    with pytest.raises(grpc.RpcError) as rpc_error:\n        repo.remote.fetch('origin', 'not-a-branch')\n    assert rpc_error.value._state.code == grpc.StatusCode.NOT_FOUND\n\n\ndef test_fetch_branch_on_client_which_does_not_existserver_raises_rpc_error(written_two_cmt_server_repo):\n    import grpc\n    _, repo = written_two_cmt_server_repo\n    repo.create_branch('new-branch')\n    with pytest.raises(grpc.RpcError) as exc_info:\n        repo.remote.fetch('origin', 'new-branch')\n    assert exc_info.value._state.code == grpc.StatusCode.NOT_FOUND\n\n\ndef test_push_clone_three_way_merge(server_instance, repo_2_br_no_conf, managed_tmpdir):\n    from hangar import Repository\n\n    repo_2_br_no_conf.remote.add('origin', server_instance)\n    push1 = repo_2_br_no_conf.remote.push('origin', 'master')\n    assert push1 == 'master'\n    push2 = repo_2_br_no_conf.remote.push('origin', 'testbranch')\n    assert push2 == 'testbranch'\n\n    test_head = repo_2_br_no_conf.log(branch='testbranch', return_contents=True)['head']\n    master_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n\n    merge_cmt = repo_2_br_no_conf.merge('merge commit', 'master', 'testbranch')\n    merge_head = repo_2_br_no_conf.log(branch='master', return_contents=True)['head']\n    merge_order = repo_2_br_no_conf.log(branch='master', return_contents=True)['order']\n    merge_push = repo_2_br_no_conf.remote.push('origin', 'master')\n    assert merge_push == 'master'\n    assert merge_head != master_head\n    assert merge_head != test_head\n\n    new_tmpdir = pjoin(managed_tmpdir, 'new')\n    mkdir(new_tmpdir)\n    newRepo = Repository(path=new_tmpdir, exists=False)\n    newRepo.clone('Test User', 'tester@foo.com', server_instance, remove_old=True)\n\n    clone_head = newRepo.log(branch='master', return_contents=True)['head']\n    clone_order = newRepo.log(branch='master', return_contents=True)['order']\n    assert clone_head == merge_head == merge_cmt\n    assert merge_order == clone_order\n    newRepo._env._close_environments()\n\n\n# -----------------------------------------------------------------------------\n\n\ndef test_push_restricted_with_right_username_password(server_instance_push_restricted, repo, managed_tmpdir):\n    from hangar import Repository\n\n    # Push master branch test\n    masterCmtList = []\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='aset', shape=(50, 20), dtype=np.float32)\n    for cIdx in range(1):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n        masterSampList = []\n        with co.columns['aset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(70):\n                arr = np.random.randn(50, 20).astype(np.float32)\n                d[str(sIdx)] = arr\n                masterSampList.append(arr)\n        cmt = co.commit(f'master commit number: {cIdx}')\n        masterCmtList.append((cmt, masterSampList))\n        co.close()\n\n    repo.remote.add('origin', server_instance_push_restricted)\n    push1 = repo.remote.push('origin',\n                             'master',\n                             username='right_username',\n                             password='right_password')\n    assert push1 == 'master'\n\n    # Clone test (master branch)\n    new_tmpdir = pjoin(managed_tmpdir, 'new')\n    mkdir(new_tmpdir)\n    newRepo = Repository(path=new_tmpdir, exists=False)\n    newRepo.clone('Test User', 'tester@foo.com', server_instance_push_restricted, remove_old=True)\n    assert newRepo.list_branches() == ['master', 'origin/master']\n    for cmt, sampList in masterCmtList:\n        newRepo.remote.fetch_data('origin', commit=cmt)\n        nco = newRepo.checkout(commit=cmt)\n        assert len(nco.columns) == 1\n        assert 'aset' in nco.columns\n        assert len(nco.columns['aset']) == 70\n        for sIdx, samp in enumerate(sampList):\n            assert np.allclose(nco.columns['aset'][str(sIdx)], samp)\n        nco.close()\n    newRepo._env._close_environments()\n\n\ndef test_push_restricted_wrong_user_and_password(server_instance_push_restricted, repo, managed_tmpdir):\n\n    # Push master branch test\n    masterCmtList = []\n    co = repo.checkout(write=True)\n    co.add_ndarray_column(name='aset', shape=(50, 20), dtype=np.float32)\n    for cIdx in range(1):\n        if cIdx != 0:\n            co = repo.checkout(write=True)\n        masterSampList = []\n        with co.columns['aset'] as d:\n            for prevKey in list(d.keys())[1:]:\n                del d[prevKey]\n            for sIdx in range(70):\n                arr = np.random.randn(50, 20).astype(np.float32)\n                d[str(sIdx)] = arr\n                masterSampList.append(arr)\n        cmt = co.commit(f'master commit number: {cIdx}')\n        masterCmtList.append((cmt, masterSampList))\n        co.close()\n\n    repo.remote.add('origin', server_instance_push_restricted)\n    with pytest.raises(PermissionError):\n        push1 = repo.remote.push('origin',\n                                 'master',\n                                 username='wrong_username',\n                                 password='right_password')\n\n    with pytest.raises(PermissionError):\n        push2 = repo.remote.push('origin',\n                                 'master',\n                                 username='right_username',\n                                 password='wrong_password')\n\n    with pytest.raises(PermissionError):\n        push3 = repo.remote.push('origin',\n                                 'master',\n                                 username='wrong_username',\n                                 password='wrong_password')\n"
  },
  {
    "path": "tests/test_repo_integrity_verification.py",
    "content": "import pytest\n\nimport numpy as np\n\n\n@pytest.fixture()\ndef diverse_repo(repo):\n    co = repo.checkout(write=True)\n    co.add_ndarray_column('test', prototype=np.arange(10))\n    co.add_str_column('test_meta')\n    co.add_bytes_column('test_bytes')\n    co.columns['test'][0] = np.arange(10)\n    co.columns['test'][1] = np.arange(10) + 1\n    co.columns['test'][2] = np.arange(10) + 2\n    co.columns['test'][3] = np.arange(10) + 3\n    co.columns['test'][4] = np.arange(10) + 4\n    co['test_meta']['hi'] = 'foo'\n    co['test_meta']['aea'] = 'eeae'\n    co['test_bytes']['lol'] = b'foo bytes'\n    co.commit('hello world')\n\n    sample_trimg = np.arange(50).reshape(5, 10).astype(np.uint8)\n    sample_trlabel = np.array([0], dtype=np.int64)\n    sample_vimg = np.zeros(50).reshape(5, 10).astype(np.uint16)\n    sample_vlabel = np.array([1], dtype=np.int32)\n\n    co.close()\n    repo.create_branch('dev')\n    co = repo.checkout(write=True, branch='dev')\n    dset_trlabels = co.add_ndarray_column(name='train_labels', prototype=sample_trlabel)\n    dset_trimgs = co.add_ndarray_column('train_images', prototype=sample_trimg, backend='01')\n    dset_trlabels[0] = sample_trlabel\n    dset_trlabels[1] = sample_trlabel + 1\n    dset_trlabels[2] = sample_trlabel + 2\n    dset_trimgs[0] = sample_trimg\n    dset_trimgs[1] = sample_trimg + 1\n    dset_trimgs[2] = sample_trimg + 2\n    co.commit('second on dev')\n    co.close()\n\n    co = repo.checkout(write=True, branch='master')\n    dset_vimgs = co.add_ndarray_column('valid_images', prototype=sample_vimg)\n    dset_vlabels = co.add_ndarray_column('valid_labels', prototype=sample_vlabel)\n    dset_vlabels[0] = sample_vlabel\n    dset_vlabels[1] = sample_vlabel + 1\n    dset_vlabels[2] = sample_vlabel + 2\n    dset_vimgs[0] = sample_vimg\n    dset_vimgs[1] = sample_vimg + 1\n    dset_vimgs[2] = sample_vimg + 2\n    co['test_meta']['second'] = 'on master now'\n    co['test_bytes']['second'] = b'on master now'\n    co.commit('second on master')\n    co.close()\n\n    base = repo.merge('merge commit', 'master', 'dev')\n    repo.create_branch('newbranch', base_commit=base)\n    co = repo.checkout(write=True, branch='master')\n    co['test_meta']['newmeta'] = 'wow'\n    co['test_bytes']['newbytes'] = b'new bytesdata'\n    co.commit('on master after merge')\n    co.close()\n\n    co = repo.checkout(write=True, branch='newbranch')\n    ds_trimgs = co.columns['train_images']\n    ds_trlabels = co.columns['train_labels']\n    ds_trlabels[3] = sample_trlabel + 3\n    ds_trlabels[4] = sample_trlabel + 4\n    ds_trlabels[5] = sample_trlabel + 5\n    ds_trimgs[3] = sample_trimg + 3\n    ds_trimgs[4] = sample_trimg + 4\n    ds_trimgs[5] = sample_trimg + 5\n    co.commit('on newdev after merge')\n    co.close()\n\n    base = repo.merge('new merge commit', 'master', 'newbranch')\n    return repo\n\n\ndef test_verify_correct(diverse_repo):\n    assert diverse_repo.verify_repo_integrity() is True\n\n\nclass TestVerifyCommitRefDigests(object):\n\n    def test_remove_array_digest_is_caught(self, diverse_repo):\n        from hangar.records import hashs\n        from hangar.diagnostics.integrity import _verify_commit_ref_digests_exist\n\n        hq = hashs.HashQuery(diverse_repo._env.hashenv)\n        keys_to_remove = list(hq.gen_all_hash_keys_db())\n\n        for key_removed in keys_to_remove:\n            with diverse_repo._env.hashenv.begin(write=True) as txn:\n                val_removed = txn.get(key_removed)\n                txn.delete(key_removed)\n\n            with pytest.raises(RuntimeError):\n                _verify_commit_ref_digests_exist(hashenv=diverse_repo._env.hashenv,\n                                                 refenv=diverse_repo._env.refenv)\n\n            with diverse_repo._env.hashenv.begin(write=True) as txn:\n                txn.put(key_removed, val_removed)\n\n    def test_remove_schema_digest_is_caught(self, diverse_repo):\n        from hangar.records import hashs\n        from hangar.diagnostics.integrity import _verify_commit_ref_digests_exist\n\n        hq = hashs.HashQuery(diverse_repo._env.hashenv)\n        keys_to_remove = list(hq.gen_all_schema_keys_db())\n        for key_removed in keys_to_remove:\n            with diverse_repo._env.hashenv.begin(write=True) as txn:\n                val_removed = txn.get(key_removed)\n                txn.delete(key_removed)\n\n            with pytest.raises(RuntimeError):\n                _verify_commit_ref_digests_exist(hashenv=diverse_repo._env.hashenv,\n                                                 refenv=diverse_repo._env.refenv)\n\n            with diverse_repo._env.hashenv.begin(write=True) as txn:\n                txn.put(key_removed, val_removed)\n\n\nclass TestVerifyCommitTree(object):\n\n    def test_parent_ref_digest_of_cmt_does_not_exist(self, diverse_repo):\n        from hangar.diagnostics.integrity import _verify_commit_tree_integrity\n        from hangar.records.parsing import commit_parent_db_key_from_raw_key\n\n        repo = diverse_repo\n        history = repo.log(return_contents=True)\n        all_commits = history['order']\n        for cmt in all_commits:\n            parentKey = commit_parent_db_key_from_raw_key(cmt)\n            with repo._env.refenv.begin(write=True) as txn:\n                parentVal = txn.get(parentKey)\n                txn.delete(parentKey)\n\n            with pytest.raises(RuntimeError, match='Data corruption detected for parent ref'):\n                _verify_commit_tree_integrity(repo._env.refenv)\n\n            with repo._env.refenv.begin(write=True) as txn:\n                txn.put(parentKey, parentVal)\n\n    def test_parent_ref_references_nonexisting_commits(self, diverse_repo):\n        from hangar.diagnostics.integrity import _verify_commit_tree_integrity\n        from hangar.records.parsing import commit_parent_db_key_from_raw_key\n        from hangar.records.parsing import commit_parent_raw_val_from_db_val\n        from hangar.records.parsing import commit_parent_db_val_from_raw_val\n\n        repo = diverse_repo\n        history = repo.log(return_contents=True)\n        all_commits = history['order']\n        for cmt in all_commits:\n            parentKey = commit_parent_db_key_from_raw_key(cmt)\n            with repo._env.refenv.begin(write=True) as txn:\n                parentVal = txn.get(parentKey)\n                parent_raw = commit_parent_raw_val_from_db_val(parentVal)\n                parent = parent_raw.ancestor_spec\n\n                if parent.dev_ancestor:\n                    modifiedVal = commit_parent_db_val_from_raw_val(\n                        master_ancestor=parent.master_ancestor,\n                        dev_ancestor='corrupt',\n                        is_merge_commit=parent.is_merge_commit)\n                elif parent.master_ancestor:\n                    modifiedVal = commit_parent_db_val_from_raw_val(\n                        master_ancestor='corrupt',\n                        dev_ancestor=parent.dev_ancestor,\n                        is_merge_commit=parent.is_merge_commit)\n                else:\n                    continue\n\n                txn.put(parentKey, modifiedVal.raw, overwrite=True)\n\n            with pytest.raises(RuntimeError, match='Data corruption detected in commit tree'):\n                _verify_commit_tree_integrity(repo._env.refenv)\n\n            with repo._env.refenv.begin(write=True) as txn:\n                txn.put(parentKey, parentVal)\n\n    def test_parent_ref_has_two_initial_commits(self, diverse_repo):\n        from hangar.diagnostics.integrity import _verify_commit_tree_integrity\n        from hangar.records.parsing import commit_parent_db_key_from_raw_key\n\n        repo = diverse_repo\n        repo = diverse_repo\n        history = repo.log(return_contents=True)\n        all_commits = history['order']\n        initial_commit = all_commits[-1]\n        for cmt in all_commits:\n            if cmt == initial_commit:\n                continue\n\n            parentKey = commit_parent_db_key_from_raw_key(cmt)\n            with repo._env.refenv.begin(write=True) as txn:\n                parentVal = txn.get(parentKey)\n                txn.put(parentKey, b'', overwrite=True)\n\n            with pytest.raises(RuntimeError, match='Commit tree integrity compromised. Multiple \"initial\"'):\n                _verify_commit_tree_integrity(repo._env.refenv)\n\n            with repo._env.refenv.begin(write=True) as txn:\n                txn.put(parentKey, parentVal, overwrite=True)\n\n\nclass TestBranchIntegrity(object):\n\n    def test_atleast_one_branch_exists(self, diverse_repo):\n        from hangar.records.heads import get_branch_names\n        from hangar.records.parsing import repo_branch_head_db_key_from_raw_key\n        from hangar.diagnostics.integrity import _verify_branch_integrity\n\n        branch_names = get_branch_names(diverse_repo._env.branchenv)\n        with diverse_repo._env.branchenv.begin(write=True) as txn:\n            for bname in branch_names:\n                branchKey = repo_branch_head_db_key_from_raw_key(bname)\n                txn.delete(branchKey)\n\n        with pytest.raises(\n                RuntimeError,\n                match='Branch map compromised. Repo must contain atleast one branch'\n        ):\n            _verify_branch_integrity(diverse_repo._env.branchenv, diverse_repo._env.refenv)\n\n    def test_branch_name_head_commit_digests_exist(self, diverse_repo):\n        from hangar.records.heads import get_branch_names, get_branch_head_commit\n        from hangar.records.parsing import commit_ref_db_key_from_raw_key\n        from hangar.records.parsing import commit_parent_db_key_from_raw_key\n        from hangar.records.parsing import commit_spec_db_key_from_raw_key\n        from hangar.diagnostics.integrity import _verify_branch_integrity\n\n        branch_names = get_branch_names(diverse_repo._env.branchenv)\n        for bname in branch_names:\n            bhead = get_branch_head_commit(diverse_repo._env.branchenv, branch_name=bname)\n            with diverse_repo._env.refenv.begin(write=True) as txn:\n                cmtRefKey = commit_ref_db_key_from_raw_key(bhead)\n                cmtSpecKey = commit_spec_db_key_from_raw_key(bhead)\n                cmtParentKey = commit_parent_db_key_from_raw_key(bhead)\n\n                cmtRefVal = txn.get(cmtRefKey)\n                cmtSpecVal = txn.get(cmtSpecKey)\n                cmtParentVal = txn.get(cmtParentKey)\n\n                txn.delete(cmtRefKey)\n                txn.delete(cmtSpecKey)\n                txn.delete(cmtParentKey)\n\n            with pytest.raises(RuntimeError, match='Branch commit map compromised. Branch name'):\n                _verify_branch_integrity(diverse_repo._env.branchenv, diverse_repo._env.refenv)\n\n            with diverse_repo._env.refenv.begin(write=True) as txn:\n                txn.put(cmtRefKey, cmtRefVal)\n                txn.put(cmtSpecKey, cmtSpecVal)\n                txn.put(cmtParentKey, cmtParentVal)\n\n    def test_staging_head_branch_name_exists(self, diverse_repo):\n        from hangar.records.heads import get_staging_branch_head\n        from hangar.records.parsing import repo_branch_head_db_key_from_raw_key\n        from hangar.diagnostics.integrity import _verify_branch_integrity\n\n        bname = get_staging_branch_head(diverse_repo._env.branchenv)\n        with diverse_repo._env.branchenv.begin(write=True) as txn:\n            branchKey = repo_branch_head_db_key_from_raw_key(bname)\n            txn.delete(branchKey)\n\n        with pytest.raises(\n                RuntimeError,\n                match='Brach commit map compromised. Staging head refers to branch name'\n        ):\n            _verify_branch_integrity(diverse_repo._env.branchenv, diverse_repo._env.refenv)\n\n\ndef test_data_digest_modification_is_caught(diverse_repo):\n    from hangar.records import hashs\n    from hangar.diagnostics.integrity import _verify_column_integrity\n\n    hq = hashs.HashQuery(diverse_repo._env.hashenv)\n    keys_to_replace = list(hq.gen_all_hash_keys_db())\n    replacer_key = keys_to_replace.pop()\n    for kreplaced in keys_to_replace:\n        with diverse_repo._env.hashenv.begin(write=True) as txn:\n            replacer_val = txn.get(replacer_key)\n            vreplaced = txn.get(kreplaced)\n            txn.put(kreplaced, replacer_val)\n\n        with pytest.raises(RuntimeError):\n            _verify_column_integrity(hashenv=diverse_repo._env.hashenv,\n                                     repo_path=diverse_repo._env.repo_path)\n\n        with diverse_repo._env.hashenv.begin(write=True) as txn:\n            txn.put(kreplaced, vreplaced)\n\n\ndef test_data_digest_remote_location_warns(diverse_repo):\n    from hangar.records import hashs\n    from hangar.diagnostics.integrity import _verify_column_integrity\n\n    hq = hashs.HashQuery(diverse_repo._env.hashenv)\n    replace_key = list(hq.gen_all_hash_keys_db())[0]\n    with diverse_repo._env.hashenv.begin(write=True) as txn:\n        txn.put(replace_key, b'50:ekaearar')\n\n    with pytest.warns(RuntimeWarning, match='Can not verify integrity of partially fetched array'):\n        _verify_column_integrity(hashenv=diverse_repo._env.hashenv,\n                                 repo_path=diverse_repo._env.repo_path)\n\n\ndef test_schema_digest_modification_is_caught(diverse_repo):\n    from hangar.records import hashs\n    from hangar.diagnostics.integrity import _verify_schema_integrity\n\n    hq = hashs.HashQuery(diverse_repo._env.hashenv)\n    keys_to_replace = list(hq.gen_all_schema_keys_db())\n    replacer_key = keys_to_replace.pop()\n    for kreplaced in keys_to_replace:\n        with diverse_repo._env.hashenv.begin(write=True) as txn:\n            replacer_val = txn.get(replacer_key)\n            vreplaced = txn.get(kreplaced)\n            txn.put(kreplaced, replacer_val)\n\n        with pytest.raises(RuntimeError, match='Data corruption detected for schema. Expected digest'):\n            _verify_schema_integrity(hashenv=diverse_repo._env.hashenv)\n\n        with diverse_repo._env.hashenv.begin(write=True) as txn:\n            txn.put(kreplaced, vreplaced)\n"
  },
  {
    "path": "tests/test_utils.py",
    "content": "import pytest\n\n\n@pytest.mark.parametrize('arg,key,expected', [\n    ['AAABBBCCC', None, ['A', 'B', 'C']],\n    ['AAABbBCcC', str.lower, ['A', 'B', 'C']],\n    ['ABACBACDA', None, ['A', 'B', 'C', 'D']],\n    ['ABacBaCAd', str.upper, ['A', 'B', 'c', 'd']],\n])\ndef test_unique_everseen(arg, key, expected):\n    from hangar.utils import unique_everseen\n\n    res = list(unique_everseen(arg, key=key))\n    assert res == expected\n\n\n@pytest.mark.parametrize('pth', [pytest.File, None, 123])\ndef test_valid_directory_path_errors_on_invalid_path_arg(pth):\n    from hangar.utils import is_valid_directory_path\n    with pytest.raises(TypeError, match='Path arg `p`'):\n        is_valid_directory_path(pth)\n\n\ndef test_valid_directory_path_recognizes_not_a_directory(managed_tmpdir):\n    from hangar.utils import is_valid_directory_path\n    from pathlib import Path\n\n    test_pth = Path(managed_tmpdir, 'test.txt').resolve()\n    with test_pth.open('w+') as f:\n        f.write('hello')\n    with pytest.raises(NotADirectoryError):\n        is_valid_directory_path(test_pth)\n\n\n@pytest.mark.parametrize('arg,expected', [\n    [1, '1.00 B'],\n    [1234, '1.23 kB'],\n    [12345678, '12.35 MB'],\n    [1234567890, '1.23 GB'],\n    [1234567890000, '1.23 TB'],\n    [1234567890000000, '1.23 PB']\n])\ndef test_format_bytes(arg, expected):\n    from hangar.utils import format_bytes\n\n    res = format_bytes(arg)\n    assert res == expected\n\n\n@pytest.mark.parametrize('arg,expected', [\n    ['100', 100],\n    ['100 MB', 100000000],\n    ['100M', 100000000],\n    ['5kB', 5000],\n    ['5.4 kB', 5400],\n    ['1kiB', 1024],\n    ['1e6', 1000000],\n    ['1e6 kB', 1000000000],\n    ['MB', 1000000]\n])\ndef test_parse_bytes(arg, expected):\n    from hangar.utils import parse_bytes\n\n    res = parse_bytes(arg)\n    assert res == expected\n\n\n@pytest.mark.parametrize('arg,expected', [\n    [0, 2],\n    [1, 2],\n    [2, 2],\n    [3, 3],\n    [4, 5],\n    [7, 7],\n    [174, 179],\n    [10065, 10067],\n    [104721, 104723],\n])\ndef test_find_next_prime(arg, expected):\n    from hangar.optimized_utils import find_next_prime\n\n    res = find_next_prime(arg)\n    assert res == expected\n"
  },
  {
    "path": "tests/test_version.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\nPortions of this code have been taken and modified from the \"packaging\" project.\n\nURL:      https://github.com/pypa/packaging\nFiles:    tests/test_version.py\n          tests/test_structures.py\nCommit:   6a09d4015b54f80762ff3ef1597a8b6740563c19\nAccessed: 11 DEC 2019\n\npackaging License\n-------------------------------------------------------------------------------\nLicense: Dual licensed under the terms of the Apache License, Version 2.0, and the BSD License.\nURL:     https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE\n         https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE.APACHE\n         https://github.com/pypa/packaging/blob/6a09d4015b/LICENSE.BSD\n\"\"\"\nimport itertools\nimport operator\n\nimport pretend\nimport pytest\n\nfrom hangar._version import (\n    Infinity, InvalidVersion, NegativeInfinity, Version, parse\n)\n\n# ------------------ Test Structures ---------------------\n\n\ndef test_infinity_repr():\n    repr(Infinity) == \"Infinity\"\n\n\ndef test_negative_infinity_repr():\n    repr(NegativeInfinity) == \"-Infinity\"\n\n\ndef test_infinity_hash():\n    assert hash(Infinity) == hash(Infinity)\n\n\ndef test_negative_infinity_hash():\n    assert hash(NegativeInfinity) == hash(NegativeInfinity)\n\n\n@pytest.mark.parametrize(\"left\", [1, \"a\", (\"b\", 4)])\ndef test_infinity_comparison(left):\n    assert left < Infinity\n    assert left <= Infinity\n    assert not left == Infinity\n    assert left != Infinity\n    assert not left > Infinity\n    assert not left >= Infinity\n\n\n@pytest.mark.parametrize(\"left\", [1, \"a\", (\"b\", 4)])\ndef test_negative_infinity_lesser(left):\n    assert not left < NegativeInfinity\n    assert not left <= NegativeInfinity\n    assert not left == NegativeInfinity\n    assert left != NegativeInfinity\n    assert left > NegativeInfinity\n    assert left >= NegativeInfinity\n\n\ndef test_infinty_equal():\n    assert Infinity == Infinity\n\n\ndef test_negative_infinity_equal():\n    assert NegativeInfinity == NegativeInfinity\n\n\ndef test_negate_infinity():\n    assert isinstance(-Infinity, NegativeInfinity.__class__)\n\n\ndef test_negate_negative_infinity():\n    assert isinstance(-NegativeInfinity, Infinity.__class__)\n\n\n# ---------------------- Test Versions ---------------------------\n\n\n@pytest.mark.parametrize(\n    (\"version\", \"klass\"), [(\"1.0\", Version)]\n)\ndef test_parse(version, klass):\n    assert isinstance(parse(version), klass)\n\n\ndef test_legacy_version_raises():\n    with pytest.raises(InvalidVersion):\n        parse('1-1-1')\n\n\n# This list must be in the correct sorting order\nVERSIONS = [\n    # Implicit epoch of 0\n    \"1.0.dev456\",\n    \"1.0a1\",\n    \"1.0a12.dev456\",\n    \"1.0a12\",\n    \"1.0b1.dev456\",\n    \"1.0b2.post345.dev456\",\n    \"1.0b2.post345\",\n    \"1.0b2-346\",\n    \"1.0c1.dev456\",\n    \"1.0rc2\",\n    \"1.0c3\",\n    \"1.0\",\n    \"1.0.post456\",\n    \"1.1.dev1\",\n    \"1.2+abc\",\n    \"1.2+abc123def\",\n    \"1.2+123456\",\n    # Explicit epoch of 1\n    \"1!1.0.dev456\",\n    \"1!1.0a2.dev456\",\n    \"1!1.0c1\",\n    \"1!1.0c3\",\n    \"1!1.0\",\n    \"1!1.0.post456.dev34\",\n    \"1!1.2+123abc\",\n    \"1!1.2+123abc456\",\n    \"1!1.2+abc123\",\n    \"1!1.2+123456\",\n    \"1!1.2.r32+123456\",\n]\n\n\nclass TestVersion:\n    @pytest.mark.parametrize(\"version\", VERSIONS)\n    def test_valid_versions(self, version):\n        Version(version)\n\n    @pytest.mark.parametrize(\n        \"version\",\n        [\n            # Non sensical versions should be invalid\n            \"french toast\",\n            # Versions with invalid local versions\n            \"1.0+a+\",\n            \"1.0++\",\n            \"1.0+_foobar\",\n            \"1.0+foo&asd\",\n            \"1.0+1+1\",\n        ],\n    )\n    def test_invalid_versions(self, version):\n        with pytest.raises(InvalidVersion):\n            Version(version)\n\n    @pytest.mark.parametrize(\n        (\"version\", \"normalized\"),\n        [\n            # Various development release incarnations\n            (\"1.0dev\", \"1.0.dev0\"),\n            (\"1.0.dev\", \"1.0.dev0\"),\n            (\"1.0dev1\", \"1.0.dev1\"),\n            (\"1.0dev\", \"1.0.dev0\"),\n            (\"1.0-dev\", \"1.0.dev0\"),\n            (\"1.0-dev1\", \"1.0.dev1\"),\n            (\"1.0DEV\", \"1.0.dev0\"),\n            (\"1.0.DEV\", \"1.0.dev0\"),\n            (\"1.0DEV1\", \"1.0.dev1\"),\n            (\"1.0DEV\", \"1.0.dev0\"),\n            (\"1.0.DEV1\", \"1.0.dev1\"),\n            (\"1.0-DEV\", \"1.0.dev0\"),\n            (\"1.0-DEV1\", \"1.0.dev1\"),\n            # Various alpha incarnations\n            (\"1.0a\", \"1.0a0\"),\n            (\"1.0.a\", \"1.0a0\"),\n            (\"1.0.a1\", \"1.0a1\"),\n            (\"1.0-a\", \"1.0a0\"),\n            (\"1.0-a1\", \"1.0a1\"),\n            (\"1.0alpha\", \"1.0a0\"),\n            (\"1.0.alpha\", \"1.0a0\"),\n            (\"1.0.alpha1\", \"1.0a1\"),\n            (\"1.0-alpha\", \"1.0a0\"),\n            (\"1.0-alpha1\", \"1.0a1\"),\n            (\"1.0A\", \"1.0a0\"),\n            (\"1.0.A\", \"1.0a0\"),\n            (\"1.0.A1\", \"1.0a1\"),\n            (\"1.0-A\", \"1.0a0\"),\n            (\"1.0-A1\", \"1.0a1\"),\n            (\"1.0ALPHA\", \"1.0a0\"),\n            (\"1.0.ALPHA\", \"1.0a0\"),\n            (\"1.0.ALPHA1\", \"1.0a1\"),\n            (\"1.0-ALPHA\", \"1.0a0\"),\n            (\"1.0-ALPHA1\", \"1.0a1\"),\n            # Various beta incarnations\n            (\"1.0b\", \"1.0b0\"),\n            (\"1.0.b\", \"1.0b0\"),\n            (\"1.0.b1\", \"1.0b1\"),\n            (\"1.0-b\", \"1.0b0\"),\n            (\"1.0-b1\", \"1.0b1\"),\n            (\"1.0beta\", \"1.0b0\"),\n            (\"1.0.beta\", \"1.0b0\"),\n            (\"1.0.beta1\", \"1.0b1\"),\n            (\"1.0-beta\", \"1.0b0\"),\n            (\"1.0-beta1\", \"1.0b1\"),\n            (\"1.0B\", \"1.0b0\"),\n            (\"1.0.B\", \"1.0b0\"),\n            (\"1.0.B1\", \"1.0b1\"),\n            (\"1.0-B\", \"1.0b0\"),\n            (\"1.0-B1\", \"1.0b1\"),\n            (\"1.0BETA\", \"1.0b0\"),\n            (\"1.0.BETA\", \"1.0b0\"),\n            (\"1.0.BETA1\", \"1.0b1\"),\n            (\"1.0-BETA\", \"1.0b0\"),\n            (\"1.0-BETA1\", \"1.0b1\"),\n            # Various release candidate incarnations\n            (\"1.0c\", \"1.0rc0\"),\n            (\"1.0.c\", \"1.0rc0\"),\n            (\"1.0.c1\", \"1.0rc1\"),\n            (\"1.0-c\", \"1.0rc0\"),\n            (\"1.0-c1\", \"1.0rc1\"),\n            (\"1.0rc\", \"1.0rc0\"),\n            (\"1.0.rc\", \"1.0rc0\"),\n            (\"1.0.rc1\", \"1.0rc1\"),\n            (\"1.0-rc\", \"1.0rc0\"),\n            (\"1.0-rc1\", \"1.0rc1\"),\n            (\"1.0C\", \"1.0rc0\"),\n            (\"1.0.C\", \"1.0rc0\"),\n            (\"1.0.C1\", \"1.0rc1\"),\n            (\"1.0-C\", \"1.0rc0\"),\n            (\"1.0-C1\", \"1.0rc1\"),\n            (\"1.0RC\", \"1.0rc0\"),\n            (\"1.0.RC\", \"1.0rc0\"),\n            (\"1.0.RC1\", \"1.0rc1\"),\n            (\"1.0-RC\", \"1.0rc0\"),\n            (\"1.0-RC1\", \"1.0rc1\"),\n            # Various post release incarnations\n            (\"1.0post\", \"1.0.post0\"),\n            (\"1.0.post\", \"1.0.post0\"),\n            (\"1.0post1\", \"1.0.post1\"),\n            (\"1.0post\", \"1.0.post0\"),\n            (\"1.0-post\", \"1.0.post0\"),\n            (\"1.0-post1\", \"1.0.post1\"),\n            (\"1.0POST\", \"1.0.post0\"),\n            (\"1.0.POST\", \"1.0.post0\"),\n            (\"1.0POST1\", \"1.0.post1\"),\n            (\"1.0POST\", \"1.0.post0\"),\n            (\"1.0r\", \"1.0.post0\"),\n            (\"1.0rev\", \"1.0.post0\"),\n            (\"1.0.POST1\", \"1.0.post1\"),\n            (\"1.0.r1\", \"1.0.post1\"),\n            (\"1.0.rev1\", \"1.0.post1\"),\n            (\"1.0-POST\", \"1.0.post0\"),\n            (\"1.0-POST1\", \"1.0.post1\"),\n            (\"1.0-5\", \"1.0.post5\"),\n            (\"1.0-r5\", \"1.0.post5\"),\n            (\"1.0-rev5\", \"1.0.post5\"),\n            # Local version case insensitivity\n            (\"1.0+AbC\", \"1.0+abc\"),\n            # Integer Normalization\n            (\"1.01\", \"1.1\"),\n            (\"1.0a05\", \"1.0a5\"),\n            (\"1.0b07\", \"1.0b7\"),\n            (\"1.0c056\", \"1.0rc56\"),\n            (\"1.0rc09\", \"1.0rc9\"),\n            (\"1.0.post000\", \"1.0.post0\"),\n            (\"1.1.dev09000\", \"1.1.dev9000\"),\n            (\"00!1.2\", \"1.2\"),\n            (\"0100!0.0\", \"100!0.0\"),\n            # Various other normalizations\n            (\"v1.0\", \"1.0\"),\n            (\"   v1.0\\t\\n\", \"1.0\"),\n        ],\n    )\n    def test_normalized_versions(self, version, normalized):\n        assert str(Version(version)) == normalized\n\n    @pytest.mark.parametrize(\n        (\"version\", \"expected\"),\n        [\n            (\"1.0.dev456\", \"1.0.dev456\"),\n            (\"1.0a1\", \"1.0a1\"),\n            (\"1.0a2.dev456\", \"1.0a2.dev456\"),\n            (\"1.0a12.dev456\", \"1.0a12.dev456\"),\n            (\"1.0a12\", \"1.0a12\"),\n            (\"1.0b1.dev456\", \"1.0b1.dev456\"),\n            (\"1.0b2\", \"1.0b2\"),\n            (\"1.0b2.post345.dev456\", \"1.0b2.post345.dev456\"),\n            (\"1.0b2.post345\", \"1.0b2.post345\"),\n            (\"1.0rc1.dev456\", \"1.0rc1.dev456\"),\n            (\"1.0rc1\", \"1.0rc1\"),\n            (\"1.0\", \"1.0\"),\n            (\"1.0.post456.dev34\", \"1.0.post456.dev34\"),\n            (\"1.0.post456\", \"1.0.post456\"),\n            (\"1.0.1\", \"1.0.1\"),\n            (\"0!1.0.2\", \"1.0.2\"),\n            (\"1.0.3+7\", \"1.0.3+7\"),\n            (\"0!1.0.4+8.0\", \"1.0.4+8.0\"),\n            (\"1.0.5+9.5\", \"1.0.5+9.5\"),\n            (\"1.2+1234.abc\", \"1.2+1234.abc\"),\n            (\"1.2+123456\", \"1.2+123456\"),\n            (\"1.2+123abc\", \"1.2+123abc\"),\n            (\"1.2+123abc456\", \"1.2+123abc456\"),\n            (\"1.2+abc\", \"1.2+abc\"),\n            (\"1.2+abc123\", \"1.2+abc123\"),\n            (\"1.2+abc123def\", \"1.2+abc123def\"),\n            (\"1.1.dev1\", \"1.1.dev1\"),\n            (\"7!1.0.dev456\", \"7!1.0.dev456\"),\n            (\"7!1.0a1\", \"7!1.0a1\"),\n            (\"7!1.0a2.dev456\", \"7!1.0a2.dev456\"),\n            (\"7!1.0a12.dev456\", \"7!1.0a12.dev456\"),\n            (\"7!1.0a12\", \"7!1.0a12\"),\n            (\"7!1.0b1.dev456\", \"7!1.0b1.dev456\"),\n            (\"7!1.0b2\", \"7!1.0b2\"),\n            (\"7!1.0b2.post345.dev456\", \"7!1.0b2.post345.dev456\"),\n            (\"7!1.0b2.post345\", \"7!1.0b2.post345\"),\n            (\"7!1.0rc1.dev456\", \"7!1.0rc1.dev456\"),\n            (\"7!1.0rc1\", \"7!1.0rc1\"),\n            (\"7!1.0\", \"7!1.0\"),\n            (\"7!1.0.post456.dev34\", \"7!1.0.post456.dev34\"),\n            (\"7!1.0.post456\", \"7!1.0.post456\"),\n            (\"7!1.0.1\", \"7!1.0.1\"),\n            (\"7!1.0.2\", \"7!1.0.2\"),\n            (\"7!1.0.3+7\", \"7!1.0.3+7\"),\n            (\"7!1.0.4+8.0\", \"7!1.0.4+8.0\"),\n            (\"7!1.0.5+9.5\", \"7!1.0.5+9.5\"),\n            (\"7!1.1.dev1\", \"7!1.1.dev1\"),\n        ],\n    )\n    def test_version_str_repr(self, version, expected):\n        assert str(Version(version)) == expected\n        assert repr(Version(version)) == \"<Version({0})>\".format(repr(expected))\n\n    def test_version_rc_and_c_equals(self):\n        assert Version(\"1.0rc1\") == Version(\"1.0c1\")\n\n    @pytest.mark.parametrize(\"version\", VERSIONS)\n    def test_version_hash(self, version):\n        assert hash(Version(version)) == hash(Version(version))\n\n    @pytest.mark.parametrize(\n        (\"version\", \"public\"),\n        [\n            (\"1.0\", \"1.0\"),\n            (\"1.0.dev0\", \"1.0.dev0\"),\n            (\"1.0.dev6\", \"1.0.dev6\"),\n            (\"1.0a1\", \"1.0a1\"),\n            (\"1.0a1.post5\", \"1.0a1.post5\"),\n            (\"1.0a1.post5.dev6\", \"1.0a1.post5.dev6\"),\n            (\"1.0rc4\", \"1.0rc4\"),\n            (\"1.0.post5\", \"1.0.post5\"),\n            (\"1!1.0\", \"1!1.0\"),\n            (\"1!1.0.dev6\", \"1!1.0.dev6\"),\n            (\"1!1.0a1\", \"1!1.0a1\"),\n            (\"1!1.0a1.post5\", \"1!1.0a1.post5\"),\n            (\"1!1.0a1.post5.dev6\", \"1!1.0a1.post5.dev6\"),\n            (\"1!1.0rc4\", \"1!1.0rc4\"),\n            (\"1!1.0.post5\", \"1!1.0.post5\"),\n            (\"1.0+deadbeef\", \"1.0\"),\n            (\"1.0.dev6+deadbeef\", \"1.0.dev6\"),\n            (\"1.0a1+deadbeef\", \"1.0a1\"),\n            (\"1.0a1.post5+deadbeef\", \"1.0a1.post5\"),\n            (\"1.0a1.post5.dev6+deadbeef\", \"1.0a1.post5.dev6\"),\n            (\"1.0rc4+deadbeef\", \"1.0rc4\"),\n            (\"1.0.post5+deadbeef\", \"1.0.post5\"),\n            (\"1!1.0+deadbeef\", \"1!1.0\"),\n            (\"1!1.0.dev6+deadbeef\", \"1!1.0.dev6\"),\n            (\"1!1.0a1+deadbeef\", \"1!1.0a1\"),\n            (\"1!1.0a1.post5+deadbeef\", \"1!1.0a1.post5\"),\n            (\"1!1.0a1.post5.dev6+deadbeef\", \"1!1.0a1.post5.dev6\"),\n            (\"1!1.0rc4+deadbeef\", \"1!1.0rc4\"),\n            (\"1!1.0.post5+deadbeef\", \"1!1.0.post5\"),\n        ],\n    )\n    def test_version_public(self, version, public):\n        assert Version(version).public == public\n\n    @pytest.mark.parametrize(\n        (\"version\", \"base_version\"),\n        [\n            (\"1.0\", \"1.0\"),\n            (\"1.0.dev0\", \"1.0\"),\n            (\"1.0.dev6\", \"1.0\"),\n            (\"1.0a1\", \"1.0\"),\n            (\"1.0a1.post5\", \"1.0\"),\n            (\"1.0a1.post5.dev6\", \"1.0\"),\n            (\"1.0rc4\", \"1.0\"),\n            (\"1.0.post5\", \"1.0\"),\n            (\"1!1.0\", \"1!1.0\"),\n            (\"1!1.0.dev6\", \"1!1.0\"),\n            (\"1!1.0a1\", \"1!1.0\"),\n            (\"1!1.0a1.post5\", \"1!1.0\"),\n            (\"1!1.0a1.post5.dev6\", \"1!1.0\"),\n            (\"1!1.0rc4\", \"1!1.0\"),\n            (\"1!1.0.post5\", \"1!1.0\"),\n            (\"1.0+deadbeef\", \"1.0\"),\n            (\"1.0.dev6+deadbeef\", \"1.0\"),\n            (\"1.0a1+deadbeef\", \"1.0\"),\n            (\"1.0a1.post5+deadbeef\", \"1.0\"),\n            (\"1.0a1.post5.dev6+deadbeef\", \"1.0\"),\n            (\"1.0rc4+deadbeef\", \"1.0\"),\n            (\"1.0.post5+deadbeef\", \"1.0\"),\n            (\"1!1.0+deadbeef\", \"1!1.0\"),\n            (\"1!1.0.dev6+deadbeef\", \"1!1.0\"),\n            (\"1!1.0a1+deadbeef\", \"1!1.0\"),\n            (\"1!1.0a1.post5+deadbeef\", \"1!1.0\"),\n            (\"1!1.0a1.post5.dev6+deadbeef\", \"1!1.0\"),\n            (\"1!1.0rc4+deadbeef\", \"1!1.0\"),\n            (\"1!1.0.post5+deadbeef\", \"1!1.0\"),\n        ],\n    )\n    def test_version_base_version(self, version, base_version):\n        assert Version(version).base_version == base_version\n\n    @pytest.mark.parametrize(\n        (\"version\", \"epoch\"),\n        [\n            (\"1.0\", 0),\n            (\"1.0.dev0\", 0),\n            (\"1.0.dev6\", 0),\n            (\"1.0a1\", 0),\n            (\"1.0a1.post5\", 0),\n            (\"1.0a1.post5.dev6\", 0),\n            (\"1.0rc4\", 0),\n            (\"1.0.post5\", 0),\n            (\"1!1.0\", 1),\n            (\"1!1.0.dev6\", 1),\n            (\"1!1.0a1\", 1),\n            (\"1!1.0a1.post5\", 1),\n            (\"1!1.0a1.post5.dev6\", 1),\n            (\"1!1.0rc4\", 1),\n            (\"1!1.0.post5\", 1),\n            (\"1.0+deadbeef\", 0),\n            (\"1.0.dev6+deadbeef\", 0),\n            (\"1.0a1+deadbeef\", 0),\n            (\"1.0a1.post5+deadbeef\", 0),\n            (\"1.0a1.post5.dev6+deadbeef\", 0),\n            (\"1.0rc4+deadbeef\", 0),\n            (\"1.0.post5+deadbeef\", 0),\n            (\"1!1.0+deadbeef\", 1),\n            (\"1!1.0.dev6+deadbeef\", 1),\n            (\"1!1.0a1+deadbeef\", 1),\n            (\"1!1.0a1.post5+deadbeef\", 1),\n            (\"1!1.0a1.post5.dev6+deadbeef\", 1),\n            (\"1!1.0rc4+deadbeef\", 1),\n            (\"1!1.0.post5+deadbeef\", 1),\n        ],\n    )\n    def test_version_epoch(self, version, epoch):\n        assert Version(version).epoch == epoch\n\n    @pytest.mark.parametrize(\n        (\"version\", \"release\"),\n        [\n            (\"1.0\", (1, 0)),\n            (\"1.0.dev0\", (1, 0)),\n            (\"1.0.dev6\", (1, 0)),\n            (\"1.0a1\", (1, 0)),\n            (\"1.0a1.post5\", (1, 0)),\n            (\"1.0a1.post5.dev6\", (1, 0)),\n            (\"1.0rc4\", (1, 0)),\n            (\"1.0.post5\", (1, 0)),\n            (\"1!1.0\", (1, 0)),\n            (\"1!1.0.dev6\", (1, 0)),\n            (\"1!1.0a1\", (1, 0)),\n            (\"1!1.0a1.post5\", (1, 0)),\n            (\"1!1.0a1.post5.dev6\", (1, 0)),\n            (\"1!1.0rc4\", (1, 0)),\n            (\"1!1.0.post5\", (1, 0)),\n            (\"1.0+deadbeef\", (1, 0)),\n            (\"1.0.dev6+deadbeef\", (1, 0)),\n            (\"1.0a1+deadbeef\", (1, 0)),\n            (\"1.0a1.post5+deadbeef\", (1, 0)),\n            (\"1.0a1.post5.dev6+deadbeef\", (1, 0)),\n            (\"1.0rc4+deadbeef\", (1, 0)),\n            (\"1.0.post5+deadbeef\", (1, 0)),\n            (\"1!1.0+deadbeef\", (1, 0)),\n            (\"1!1.0.dev6+deadbeef\", (1, 0)),\n            (\"1!1.0a1+deadbeef\", (1, 0)),\n            (\"1!1.0a1.post5+deadbeef\", (1, 0)),\n            (\"1!1.0a1.post5.dev6+deadbeef\", (1, 0)),\n            (\"1!1.0rc4+deadbeef\", (1, 0)),\n            (\"1!1.0.post5+deadbeef\", (1, 0)),\n        ],\n    )\n    def test_version_release(self, version, release):\n        assert Version(version).release == release\n\n    @pytest.mark.parametrize(\n        (\"version\", \"local\"),\n        [\n            (\"1.0\", None),\n            (\"1.0.dev0\", None),\n            (\"1.0.dev6\", None),\n            (\"1.0a1\", None),\n            (\"1.0a1.post5\", None),\n            (\"1.0a1.post5.dev6\", None),\n            (\"1.0rc4\", None),\n            (\"1.0.post5\", None),\n            (\"1!1.0\", None),\n            (\"1!1.0.dev6\", None),\n            (\"1!1.0a1\", None),\n            (\"1!1.0a1.post5\", None),\n            (\"1!1.0a1.post5.dev6\", None),\n            (\"1!1.0rc4\", None),\n            (\"1!1.0.post5\", None),\n            (\"1.0+deadbeef\", \"deadbeef\"),\n            (\"1.0.dev6+deadbeef\", \"deadbeef\"),\n            (\"1.0a1+deadbeef\", \"deadbeef\"),\n            (\"1.0a1.post5+deadbeef\", \"deadbeef\"),\n            (\"1.0a1.post5.dev6+deadbeef\", \"deadbeef\"),\n            (\"1.0rc4+deadbeef\", \"deadbeef\"),\n            (\"1.0.post5+deadbeef\", \"deadbeef\"),\n            (\"1!1.0+deadbeef\", \"deadbeef\"),\n            (\"1!1.0.dev6+deadbeef\", \"deadbeef\"),\n            (\"1!1.0a1+deadbeef\", \"deadbeef\"),\n            (\"1!1.0a1.post5+deadbeef\", \"deadbeef\"),\n            (\"1!1.0a1.post5.dev6+deadbeef\", \"deadbeef\"),\n            (\"1!1.0rc4+deadbeef\", \"deadbeef\"),\n            (\"1!1.0.post5+deadbeef\", \"deadbeef\"),\n        ],\n    )\n    def test_version_local(self, version, local):\n        assert Version(version).local == local\n\n    @pytest.mark.parametrize(\n        (\"version\", \"pre\"),\n        [\n            (\"1.0\", None),\n            (\"1.0.dev0\", None),\n            (\"1.0.dev6\", None),\n            (\"1.0a1\", (\"a\", 1)),\n            (\"1.0a1.post5\", (\"a\", 1)),\n            (\"1.0a1.post5.dev6\", (\"a\", 1)),\n            (\"1.0rc4\", (\"rc\", 4)),\n            (\"1.0.post5\", None),\n            (\"1!1.0\", None),\n            (\"1!1.0.dev6\", None),\n            (\"1!1.0a1\", (\"a\", 1)),\n            (\"1!1.0a1.post5\", (\"a\", 1)),\n            (\"1!1.0a1.post5.dev6\", (\"a\", 1)),\n            (\"1!1.0rc4\", (\"rc\", 4)),\n            (\"1!1.0.post5\", None),\n            (\"1.0+deadbeef\", None),\n            (\"1.0.dev6+deadbeef\", None),\n            (\"1.0a1+deadbeef\", (\"a\", 1)),\n            (\"1.0a1.post5+deadbeef\", (\"a\", 1)),\n            (\"1.0a1.post5.dev6+deadbeef\", (\"a\", 1)),\n            (\"1.0rc4+deadbeef\", (\"rc\", 4)),\n            (\"1.0.post5+deadbeef\", None),\n            (\"1!1.0+deadbeef\", None),\n            (\"1!1.0.dev6+deadbeef\", None),\n            (\"1!1.0a1+deadbeef\", (\"a\", 1)),\n            (\"1!1.0a1.post5+deadbeef\", (\"a\", 1)),\n            (\"1!1.0a1.post5.dev6+deadbeef\", (\"a\", 1)),\n            (\"1!1.0rc4+deadbeef\", (\"rc\", 4)),\n            (\"1!1.0.post5+deadbeef\", None),\n        ],\n    )\n    def test_version_pre(self, version, pre):\n        assert Version(version).pre == pre\n\n    @pytest.mark.parametrize(\n        (\"version\", \"expected\"),\n        [\n            (\"1.0.dev0\", True),\n            (\"1.0.dev1\", True),\n            (\"1.0a1.dev1\", True),\n            (\"1.0b1.dev1\", True),\n            (\"1.0c1.dev1\", True),\n            (\"1.0rc1.dev1\", True),\n            (\"1.0a1\", True),\n            (\"1.0b1\", True),\n            (\"1.0c1\", True),\n            (\"1.0rc1\", True),\n            (\"1.0a1.post1.dev1\", True),\n            (\"1.0b1.post1.dev1\", True),\n            (\"1.0c1.post1.dev1\", True),\n            (\"1.0rc1.post1.dev1\", True),\n            (\"1.0a1.post1\", True),\n            (\"1.0b1.post1\", True),\n            (\"1.0c1.post1\", True),\n            (\"1.0rc1.post1\", True),\n            (\"1.0\", False),\n            (\"1.0+dev\", False),\n            (\"1.0.post1\", False),\n            (\"1.0.post1+dev\", False),\n        ],\n    )\n    def test_version_is_prerelease(self, version, expected):\n        assert Version(version).is_prerelease is expected\n\n    @pytest.mark.parametrize(\n        (\"version\", \"dev\"),\n        [\n            (\"1.0\", None),\n            (\"1.0.dev0\", 0),\n            (\"1.0.dev6\", 6),\n            (\"1.0a1\", None),\n            (\"1.0a1.post5\", None),\n            (\"1.0a1.post5.dev6\", 6),\n            (\"1.0rc4\", None),\n            (\"1.0.post5\", None),\n            (\"1!1.0\", None),\n            (\"1!1.0.dev6\", 6),\n            (\"1!1.0a1\", None),\n            (\"1!1.0a1.post5\", None),\n            (\"1!1.0a1.post5.dev6\", 6),\n            (\"1!1.0rc4\", None),\n            (\"1!1.0.post5\", None),\n            (\"1.0+deadbeef\", None),\n            (\"1.0.dev6+deadbeef\", 6),\n            (\"1.0a1+deadbeef\", None),\n            (\"1.0a1.post5+deadbeef\", None),\n            (\"1.0a1.post5.dev6+deadbeef\", 6),\n            (\"1.0rc4+deadbeef\", None),\n            (\"1.0.post5+deadbeef\", None),\n            (\"1!1.0+deadbeef\", None),\n            (\"1!1.0.dev6+deadbeef\", 6),\n            (\"1!1.0a1+deadbeef\", None),\n            (\"1!1.0a1.post5+deadbeef\", None),\n            (\"1!1.0a1.post5.dev6+deadbeef\", 6),\n            (\"1!1.0rc4+deadbeef\", None),\n            (\"1!1.0.post5+deadbeef\", None),\n        ],\n    )\n    def test_version_dev(self, version, dev):\n        assert Version(version).dev == dev\n\n    @pytest.mark.parametrize(\n        (\"version\", \"expected\"),\n        [\n            (\"1.0\", False),\n            (\"1.0.dev0\", True),\n            (\"1.0.dev6\", True),\n            (\"1.0a1\", False),\n            (\"1.0a1.post5\", False),\n            (\"1.0a1.post5.dev6\", True),\n            (\"1.0rc4\", False),\n            (\"1.0.post5\", False),\n            (\"1!1.0\", False),\n            (\"1!1.0.dev6\", True),\n            (\"1!1.0a1\", False),\n            (\"1!1.0a1.post5\", False),\n            (\"1!1.0a1.post5.dev6\", True),\n            (\"1!1.0rc4\", False),\n            (\"1!1.0.post5\", False),\n            (\"1.0+deadbeef\", False),\n            (\"1.0.dev6+deadbeef\", True),\n            (\"1.0a1+deadbeef\", False),\n            (\"1.0a1.post5+deadbeef\", False),\n            (\"1.0a1.post5.dev6+deadbeef\", True),\n            (\"1.0rc4+deadbeef\", False),\n            (\"1.0.post5+deadbeef\", False),\n            (\"1!1.0+deadbeef\", False),\n            (\"1!1.0.dev6+deadbeef\", True),\n            (\"1!1.0a1+deadbeef\", False),\n            (\"1!1.0a1.post5+deadbeef\", False),\n            (\"1!1.0a1.post5.dev6+deadbeef\", True),\n            (\"1!1.0rc4+deadbeef\", False),\n            (\"1!1.0.post5+deadbeef\", False),\n        ],\n    )\n    def test_version_is_devrelease(self, version, expected):\n        assert Version(version).is_devrelease is expected\n\n    @pytest.mark.parametrize(\n        (\"version\", \"post\"),\n        [\n            (\"1.0\", None),\n            (\"1.0.dev0\", None),\n            (\"1.0.dev6\", None),\n            (\"1.0a1\", None),\n            (\"1.0a1.post5\", 5),\n            (\"1.0a1.post5.dev6\", 5),\n            (\"1.0rc4\", None),\n            (\"1.0.post5\", 5),\n            (\"1!1.0\", None),\n            (\"1!1.0.dev6\", None),\n            (\"1!1.0a1\", None),\n            (\"1!1.0a1.post5\", 5),\n            (\"1!1.0a1.post5.dev6\", 5),\n            (\"1!1.0rc4\", None),\n            (\"1!1.0.post5\", 5),\n            (\"1.0+deadbeef\", None),\n            (\"1.0.dev6+deadbeef\", None),\n            (\"1.0a1+deadbeef\", None),\n            (\"1.0a1.post5+deadbeef\", 5),\n            (\"1.0a1.post5.dev6+deadbeef\", 5),\n            (\"1.0rc4+deadbeef\", None),\n            (\"1.0.post5+deadbeef\", 5),\n            (\"1!1.0+deadbeef\", None),\n            (\"1!1.0.dev6+deadbeef\", None),\n            (\"1!1.0a1+deadbeef\", None),\n            (\"1!1.0a1.post5+deadbeef\", 5),\n            (\"1!1.0a1.post5.dev6+deadbeef\", 5),\n            (\"1!1.0rc4+deadbeef\", None),\n            (\"1!1.0.post5+deadbeef\", 5),\n        ],\n    )\n    def test_version_post(self, version, post):\n        assert Version(version).post == post\n\n    @pytest.mark.parametrize(\n        (\"version\", \"expected\"),\n        [\n            (\"1.0.dev1\", False),\n            (\"1.0\", False),\n            (\"1.0+foo\", False),\n            (\"1.0.post1.dev1\", True),\n            (\"1.0.post1\", True),\n        ],\n    )\n    def test_version_is_postrelease(self, version, expected):\n        assert Version(version).is_postrelease is expected\n\n    @pytest.mark.parametrize(\n        (\"left\", \"right\", \"op\"),\n        # Below we'll generate every possible combination of VERSIONS that\n        # should be True for the given operator\n        itertools.chain(\n            *\n            # Verify that the less than (<) operator works correctly\n            [\n                [(x, y, operator.lt) for y in VERSIONS[i + 1 :]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the less than equal (<=) operator works correctly\n            [\n                [(x, y, operator.le) for y in VERSIONS[i:]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the equal (==) operator works correctly\n            [[(x, x, operator.eq) for x in VERSIONS]]\n            +\n            # Verify that the not equal (!=) operator works correctly\n            [\n                [(x, y, operator.ne) for j, y in enumerate(VERSIONS) if i != j]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the greater than equal (>=) operator works correctly\n            [\n                [(x, y, operator.ge) for y in VERSIONS[: i + 1]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the greater than (>) operator works correctly\n            [\n                [(x, y, operator.gt) for y in VERSIONS[:i]]\n                for i, x in enumerate(VERSIONS)\n            ]\n        ),\n    )\n    def test_comparison_true(self, left, right, op):\n        assert op(Version(left), Version(right))\n\n    @pytest.mark.parametrize(\n        (\"left\", \"right\", \"op\"),\n        # Below we'll generate every possible combination of VERSIONS that\n        # should be False for the given operator\n        itertools.chain(\n            *\n            # Verify that the less than (<) operator works correctly\n            [\n                [(x, y, operator.lt) for y in VERSIONS[: i + 1]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the less than equal (<=) operator works correctly\n            [\n                [(x, y, operator.le) for y in VERSIONS[:i]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the equal (==) operator works correctly\n            [\n                [(x, y, operator.eq) for j, y in enumerate(VERSIONS) if i != j]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the not equal (!=) operator works correctly\n            [[(x, x, operator.ne) for x in VERSIONS]]\n            +\n            # Verify that the greater than equal (>=) operator works correctly\n            [\n                [(x, y, operator.ge) for y in VERSIONS[i + 1 :]]\n                for i, x in enumerate(VERSIONS)\n            ]\n            +\n            # Verify that the greater than (>) operator works correctly\n            [\n                [(x, y, operator.gt) for y in VERSIONS[i:]]\n                for i, x in enumerate(VERSIONS)\n            ]\n        ),\n    )\n    def test_comparison_false(self, left, right, op):\n        assert not op(Version(left), Version(right))\n\n    @pytest.mark.parametrize((\"op\", \"expected\"), [(\"eq\", False), (\"ne\", True)])\n    def test_compare_other(self, monkeypatch, op, expected):\n        other = pretend.stub(**{\"__{0}__\".format(op): lambda other: NotImplemented})\n\n        assert getattr(operator, op)(Version(\"1\"), other) is expected\n\n    def test_major_version(self):\n        assert Version(\"2.1.0\").major == 2\n\n    def test_minor_version(self):\n        assert Version(\"2.1.0\").minor == 1\n        assert Version(\"2\").minor == 0\n\n    def test_micro_version(self):\n        assert Version(\"2.1.3\").micro == 3\n        assert Version(\"2.1\").micro == 0\n        assert Version(\"2\").micro == 0\n"
  },
  {
    "path": "tests/test_visualizations.py",
    "content": "import pytest\nimport numpy as np\n\n\ndef verify_out(capfd, expected):\n    out, _ = capfd.readouterr()\n    print(out)\n    assert expected == out\n\n\ndef test_flat_merge_graph(capfd):\n    from hangar.diagnostics import Graph\n\n    flat_log_contents = {\n        'head': '3c9530ac0da1106c0acbe1201900c51548bbcdd9',\n        'ancestors': {\n            '0ff3f2ec156ab8e1026b5271630ccae4556cc260': [''],\n            '3c9530ac0da1106c0acbe1201900c51548bbcdd9': ['fed88489ab6e59913aee935169b15fe68755d82c'],\n            'fed88489ab6e59913aee935169b15fe68755d82c': ['0ff3f2ec156ab8e1026b5271630ccae4556cc260']},\n        'specs': {\n            '0ff3f2ec156ab8e1026b5271630ccae4556cc260': {\n                'commit_message': 'first commit adding training images and labels',\n                'commit_time': 1562203787.257128, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            '3c9530ac0da1106c0acbe1201900c51548bbcdd9': {\n                'commit_message': 'added testing labels only',\n                'commit_time': 1562203787.388417, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            'fed88489ab6e59913aee935169b15fe68755d82c': {\n                'commit_message': 'added testing images only',\n                'commit_time': 1562203787.372292, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'}},\n        'order': ['3c9530ac0da1106c0acbe1201900c51548bbcdd9',\n                'fed88489ab6e59913aee935169b15fe68755d82c',\n                '0ff3f2ec156ab8e1026b5271630ccae4556cc260']}\n\n    flat_hash_branch_map = {\n        '3c9530ac0da1106c0acbe1201900c51548bbcdd9': ['add-test'],\n        '0ff3f2ec156ab8e1026b5271630ccae4556cc260': ['untouched-live-demo-branch']}\n\n    g = Graph(use_color=False)\n    g.show_nodes(\n        dag=flat_log_contents['ancestors'],\n        spec=flat_log_contents['specs'],\n        branch=flat_hash_branch_map,\n        start=flat_log_contents['head'],\n        order=flat_log_contents['order'],\n        show_time=False,\n        show_user=False)\n\n    expected = '* 3c9530ac0da1106c0acbe1201900c51548bbcdd9 (add-test) : added testing labels only\\n'\\\n               '* fed88489ab6e59913aee935169b15fe68755d82c : added testing images only\\n'\\\n               '* 0ff3f2ec156ab8e1026b5271630ccae4556cc260 (untouched-live-demo-branch) : first commit adding training images and labels\\n'\n\n    verify_out(capfd, expected)\n\n\ndef test_three_way_merge_graph(capfd):\n    from hangar.diagnostics import Graph\n\n    three_way_log_contents = {\n        'head': '074f81d6b9fa5fa856175d47c7cc95cc4a839965',\n        'ancestors': {\n            '074f81d6b9fa5fa856175d47c7cc95cc4a839965': ['e5ea58dd9c7ffacd45fb128ddc00aced08d13889', '3c9530ac0da1106c0acbe1201900c51548bbcdd9'],\n            'e5ea58dd9c7ffacd45fb128ddc00aced08d13889': ['0ff3f2ec156ab8e1026b5271630ccae4556cc260'],\n            '0ff3f2ec156ab8e1026b5271630ccae4556cc260': [''],\n            '3c9530ac0da1106c0acbe1201900c51548bbcdd9': ['fed88489ab6e59913aee935169b15fe68755d82c'],\n            'fed88489ab6e59913aee935169b15fe68755d82c': ['0ff3f2ec156ab8e1026b5271630ccae4556cc260']},\n        'specs': {\n            '074f81d6b9fa5fa856175d47c7cc95cc4a839965': {\n                'commit_message': 'adding in the new testing columns',\n                'commit_time': 1562203830.775428, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            'e5ea58dd9c7ffacd45fb128ddc00aced08d13889': {\n                'commit_message': 'commit adding validation images and labels',\n                'commit_time': 1562203787.320624, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            '0ff3f2ec156ab8e1026b5271630ccae4556cc260': {\n                'commit_message': 'first commit adding training images and labels',\n                'commit_time': 1562203787.257128, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            '3c9530ac0da1106c0acbe1201900c51548bbcdd9': {\n                'commit_message': 'added testing labels only',\n                'commit_time': 1562203787.388417, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'},\n            'fed88489ab6e59913aee935169b15fe68755d82c': {\n                'commit_message': 'added testing images only',\n                'commit_time': 1562203787.372292, 'commit_user': 'Foo User', 'commit_email': 'foo@bar.com'}},\n        'order': ['074f81d6b9fa5fa856175d47c7cc95cc4a839965',\n                '3c9530ac0da1106c0acbe1201900c51548bbcdd9',\n                'fed88489ab6e59913aee935169b15fe68755d82c',\n                'e5ea58dd9c7ffacd45fb128ddc00aced08d13889',\n                '0ff3f2ec156ab8e1026b5271630ccae4556cc260']}\n\n    three_way_hash_branch_map = {\n        '3c9530ac0da1106c0acbe1201900c51548bbcdd9': ['add-test'],\n        'e5ea58dd9c7ffacd45fb128ddc00aced08d13889': ['add-validation'],\n        '074f81d6b9fa5fa856175d47c7cc95cc4a839965': ['master'],\n        '0ff3f2ec156ab8e1026b5271630ccae4556cc260': ['untouched-live-demo-branch']}\n\n    g = Graph(use_color=False)\n    g.show_nodes(\n        dag=three_way_log_contents['ancestors'],\n        spec=three_way_log_contents['specs'],\n        branch=three_way_hash_branch_map,\n        start=three_way_log_contents['head'],\n        order=three_way_log_contents['order'],\n        show_time=False,\n        show_user=False)\n\n    real = '*   074f81d6b9fa5fa856175d47c7cc95cc4a839965 (master) : adding in the new testing columns\\n'\\\n           '|\\\\  \\n'\\\n           '| * 3c9530ac0da1106c0acbe1201900c51548bbcdd9 (add-test) : added testing labels only\\n'\\\n           '| * fed88489ab6e59913aee935169b15fe68755d82c : added testing images only\\n'\\\n           '* | e5ea58dd9c7ffacd45fb128ddc00aced08d13889 (add-validation) : commit adding validation images and labels\\n'\\\n           '|/  \\n'\\\n           '* 0ff3f2ec156ab8e1026b5271630ccae4556cc260 (untouched-live-demo-branch) : first commit adding training images and labels\\n'\n\n    verify_out(capfd, real)\n\n\ndef test_octopus_merge_graph(capfd):\n    from hangar.diagnostics import Graph\n\n    octopus_log_contents = {\n        'head': '05ad17beab54ede8d7f9214c5c6ae44509c3da97',\n        'ancestors': {\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97': ['b9c7da873c06c730f52bad5808df5312c4cc0a38', '1b49223ae5e731da3750e4836d14565dbe504f18'],\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38': ['a74236e598b96dcde10b176921eb58bb4a9c64bf', 'c4d6875caeff83a29413ae163dbcfdc3c57ad373'],\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf': ['9152a4578f74b36838f8187e43c8644b1eba47b5'],\n            '9152a4578f74b36838f8187e43c8644b1eba47b5': ['ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d', '21f274d31abc09ede4ad6753f079297885b02a09'],\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d': ['e9ca97e336496b1fceb75869adf0294af5635922'],\n            'e9ca97e336496b1fceb75869adf0294af5635922': ['489bceb38246f27cae2a0f47eba0e488d95618db'],\n            '489bceb38246f27cae2a0f47eba0e488d95618db': ['17286961175c5cbbd4381fef07cc0a20920a5ce6'],\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6': ['63ac654df43bd149a1ca5f919e714bc57e69af99'],\n            '63ac654df43bd149a1ca5f919e714bc57e69af99': [''],\n            '1b49223ae5e731da3750e4836d14565dbe504f18': ['9152a4578f74b36838f8187e43c8644b1eba47b5'],\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373': ['63ac654df43bd149a1ca5f919e714bc57e69af99'],\n            '21f274d31abc09ede4ad6753f079297885b02a09': ['5c0ea20c6513f135f0131d9e10d86801ded29537'],\n            '5c0ea20c6513f135f0131d9e10d86801ded29537': ['10e84be056afb2ace6b7ba044ce1e9c9811eae4f'],\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f': ['e9ca97e336496b1fceb75869adf0294af5635922']},\n        'specs': {\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97': {'commit_message': 'try number two',\n                'commit_time': 1562363265.6635652, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38': {'commit_message': 'merging the long running branch into master',\n                'commit_time': 1562363265.652887, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf': {'commit_message': 'another on master',\n                'commit_time': 1562363265.6346502, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '9152a4578f74b36838f8187e43c8644b1eba47b5': {'commit_message': 'this is the first merge',\n                'commit_time': 1562363265.578071, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d': {'commit_message': 'third commit on master',\n                'commit_time': 1562363265.4683158, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'e9ca97e336496b1fceb75869adf0294af5635922': {'commit_message': 'second commit on master with training labels',\n                'commit_time': 1562363265.398268, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '489bceb38246f27cae2a0f47eba0e488d95618db': {'commit_message': 'second',\n                'commit_time': 1562363264.7388191, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6': {'commit_message': 'hi',\n                'commit_time': 1562363264.735318, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '63ac654df43bd149a1ca5f919e714bc57e69af99': {'commit_message': 'initial commit on master with training images',\n                'commit_time': 1562363264.731286, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '1b49223ae5e731da3750e4836d14565dbe504f18': {'commit_message': 'another on try delete',\n                'commit_time': 1562363265.642503, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373': {'commit_message': 'first commit on the large branch',\n                'commit_time': 1562363265.374819, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '21f274d31abc09ede4ad6753f079297885b02a09': {'commit_message': 'another commit on test banch after adding to new_set',\n                'commit_time': 1562363265.56455, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '5c0ea20c6513f135f0131d9e10d86801ded29537': {'commit_message': 'second commit on test branch with new aset',\n                'commit_time': 1562363265.545484, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f': {'commit_message': 'first commit on test branch',\n                'commit_time': 1562363265.524131, 'commit_user': 'test user', 'commit_email': 'test@email.com'}},\n        'order': [\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97',\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38',\n            '1b49223ae5e731da3750e4836d14565dbe504f18',\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf',\n            '9152a4578f74b36838f8187e43c8644b1eba47b5',\n            '21f274d31abc09ede4ad6753f079297885b02a09',\n            '5c0ea20c6513f135f0131d9e10d86801ded29537',\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f',\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d',\n            'e9ca97e336496b1fceb75869adf0294af5635922',\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373',\n            '489bceb38246f27cae2a0f47eba0e488d95618db',\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6',\n            '63ac654df43bd149a1ca5f919e714bc57e69af99']\n        }\n\n    octopus_hash_branch_map = {\n        'c4d6875caeff83a29413ae163dbcfdc3c57ad373': ['large_branch'],\n        '21f274d31abc09ede4ad6753f079297885b02a09': ['test_branch'],\n        '05ad17beab54ede8d7f9214c5c6ae44509c3da97': ['master'],\n        '1b49223ae5e731da3750e4836d14565dbe504f18': ['trydelete']}\n\n    g = Graph(use_color=False)\n    g.show_nodes(\n        dag=octopus_log_contents['ancestors'],\n        spec=octopus_log_contents['specs'],\n        branch=octopus_hash_branch_map,\n        start=octopus_log_contents['head'],\n        order=octopus_log_contents['order'],\n        show_time=False,\n        show_user=False)\n\n    real = '*   05ad17beab54ede8d7f9214c5c6ae44509c3da97 (master) : try number two\\n'\\\n           '|\\\\  \\n'\\\n           '* \\\\   b9c7da873c06c730f52bad5808df5312c4cc0a38 : merging the long running branch into master\\n'\\\n           '|\\\\ \\\\  \\n'\\\n           '| | * 1b49223ae5e731da3750e4836d14565dbe504f18 (trydelete) : another on try delete\\n'\\\n           '* | | a74236e598b96dcde10b176921eb58bb4a9c64bf : another on master\\n'\\\n           '| |/  \\n'\\\n           '|/|   \\n'\\\n           '* |   9152a4578f74b36838f8187e43c8644b1eba47b5 : this is the first merge\\n'\\\n           '|\\\\ \\\\  \\n'\\\n           '| * | 21f274d31abc09ede4ad6753f079297885b02a09 (test_branch) : another commit on test banch after adding to new_set\\n'\\\n           '| * | 5c0ea20c6513f135f0131d9e10d86801ded29537 : second commit on test branch with new aset\\n'\\\n           '| * | 10e84be056afb2ace6b7ba044ce1e9c9811eae4f : first commit on test branch\\n'\\\n           '* | | ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d : third commit on master\\n'\\\n           '|/ /  \\n'\\\n           '* | e9ca97e336496b1fceb75869adf0294af5635922 : second commit on master with training labels\\n'\\\n           '| * c4d6875caeff83a29413ae163dbcfdc3c57ad373 (large_branch) : first commit on the large branch\\n'\\\n           '* | 489bceb38246f27cae2a0f47eba0e488d95618db : second\\n'\\\n           '* | 17286961175c5cbbd4381fef07cc0a20920a5ce6 : hi\\n'\\\n           '|/  \\n'\\\n           '* 63ac654df43bd149a1ca5f919e714bc57e69af99 : initial commit on master with training images\\n'\n\n    verify_out(capfd, real)\n\n\ndef test_octopus_large_merge_graph(capfd):\n    from hangar.diagnostics import Graph\n\n    octopus_log_contents = {\n        'head': 'ddeeff',\n        'ancestors': {\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97': ['b9c7da873c06c730f52bad5808df5312c4cc0a38', '1b49223ae5e731da3750e4836d14565dbe504f18'],\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38': ['a74236e598b96dcde10b176921eb58bb4a9c64bf', 'c4d6875caeff83a29413ae163dbcfdc3c57ad373', 'e9ca97e336496b1fceb75869adf0294af5635922'],\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf': ['9152a4578f74b36838f8187e43c8644b1eba47b5', '21f274d31abc09ede4ad6753f079297885b02a09'],\n            '9152a4578f74b36838f8187e43c8644b1eba47b5': ['ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d', '21f274d31abc09ede4ad6753f079297885b02a09'],\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d': ['e9ca97e336496b1fceb75869adf0294af5635922', 'c4d6875caeff83a29413ae163dbcfdc3c57ad373'],\n            'e9ca97e336496b1fceb75869adf0294af5635922': ['489bceb38246f27cae2a0f47eba0e488d95618db'],\n            '489bceb38246f27cae2a0f47eba0e488d95618db': ['17286961175c5cbbd4381fef07cc0a20920a5ce6'],\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6': ['63ac654df43bd149a1ca5f919e714bc57e69af99'],\n            '63ac654df43bd149a1ca5f919e714bc57e69af99': [''],\n            '1b49223ae5e731da3750e4836d14565dbe504f18': ['9152a4578f74b36838f8187e43c8644b1eba47b5', 'a74236e598b96dcde10b176921eb58bb4a9c64bf'],\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373': ['63ac654df43bd149a1ca5f919e714bc57e69af99'],\n            '21f274d31abc09ede4ad6753f079297885b02a09': ['5c0ea20c6513f135f0131d9e10d86801ded29537'],\n            '5c0ea20c6513f135f0131d9e10d86801ded29537': ['10e84be056afb2ace6b7ba044ce1e9c9811eae4f', 'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d'],\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f': ['e9ca97e336496b1fceb75869adf0294af5635922', 'c4d6875caeff83a29413ae163dbcfdc3c57ad373'],\n            'aabbcc': ['9152a4578f74b36838f8187e43c8644b1eba47b5', '5c0ea20c6513f135f0131d9e10d86801ded29537'],\n            'ddeeff': ['aabbcc', '05ad17beab54ede8d7f9214c5c6ae44509c3da97'],\n        },\n        'specs': {\n            'ddeeff': {'commit_message': 'new master',\n                'commit_time': 1562363266.6635652, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97': {'commit_message': 'try number two',\n                'commit_time': 1562363265.6635652, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38': {'commit_message': 'merging the long running branch into master',\n                'commit_time': 1562363265.652887, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf': {'commit_message': 'another on master',\n                'commit_time': 1562363265.6346502, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '9152a4578f74b36838f8187e43c8644b1eba47b5': {'commit_message': 'this is the first merge',\n                'commit_time': 1562363265.578071, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d': {'commit_message': 'third commit on master',\n                'commit_time': 1562363265.4683158, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'e9ca97e336496b1fceb75869adf0294af5635922': {'commit_message': 'second commit on master with training labels',\n                'commit_time': 1562363265.398268, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '489bceb38246f27cae2a0f47eba0e488d95618db': {'commit_message': 'second',\n                'commit_time': 1562363264.7388191, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6': {'commit_message': 'hi',\n                'commit_time': 1562363264.735318, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '63ac654df43bd149a1ca5f919e714bc57e69af99': {'commit_message': 'initial commit on master with training images',\n                'commit_time': 1562363264.731286, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '1b49223ae5e731da3750e4836d14565dbe504f18': {'commit_message': 'another on try delete',\n                'commit_time': 1562363265.642503, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'aabbcc': {'commit_message': 'made up b',\n                'commit_time': 1562363265.640021, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373': {'commit_message': 'first commit on the large branch',\n                'commit_time': 1562363265.374819, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '21f274d31abc09ede4ad6753f079297885b02a09': {'commit_message': 'another commit on test banch after adding to new_set',\n                'commit_time': 1562363265.56455, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '5c0ea20c6513f135f0131d9e10d86801ded29537': {'commit_message': 'second commit on test branch with new aset',\n                'commit_time': 1562363265.545484, 'commit_user': 'test user', 'commit_email': 'test@email.com'},\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f': {'commit_message': 'first commit on test branch',\n                'commit_time': 1562363265.524131, 'commit_user': 'test user', 'commit_email': 'test@email.com'}},\n        'order': [\n            'ddeeff',\n            '05ad17beab54ede8d7f9214c5c6ae44509c3da97',\n            'b9c7da873c06c730f52bad5808df5312c4cc0a38',\n            '1b49223ae5e731da3750e4836d14565dbe504f18',\n            'aabbcc',\n            'a74236e598b96dcde10b176921eb58bb4a9c64bf',\n            '9152a4578f74b36838f8187e43c8644b1eba47b5',\n            '21f274d31abc09ede4ad6753f079297885b02a09',\n            '5c0ea20c6513f135f0131d9e10d86801ded29537',\n            '10e84be056afb2ace6b7ba044ce1e9c9811eae4f',\n            'ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d',\n            'e9ca97e336496b1fceb75869adf0294af5635922',\n            'c4d6875caeff83a29413ae163dbcfdc3c57ad373',\n            '489bceb38246f27cae2a0f47eba0e488d95618db',\n            '17286961175c5cbbd4381fef07cc0a20920a5ce6',\n            '63ac654df43bd149a1ca5f919e714bc57e69af99']\n        }\n\n    octopus_hash_branch_map = {\n        'c4d6875caeff83a29413ae163dbcfdc3c57ad373': ['large_branch'],\n        '21f274d31abc09ede4ad6753f079297885b02a09': ['test_branch'],\n        'ddeeff': ['master'],\n        '1b49223ae5e731da3750e4836d14565dbe504f18': ['trydelete'],\n        'aabbcc': ['madeupbranch']\n    }\n\n    g = Graph(use_color=False)\n    g.show_nodes(\n        dag=octopus_log_contents['ancestors'],\n        spec=octopus_log_contents['specs'],\n        branch=octopus_hash_branch_map,\n        start=octopus_log_contents['head'],\n        order=octopus_log_contents['order'],\n        show_time=True,\n        show_user=True)\n\n    real = '*   ddeeff (master) (05Jul2019 21:47:46)(test user): new master\\n'\\\n           '|\\\\  \\n'\\\n           '| *   05ad17beab54ede8d7f9214c5c6ae44509c3da97 (05Jul2019 21:47:45)(test user): try number two\\n'\\\n           '| |\\\\  \\n'\\\n           '| | \\\\     \\n'\\\n           '| |  \\\\    \\n'\\\n           '| *-. \\\\   b9c7da873c06c730f52bad5808df5312c4cc0a38 (05Jul2019 21:47:45)(test user): merging the long running branch into master\\n'\\\n           '| |\\\\ \\\\ \\\\  \\n'\\\n           '| | | | *   1b49223ae5e731da3750e4836d14565dbe504f18 (trydelete) (05Jul2019 21:47:45)(test user): another on try delete\\n'\\\n           '| | | | |\\\\  \\n'\\\n           '| | |_|_|/  \\n'\\\n           '| |/| | |   \\n'\\\n           '* | | | |   aabbcc (madeupbranch) (05Jul2019 21:47:45)(test user): made up b\\n'\\\n           '|\\\\ \\\\ \\\\ \\\\ \\\\  \\n'\\\n           '| |_|_|_|/  \\n'\\\n           '|/| | | |   \\n'\\\n           '| | * | |   a74236e598b96dcde10b176921eb58bb4a9c64bf (05Jul2019 21:47:45)(test user): another on master\\n'\\\n           '| | |\\\\ \\\\ \\\\  \\n'\\\n           '| |/ / / /  \\n'\\\n           '|/| | | |   \\n'\\\n           '* | | | |   9152a4578f74b36838f8187e43c8644b1eba47b5 (05Jul2019 21:47:45)(test user): this is the first merge\\n'\\\n           '|\\\\ \\\\ \\\\ \\\\ \\\\  \\n'\\\n           '| | |/ / /  \\n'\\\n           '| |/| | |   \\n'\\\n           '| * | | | 21f274d31abc09ede4ad6753f079297885b02a09 (test_branch) (05Jul2019 21:47:45)(test user): another commit on test banch after adding to new_set\\n'\\\n           '| |/ / /  \\n'\\\n           '| * | |   5c0ea20c6513f135f0131d9e10d86801ded29537 (05Jul2019 21:47:45)(test user): second commit on test branch with new aset\\n'\\\n           '| |\\\\ \\\\ \\\\  \\n'\\\n           '| |/ / /  \\n'\\\n           '|/| | |   \\n'\\\n           '| * | |   10e84be056afb2ace6b7ba044ce1e9c9811eae4f (05Jul2019 21:47:45)(test user): first commit on test branch\\n'\\\n           '| |\\\\ \\\\ \\\\  \\n'\\\n           '| | |/ /  \\n'\\\n           '| | | /   \\n'\\\n           '| | |/    \\n'\\\n           '| |/|     \\n'\\\n           '* | |   ef7b6e5bcaaebf62b9e02902ff60eb7862c3472d (05Jul2019 21:47:45)(test user): third commit on master\\n'\\\n           '|\\\\ \\\\ \\\\  \\n'\\\n           '| |/ /  \\n'\\\n           '|/| /   \\n'\\\n           '| |/    \\n'\\\n           '* | e9ca97e336496b1fceb75869adf0294af5635922 (05Jul2019 21:47:45)(test user): second commit on master with training labels\\n'\\\n           '| * c4d6875caeff83a29413ae163dbcfdc3c57ad373 (large_branch) (05Jul2019 21:47:45)(test user): first commit on the large branch\\n'\\\n           '* | 489bceb38246f27cae2a0f47eba0e488d95618db (05Jul2019 21:47:44)(test user): second\\n'\\\n           '* | 17286961175c5cbbd4381fef07cc0a20920a5ce6 (05Jul2019 21:47:44)(test user): hi\\n'\\\n           '|/  \\n'\\\n           '* 63ac654df43bd149a1ca5f919e714bc57e69af99 (05Jul2019 21:47:44)(test user): initial commit on master with training images\\n'\n\n    verify_out(capfd, real)\n\n\ndef test_repo_log_return_contents_correct_default_args(repo):\n\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co['test_meta']['foo'] = 'bar'\n    ancestor_digest = co.commit('first')\n    co['test_meta']['hello'] = 'world'\n    master_head = co.commit('second')\n    co.close()\n\n    ancestor_branch = repo.create_branch('ancestor', base_commit=ancestor_digest)\n    dev_branch = repo.create_branch('dev', base_commit=ancestor_digest)\n\n    co = repo.checkout(write=True, branch=dev_branch.name)\n    co['test_meta']['zen'] = 'of python'\n    dev_head = co.commit('third on test')\n    co.close()\n\n    log = repo.log(return_contents=True)\n\n    assert log['head'] == dev_head\n\n    expected_ancestors = {\n        dev_head: [ancestor_digest],\n        ancestor_digest: [''],\n    }\n    assert log['ancestors'] == expected_ancestors\n\n    assert len(log['specs']) == 2\n    assert len(log['specs'][ancestor_digest]) == 4\n    assert len(log['specs'][dev_head]) == 4\n    assert log['specs'][ancestor_digest]['commit_message'] == 'first'\n    assert log['specs'][dev_head]['commit_message'] == 'third on test'\n\n    assert log['order'] == [dev_head, ancestor_digest]\n\n    assert len(log['branch_heads']) == 2\n    assert log['branch_heads'][ancestor_digest] == [ancestor_branch.name]\n    assert log['branch_heads'][dev_head] == [dev_branch.name]\n\n\ndef test_repo_log_return_contents_correct_when_specify_branch_name(repo):\n\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co['test_meta']['foo'] = 'bar'\n    ancestor_digest = co.commit('first')\n    co['test_meta']['hello'] = 'world'\n    master_head = co.commit('second')\n    co.close()\n\n    ancestor_branch = repo.create_branch('ancestor', base_commit=ancestor_digest)\n    dev_branch = repo.create_branch('dev', base_commit=ancestor_digest)\n\n    co = repo.checkout(write=True, branch=dev_branch.name)\n    co['test_meta']['zen'] = 'of python'\n    dev_head = co.commit('third on test')\n    co.close()\n\n    log = repo.log(branch='master', return_contents=True)\n\n    assert log['head'] == master_head\n\n    expected_ancestors = {\n        master_head: [ancestor_digest],\n        ancestor_digest: [''],\n    }\n    assert log['ancestors'] == expected_ancestors\n\n    assert len(log['specs']) == 2\n    assert len(log['specs'][ancestor_digest]) == 4\n    assert len(log['specs'][master_head]) == 4\n    assert log['specs'][ancestor_digest]['commit_message'] == 'first'\n    assert log['specs'][master_head]['commit_message'] == 'second'\n\n    assert log['order'] == [master_head, ancestor_digest]\n\n    assert len(log['branch_heads']) == 2\n    assert log['branch_heads'][ancestor_digest] == [ancestor_branch.name]\n    assert log['branch_heads'][master_head] == ['master']\n\n\ndef test_repo_log_return_contents_correct_when_specify_digest(repo):\n\n    co = repo.checkout(write=True)\n    co.add_str_column('test_meta')\n    co['test_meta']['foo'] = 'bar'\n    ancestor_digest = co.commit('first')\n    co['test_meta']['hello'] = 'world'\n    master_head = co.commit('second')\n    co.close()\n\n    ancestor_branch = repo.create_branch('ancestor', base_commit=ancestor_digest)\n    dev_branch = repo.create_branch('dev', base_commit=ancestor_digest)\n\n    co = repo.checkout(write=True, branch=dev_branch.name)\n    co['test_meta']['zen'] = 'of python'\n    dev_head = co.commit('third on test')\n    co.close()\n\n    log = repo.log(commit=master_head, return_contents=True)\n\n    assert log['head'] == master_head\n\n    expected_ancestors = {\n        master_head: [ancestor_digest],\n        ancestor_digest: [''],\n    }\n    assert log['ancestors'] == expected_ancestors\n\n    assert len(log['specs']) == 2\n    assert len(log['specs'][ancestor_digest]) == 4\n    assert len(log['specs'][master_head]) == 4\n    assert log['specs'][ancestor_digest]['commit_message'] == 'first'\n    assert log['specs'][master_head]['commit_message'] == 'second'\n\n    assert log['order'] == [master_head, ancestor_digest]\n\n    assert len(log['branch_heads']) == 2\n    assert log['branch_heads'][ancestor_digest] == [ancestor_branch.name]\n    assert log['branch_heads'][master_head] == ['master']\n"
  },
  {
    "path": "tests/typesystem/test_ndarray_typesysem.py",
    "content": "import pytest\nimport numpy as np\n\n\nfrom hangar.typesystem import NdarrayFixedShape, NdarrayVariableShape\n\n\nclass TestInvalidValues:\n\n    @pytest.mark.parametrize('shape,expected_exc', [\n        [tuple(range(32)), ValueError],\n        [(1.2, 2), TypeError],\n        [[1, 2], TypeError],\n        ['shouldntwork', TypeError],\n    ])\n    def test_shape_not_tuple_of_int_less_than_32_dims(self, shape, expected_exc):\n        with pytest.raises(expected_exc):\n            NdarrayFixedShape(shape=shape, dtype=np.uint8, column_layout='flat')\n        with pytest.raises(expected_exc):\n            NdarrayVariableShape(shape=shape, dtype=np.uint8, column_layout='flat')\n\n    @pytest.mark.parametrize(\n        'coltype', ['str', str, 'notvalid', None, 32, 3.5, {'foo': 'bar'}, ascii])\n    def test_column_type_must_be_ndarray(self, coltype):\n        with pytest.raises(ValueError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout='flat', column_type=coltype)\n        with pytest.raises(ValueError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout='flat', column_type=coltype)\n\n    @pytest.mark.parametrize(\n        'collayout', ['f', 'n', 'notvalid', None, 32, 3.5, {'foo': 'bar'}, ascii])\n    def test_column_layout_must_be_valid_value(self, collayout):\n        with pytest.raises(ValueError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout=collayout)\n        with pytest.raises(ValueError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout=collayout)\n\n    @pytest.mark.parametrize(\n        'backend', ['30', 24, {'10': '10'}, ('00',), ['50', ], ascii, 'None'])\n    def test_fixed_shape_backend_code_valid_value(self, backend):\n        with pytest.raises(ValueError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend=backend)\n\n    @pytest.mark.parametrize(\n        'backend', ['30', '01', 24, {'10': '10'}, ('00',), ['50', ], ascii, 'None'])\n    def test_variable_shape_backend_code_valid_value(self, backend):\n        with pytest.raises(ValueError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend=backend)\n\n    @pytest.mark.parametrize(\n        'opts', ['val', [], (), [('key', 'val')], 10, ({'key': 'val'},), ascii])\n    def test_backend_options_must_be_dict_or_nonetype(self, opts):\n        with pytest.raises(TypeError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend='00', backend_options=opts)\n        with pytest.raises(TypeError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend='00', backend_options=opts)\n\n    def test_backend_must_be_specified_if_backend_options_provided(self):\n        with pytest.raises(ValueError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend_options={})\n        with pytest.raises(ValueError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout='flat', backend_options={})\n\n    @pytest.mark.parametrize(\n        'schema_type', ['fixed_shape', True, 'str', np.uint8, 3, ascii])\n    def test_variable_shape_must_have_variable_shape_schema_type(self, schema_type):\n        with pytest.raises(ValueError):\n            NdarrayVariableShape(shape=(1,), dtype=np.uint8, column_layout='flat', schema_type=schema_type)\n\n    @pytest.mark.parametrize(\n        'schema_type', ['variable_shape', True, 'str', np.uint8, 3, ascii])\n    def test_fixed_shape_must_have_fixed_shape_schema_type(self, schema_type):\n        with pytest.raises(ValueError):\n            NdarrayFixedShape(shape=(1,), dtype=np.uint8, column_layout='flat', schema_type=schema_type)\n\n"
  },
  {
    "path": "tests/typesystem/test_pybytes_typesystem.py",
    "content": "import pytest\nimport numpy as np\n\nfrom hangar.typesystem import BytesVariableShape\n\n\nclass TestInvalidValues:\n\n    @pytest.mark.parametrize('coltype', ['ndarray', np.ndarray, 32, {'foo': 'bar'}, ascii])\n    def test_column_type_must_be_str(self, coltype):\n        with pytest.raises(ValueError):\n            BytesVariableShape(dtype=bytes, column_layout='flat', column_type=coltype)\n\n    @pytest.mark.parametrize('collayout', ['f', 'n', None, 32, {'foo': 'bar'}, ascii])\n    def test_column_layout_must_be_valid_value(self, collayout):\n        with pytest.raises(ValueError):\n            BytesVariableShape(dtype=bytes, column_layout=collayout)\n\n    @pytest.mark.parametrize('backend', ['00', 24, {'31': '31'}, ('31',), ['50', ], ascii, 'None'])\n    def test_variable_shape_backend_code_valid_value(self, backend):\n        with pytest.raises(ValueError):\n            BytesVariableShape(dtype=bytes, column_layout='flat', backend=backend)\n\n    @pytest.mark.parametrize('opts', ['val', [], (), [('k', 'v')], 10, ({'k': 'v'},), ascii])\n    def test_backend_options_must_be_dict_or_nonetype(self, opts):\n        with pytest.raises(TypeError):\n            BytesVariableShape(dtype=bytes, column_layout='flat', backend='31', backend_options=opts)\n\n    def test_backend_must_be_specified_if_backend_options_provided(self):\n        with pytest.raises(ValueError):\n            BytesVariableShape(dtype=bytes, column_layout='flat', backend_options={})\n\n    @pytest.mark.parametrize('schema_type', ['fixed_shape', True, 'str', np.uint8, 3, ascii])\n    def test_variable_shape_must_have_variable_shape_schema_type(self, schema_type):\n        with pytest.raises(ValueError):\n            BytesVariableShape(dtype=bytes, column_layout='flat', schema_type=schema_type)\n\n\n# ----------------------- Fixtures for Valid Schema ---------------------------\n\n\n@pytest.fixture(params=['nested', 'flat'], scope='class')\ndef column_layout(request):\n    return request.param\n\n\n@pytest.fixture(params=['31'], scope='class')\ndef backend(request):\n    return request.param\n\n\n@pytest.fixture(params=[{}], scope='class')\ndef backend_options(request):\n    return request.param\n\n\n@pytest.fixture(scope='class')\ndef valid_schema(column_layout, backend, backend_options):\n    schema = BytesVariableShape(\n        dtype=bytes, column_layout=column_layout, backend=backend, backend_options=backend_options)\n    return schema\n\n\nclass TestValidSchema:\n\n    @pytest.mark.parametrize('data', [\n        b'hello', b'world how are you?', b'\\n what\\'s up',\n        b'loob!', b'lol',\n        (b\"\\x80\\x04\\x95'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x8c\\x08__main__\"\n         b\"\\x94\\x8c\\x07testobj\\x94\\x93\\x94)\\x81\\x94}\\x94\\x8c\\x04name\\x94Nsb.\")\n    ])\n    def test_valid_data(self, valid_schema, data):\n        res = valid_schema.verify_data_compatible(data)\n        assert res.compatible is True\n        assert res.reason == ''\n\n    def test_data_over_2MB_size_not_allowed(self, valid_schema):\n        data = ''.join(['a' for _ in range(2_000_001)]).encode()\n        res = valid_schema.verify_data_compatible(data)\n        assert res.compatible is False\n\n\n\n"
  },
  {
    "path": "tests/typesystem/test_pystr_typesystem.py",
    "content": "import pytest\nimport numpy as np\nfrom random import randint, choices\n\n\nfrom hangar.typesystem import StringVariableShape\n\n\nclass TestInvalidValues:\n\n    @pytest.mark.parametrize('coltype', ['ndarray', np.ndarray, 32, {'foo': 'bar'}, ascii])\n    def test_column_type_must_be_str(self, coltype):\n        with pytest.raises(ValueError):\n            StringVariableShape(dtype=str, column_layout='flat', column_type=coltype)\n\n    @pytest.mark.parametrize('collayout', ['f', 'n', None, 32, {'foo': 'bar'}, ascii])\n    def test_column_layout_must_be_valid_value(self, collayout):\n        with pytest.raises(ValueError):\n            StringVariableShape(dtype=str, column_layout=collayout)\n\n    @pytest.mark.parametrize('backend', ['00', 24, {'30': '30'}, ('30',), ['50',], ascii, 'None'])\n    def test_variable_shape_backend_code_valid_value(self, backend):\n        with pytest.raises(ValueError):\n            StringVariableShape(dtype=str, column_layout='flat', backend=backend)\n\n    @pytest.mark.parametrize('opts', ['val', [], (), [('k', 'v')], 10, ({'k': 'v'},), ascii])\n    def test_backend_options_must_be_dict_or_nonetype(self, opts):\n        with pytest.raises(TypeError):\n            StringVariableShape(dtype=str, column_layout='flat', backend='30', backend_options=opts)\n\n    def test_backend_must_be_specified_if_backend_options_provided(self):\n        with pytest.raises(ValueError):\n            StringVariableShape(dtype=str, column_layout='flat', backend_options={})\n\n    @pytest.mark.parametrize('schema_type', ['fixed_shape', True, 'str', np.uint8, 3, ascii])\n    def test_variable_shape_must_have_variable_shape_schema_type(self, schema_type):\n        with pytest.raises(ValueError):\n            StringVariableShape(dtype=str, column_layout='flat', schema_type=schema_type)\n\n\n# ----------------------- Fixtures for Valid Schema ---------------------------\n\n\n@pytest.fixture(params=['nested', 'flat'], scope='class')\ndef column_layout(request):\n    return request.param\n\n\n@pytest.fixture(params=['30'], scope='class')\ndef backend(request):\n    return request.param\n\n\n@pytest.fixture(params=[{}], scope='class')\ndef backend_options(request):\n    return request.param\n\n\n@pytest.fixture(scope='class')\ndef valid_schema(column_layout, backend, backend_options):\n    schema = StringVariableShape(\n        dtype=str, column_layout=column_layout, backend=backend, backend_options=backend_options)\n    return schema\n\n\nclass TestValidSchema:\n\n    @pytest.mark.parametrize('data', [\n        'hello', 'world how are you?', '\\n what\\'s up',\n        'loob!', 'a\\xac\\u1234\\u20ac\\U00008000', 'lol'\n    ])\n    def test_valid_data(self, valid_schema, data):\n        res = valid_schema.verify_data_compatible(data)\n        assert res.compatible is True\n        assert res.reason == ''\n\n    @pytest.mark.parametrize('data', [chr(24523), chr(253), chr(6222)])\n    def test_large_unicode_codepoints_strings_compatible(self, valid_schema, data):\n        res = valid_schema.verify_data_compatible(data)\n        assert res.compatible is True\n        assert res.reason == ''\n\n    def test_strings_over_2MB_size_not_allowed(self, valid_schema):\n        data = ''.join(['a' for _ in range(2_000_001)])\n        res = valid_schema.verify_data_compatible(data)\n        assert res.compatible is False\n\n\n\n"
  },
  {
    "path": "tox.ini",
    "content": "[tox]\nenvlist =\n    clean,\n    docs,\n    py{36,37,38}-cov{yes,no}-ml{yes,no},\n    report,\n    mypy\n\n# -------------- dependency setup ---------------\n\n[gh-actions]\npython =\n    3.6: py36\n    3.7: py37\n    3.8: py38\n\n[gh-actions:env]\nTESTCOVER =\n    yes: covyes\n    no: covno\nTESTML =\n    yes: mlyes\n    no: mlno\n\n[base]\ndeps =\n    Cython\n    py{36,37,38}: pytest\n        pytest-xdist\n    py{36,37,38}-mlno: hypothesis[numpy]\n        pretend\n    py{36,37,38}-covyes: pytest-cov\n    py{36,37,38}-cov{yes,no}-mlyes,docs: tensorflow-cpu == 2.2.0\n    py{36,37,38}-cov{yes,no}-mlyes,docs: torch == 1.4.0+cpu ; sys_platform != 'darwin'\n    py{36,37,38}-cov{yes,no}-mlyes,docs: torch == 1.4.0 ; sys_platform == 'darwin'\n\n\n[testenv]\ndeps =\n    {[base]deps}\nusedevelop =\n    covyes,docs: true\n    covno: false\nignore_basepython_conflict = true\nsetenv =\n    PYTHONPATH={toxinidir}/tests\npassenv =\n    *\ninstall_command =\n    pip install {packages} -f https://download.pytorch.org/whl/torch_stable.html\ncommands =\n    py{36,37,38}-covno-mlno: pytest --ignore {env:PYTHONPATH}/ml_datasets -n={env:PYTEST_XDIST_PROC_NR:4} {posargs}\n    py{36,37,38}-covyes-mlno: pytest --ignore {env:PYTHONPATH}/ml_datasets --cov --cov-append --cov-report term -n={env:PYTEST_XDIST_PROC_NR:4} {posargs}\n    py{36,37,38}-covno-mlyes: pytest -n={env:PYTEST_XDIST_PROC_NR:4} {posargs} {env:PYTHONPATH}/ml_datasets\n    py{36,37,38}-covyes-mlyes: pytest --cov --cov-append --cov-report term -n={env:PYTEST_XDIST_PROC_NR:4} {posargs} {env:PYTHONPATH}/ml_datasets\n\n# ---------------- checkers ------------------------\n\n[testenv:spell]\nsetenv =\n    SPELLCHECK=1\ncommands =\n    sphinx-build -b spelling docs dist/docs\nskip_install = true\ndeps =\n    -r{toxinidir}/docs/requirements.txt\n    sphinxcontrib-spelling\n    pyenchant\n\n[testenv:docs]\nusedevelop = true\ndeps =\n    {[base]deps}\n    -r{toxinidir}/docs/requirements.txt\ncommands =\n    sphinx-build {posargs:-E} -b html docs dist/docs\n    sphinx-build -b linkcheck docs dist/docs -j {env:GH_ACTIONS_PROC_NR:8}\ninstall_command =\n    pip install {packages} -f https://download.pytorch.org/whl/torch_stable.html\n\n[testenv:report]\ndeps =\n    coverage\nskip_install = true\ncommands =\n    coverage report\n    coverage html\n\n[testenv:clean]\nskip_install = true\ndeps =\n    coverage\ncommands =\n    coverage erase\n\n# ------------------- mypy ----------------------\n\n[testenv:mypy]\nbasepython = {env:TOXPYTHON:python3.8}\nskip_install = False\ncommands =\n    {posargs:mypy --config-file mypy.ini src/hangar}\ndeps =\n    {[base]deps}\n    mypy >= 0.701\n    mypy-protobuf\n    grpcio_tools\ninstall_command =\n    pip install {packages} -f https://download.pytorch.org/whl/torch_stable.html\n"
  }
]