Full Code of felixdittrich92/OnnxTR for AI

main b10318c76097 cached

126 files

480.5 KB

180.5k tokens

387 symbols

1 requests

Download .txt

Showing preview only (561K chars total). Download the full file or copy to clipboard to get everything.

Repository: felixdittrich92/OnnxTR
Branch: main
Commit: b10318c76097
Files: 126
Total size: 480.5 KB

Directory structure:
gitextract_7yglu2_f/

├── .conda/
│   └── meta.yaml
├── .github/
│   ├── CODEOWNERS
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   ├── config.yml
│   │   └── feature_request.yml
│   ├── dependabot.yml
│   ├── release.yml
│   └── workflows/
│       ├── builds.yml
│       ├── clear_caches.yml
│       ├── demo.yml
│       ├── docker.yml
│       ├── main.yml
│       ├── publish.yml
│       └── style.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── demo/
│   ├── README.md
│   ├── app.py
│   ├── packages.txt
│   └── requirements.txt
├── onnxtr/
│   ├── __init__.py
│   ├── contrib/
│   │   ├── __init__.py
│   │   ├── artefacts.py
│   │   └── base.py
│   ├── file_utils.py
│   ├── io/
│   │   ├── __init__.py
│   │   ├── elements.py
│   │   ├── html.py
│   │   ├── image.py
│   │   ├── pdf.py
│   │   └── reader.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── builder.py
│   │   ├── classification/
│   │   │   ├── __init__.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   └── mobilenet.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   └── zoo.py
│   │   ├── detection/
│   │   │   ├── __init__.py
│   │   │   ├── _utils/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   ├── core.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── differentiable_binarization.py
│   │   │   │   ├── fast.py
│   │   │   │   └── linknet.py
│   │   │   ├── postprocessor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   └── zoo.py
│   │   ├── engine.py
│   │   ├── factory/
│   │   │   ├── __init__.py
│   │   │   └── hub.py
│   │   ├── predictor/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── predictor.py
│   │   ├── preprocessor/
│   │   │   ├── __init__.py
│   │   │   └── base.py
│   │   ├── recognition/
│   │   │   ├── __init__.py
│   │   │   ├── core.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── crnn.py
│   │   │   │   ├── master.py
│   │   │   │   ├── parseq.py
│   │   │   │   ├── sar.py
│   │   │   │   ├── viptr.py
│   │   │   │   └── vitstr.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _utils.py
│   │   │   │   └── base.py
│   │   │   ├── utils.py
│   │   │   └── zoo.py
│   │   └── zoo.py
│   ├── py.typed
│   ├── transforms/
│   │   ├── __init__.py
│   │   └── base.py
│   └── utils/
│       ├── __init__.py
│       ├── common_types.py
│       ├── data.py
│       ├── fonts.py
│       ├── geometry.py
│       ├── multithreading.py
│       ├── reconstitution.py
│       ├── repr.py
│       ├── visualization.py
│       └── vocabs.py
├── pyproject.toml
├── scripts/
│   ├── convert_to_float16.py
│   ├── evaluate.py
│   ├── latency.py
│   └── quantize.py
├── setup.py
└── tests/
    ├── common/
    │   ├── test_contrib.py
    │   ├── test_core.py
    │   ├── test_engine_cfg.py
    │   ├── test_headers.py
    │   ├── test_io.py
    │   ├── test_io_elements.py
    │   ├── test_models.py
    │   ├── test_models_builder.py
    │   ├── test_models_classification.py
    │   ├── test_models_detection.py
    │   ├── test_models_detection_utils.py
    │   ├── test_models_factory.py
    │   ├── test_models_preprocessor.py
    │   ├── test_models_recognition.py
    │   ├── test_models_recognition_utils.py
    │   ├── test_models_zoo.py
    │   ├── test_transforms.py
    │   ├── test_utils_data.py
    │   ├── test_utils_fonts.py
    │   ├── test_utils_geometry.py
    │   ├── test_utils_multithreading.py
    │   ├── test_utils_reconstitution.py
    │   ├── test_utils_visualization.py
    │   └── test_utils_vocabs.py
    └── conftest.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .conda/meta.yaml
================================================
{% set pyproject = load_file_data('../pyproject.toml', from_recipe_dir=True) %}
{% set project = pyproject.get('project') %}
{% set urls = pyproject.get('project', {}).get('urls') %}
{% set version = environ.get('BUILD_VERSION', '0.8.2a0') %}

package:
  name: onnxtr
  version: {{ version }}

source:
  fn: onnxtr-{{ version }}.tar.gz
  url: ../dist/onnxtr-{{ version }}.tar.gz

build:
  script: python setup.py install --single-version-externally-managed --record=record.txt

requirements:
  host:
    - python>=3.10, <3.12
    - setuptools

  run:
    - numpy >=1.16.0, <3.0.0
    - scipy >=1.4.0, <2.0.0
    - pillow >=9.2.0
    - opencv >=4.5.0, <5.0.0
    - pypdfium2-team::pypdfium2_helpers >=4.11.0, <5.0.0
    - pyclipper >=1.2.0, <2.0.0
    - langdetect >=1.0.9, <2.0.0
    - rapidfuzz >=3.0.0, <4.0.0
    - huggingface_hub >=0.20.0, <1.0.0
    - defusedxml >=0.7.0
    - anyascii >=0.3.2
    - tqdm >=4.30.0

test:
  requires:
    - pip
    - onnxruntime

  imports:
    - onnxtr

about:
  home: {{ urls.get('repository') }}
  license: Apache-2.0
  license_file: {{ project.get('license', {}).get('file') }}
  summary: {{ project.get('description') | replace(":", " -")}}
  dev_url: {{ urls.get('repository') }}


================================================
FILE: .github/CODEOWNERS
================================================
*       @felixdittrich92

================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: felixdittrich92
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
polar: # Replace with a single Polar username
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
thanks_dev: # Replace with a single thanks.dev username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yml
================================================
name: 🐛 Bug report
description: Create a report to help us improve the library
labels: 'type: bug'

body:
- type: markdown
  attributes:
    value: >
      #### Before reporting a bug, please check that the issue hasn't already been addressed in [the existing and past issues](https://github.com/felixdittrich92/onnxtr/issues).
- type: textarea
  attributes:
    label: Bug description
    description: |
      A clear and concise description of what the bug is.

      Please explain the result you observed and the behavior you were expecting.
    placeholder: |
      A clear and concise description of what the bug is.
  validations:
    required: true

- type: textarea
  attributes:
    label: Code snippet to reproduce the bug
    description: |
      Sample code to reproduce the problem.

      Please wrap your code snippet with ```` ```triple quotes blocks``` ```` for readability.
    placeholder: |
      ```python
      Sample code to reproduce the problem
      ```
  validations:
    required: true
- type: textarea
  attributes:
    label: Error traceback
    description: |
      The error message you received running the code snippet, with the full traceback.

      Please wrap your error message with ```` ```triple quotes blocks``` ```` for readability.
    placeholder: |
      ```
      The error message you got, with the full traceback.
      ```
  validations:
    required: true
- type: textarea
  attributes:
    label: Environment
    description: |
      Please describe your environment:
      OS:
      Python version:
      Library version:
      Onnxruntime version:
  validations:
    required: true
- type: markdown
  attributes:
    value: >
      Thanks for helping us improve the library!


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
contact_links:
  - name: Usage questions
    url: https://github.com/felixdittrich92/OnnxTR/discussions
    about: Ask questions and discuss with other OnnxTR community members

================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yml
================================================
name: 🚀 Feature request
description: >
  Submit a proposal/request for a new feature for OnnxTR. Please search for existing issues before creating a new one.
  For non-onnx related features please use the [main repository](https://github.com/mindee/doctr/issues).
labels: 'type: enhancement'

body:
- type: textarea
  attributes:
    label: 🚀 The feature
    description: >
      A clear and concise description of the feature proposal
  validations:
    required: true
- type: textarea
  attributes:
    label: Additional context
    description: >
      Add any other context or screenshots about the feature request.
- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"
    open-pull-requests-limit: 10
    target-branch: "main"
    labels: ["topic: build"]
    schedule:
      interval: weekly
      day: sunday
  - package-ecosystem: "github-actions"
    directory: "/"
    open-pull-requests-limit: 10
    target-branch: "main"
    labels: ["topic: CI/CD"]
    schedule:
      interval: weekly
      day: sunday
    groups:
      github-actions:
        patterns:
          - "*"


================================================
FILE: .github/release.yml
================================================
changelog:
  exclude:
    labels:
      - ignore-for-release
  categories:
    - title: Breaking Changes 🛠
      labels:
        - "type: breaking change"
    # NEW FEATURES
    - title: New Features
      labels:
        - "type: new feature"
    # BUG FIXES
    - title: Bug Fixes
      labels:
        - "type: bug"
    # IMPROVEMENTS
    - title: Improvements
      labels:
        - "type: enhancement"
    # MISC
    - title: Miscellaneous
      labels:
        - "type: misc"


================================================
FILE: .github/workflows/builds.yml
================================================
name: builds

on:
  push:
    branches: main
  pull_request:
    branches: main
  schedule:
    # Runs every 2 weeks on Monday at 03:00 UTC
    - cron: '0 3 * * 1'

jobs:
  build:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python: ["3.10", "3.11", "3.12", "3.13"]
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          # MacOS issue ref.: https://github.com/actions/setup-python/issues/855 & https://github.com/actions/setup-python/issues/865
          python-version: ${{ matrix.os == 'macos-latest' && matrix.python == '3.10' && '3.11' || matrix.python }}
          architecture: x64
      - name: Cache python modules
        uses: actions/cache@v5
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
      - name: Install package
        run: |
          python -m pip install --upgrade pip
          pip install -e .[cpu-headless,viz] --upgrade
      - name: Import package
        run: python -c "import onnxtr; print(onnxtr.__version__)"

  conda:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: conda-incubator/setup-miniconda@v4
        with:
          auto-update-conda: true
          python-version: "3.10"
          channels: pypdfium2-team,bblanchon,defaults,conda-forge
          channel-priority: strict
      - name: Install dependencies
        shell: bash -el {0}
        run: conda install -y conda-build conda-verify anaconda-client
      - name: Install libEGL
        run: sudo apt-get update && sudo apt-get install -y libegl1
      - name: Build and verify
        shell: bash -el {0}
        run: |
          python setup.py sdist
          mkdir conda-dist
          conda build .conda/ --output-folder conda-dist
          conda-verify conda-dist/linux-64/*conda --ignore=C1115


================================================
FILE: .github/workflows/clear_caches.yml
================================================
name: Clear GitHub runner caches

on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * *'  # Runs once a day

jobs:
  clear:
    name: Clear caches
    runs-on: ubuntu-latest
    steps:
    - uses: MyAlbum/purge-cache@v2
      with:
        max-age: 172800 # Caches older than 2 days are deleted


================================================
FILE: .github/workflows/demo.yml
================================================
name: Sync Hugging Face demo

on:
  # Run 'test-demo' on every pull request to the main branch
  pull_request:
    branches: [main]

  # Run 'sync-to-hub' on push when tagging (e.g., 'v*') and on a scheduled cron job
  push:
    tags:
      - 'v*'

  schedule:
    - cron: '0 2 10 * *'  # At 02:00 on day-of-month 10 (every month)

  # Allow manual triggering of the workflow
  workflow_dispatch:

jobs:
  # This job runs on every pull request to main
  test-demo:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
        python: ["3.10"]
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Cache python modules
        uses: actions/cache@v5
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('requirements.txt') }}-${{ hashFiles('demo/requirements.txt') }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r demo/requirements.txt --upgrade
      - name: Start Gradio demo
        run: |
          nohup python demo/app.py &
          sleep 10  # Allow some time for the Gradio server to start
      - name: Check demo build
        run: |
          curl --fail http://127.0.0.1:7860/ || exit 1

  # This job only runs when a new version tag is pushed or during the cron job
  sync-to-hub:
    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
    needs: test-demo
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: "3.10"
      - name: Install huggingface_hub
        run: pip install huggingface-hub
      - name: Upload folder to Hugging Face
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          python -c "
          from huggingface_hub import HfApi
          api = HfApi(token='${{ secrets.HF_TOKEN }}')
          repo_id = 'Felix92/OnnxTR-OCR'
          api.upload_folder(repo_id=repo_id, repo_type='space', folder_path='demo/')
          api.restart_space(repo_id=repo_id, factory_reboot=True)
          "


================================================
FILE: .github/workflows/docker.yml
================================================
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
#
name: Docker image on ghcr.io

on:
  push:
    tags:
      - 'v*'
  pull_request:
    branches: main
  schedule:
    - cron: '0 2 1 6 *'  # At 02:00 on day-of-month 1 in June (once a year actually)

env:
  REGISTRY: ghcr.io

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest

    strategy:
      fail-fast: false
      matrix:
        image:
          - "ubuntu:24.04"          # Base image for CPU variants
          - "nvidia/cuda:12.6.2-base-ubuntu24.04" # Base image for GPU
        variant:
          - "cpu-headless"           # CPU variant 1
          - "openvino-headless"  # CPU variant 2
          - "gpu-headless"           # GPU variant
        python: [3.10.13]

        # Exclude invalid combinations
        exclude:
          - image: "nvidia/cuda:12.6.2-base-ubuntu24.04"
            variant: "cpu-headless"
          - image: "nvidia/cuda:12.6.2-base-ubuntu24.04"
            variant: "openvino-headless"
          - image: "ubuntu:24.04"
            variant: "gpu-headless"

    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Log in to the Container registry
        uses: docker/login-action@v4
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Sanitize docker tag
        run: |
          # Start with the base prefix
          PREFIX_DOCKER_TAG="OnnxTR-${{ matrix.variant }}-py${{ matrix.python }}"

          # Replace any commas with hyphens (if needed)
          PREFIX_DOCKER_TAG=$(echo "$PREFIX_DOCKER_TAG" | sed 's/,/-/g')

          # Determine suffix based on image
          IMAGE="${{ matrix.image }}"
          case "$IMAGE" in
            "nvidia/cuda:"*)
              SUFFIX=$(echo "$IMAGE" | sed -E 's|.*/cuda:([0-9]+\.[0-9]+\.[0-9]+)-base-(ubuntu[0-9]+\.[0-9]+)|-\2-cuda\1|')
              ;;
            "ubuntu:"*)
              SUFFIX=$(echo "$IMAGE" | sed -E 's|ubuntu:([0-9]+\.[0-9]+)|-ubuntu\1|')
              ;;
            *)
              SUFFIX=""
              ;;
          esac

          # Combine the prefix, suffix, and ensure ending hyphen
          PREFIX_DOCKER_TAG="${PREFIX_DOCKER_TAG}${SUFFIX}-"

          # Export to environment
          echo "PREFIX_DOCKER_TAG=${PREFIX_DOCKER_TAG}" >> $GITHUB_ENV

          # Debugging output
          echo "Final Docker Tag: $PREFIX_DOCKER_TAG"

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v6
        with:
          images: ${{ env.REGISTRY }}/${{ github.repository }}
          tags: |
            # used only on schedule event
            type=schedule,pattern={{date 'YYYY-MM'}},prefix=${{ env.PREFIX_DOCKER_TAG }}
            # used only if a tag following semver is published
            type=semver,pattern={{raw}},prefix=${{ env.PREFIX_DOCKER_TAG }}

      - name: Build Docker image
        id: build
        uses: docker/build-push-action@v7
        with:
          context: .
          build-args: |
            BASE_IMAGE=${{ matrix.image }}
            SYSTEM=${{ matrix.variant }}
            PYTHON_VERSION=${{ matrix.python }}
            ONNXTR_REPO=${{ github.repository }}
            ONNXTR_VERSION=${{ github.sha }}
          push: false  # push only if `import onnxtr` works
          tags: ${{ steps.meta.outputs.tags }}

      - name: Check if `import onnxtr` works
        run: docker run ${{ steps.build.outputs.imageid }} python3 -c 'import onnxtr; print(onnxtr.__version__)'

      - name: Push Docker image
        if: ${{ (github.ref == 'refs/heads/main' && github.event_name != 'pull_request') || (startsWith(github.ref, 'refs/tags') && github.event_name == 'push') }}
        uses: docker/build-push-action@v7
        with:
          context: .
          build-args: |
            BASE_IMAGE=${{ matrix.image }}
            SYSTEM=${{ matrix.variant }}
            PYTHON_VERSION=${{ matrix.python }}
            ONNXTR_REPO=${{ github.repository }}
            ONNXTR_VERSION=${{ github.sha }}
          push: true
          tags: ${{ steps.meta.outputs.tags }}

================================================
FILE: .github/workflows/main.yml
================================================
name: tests

on:
  push:
    branches: main
  pull_request:
    branches: main
  schedule:
    # Runs every 2 weeks on Monday at 03:00 UTC
    - cron: '0 3 * * 1'

jobs:
  pytest-common:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python: ["3.10", "3.11", "3.12"]
        backend: ["cpu-headless", "openvino-headless"]
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Cache python modules
        uses: actions/cache@v5
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}-tests
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -e .[${{ matrix.backend }},viz,html,testing] --upgrade
      - name: Run unittests
        run: |
          coverage run -m pytest tests/common/ -rs --memray
          coverage xml -o coverage-common-${{ matrix.backend }}-${{ matrix.python }}.xml
      - uses: actions/upload-artifact@v7
        with:
          name: coverage-common-${{ matrix.backend }}-${{ matrix.python }}
          path: ./coverage-common-${{ matrix.backend }}-${{ matrix.python }}.xml
          if-no-files-found: error

  codecov-upload:
    runs-on: ubuntu-latest
    needs: [ pytest-common ]
    steps:
      - uses: actions/checkout@v6
      - uses: actions/download-artifact@v8
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v6
        with:
          flags: unittests
          fail_ci_if_error: true
          token: ${{ secrets.CODECOV_TOKEN }}


================================================
FILE: .github/workflows/publish.yml
================================================
name: publish

on:
  release:
    types: [published]

jobs:
  pypi:
    if: "!github.event.release.prerelease"
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
        python: ["3.10"]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Cache python modules
        uses: actions/cache@v5
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install setuptools wheel twine --upgrade
      - name: Get release tag
        id: release_tag
        run: echo "VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV
      - name: Build and publish
        env:
          TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
          TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
          VERSION: ${{ env.VERSION }}
        run: |
          BUILD_VERSION=$VERSION python setup.py sdist bdist_wheel
          twine check dist/*
          twine upload dist/*

  pypi-check:
    needs: pypi
    if: "!github.event.release.prerelease"
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest]
        python: ["3.10"]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Install package
        run: |
          python -m pip install --upgrade pip
          pip install onnxtr[cpu] --upgrade
          python -c "from importlib.metadata import version; print(version('onnxtr'))"

  conda:
    if: "!github.event.release.prerelease"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: conda-incubator/setup-miniconda@v4
        with:
          auto-update-conda: true
          python-version: "3.10"
          channels: pypdfium2-team,bblanchon,defaults,conda-forge
          channel-priority: strict
      - name: Install dependencies
        shell: bash -el {0}
        run: conda install -y conda-build conda-verify anaconda-client
      - name: Install libEGL
        run: sudo apt-get update && sudo apt-get install -y libegl1
      - name: Get release tag
        id: release_tag
        run: echo "VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV
      - name: Build and publish
        shell: bash -el {0}
        env:
          ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_TOKEN }}
          VERSION: ${{ env.VERSION }}
        run: |
          echo "BUILD_VERSION=${VERSION}" >> $GITHUB_ENV
          python setup.py sdist
          mkdir conda-dist
          conda build .conda/ --output-folder conda-dist
          conda-verify conda-dist/linux-64/*conda --ignore=C1115
          anaconda upload conda-dist/linux-64/*conda

  conda-check:
    if: "!github.event.release.prerelease"
    runs-on: ubuntu-latest
    needs: conda
    steps:
      - uses: conda-incubator/setup-miniconda@v4
        with:
          auto-update-conda: true
          python-version: "3.10"
      - name: Install package
        shell: bash -el {0}
        run: |
          conda config --set channel_priority strict
          conda install -c conda-forge onnxruntime
          conda install -c felix92 -c pypdfium2-team -c bblanchon -c defaults -c conda-forge onnxtr
          python -c "from importlib.metadata import version; print(version('onnxtr'))"


================================================
FILE: .github/workflows/style.yml
================================================
name: style

on:
  push:
    branches: main
  pull_request:
    branches: main

jobs:
  ruff:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python: ["3.10"]
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Run ruff
        run: |
          pip install ruff --upgrade
          ruff --version
          ruff check --diff .

  mypy:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest]
        python: ["3.10"]
    steps:
      - uses: actions/checkout@v6
      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python }}
          architecture: x64
      - name: Cache python modules
        uses: actions/cache@v5
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -e .[dev] --upgrade
          pip install mypy --upgrade
      - name: Run mypy
        run: |
          mypy --version
          mypy


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Temp files
onnxtr/version.py
logs/
wandb/
.idea/

# Model files
*.onnx
.qodo

# Profile files
yappi_profile.stats
memray_profile.bin
memray_flamegraph.html


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-ast
      - id: check-yaml
        exclude: .conda
      - id: check-toml
      - id: check-json
      - id: check-added-large-files
        exclude: docs/images/
      - id: end-of-file-fixer
      - id: trailing-whitespace
      - id: debug-statements
      - id: check-merge-conflict
      - id: no-commit-to-branch
        args: ['--branch', 'main']
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.0
    hooks:
      - id: ruff
        args: [ --fix ]
      - id: ruff-format


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
  and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
  overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
  advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
  address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
contact@mindee.com.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior,  harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.


================================================
FILE: Dockerfile
================================================
ARG BASE_IMAGE

FROM ${BASE_IMAGE}

ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

ARG SYSTEM
ARG PYTHON_VERSION

RUN apt-get update && apt-get install -y --no-install-recommends \
    # - Other packages
    build-essential \
    pkg-config \
    curl \
    wget \
    software-properties-common \
    unzip \
    git \
    # - Packages to build Python
    tar make gcc zlib1g-dev libffi-dev libssl-dev liblzma-dev libbz2-dev libsqlite3-dev \
    # - Packages for docTR
    libgl1-mesa-dev libsm6 libxext6 libxrender-dev libpangocairo-1.0-0 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
fi

# Install Python

RUN wget http://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz && \
    tar -zxf Python-$PYTHON_VERSION.tgz && \
    cd Python-$PYTHON_VERSION && \
    mkdir /opt/python/ && \
    ./configure --prefix=/opt/python && \
    make && \
    make install && \
    cd .. && \
    rm Python-$PYTHON_VERSION.tgz && \
    rm -r Python-$PYTHON_VERSION

ENV PATH=/opt/python/bin:$PATH

# Install OnnxTR
ARG ONNXTR_REPO='felixdittrich92/onnxtr'
ARG ONNXTR_VERSION=main
RUN pip3 install -U pip setuptools wheel && \
    pip3 install "onnxtr[$SYSTEM,html]@git+https://github.com/$ONNXTR_REPO.git@$ONNXTR_VERSION"


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: Makefile
================================================
.PHONY: quality style test  docs-single-version docs
# this target runs checks on all files
quality:
	ruff check .
	mypy onnxtr/

# this target runs checks on all files and potentially modifies some of them
style:
	ruff format .
	ruff check --fix .

# Run tests for the library
test:
	coverage run -m pytest tests/common/ -rs --memray
	coverage report --fail-under=80 --show-missing

# Check that docs can build
docs-single-version:
	sphinx-build docs/source docs/_build -a

# Check that docs can build
docs:
	cd docs && bash build.sh

================================================
FILE: README.md
================================================
<p align="center">
  <img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="40%">
</p>

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
![Build Status](https://github.com/felixdittrich92/onnxtr/workflows/builds/badge.svg)
[![codecov](https://codecov.io/gh/felixdittrich92/OnnxTR/graph/badge.svg?token=WVFRCQBOLI)](https://codecov.io/gh/felixdittrich92/OnnxTR)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/4fff4d764bb14fb8b4f4afeb9587231b)](https://app.codacy.com/gh/felixdittrich92/OnnxTR/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![CodeFactor](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr/badge)](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr)
[![Socket Badge](https://socket.dev/api/badge/pypi/package/onnxtr/0.8.1?artifact_id=tar-gz)](https://socket.dev/pypi/package/onnxtr/overview/0.8.1/tar-gz)
[![Pypi](https://img.shields.io/badge/pypi-v0.8.1-blue.svg)](https://pypi.org/project/OnnxTR/)
[![Docker Images](https://img.shields.io/badge/Docker-4287f5?style=flat&logo=docker&logoColor=white)](https://github.com/felixdittrich92/OnnxTR/pkgs/container/onnxtr)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Felix92/OnnxTR-OCR)
![PyPI - Downloads](https://img.shields.io/pypi/dm/onnxtr)

> :warning: Please note that this is a wrapper around the [doctr](https://github.com/mindee/doctr) library to provide a Onnx pipeline for docTR. For feature requests, which are not directly related to the Onnx pipeline, please refer to the base project.

**Optical Character Recognition made seamless & accessible to anyone, powered by Onnx**

What you can expect from this repository:

- efficient ways to parse textual information (localize and identify each word) from your documents
- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library - no PyTorch or TensorFlow dependencies
- more lightweight package with faster inference latency and less required resources
- 8-Bit quantized models for faster inference on CPU

![OCR_example](https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/ocr.png)

## Installation

### Prerequisites

Python 3.10 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install OnnxTR.

### Latest release

You can then install the latest release of the package using [pypi](https://pypi.org/project/OnnxTR/) as follows:

**NOTE:**

Currently supported execution providers by default are: CPU, CUDA (NVIDIA GPU), OpenVINO (Intel CPU | GPU), CoreML (Apple Silicon).

For GPU support please take a look at: [ONNX Runtime](https://onnxruntime.ai/getting-started).

- **Prerequisites:** CUDA & cuDNN needs to be installed before [Version table](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html).

```shell
# standard cpu support
pip install "onnxtr[cpu]"
pip install "onnxtr[cpu-headless]"  # same as cpu but with opencv-headless
# with gpu support
pip install "onnxtr[gpu]"
pip install "onnxtr[gpu-headless]"  # same as gpu but with opencv-headless
# OpenVINO cpu | gpu support for Intel CPUs | GPUs
pip install "onnxtr[openvino]"
pip install "onnxtr[openvino-headless]"  # same as openvino but with opencv-headless
# with HTML support
pip install "onnxtr[html]"
# with support for visualization
pip install "onnxtr[viz]"
# with support for all dependencies
pip install "onnxtr[html, gpu, viz]"
```

**Recommendation:**

If you have:

- a NVIDIA GPU, use one of the `gpu` variants
- an Intel CPU or GPU, use one of the `openvino` variants
- an Apple Silicon Mac, use one of the `cpu` variants (CoreML is auto-detected)
- otherwise, use one of the `cpu` variants

**OpenVINO:**

By default OnnxTR running with the OpenVINO execution provider backend uses the `CPU` device with `FP32` precision, to change the device or for further configuaration please refer to the [ONNX Runtime OpenVINO documentation](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#summary-of-options).

### Reading files

Documents can be interpreted from PDF / Images / Webpages / Multiple page images using the following code snippet:

```python
from onnxtr.io import DocumentFile

# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
```

### Putting it together

Let's use the default `ocr_predictor` model for an example:

```python
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, EngineConfig

model = ocr_predictor(
    det_arch="fast_base",  # detection architecture
    reco_arch="vitstr_base",  # recognition architecture
    det_bs=2,  # detection batch size
    reco_bs=512,  # recognition batch size
    # Document related parameters
    assume_straight_pages=True,  # set to `False` if the pages are not straight (rotation, perspective, etc.) (default: True)
    straighten_pages=False,  # set to `True` if the pages should be straightened before final processing (default: False)
    export_as_straight_boxes=False,  # set to `True` if the boxes should be exported as if the pages were straight (default: False)
    # Preprocessing related parameters
    preserve_aspect_ratio=True,  # set to `False` if the aspect ratio should not be preserved (default: True)
    symmetric_pad=True,  # set to `False` to disable symmetric padding (default: True)
    # Additional parameters - meta information
    detect_orientation=False,  # set to `True` if the orientation of the pages should be detected (default: False)
    detect_language=False,  # set to `True` if the language of the pages should be detected (default: False)
    # Orientation specific parameters in combination with `assume_straight_pages=False` and/or `straighten_pages=True`
    disable_crop_orientation=False,  # set to `True` if the crop orientation classification should be disabled (default: False)
    disable_page_orientation=False,  # set to `True` if the general page orientation classification should be disabled (default: False)
    # DocumentBuilder specific parameters
    resolve_lines=True,  # whether words should be automatically grouped into lines (default: True)
    resolve_blocks=False,  # whether lines should be automatically grouped into blocks (default: False)
    paragraph_break=0.035,  # relative length of the minimum space separating paragraphs (default: 0.035)
    # OnnxTR specific parameters
    # NOTE: 8-Bit quantized models are not available for FAST detection models and can in general lead to poorer accuracy
    load_in_8_bit=False,  # set to `True` to load 8-bit quantized models instead of the full precision onces (default: False)
    # Advanced engine configuration options
    det_engine_cfg=EngineConfig(),  # detection model engine configuration (default: internal predefined configuration)
    reco_engine_cfg=EngineConfig(),  # recognition model engine configuration (default: internal predefined configuration)
    clf_engine_cfg=EngineConfig(),  # classification (orientation) model engine configuration (default: internal predefined configuration)
)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
# Display the result (requires matplotlib & mplcursors to be installed)
result.show()
```

![Visualization sample](https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/doctr_example_script.gif)

Or even rebuild the original document from its predictions:

```python
import matplotlib.pyplot as plt

synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0])
plt.axis("off")
plt.show()
```

![Synthesis sample](https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/synthesized_sample.png)

The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
To get a better understanding of the document model, check out [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):

You can also export them as a nested dict, more appropriate for JSON format / render it or export as XML (hocr format):

```python
json_output = result.export()  # nested dict
text_output = result.render()  # human-readable text
xml_output = result.export_as_xml()  # hocr format
for output in xml_output:
    xml_bytes_string = output[0]
    xml_element = output[1]
```

<details>
  <summary>Advanced engine configuration options</summary>

You can also define advanced engine configurations for the models / predictors:

```python
from onnxruntime import SessionOptions

from onnxtr.models import ocr_predictor, EngineConfig

general_options = (
    SessionOptions()
)  # For configuartion options see: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
general_options.enable_cpu_mem_arena = False

# NOTE: The following would force to run only on the GPU if no GPU is available it will raise an error
# List of strings e.g. ["CUDAExecutionProvider", "CPUExecutionProvider"] or a list of tuples with the provider and its options e.g.
# [("CUDAExecutionProvider", {"device_id": 0}), ("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
providers = [
    ("CUDAExecutionProvider", {"device_id": 0, "cudnn_conv_algo_search": "DEFAULT"})
]  # For available providers see: https://onnxruntime.ai/docs/execution-providers/

engine_config = EngineConfig(session_options=general_options, providers=providers)
# We use the default predictor with the custom engine configuration
# NOTE: You can define differnt engine configurations for detection, recognition and classification depending on your needs
predictor = ocr_predictor(det_engine_cfg=engine_config, reco_engine_cfg=engine_config, clf_engine_cfg=engine_config)
```

You can also dynamically configure whether the memory arena should shrink:

```python
from random import random
from onnxruntime import RunOptions, SessionOptions

from onnxtr.models import ocr_predictor, EngineConfig


def arena_shrinkage_handler(run_options: RunOptions) -> RunOptions:
    """
    Shrink the memory arena on 10% of inference runs.
    """
    if random() < 0.1:
        run_options.add_run_config_entry("memory.enable_memory_arena_shrinkage", "cpu:0")
    return run_options


engine_config = EngineConfig(run_options_provider=arena_shrinkage_handler)
engine_config.session_options.enable_mem_pattern = False

predictor = ocr_predictor(det_engine_cfg=engine_config, reco_engine_cfg=engine_config, clf_engine_cfg=engine_config)
```

</details>

## Loading custom exported models

You can also load docTR custom exported models:
For exporting please take a look at the [doctr documentation](https://mindee.github.io/doctr/using_doctr/using_model_export.html#export-to-onnx).

```python
from onnxtr.models import ocr_predictor, linknet_resnet18, parseq

reco_model = parseq("path_to_custom_model.onnx", vocab="ABC")
det_model = linknet_resnet18("path_to_custom_model.onnx")
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
```

## Loading models from HuggingFace Hub

You can also load models from the HuggingFace Hub:

```python
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, from_hub

img = DocumentFile.from_images(["<image_path>"])
# Load your model from the hub
model = from_hub("onnxtr/my-model")

# Pass it to the predictor
# If your model is a recognition model:
predictor = ocr_predictor(det_arch="db_mobilenet_v3_large", reco_arch=model)

# If your model is a detection model:
predictor = ocr_predictor(det_arch=model, reco_arch="crnn_mobilenet_v3_small")

# Get your predictions
res = predictor(img)
```

HF Hub search: [here](https://huggingface.co/models?search=onnxtr).

Collection: [here](https://huggingface.co/collections/Felix92/onnxtr-66bf213a9f88f7346c90e842)

Or push your own models to the hub:

```python
from onnxtr.models import parseq, push_to_hf_hub, login_to_hub
from onnxtr.utils.vocabs import VOCABS

# Login to the hub
login_to_hub()

# Recogniton model
model = parseq("~/onnxtr-parseq-multilingual-v1.onnx", vocab=VOCABS["multilingual"])
push_to_hf_hub(
    model,
    model_name="onnxtr-parseq-multilingual-v1",
    task="recognition",  # The task for which the model is intended [detection, recognition, classification]
    arch="parseq",  # The name of the model architecture
    override=False,  # Set to `True` if you want to override an existing model / repository
)

# Detection model
model = linknet_resnet18("~/onnxtr-linknet-resnet18.onnx")
push_to_hf_hub(model, model_name="onnxtr-linknet-resnet18", task="detection", arch="linknet_resnet18", override=True)
```

## Models architectures

Credits where it's due: this repository provides ONNX models for the following architectures, converted from the docTR models:

### Text Detection

- DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).
- LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)
- FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf)

### Text Recognition

- CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).
- SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).
- MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).
- ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf).
- PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966).
- VIPTR: [A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2401.10110).

```python
predictor = ocr_predictor()
predictor.list_archs()
{
    "detection archs": [
        "db_resnet34",
        "db_resnet50",
        "db_mobilenet_v3_large",
        "linknet_resnet18",
        "linknet_resnet34",
        "linknet_resnet50",
        "fast_tiny",  # No 8-bit support
        "fast_small",  # No 8-bit support
        "fast_base",  # No 8-bit support
    ],
    "recognition archs": [
        "crnn_vgg16_bn",
        "crnn_mobilenet_v3_small",
        "crnn_mobilenet_v3_large",
        "sar_resnet31",
        "master",
        "vitstr_small",
        "vitstr_base",
        "parseqviptr_tiny",  # No 8-bit support
    ],
}
```

### Documentation

This repository is in sync with the [doctr](https://github.com/mindee/doctr) library, which provides a high-level API to perform OCR on documents.
This repository stays up-to-date with the latest features and improvements from the base project.
So we can refer to the [doctr documentation](https://mindee.github.io/doctr/) for more detailed information.

NOTE:

- `pretrained` is the default in OnnxTR, and not available as a parameter.
- docTR specific environment variables (e.g.: DOCTR_CACHE_DIR -> ONNXTR_CACHE_DIR) needs to be replaced with `ONNXTR_` prefix.

### Benchmarks

The CPU benchmarks was measured on a `i7-14700K Intel CPU`.

The GPU benchmarks was measured on a `RTX 4080 Nvidia GPU`.

Benchmarking performed on the FUNSD dataset and CORD dataset.

docTR / OnnxTR models used for the benchmarks are `fast_base` (full precision) | `db_resnet50` (8-bit variant) for detection and `crnn_vgg16_bn` for recognition.

The smallest combination in OnnxTR (docTR) of `db_mobilenet_v3_large` and `crnn_mobilenet_v3_small` takes as comparison `~0.17s / Page` on the FUNSD dataset and `~0.12s / Page` on the CORD dataset in **full precision** on CPU.

- CPU benchmarks:

|Library                             |FUNSD (199 pages)              |CORD  (900 pages)              |
|------------------------------------|-------------------------------|-------------------------------|
|docTR (CPU) - v0.8.1                | ~1.29s / Page                 | ~0.60s / Page                 |
|**OnnxTR (CPU)** - v0.6.0           | ~0.57s / Page                 | **~0.25s / Page**             |
|**OnnxTR (CPU) 8-bit** - v0.6.0     | **~0.38s / Page**             | **~0.14s / Page**             |
|**OnnxTR (CPU-OpenVINO)** - v0.6.0  | **~0.15s / Page**             | **~0.14s / Page**             |
|EasyOCR (CPU) - v1.7.1              | ~1.96s / Page                 | ~1.75s / Page                 |
|**PyTesseract (CPU)** - v0.3.10     | **~0.50s / Page**             | ~0.52s / Page                 |
|Surya (line) (CPU) - v0.4.4         | ~48.76s / Page                | ~35.49s / Page                |
|PaddleOCR (CPU) - no cls - v2.7.3   | ~1.27s / Page                 | ~0.38s / Page                 |

- GPU benchmarks:

|Library                              |FUNSD (199 pages)              |CORD  (900 pages)              |
|-------------------------------------|-------------------------------|-------------------------------|
|docTR (GPU) - v0.8.1                 | ~0.07s / Page                 | ~0.05s / Page                 |
|**docTR (GPU) float16** - v0.8.1     | **~0.06s / Page**             | **~0.03s / Page**             |
|OnnxTR (GPU) - v0.6.0                | **~0.06s / Page**             | ~0.04s / Page                 |
|**OnnxTR (GPU) float16 - v0.6.0**    | **~0.05s / Page**             | **~0.03s / Page**             |
|EasyOCR (GPU) - v1.7.1               | ~0.31s / Page                 | ~0.19s / Page                 |
|Surya (GPU) float16 - v0.4.4         | ~3.70s / Page                 | ~2.81s / Page                 |
|**PaddleOCR (GPU) - no cls - v2.7.3**| ~0.08s / Page                 | **~0.03s / Page**             |

## Citation

If you wish to cite please refer to the base project citation, feel free to use this [BibTeX](http://www.bibtex.org/) reference:

```bibtex
@misc{doctr2021,
    title={docTR: Document Text Recognition},
    author={Mindee},
    year={2021},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/mindee/doctr}}
}
```

```bibtex
@misc{onnxtr2024,
    title={OnnxTR: Optical Character Recognition made seamless & accessible to anyone, powered by Onnx},
    author={Felix Dittrich},
    year={2024},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/felixdittrich92/OnnxTR}}
}
```

## License

Distributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/felixdittrich92/OnnxTR?tab=Apache-2.0-1-ov-file#readme) for more information.


================================================
FILE: demo/README.md
================================================
---
title: OnnxTR OCR
emoji: 🔥
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
license: apache-2.0
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

## Run the demo locally

```bash
cd demo
pip install -r requirements.txt
python3 app.py
```


================================================
FILE: demo/app.py
================================================
import io
import os
from typing import Any

# NOTE: This is a fix to run the demo on the HuggingFace Zero GPU or CPU spaces
if os.environ.get("SPACES_ZERO_GPU") is not None:
    import spaces
else:

    class spaces:  # noqa: N801
        @staticmethod
        def GPU(func):  # noqa: N802
            def wrapper(*args, **kwargs):
                return func(*args, **kwargs)

            return wrapper


import cv2
import gradio as gr
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.figure import Figure
from PIL import Image

from onnxtr.io import DocumentFile
from onnxtr.models import EngineConfig, from_hub, ocr_predictor
from onnxtr.models.predictor import OCRPredictor
from onnxtr.utils.visualization import visualize_page

DET_ARCHS: list[str] = [
    "fast_base",
    "fast_small",
    "fast_tiny",
    "db_resnet50",
    "db_resnet34",
    "db_mobilenet_v3_large",
    "linknet_resnet18",
    "linknet_resnet34",
    "linknet_resnet50",
]
RECO_ARCHS: list[str] = [
    "crnn_vgg16_bn",
    "crnn_mobilenet_v3_small",
    "crnn_mobilenet_v3_large",
    "master",
    "sar_resnet31",
    "vitstr_small",
    "vitstr_base",
    "parseq",
    "viptr_tiny",
]

CUSTOM_RECO_ARCHS: list[str] = [
    "Felix92/onnxtr-parseq-multilingual-v1",
]


def load_predictor(
    det_arch: str,
    reco_arch: str,
    use_gpu: bool,
    assume_straight_pages: bool,
    straighten_pages: bool,
    export_as_straight_boxes: bool,
    detect_language: bool,
    load_in_8_bit: bool,
    bin_thresh: float,
    box_thresh: float,
    disable_crop_orientation: bool = False,
    disable_page_orientation: bool = False,
) -> OCRPredictor:
    """Load a predictor from doctr.models

    Args:
    ----
        det_arch: detection architecture
        reco_arch: recognition architecture
        use_gpu: whether to use the GPU or not
        assume_straight_pages: whether to assume straight pages or not
        disable_crop_orientation: whether to disable crop orientation or not
        disable_page_orientation: whether to disable page orientation or not
        straighten_pages: whether to straighten rotated pages or not
        export_as_straight_boxes: whether to export straight boxes
        detect_language: whether to detect the language of the text
        load_in_8_bit: whether to load the image in 8 bit mode
        bin_thresh: binarization threshold for the segmentation map
        box_thresh: minimal objectness score to consider a box

    Returns:
    -------
        instance of OCRPredictor
    """
    engine_cfg = (
        EngineConfig()
        if use_gpu
        else EngineConfig(providers=[("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})])
    )
    predictor = ocr_predictor(
        det_arch=det_arch,
        reco_arch=reco_arch if reco_arch not in CUSTOM_RECO_ARCHS else from_hub(reco_arch),
        assume_straight_pages=assume_straight_pages,
        straighten_pages=straighten_pages,
        detect_language=detect_language,
        load_in_8_bit=load_in_8_bit,
        export_as_straight_boxes=export_as_straight_boxes,
        detect_orientation=not assume_straight_pages,
        disable_crop_orientation=disable_crop_orientation,
        disable_page_orientation=disable_page_orientation,
        det_engine_cfg=engine_cfg,
        reco_engine_cfg=engine_cfg,
        clf_engine_cfg=engine_cfg,
    )
    predictor.det_predictor.model.postprocessor.bin_thresh = bin_thresh
    predictor.det_predictor.model.postprocessor.box_thresh = box_thresh
    return predictor


def forward_image(predictor: OCRPredictor, image: np.ndarray) -> np.ndarray:
    """Forward an image through the predictor

    Args:
    ----
        predictor: instance of OCRPredictor
        image: image to process

    Returns:
    -------
        segmentation map
    """
    processed_batches = predictor.det_predictor.pre_processor([image])
    out = predictor.det_predictor.model(processed_batches[0], return_model_output=True)
    seg_map = out["out_map"]

    return seg_map


def matplotlib_to_pil(fig: Figure | np.ndarray) -> Image.Image:
    """Convert a matplotlib figure to a PIL image

    Args:
    ----
        fig: matplotlib figure or numpy array

    Returns:
    -------
        PIL image
    """
    buf = io.BytesIO()
    if isinstance(fig, Figure):
        fig.savefig(buf)
    else:
        plt.imsave(buf, fig)
    buf.seek(0)
    return Image.open(buf)


@spaces.GPU
def analyze_page(
    uploaded_file: Any,
    page_idx: int,
    det_arch: str,
    reco_arch: str,
    use_gpu: bool,
    assume_straight_pages: bool,
    disable_crop_orientation: bool,
    disable_page_orientation: bool,
    straighten_pages: bool,
    export_as_straight_boxes: bool,
    detect_language: bool,
    load_in_8_bit: bool,
    bin_thresh: float,
    box_thresh: float,
):
    """Analyze a page

    Args:
    ----
        uploaded_file: file to analyze
        page_idx: index of the page to analyze
        det_arch: detection architecture
        reco_arch: recognition architecture
        use_gpu: whether to use the GPU or not
        assume_straight_pages: whether to assume straight pages or not
        disable_crop_orientation: whether to disable crop orientation or not
        disable_page_orientation: whether to disable page orientation or not
        straighten_pages: whether to straighten rotated pages or not
        export_as_straight_boxes: whether to export straight boxes
        detect_language: whether to detect the language of the text
        load_in_8_bit: whether to load the image in 8 bit mode
        bin_thresh: binarization threshold for the segmentation map
        box_thresh: minimal objectness score to consider a box

    Returns:
    -------
        input image, segmentation heatmap, output image, OCR output, synthesized page
    """
    if uploaded_file is None:
        return None, "Please upload a document", None, None, None

    if uploaded_file.name.endswith(".pdf"):
        doc = DocumentFile.from_pdf(uploaded_file)
    else:
        doc = DocumentFile.from_images(uploaded_file)
    try:
        page = doc[page_idx - 1]
    except IndexError:
        page = doc[-1]

    img = page

    predictor = load_predictor(
        det_arch=det_arch,
        reco_arch=reco_arch,
        use_gpu=use_gpu,
        assume_straight_pages=assume_straight_pages,
        straighten_pages=straighten_pages,
        export_as_straight_boxes=export_as_straight_boxes,
        detect_language=detect_language,
        load_in_8_bit=load_in_8_bit,
        bin_thresh=bin_thresh,
        box_thresh=box_thresh,
        disable_crop_orientation=disable_crop_orientation,
        disable_page_orientation=disable_page_orientation,
    )

    seg_map = forward_image(predictor, page)
    seg_map = np.squeeze(seg_map)
    seg_map = cv2.resize(seg_map, (img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR)
    seg_heatmap = matplotlib_to_pil(seg_map)

    out = predictor([page])

    page_export = out.pages[0].export()
    fig = visualize_page(out.pages[0].export(), out.pages[0].page, interactive=False, add_labels=False)

    out_img = matplotlib_to_pil(fig)

    if assume_straight_pages or (not assume_straight_pages and straighten_pages):
        synthesized_page = out.pages[0].synthesize()
    else:
        synthesized_page = None

    return img, seg_heatmap, out_img, page_export, synthesized_page


with gr.Blocks(fill_height=True) as demo:
    gr.HTML(
        """
        <div style="text-align: center;">
            <p style="display: flex; justify-content: center;">
                <img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="15%">
            </p>

            <h1>OnnxTR OCR Demo</h1>

            <p style="display: flex; justify-content: center; gap: 10px;">
                <a href="https://github.com/felixdittrich92/OnnxTR" target="_blank">
                    <img src="https://img.shields.io/badge/GitHub-blue?logo=github" alt="GitHub OnnxTR">
                </a>
                <a href="https://pypi.org/project/onnxtr/" target="_blank">
                    <img src="https://img.shields.io/pypi/v/onnxtr?color=blue" alt="PyPI">
                </a>
            </p>
        </div>
        <h2>To use this interactive demo for OnnxTR:</h2>
        <h3> 1. Upload a document (PDF, JPG, or PNG)</h3>
        <h3> 2. Select the model architectures for text detection and recognition you want to use</h3>
        <h3> 3. Press the "Analyze page" button to process the uploaded document</h3>
        """
    )
    with gr.Row():
        with gr.Column(scale=1):
            upload = gr.File(label="Upload File [JPG | PNG | PDF]", file_types=[".pdf", ".jpg", ".png"])
            page_selection = gr.Slider(minimum=1, maximum=10, step=1, value=1, label="Page selection")
            det_model = gr.Dropdown(choices=DET_ARCHS, value=DET_ARCHS[0], label="Text detection model")
            reco_model = gr.Dropdown(
                choices=RECO_ARCHS + CUSTOM_RECO_ARCHS, value=RECO_ARCHS[0], label="Text recognition model"
            )
            use_gpu = gr.Checkbox(value=True, label="Use GPU")
            assume_straight = gr.Checkbox(value=True, label="Assume straight pages")
            disable_crop_orientation = gr.Checkbox(value=False, label="Disable crop orientation")
            disable_page_orientation = gr.Checkbox(value=False, label="Disable page orientation")
            straighten = gr.Checkbox(value=False, label="Straighten pages")
            export_as_straight_boxes = gr.Checkbox(value=False, label="Export as straight boxes")
            det_language = gr.Checkbox(value=False, label="Detect language")
            load_in_8_bit = gr.Checkbox(value=False, label="Load 8-bit quantized models")
            binarization_threshold = gr.Slider(
                minimum=0.1, maximum=0.9, value=0.3, step=0.1, label="Binarization threshold"
            )
            box_threshold = gr.Slider(minimum=0.1, maximum=0.9, value=0.1, step=0.1, label="Box threshold")
            analyze_button = gr.Button("Analyze page")
        with gr.Column(scale=3):
            with gr.Row():
                input_image = gr.Image(label="Input page", width=700, height=500)
                segmentation_heatmap = gr.Image(label="Segmentation heatmap", width=700, height=500)
                output_image = gr.Image(label="Output page", width=700, height=500)
            with gr.Row():
                with gr.Column(scale=3):
                    ocr_output = gr.JSON(label="OCR output", render=True, scale=1, height=500)
                with gr.Column(scale=3):
                    synthesized_page = gr.Image(label="Synthesized page", width=700, height=500)

    analyze_button.click(
        analyze_page,
        inputs=[
            upload,
            page_selection,
            det_model,
            reco_model,
            use_gpu,
            assume_straight,
            disable_crop_orientation,
            disable_page_orientation,
            straighten,
            export_as_straight_boxes,
            det_language,
            load_in_8_bit,
            binarization_threshold,
            box_threshold,
        ],
        outputs=[input_image, segmentation_heatmap, output_image, ocr_output, synthesized_page],
    )

demo.launch(inbrowser=True, allowed_paths=["./data/logo.jpg"])


================================================
FILE: demo/packages.txt
================================================
python3-opencv
fonts-freefont-ttf


================================================
FILE: demo/requirements.txt
================================================
-e "onnxtr[gpu-headless,viz] @ git+https://github.com/felixdittrich92/OnnxTR.git"
gradio>=5.30.0,<7.0.0
spaces>=0.37.0

# Quick fix to avoid HuggingFace Spaces cudnn9.x Cuda12.x issue
# NOTE: outdated
# onnxruntime-gpu==1.19.0


================================================
FILE: onnxtr/__init__.py
================================================
from . import io, models, contrib, transforms, utils
from .version import __version__  # noqa: F401


================================================
FILE: onnxtr/contrib/__init__.py
================================================
from .artefacts import ArtefactDetector

================================================
FILE: onnxtr/contrib/artefacts.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import cv2
import numpy as np

from onnxtr.file_utils import requires_package

from .base import _BasePredictor

__all__ = ["ArtefactDetector"]

default_cfgs: dict[str, dict[str, Any]] = {
    "yolov8_artefact": {
        "input_shape": (3, 1024, 1024),
        "labels": ["bar_code", "qr_code", "logo", "photo"],
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/yolo_artefact-f9d66f14.onnx",
    },
}


class ArtefactDetector(_BasePredictor):
    """
    A class to detect artefacts in images

    >>> from onnxtr.io import DocumentFile
    >>> from onnxtr.contrib.artefacts import ArtefactDetector
    >>> doc = DocumentFile.from_images(["path/to/image.jpg"])
    >>> detector = ArtefactDetector()
    >>> results = detector(doc)

    Args:
        arch: the architecture to use
        batch_size: the batch size to use
        model_path: the path to the model to use
        labels: the labels to use
        input_shape: the input shape to use
        mask_labels: the mask labels to use
        conf_threshold: the confidence threshold to use
        iou_threshold: the intersection over union threshold to use
        **kwargs: additional arguments to be passed to `download_from_url`
    """

    def __init__(
        self,
        arch: str = "yolov8_artefact",
        batch_size: int = 2,
        model_path: str | None = None,
        labels: list[str] | None = None,
        input_shape: tuple[int, int, int] | None = None,
        conf_threshold: float = 0.5,
        iou_threshold: float = 0.5,
        **kwargs: Any,
    ) -> None:
        super().__init__(batch_size=batch_size, url=default_cfgs[arch]["url"], model_path=model_path, **kwargs)
        self.labels = labels or default_cfgs[arch]["labels"]
        self.input_shape = input_shape or default_cfgs[arch]["input_shape"]
        self.conf_threshold = conf_threshold
        self.iou_threshold = iou_threshold

    def preprocess(self, img: np.ndarray) -> np.ndarray:
        return np.transpose(cv2.resize(img, (self.input_shape[2], self.input_shape[1])), (2, 0, 1)) / np.array(255.0)

    def postprocess(self, output: list[np.ndarray], input_images: list[list[np.ndarray]]) -> list[list[dict[str, Any]]]:
        results = []

        for batch in zip(output, input_images):
            for out, img in zip(batch[0], batch[1]):
                org_height, org_width = img.shape[:2]
                width_scale, height_scale = org_width / self.input_shape[2], org_height / self.input_shape[1]
                for res in out:
                    sample_results = []
                    for row in np.transpose(np.squeeze(res)):
                        classes_scores = row[4:]
                        max_score = np.amax(classes_scores)
                        if max_score >= self.conf_threshold:
                            class_id = np.argmax(classes_scores)
                            x, y, w, h = row[0], row[1], row[2], row[3]
                            # to rescaled xmin, ymin, xmax, ymax
                            xmin = int((x - w / 2) * width_scale)
                            ymin = int((y - h / 2) * height_scale)
                            xmax = int((x + w / 2) * width_scale)
                            ymax = int((y + h / 2) * height_scale)

                            sample_results.append({
                                "label": self.labels[class_id],
                                "confidence": float(max_score),
                                "box": [xmin, ymin, xmax, ymax],
                            })

                    # Filter out overlapping boxes
                    boxes = [res["box"] for res in sample_results]
                    scores = [res["confidence"] for res in sample_results]
                    keep_indices = cv2.dnn.NMSBoxes(boxes, scores, self.conf_threshold, self.iou_threshold)  # type: ignore[arg-type]
                    sample_results = [sample_results[i] for i in keep_indices]

                    results.append(sample_results)

        self._results = results
        return results

    def show(self, **kwargs: Any) -> None:
        """
        Display the results

        Args:
            **kwargs: additional keyword arguments to be passed to `plt.show`
        """
        requires_package("matplotlib", "`.show()` requires matplotlib installed")
        import matplotlib.pyplot as plt
        from matplotlib.patches import Rectangle

        # visualize the results with matplotlib
        if self._results and self._inputs:
            for img, res in zip(self._inputs, self._results):
                plt.figure(figsize=(10, 10))
                plt.imshow(img)
                for obj in res:
                    xmin, ymin, xmax, ymax = obj["box"]
                    label = obj["label"]
                    plt.text(xmin, ymin, f"{label} {obj['confidence']:.2f}", color="red")
                    plt.gca().add_patch(
                        Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, fill=False, edgecolor="red", linewidth=2)
                    )
                plt.show(**kwargs)


================================================
FILE: onnxtr/contrib/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np
import onnxruntime as ort

from onnxtr.utils.data import download_from_url


class _BasePredictor:
    """
    Base class for all predictors

    Args:
        batch_size: the batch size to use
        url: the url to use to download a model if needed
        model_path: the path to the model to use
        **kwargs: additional arguments to be passed to `download_from_url`
    """

    def __init__(self, batch_size: int, url: str | None = None, model_path: str | None = None, **kwargs) -> None:
        self.batch_size = batch_size
        self.session = self._init_model(url, model_path, **kwargs)

        self._inputs: list[np.ndarray] = []
        self._results: list[Any] = []

    def _init_model(self, url: str | None = None, model_path: str | None = None, **kwargs: Any) -> Any:
        """
        Download the model from the given url if needed

        Args:
            url: the url to use
            model_path: the path to the model to use
            **kwargs: additional arguments to be passed to `download_from_url`

        Returns:
            Any: the ONNX loaded model
        """
        if not url and not model_path:
            raise ValueError("You must provide either a url or a model_path")
        onnx_model_path = model_path if model_path else str(download_from_url(url, cache_subdir="models", **kwargs))  # type: ignore[arg-type]
        return ort.InferenceSession(onnx_model_path, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

    def preprocess(self, img: np.ndarray) -> np.ndarray:
        """
        Preprocess the input image

        Args:
            img: the input image to preprocess

        Returns:
            np.ndarray: the preprocessed image
        """
        raise NotImplementedError

    def postprocess(self, output: list[np.ndarray], input_images: list[list[np.ndarray]]) -> Any:
        """
        Postprocess the model output

        Args:
            output: the model output to postprocess
            input_images: the input images used to generate the output

        Returns:
            Any: the postprocessed output
        """
        raise NotImplementedError

    def __call__(self, inputs: list[np.ndarray]) -> Any:
        """
        Call the model on the given inputs

        Args:
            inputs: the inputs to use

        Returns:
            Any: the postprocessed output
        """
        self._inputs = inputs
        model_inputs = self.session.get_inputs()

        batched_inputs = [inputs[i : i + self.batch_size] for i in range(0, len(inputs), self.batch_size)]
        processed_batches = [
            np.array([self.preprocess(img) for img in batch], dtype=np.float32) for batch in batched_inputs
        ]

        outputs = [self.session.run(None, {model_inputs[0].name: batch}) for batch in processed_batches]
        return self.postprocess(outputs, batched_inputs)


================================================
FILE: onnxtr/file_utils.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

import importlib.metadata
import logging

__all__ = ["requires_package"]

ENV_VARS_TRUE_VALUES = {"1", "ON", "YES", "TRUE"}
ENV_VARS_TRUE_AND_AUTO_VALUES = ENV_VARS_TRUE_VALUES.union({"AUTO"})


def requires_package(name: str, extra_message: str | None = None) -> None:  # pragma: no cover
    """
    package requirement helper

    Args:
        name: name of the package
        extra_message: additional message to display if the package is not found
    """
    try:
        _pkg_version = importlib.metadata.version(name)
        logging.info(f"{name} version {_pkg_version} available.")
    except importlib.metadata.PackageNotFoundError:
        raise ImportError(
            f"\n\n{extra_message if extra_message is not None else ''} "
            f"\nPlease install it with the following command: pip install {name}\n"
        )


================================================
FILE: onnxtr/io/__init__.py
================================================
from .elements import *
from .html import *
from .image import *
from .pdf import *
from .reader import *


================================================
FILE: onnxtr/io/elements.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

from defusedxml import defuse_stdlib

defuse_stdlib()
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element as ETElement
from xml.etree.ElementTree import SubElement

import numpy as np

import onnxtr
from onnxtr.file_utils import requires_package
from onnxtr.utils.common_types import BoundingBox
from onnxtr.utils.geometry import resolve_enclosing_bbox, resolve_enclosing_rbbox
from onnxtr.utils.reconstitution import synthesize_page
from onnxtr.utils.repr import NestedObject

try:  # optional dependency for visualization
    from onnxtr.utils.visualization import visualize_page
except ModuleNotFoundError:  # pragma: no cover
    pass

__all__ = ["Element", "Word", "Artefact", "Line", "Block", "Page", "Document"]


class Element(NestedObject):
    """Implements an abstract document element with exporting and text rendering capabilities"""

    _children_names: list[str] = []
    _exported_keys: list[str] = []

    def __init__(self, **kwargs: Any) -> None:
        for k, v in kwargs.items():
            if k in self._children_names:
                setattr(self, k, v)
            else:
                raise KeyError(f"{self.__class__.__name__} object does not have any attribute named '{k}'")

    def export(self) -> dict[str, Any]:
        """Exports the object into a nested dict format"""
        export_dict = {k: getattr(self, k) for k in self._exported_keys}
        for children_name in self._children_names:
            export_dict[children_name] = [c.export() for c in getattr(self, children_name)]

        return export_dict

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        raise NotImplementedError

    def render(self) -> str:
        raise NotImplementedError


class Word(Element):
    """Implements a word element

    Args:
        value: the text string of the word
        confidence: the confidence associated with the text prediction
        geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
        the page's size
        objectness_score: the objectness score of the detection
        crop_orientation: the general orientation of the crop in degrees and its confidence
    """

    _exported_keys: list[str] = ["value", "confidence", "geometry", "objectness_score", "crop_orientation"]
    _children_names: list[str] = []

    def __init__(
        self,
        value: str,
        confidence: float,
        geometry: BoundingBox | np.ndarray,
        objectness_score: float,
        crop_orientation: dict[str, Any],
    ) -> None:
        super().__init__()
        self.value = value
        self.confidence = confidence
        self.geometry = geometry
        self.objectness_score = objectness_score
        self.crop_orientation = crop_orientation

    def render(self) -> str:
        """Renders the full text of the element"""
        return self.value

    def extra_repr(self) -> str:
        return f"value='{self.value}', confidence={self.confidence:.2}"

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        return cls(**kwargs)


class Artefact(Element):
    """Implements a non-textual element

    Args:
        artefact_type: the type of artefact
        confidence: the confidence of the type prediction
        geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
            the page's size.
    """

    _exported_keys: list[str] = ["geometry", "type", "confidence"]
    _children_names: list[str] = []

    def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
        super().__init__()
        self.geometry = geometry
        self.type = artefact_type
        self.confidence = confidence

    def render(self) -> str:
        """Renders the full text of the element"""
        return f"[{self.type.upper()}]"

    def extra_repr(self) -> str:
        return f"type='{self.type}', confidence={self.confidence:.2}"

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        return cls(**kwargs)


class Line(Element):
    """Implements a line element as a collection of words

    Args:
        words: list of word elements
        geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
            the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
            all words in it.
    """

    _exported_keys: list[str] = ["geometry", "objectness_score"]
    _children_names: list[str] = ["words"]
    words: list[Word] = []

    def __init__(
        self,
        words: list[Word],
        geometry: BoundingBox | np.ndarray | None = None,
        objectness_score: float | None = None,
    ) -> None:
        # Compute the objectness score of the line
        if objectness_score is None:
            objectness_score = float(np.mean([w.objectness_score for w in words]))
        # Resolve the geometry using the smallest enclosing bounding box
        if geometry is None:
            # Check whether this is a rotated or straight box
            box_resolution_fn = resolve_enclosing_rbbox if len(words[0].geometry) == 4 else resolve_enclosing_bbox
            geometry = box_resolution_fn([w.geometry for w in words])  # type: ignore[misc]

        super().__init__(words=words)
        self.geometry = geometry
        self.objectness_score = objectness_score

    def render(self) -> str:
        """Renders the full text of the element"""
        return " ".join(w.render() for w in self.words)

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        kwargs.update({
            "words": [Word.from_dict(_dict) for _dict in save_dict["words"]],
        })
        return cls(**kwargs)


class Block(Element):
    """Implements a block element as a collection of lines and artefacts

    Args:
        lines: list of line elements
        artefacts: list of artefacts
        geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
            the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
            all lines and artefacts in it.
    """

    _exported_keys: list[str] = ["geometry", "objectness_score"]
    _children_names: list[str] = ["lines", "artefacts"]
    lines: list[Line] = []
    artefacts: list[Artefact] = []

    def __init__(
        self,
        lines: list[Line] = [],
        artefacts: list[Artefact] = [],
        geometry: BoundingBox | np.ndarray | None = None,
        objectness_score: float | None = None,
    ) -> None:
        # Compute the objectness score of the line
        if objectness_score is None:
            objectness_score = float(np.mean([w.objectness_score for line in lines for w in line.words]))
        # Resolve the geometry using the smallest enclosing bounding box
        if geometry is None:
            line_boxes = [word.geometry for line in lines for word in line.words]
            artefact_boxes = [artefact.geometry for artefact in artefacts]
            box_resolution_fn = (
                resolve_enclosing_rbbox if isinstance(lines[0].geometry, np.ndarray) else resolve_enclosing_bbox
            )
            geometry = box_resolution_fn(line_boxes + artefact_boxes)  # type: ignore

        super().__init__(lines=lines, artefacts=artefacts)
        self.geometry = geometry
        self.objectness_score = objectness_score

    def render(self, line_break: str = "\n") -> str:
        """Renders the full text of the element"""
        return line_break.join(line.render() for line in self.lines)

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        kwargs.update({
            "lines": [Line.from_dict(_dict) for _dict in save_dict["lines"]],
            "artefacts": [Artefact.from_dict(_dict) for _dict in save_dict["artefacts"]],
        })
        return cls(**kwargs)


class Page(Element):
    """Implements a page element as a collection of blocks

    Args:
        page: image encoded as a numpy array in uint8
        blocks: list of block elements
        page_idx: the index of the page in the input raw document
        dimensions: the page size in pixels in format (height, width)
        orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
        language: a dictionary with the language value and confidence of the prediction
    """

    _exported_keys: list[str] = ["page_idx", "dimensions", "orientation", "language"]
    _children_names: list[str] = ["blocks"]
    blocks: list[Block] = []

    def __init__(
        self,
        page: np.ndarray,
        blocks: list[Block],
        page_idx: int,
        dimensions: tuple[int, int],
        orientation: dict[str, Any] | None = None,
        language: dict[str, Any] | None = None,
    ) -> None:
        super().__init__(blocks=blocks)
        self.page = page
        self.page_idx = page_idx
        self.dimensions = dimensions
        self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
        self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)

    def render(self, block_break: str = "\n\n") -> str:
        """Renders the full text of the element"""
        return block_break.join(b.render() for b in self.blocks)

    def extra_repr(self) -> str:
        return f"dimensions={self.dimensions}"

    def show(self, interactive: bool = True, preserve_aspect_ratio: bool = False, **kwargs) -> None:
        """Overlay the result on a given image

        Args:
            interactive: whether the display should be interactive
            preserve_aspect_ratio: pass True if you passed True to the predictor
            **kwargs: additional keyword arguments passed to the matplotlib.pyplot.show method
        """
        requires_package("matplotlib", "`.show()` requires matplotlib & mplcursors installed")
        requires_package("mplcursors", "`.show()` requires matplotlib & mplcursors installed")
        import matplotlib.pyplot as plt

        visualize_page(self.export(), self.page, interactive=interactive, preserve_aspect_ratio=preserve_aspect_ratio)
        plt.show(**kwargs)

    def synthesize(self, **kwargs) -> np.ndarray:
        """Synthesize the page from the predictions

        Args:
            **kwargs: keyword arguments passed to the `synthesize_page` method

        Returns
            synthesized page
        """
        return synthesize_page(self.export(), **kwargs)

    def export_as_xml(self, file_title: str = "OnnxTR - XML export (hOCR)") -> tuple[bytes, ET.ElementTree]:
        """Export the page as XML (hOCR-format)
        convention: https://github.com/kba/hocr-spec/blob/master/1.2/spec.md

        Args:
            file_title: the title of the XML file

        Returns:
            a tuple of the XML byte string, and its ElementTree
        """
        p_idx = self.page_idx
        block_count: int = 1
        line_count: int = 1
        word_count: int = 1
        height, width = self.dimensions
        language = self.language if "language" in self.language.keys() else "en"
        # Create the XML root element
        page_hocr = ETElement("html", attrib={"xmlns": "http://www.w3.org/1999/xhtml", "xml:lang": str(language)})
        # Create the header / SubElements of the root element
        head = SubElement(page_hocr, "head")
        SubElement(head, "title").text = file_title
        SubElement(head, "meta", attrib={"http-equiv": "Content-Type", "content": "text/html; charset=utf-8"})
        SubElement(
            head,
            "meta",
            attrib={"name": "ocr-system", "content": f"onnxtr {onnxtr.__version__}"},  # type: ignore[attr-defined]
        )
        SubElement(
            head,
            "meta",
            attrib={"name": "ocr-capabilities", "content": "ocr_page ocr_carea ocr_par ocr_line ocrx_word"},
        )
        # Create the body
        body = SubElement(page_hocr, "body")
        page_div = SubElement(
            body,
            "div",
            attrib={
                "class": "ocr_page",
                "id": f"page_{p_idx + 1}",
                "title": f"image; bbox 0 0 {width} {height}; ppageno 0",
            },
        )
        # iterate over the blocks / lines / words and create the XML elements in body line by line with the attributes
        for block in self.blocks:
            if len(block.geometry) != 2:
                raise TypeError("XML export is only available for straight bounding boxes for now.")
            (xmin, ymin), (xmax, ymax) = block.geometry
            block_div = SubElement(
                page_div,
                "div",
                attrib={
                    "class": "ocr_carea",
                    "id": f"block_{block_count}",
                    "title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
                    {int(round(xmax * width))} {int(round(ymax * height))}",
                },
            )
            paragraph = SubElement(
                block_div,
                "p",
                attrib={
                    "class": "ocr_par",
                    "id": f"par_{block_count}",
                    "title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
                    {int(round(xmax * width))} {int(round(ymax * height))}",
                },
            )
            block_count += 1
            for line in block.lines:
                (xmin, ymin), (xmax, ymax) = line.geometry
                # NOTE: baseline, x_size, x_descenders, x_ascenders is currently initalized to 0
                line_span = SubElement(
                    paragraph,
                    "span",
                    attrib={
                        "class": "ocr_line",
                        "id": f"line_{line_count}",
                        "title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
                        {int(round(xmax * width))} {int(round(ymax * height))}; \
                        baseline 0 0; x_size 0; x_descenders 0; x_ascenders 0",
                    },
                )
                line_count += 1
                for word in line.words:
                    (xmin, ymin), (xmax, ymax) = word.geometry
                    conf = word.confidence
                    word_div = SubElement(
                        line_span,
                        "span",
                        attrib={
                            "class": "ocrx_word",
                            "id": f"word_{word_count}",
                            "title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
                            {int(round(xmax * width))} {int(round(ymax * height))}; \
                            x_wconf {int(round(conf * 100))}",
                        },
                    )
                    # set the text
                    word_div.text = word.value
                    word_count += 1

        return (ET.tostring(page_hocr, encoding="utf-8", method="xml"), ET.ElementTree(page_hocr))

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        kwargs.update({"blocks": [Block.from_dict(block_dict) for block_dict in save_dict["blocks"]]})
        return cls(**kwargs)


class Document(Element):
    """Implements a document element as a collection of pages

    Args:
        pages: list of page elements
    """

    _children_names: list[str] = ["pages"]
    pages: list[Page] = []

    def __init__(
        self,
        pages: list[Page],
    ) -> None:
        super().__init__(pages=pages)

    def render(self, page_break: str = "\n\n\n\n") -> str:
        """Renders the full text of the element"""
        return page_break.join(p.render() for p in self.pages)

    def show(self, **kwargs) -> None:
        """Overlay the result on a given image"""
        for result in self.pages:
            result.show(**kwargs)

    def synthesize(self, **kwargs) -> list[np.ndarray]:
        """Synthesize all pages from their predictions

        Args:
            **kwargs: keyword arguments passed to the `Page.synthesize` method

        Returns:
            list of synthesized pages
        """
        return [page.synthesize(**kwargs) for page in self.pages]

    def export_as_xml(self, **kwargs) -> list[tuple[bytes, ET.ElementTree]]:
        """Export the document as XML (hOCR-format)

        Args:
            **kwargs: additional keyword arguments passed to the Page.export_as_xml method

        Returns:
            list of tuple of (bytes, ElementTree)
        """
        return [page.export_as_xml(**kwargs) for page in self.pages]

    @classmethod
    def from_dict(cls, save_dict: dict[str, Any], **kwargs):
        kwargs = {k: save_dict[k] for k in cls._exported_keys}
        kwargs.update({"pages": [Page.from_dict(page_dict) for page_dict in save_dict["pages"]]})
        return cls(**kwargs)


================================================
FILE: onnxtr/io/html.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

__all__ = ["read_html"]


def read_html(url: str, **kwargs: Any) -> bytes:
    """Read a PDF file and convert it into an image in numpy format

    >>> from onnxtr.io import read_html
    >>> doc = read_html("https://www.yoursite.com")

    Args:
        url: URL of the target web page
        **kwargs: keyword arguments from `weasyprint.HTML`

    Returns:
        decoded PDF file as a bytes stream
    """
    from weasyprint import HTML

    return HTML(url, **kwargs).write_pdf()


================================================
FILE: onnxtr/io/image.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from pathlib import Path

import cv2
import numpy as np

from onnxtr.utils.common_types import AbstractFile

__all__ = ["read_img_as_numpy"]


def read_img_as_numpy(
    file: AbstractFile,
    output_size: tuple[int, int] | None = None,
    rgb_output: bool = True,
) -> np.ndarray:
    """Read an image file into numpy format

    >>> from onnxtr.io import read_img_as_numpy
    >>> page = read_img_as_numpy("path/to/your/doc.jpg")

    Args:
        file: the path to the image file
        output_size: the expected output size of each page in format H x W
        rgb_output: whether the output ndarray channel order should be RGB instead of BGR.

    Returns:
        the page decoded as numpy ndarray of shape H x W x 3
    """
    if isinstance(file, (str, Path)):
        if not Path(file).is_file():
            raise FileNotFoundError(f"unable to access {file}")
        img = cv2.imread(str(file), cv2.IMREAD_COLOR)
    elif isinstance(file, bytes):
        _file: np.ndarray = np.frombuffer(file, np.uint8)
        img = cv2.imdecode(_file, cv2.IMREAD_COLOR)
    else:
        raise TypeError("unsupported object type for argument 'file'")

    # Validity check
    if img is None:
        raise ValueError("unable to read file.")
    # Resizing
    if isinstance(output_size, tuple):
        img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
    # Switch the channel order
    if rgb_output:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return img


================================================
FILE: onnxtr/io/pdf.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np
import pypdfium2 as pdfium

from onnxtr.utils.common_types import AbstractFile

__all__ = ["read_pdf"]


def read_pdf(
    file: AbstractFile,
    scale: int = 2,
    rgb_mode: bool = True,
    password: str | None = None,
    **kwargs: Any,
) -> list[np.ndarray]:
    """Read a PDF file and convert it into an image in numpy format

    >>> from onnxtr.io import read_pdf
    >>> doc = read_pdf("path/to/your/doc.pdf")

    Args:
        file: the path to the PDF file
        scale: rendering scale (1 corresponds to 72dpi)
        rgb_mode: if True, the output will be RGB, otherwise BGR
        password: a password to unlock the document, if encrypted
        **kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`

    Returns:
        the list of pages decoded as numpy ndarray of shape H x W x C
    """
    # Rasterise pages to numpy ndarrays with pypdfium2
    pdf = pdfium.PdfDocument(file, password=password)
    try:
        return [page.render(scale=scale, rev_byteorder=rgb_mode, **kwargs).to_numpy() for page in pdf]
    finally:
        pdf.close()


================================================
FILE: onnxtr/io/reader.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from collections.abc import Sequence
from pathlib import Path

import numpy as np

from onnxtr.file_utils import requires_package
from onnxtr.utils.common_types import AbstractFile

from .html import read_html
from .image import read_img_as_numpy
from .pdf import read_pdf

__all__ = ["DocumentFile"]


class DocumentFile:
    """Read a document from multiple extensions"""

    @classmethod
    def from_pdf(cls, file: AbstractFile, **kwargs) -> list[np.ndarray]:
        """Read a PDF file

        >>> from onnxtr.io import DocumentFile
        >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")

        Args:
            file: the path to the PDF file or a binary stream
            **kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`

        Returns:
            the list of pages decoded as numpy ndarray of shape H x W x 3
        """
        return read_pdf(file, **kwargs)

    @classmethod
    def from_url(cls, url: str, **kwargs) -> list[np.ndarray]:
        """Interpret a web page as a PDF document

        >>> from onnxtr.io import DocumentFile
        >>> doc = DocumentFile.from_url("https://www.yoursite.com")

        Args:
            url: the URL of the target web page
            **kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`

        Returns:
            the list of pages decoded as numpy ndarray of shape H x W x 3
        """
        requires_package(
            "weasyprint",
            "`.from_url` requires weasyprint installed.\n"
            + "Installation instructions: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation",
        )
        pdf_stream = read_html(url)
        return cls.from_pdf(pdf_stream, **kwargs)

    @classmethod
    def from_images(cls, files: Sequence[AbstractFile] | AbstractFile, **kwargs) -> list[np.ndarray]:
        """Read an image file (or a collection of image files) and convert it into an image in numpy format

        >>> from onnxtr.io import DocumentFile
        >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])

        Args:
            files: the path to the image file or a binary stream, or a collection of those
            **kwargs: additional parameters to :meth:`onnxtr.io.image.read_img_as_numpy`

        Returns:
            the list of pages decoded as numpy ndarray of shape H x W x 3
        """
        if isinstance(files, (str, Path, bytes)):
            files = [files]

        return [read_img_as_numpy(file, **kwargs) for file in files]


================================================
FILE: onnxtr/models/__init__.py
================================================
from .engine import EngineConfig
from .classification import *
from .detection import *
from .recognition import *
from .zoo import *
from .factory import *


================================================
FILE: onnxtr/models/_utils.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from math import floor
from statistics import median_low

import cv2
import numpy as np
from langdetect import LangDetectException, detect_langs

from onnxtr.utils.geometry import rotate_image

__all__ = ["estimate_orientation", "get_language"]


def get_max_width_length_ratio(contour: np.ndarray) -> float:
    """Get the maximum shape ratio of a contour.

    Args:
        contour: the contour from cv2.findContour

    Returns:
        the maximum shape ratio
    """
    _, (w, h), _ = cv2.minAreaRect(contour)
    if w == 0 or h == 0:
        return 0.0
    return max(w / h, h / w)


def estimate_orientation(
    img: np.ndarray,
    general_page_orientation: tuple[int, float] | None = None,
    n_ct: int = 70,
    ratio_threshold_for_lines: float = 3,
    min_confidence: float = 0.2,
    lower_area: int = 100,
) -> int:
    """Estimate the angle of the general document orientation based on the
     lines of the document and the assumption that they should be horizontal.

    Args:
        img: the img or bitmap to analyze (H, W, C)
        general_page_orientation: the general orientation of the page (angle [0, 90, 180, 270 (-90)], confidence)
            estimated by a model
        n_ct: the number of contours used for the orientation estimation
        ratio_threshold_for_lines: this is the ratio w/h used to discriminates lines
        min_confidence: the minimum confidence to consider the general_page_orientation
        lower_area: the minimum area of a contour to be considered

    Returns:
        the estimated angle of the page (clockwise, negative for left side rotation, positive for right side rotation)
    """
    assert len(img.shape) == 3 and img.shape[-1] in [1, 3], f"Image shape {img.shape} not supported"

    # Convert image to grayscale if necessary
    if img.shape[-1] == 3:
        gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        gray_img = cv2.medianBlur(gray_img, 5)
        thresh = cv2.threshold(gray_img, thresh=0, maxval=255, type=cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    else:
        thresh = img.astype(np.uint8)

    page_orientation, orientation_confidence = general_page_orientation or (0, 0.0)
    is_confident = page_orientation is not None and orientation_confidence >= min_confidence
    base_angle = page_orientation if is_confident else 0

    if is_confident:
        # We rotate the image to the general orientation which improves the detection
        # No expand needed bitmap is already padded
        thresh = rotate_image(thresh, -base_angle)
    else:  # That's only required if we do not work on the detection models bin map
        # try to merge words in lines
        (h, w) = img.shape[:2]
        k_x = max(1, (floor(w / 100)))
        k_y = max(1, (floor(h / 100)))
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (k_x, k_y))
        thresh = cv2.dilate(thresh, kernel, iterations=1)

    # extract contours
    contours, _ = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

    # Filter & Sort contours
    contours = sorted(
        [contour for contour in contours if cv2.contourArea(contour) > lower_area],
        key=get_max_width_length_ratio,
        reverse=True,
    )

    angles = []
    for contour in contours[:n_ct]:
        _, (w, h), angle = cv2.minAreaRect(contour)

        # OpenCV version-proof normalization: force 'w' to be the long side
        # so the angle is consistently relative to the major axis.
        # https://github.com/opencv/opencv/pull/28051/changes
        if w < h:
            w, h = h, w
            angle -= 90

        # Normalize angle to be within [-90, 90]
        while angle <= -90:
            angle += 180
        while angle > 90:
            angle -= 180

        if h > 0:
            if w / h > ratio_threshold_for_lines:  # select only contours with ratio like lines
                angles.append(angle)
            elif w / h < 1 / ratio_threshold_for_lines:  # if lines are vertical, substract 90 degree
                angles.append(angle - 90)

    if len(angles) == 0:
        skew_angle = 0  # in case no angles is found
    else:
        # median_low picks a value from the data to avoid outliers
        median = -median_low(angles)
        skew_angle = -round(median) if abs(median) != 0 else 0

        # Resolve the 90-degree flip ambiguity.
        # If the estimation is exactly 90/-90, it's usually a vertical detection of horizontal lines.
        if abs(skew_angle) == 90:
            skew_angle = 0

    # combine with the general orientation and the estimated angle
    # Apply the detected skew to our base orientation
    final_angle = base_angle + skew_angle

    # Standardize result to [-179, 180] range to handle wrap-around cases (e.g., 180 + -31)
    while final_angle > 180:
        final_angle -= 360
    while final_angle <= -180:
        final_angle += 360

    if is_confident:
        # If the estimated angle is perpendicular, treat it as 0 to avoid wrong flips
        if abs(skew_angle) % 90 == 0:
            return page_orientation

        # special case where the estimated angle is mostly wrong:
        # case 1: - and + swapped
        # case 2: estimated angle is completely wrong
        # so in this case we prefer the general page orientation
        if abs(skew_angle) == abs(page_orientation) and page_orientation != 0:
            return page_orientation

    return int(
        final_angle
    )  # return the clockwise angle (negative - left side rotation, positive - right side rotation)


def rectify_crops(
    crops: list[np.ndarray],
    orientations: list[int],
) -> list[np.ndarray]:
    """Rotate each crop of the list according to the predicted orientation:
    0: already straight, no rotation
    1: 90 ccw, rotate 3 times ccw
    2: 180, rotate 2 times ccw
    3: 270 ccw, rotate 1 time ccw
    """
    # Inverse predictions (if angle of +90 is detected, rotate by -90)
    orientations = [4 - pred if pred != 0 else 0 for pred in orientations]
    return (
        [crop if orientation == 0 else np.rot90(crop, orientation) for orientation, crop in zip(orientations, crops)]
        if len(orientations) > 0
        else []
    )


def rectify_loc_preds(
    page_loc_preds: np.ndarray,
    orientations: list[int],
) -> np.ndarray | None:
    """Orient the quadrangle (Polygon4P) according to the predicted orientation,
    so that the points are in this order: top L, top R, bot R, bot L if the crop is readable
    """
    return (
        np.stack(
            [
                np.roll(page_loc_pred, orientation, axis=0)
                for orientation, page_loc_pred in zip(orientations, page_loc_preds)
            ],
            axis=0,
        )
        if len(orientations) > 0
        else None
    )


def get_language(text: str) -> tuple[str, float]:
    """Get languages of a text using langdetect model.
    Get the language with the highest probability or no language if only a few words or a low probability

    Args:
        text (str): text

    Returns:
        The detected language in ISO 639 code and confidence score
    """
    try:
        lang = detect_langs(text.lower())[0]
    except LangDetectException:
        return "unknown", 0.0
    if len(text) <= 1 or (len(text) <= 5 and lang.prob <= 0.2):
        return "unknown", 0.0
    return lang.lang, lang.prob


================================================
FILE: onnxtr/models/builder.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.


from typing import Any

import numpy as np
from scipy.cluster.hierarchy import fclusterdata

from onnxtr.io.elements import Block, Document, Line, Page, Word
from onnxtr.utils.geometry import estimate_page_angle, resolve_enclosing_bbox, resolve_enclosing_rbbox, rotate_boxes
from onnxtr.utils.repr import NestedObject

__all__ = ["DocumentBuilder"]


class DocumentBuilder(NestedObject):
    """Implements a document builder

    Args:
        resolve_lines: whether words should be automatically grouped into lines
        resolve_blocks: whether lines should be automatically grouped into blocks
        paragraph_break: relative length of the minimum space separating paragraphs
        export_as_straight_boxes: if True, force straight boxes in the export (fit a rectangle
            box to all rotated boxes). Else, keep the boxes format unchanged, no matter what it is.
    """

    def __init__(
        self,
        resolve_lines: bool = True,
        resolve_blocks: bool = False,
        paragraph_break: float = 0.035,
        export_as_straight_boxes: bool = False,
    ) -> None:
        self.resolve_lines = resolve_lines
        self.resolve_blocks = resolve_blocks
        self.paragraph_break = paragraph_break
        self.export_as_straight_boxes = export_as_straight_boxes

    @staticmethod
    def _sort_boxes(boxes: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        """Sort bounding boxes from top to bottom, left to right

        Args:
            boxes: bounding boxes of shape (N, 4) or (N, 4, 2) (in case of rotated bbox)

        Returns:
            tuple: indices of ordered boxes of shape (N,), boxes
                If straight boxes are passed tpo the function, boxes are unchanged
                else: boxes returned are straight boxes fitted to the straightened rotated boxes
                so that we fit the lines afterwards to the straigthened page
        """
        if boxes.ndim == 3:
            boxes = rotate_boxes(
                loc_preds=boxes,
                angle=-estimate_page_angle(boxes),
                orig_shape=(1024, 1024),
                min_angle=5.0,
            )
            boxes = np.concatenate((boxes.min(1), boxes.max(1)), -1)
        return (boxes[:, 0] + 2 * boxes[:, 3] / np.median(boxes[:, 3] - boxes[:, 1])).argsort(), boxes

    def _resolve_sub_lines(self, boxes: np.ndarray, word_idcs: list[int]) -> list[list[int]]:
        """Split a line in sub_lines

        Args:
            boxes: bounding boxes of shape (N, 4)
            word_idcs: list of indexes for the words of the line

        Returns:
            A list of (sub-)lines computed from the original line (words)
        """
        lines = []
        # Sort words horizontally
        word_idcs = [word_idcs[idx] for idx in boxes[word_idcs, 0].argsort().tolist()]

        # Eventually split line horizontally
        if len(word_idcs) < 2:
            lines.append(word_idcs)
        else:
            sub_line = [word_idcs[0]]
            for i in word_idcs[1:]:
                horiz_break = True

                prev_box = boxes[sub_line[-1]]
                # Compute distance between boxes
                dist = boxes[i, 0] - prev_box[2]
                # If distance between boxes is lower than paragraph break, same sub-line
                if dist < self.paragraph_break:
                    horiz_break = False

                if horiz_break:
                    lines.append(sub_line)
                    sub_line = []

                sub_line.append(i)
            lines.append(sub_line)

        return lines

    def _resolve_lines(self, boxes: np.ndarray) -> list[list[int]]:
        """Order boxes to group them in lines

        Args:
            boxes: bounding boxes of shape (N, 4) or (N, 4, 2) in case of rotated bbox

        Returns:
            nested list of box indices
        """
        # Sort boxes, and straighten the boxes if they are rotated
        idxs, boxes = self._sort_boxes(boxes)

        # Compute median for boxes heights
        y_med = np.median(boxes[:, 3] - boxes[:, 1])

        lines = []
        words = [idxs[0]]  # Assign the top-left word to the first line
        # Define a mean y-center for the line
        y_center_sum = boxes[idxs[0]][[1, 3]].mean()

        for idx in idxs[1:]:
            vert_break = True

            # Compute y_dist
            y_dist = abs(boxes[idx][[1, 3]].mean() - y_center_sum / len(words))
            # If y-center of the box is close enough to mean y-center of the line, same line
            if y_dist < y_med / 2:
                vert_break = False

            if vert_break:
                # Compute sub-lines (horizontal split)
                lines.extend(self._resolve_sub_lines(boxes, words))
                words = []
                y_center_sum = 0

            words.append(idx)
            y_center_sum += boxes[idx][[1, 3]].mean()

        # Use the remaining words to form the last(s) line(s)
        if len(words) > 0:
            # Compute sub-lines (horizontal split)
            lines.extend(self._resolve_sub_lines(boxes, words))

        return lines

    @staticmethod
    def _resolve_blocks(boxes: np.ndarray, lines: list[list[int]]) -> list[list[list[int]]]:
        """Order lines to group them in blocks

        Args:
            boxes: bounding boxes of shape (N, 4) or (N, 4, 2)
            lines: list of lines, each line is a list of idx

        Returns:
            nested list of box indices
        """
        # Resolve enclosing boxes of lines
        if boxes.ndim == 3:
            box_lines: np.ndarray = np.asarray([
                resolve_enclosing_rbbox([tuple(boxes[idx, :, :]) for idx in line])  # type: ignore[misc]
                for line in lines
            ])
        else:
            _box_lines = [
                resolve_enclosing_bbox([(tuple(boxes[idx, :2]), tuple(boxes[idx, 2:])) for idx in line])
                for line in lines
            ]
            box_lines = np.asarray([(x1, y1, x2, y2) for ((x1, y1), (x2, y2)) in _box_lines])

        # Compute geometrical features of lines to clusterize
        # Clusterizing only with box centers yield to poor results for complex documents
        if boxes.ndim == 3:
            box_features: np.ndarray = np.stack(
                (
                    (box_lines[:, 0, 0] + box_lines[:, 0, 1]) / 2,
                    (box_lines[:, 0, 0] + box_lines[:, 2, 0]) / 2,
                    (box_lines[:, 0, 0] + box_lines[:, 2, 1]) / 2,
                    (box_lines[:, 0, 1] + box_lines[:, 2, 1]) / 2,
                    (box_lines[:, 0, 1] + box_lines[:, 2, 0]) / 2,
                    (box_lines[:, 2, 0] + box_lines[:, 2, 1]) / 2,
                ),
                axis=-1,
            )
        else:
            box_features = np.stack(
                (
                    (box_lines[:, 0] + box_lines[:, 3]) / 2,
                    (box_lines[:, 1] + box_lines[:, 2]) / 2,
                    (box_lines[:, 0] + box_lines[:, 2]) / 2,
                    (box_lines[:, 1] + box_lines[:, 3]) / 2,
                    box_lines[:, 0],
                    box_lines[:, 1],
                ),
                axis=-1,
            )
        # Compute clusters
        clusters = fclusterdata(box_features, t=0.1, depth=4, criterion="distance", metric="euclidean")

        _blocks: dict[int, list[int]] = {}
        # Form clusters
        for line_idx, cluster_idx in enumerate(clusters):
            if cluster_idx in _blocks.keys():
                _blocks[cluster_idx].append(line_idx)
            else:
                _blocks[cluster_idx] = [line_idx]

        # Retrieve word-box level to return a fully nested structure
        blocks = [[lines[idx] for idx in block] for block in _blocks.values()]

        return blocks

    def _build_blocks(
        self,
        boxes: np.ndarray,
        objectness_scores: np.ndarray,
        word_preds: list[tuple[str, float]],
        crop_orientations: list[dict[str, Any]],
    ) -> list[Block]:
        """Gather independent words in structured blocks

        Args:
            boxes: bounding boxes of all detected words of the page, of shape (N, 4) or (N, 4, 2)
            objectness_scores: objectness scores of all detected words of the page, of shape N
            word_preds: list of all detected words of the page, of shape N
            crop_orientations: list of dictoinaries containing
                the general orientation (orientations + confidences) of the crops

        Returns:
            list of block elements
        """
        if boxes.shape[0] != len(word_preds):
            raise ValueError(f"Incompatible argument lengths: {boxes.shape[0]}, {len(word_preds)}")

        if boxes.shape[0] == 0:
            return []

        # Decide whether we try to form lines
        _boxes = boxes
        if self.resolve_lines:
            lines = self._resolve_lines(_boxes if _boxes.ndim == 3 else _boxes[:, :4])
            # Decide whether we try to form blocks
            if self.resolve_blocks and len(lines) > 1:
                _blocks = self._resolve_blocks(_boxes if _boxes.ndim == 3 else _boxes[:, :4], lines)
            else:
                _blocks = [lines]
        else:
            # Sort bounding boxes, one line for all boxes, one block for the line
            lines = [self._sort_boxes(_boxes if _boxes.ndim == 3 else _boxes[:, :4])[0]]  # type: ignore[list-item]
            _blocks = [lines]

        blocks = [
            Block([
                Line([
                    Word(
                        *word_preds[idx],
                        tuple(tuple(pt) for pt in boxes[idx].tolist()),  # type: ignore[arg-type]
                        float(objectness_scores[idx]),
                        crop_orientations[idx],
                    )
                    if boxes.ndim == 3
                    else Word(
                        *word_preds[idx],
                        ((boxes[idx, 0], boxes[idx, 1]), (boxes[idx, 2], boxes[idx, 3])),
                        float(objectness_scores[idx]),
                        crop_orientations[idx],
                    )
                    for idx in line
                ])
                for line in lines
            ])
            for lines in _blocks
        ]

        return blocks

    def extra_repr(self) -> str:
        return (
            f"resolve_lines={self.resolve_lines}, resolve_blocks={self.resolve_blocks}, "
            f"paragraph_break={self.paragraph_break}, "
            f"export_as_straight_boxes={self.export_as_straight_boxes}"
        )

    def __call__(
        self,
        pages: list[np.ndarray],
        boxes: list[np.ndarray],
        objectness_scores: list[np.ndarray],
        text_preds: list[list[tuple[str, float]]],
        page_shapes: list[tuple[int, int]],
        crop_orientations: list[dict[str, Any]],
        orientations: list[dict[str, Any]] | None = None,
        languages: list[dict[str, Any]] | None = None,
    ) -> Document:
        """Re-arrange detected words into structured blocks

        Args:
            pages: list of N elements, where each element represents the page image
            boxes: list of N elements, where each element represents the localization predictions, of shape (*, 4)
                or (*, 4, 2) for all words for a given page
            objectness_scores: list of N elements, where each element represents the objectness scores
            text_preds: list of N elements, where each element is the list of all word prediction (text + confidence)
            page_shapes: shape of each page, of size N
            crop_orientations: list of N elements, where each element is
                a dictionary containing the general orientation (orientations + confidences) of the crops
            orientations: optional, list of N elements,
                where each element is a dictionary containing the orientation (orientation + confidence)
            languages: optional, list of N elements,
                where each element is a dictionary containing the language (language + confidence)

        Returns:
            document object
        """
        if len(boxes) != len(text_preds) != len(crop_orientations) != len(objectness_scores) or len(boxes) != len(
            page_shapes
        ) != len(crop_orientations) != len(objectness_scores):
            raise ValueError("All arguments are expected to be lists of the same size")

        _orientations = orientations if isinstance(orientations, list) else [None] * len(boxes)
        _languages = languages if isinstance(languages, list) else [None] * len(boxes)
        if self.export_as_straight_boxes and len(boxes) > 0:
            # If boxes are already straight OK, else fit a bounding rect
            if boxes[0].ndim == 3:
                # Iterate over pages and boxes
                boxes = [np.concatenate((p_boxes.min(1), p_boxes.max(1)), 1) for p_boxes in boxes]

        _pages = [
            Page(
                page,
                self._build_blocks(
                    page_boxes,
                    loc_scores,
                    word_preds,
                    word_crop_orientations,
                ),
                _idx,
                shape,
                orientation,
                language,
            )
            for page, _idx, shape, page_boxes, loc_scores, word_preds, word_crop_orientations, orientation, language in zip(  # noqa: E501
                pages,
                range(len(boxes)),
                page_shapes,
                boxes,
                objectness_scores,
                text_preds,
                crop_orientations,
                _orientations,
                _languages,
            )
        ]

        return Document(_pages)


================================================
FILE: onnxtr/models/classification/__init__.py
================================================
from .models import *
from .zoo import *


================================================
FILE: onnxtr/models/classification/models/__init__.py
================================================
from .mobilenet import *

================================================
FILE: onnxtr/models/classification/models/mobilenet.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

# Greatly inspired by https://github.com/pytorch/vision/blob/master/torchvision/models/mobilenetv3.py

from copy import deepcopy
from typing import Any

import numpy as np

from ...engine import Engine, EngineConfig

__all__ = [
    "MobileNetV3",
    "mobilenet_v3_small_crop_orientation",
    "mobilenet_v3_small_page_orientation",
]

default_cfgs: dict[str, dict[str, Any]] = {
    "mobilenet_v3_small_crop_orientation": {
        "mean": (0.694, 0.695, 0.693),
        "std": (0.299, 0.296, 0.301),
        "input_shape": (3, 256, 256),
        "classes": [0, -90, 180, 90],
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_crop_orientation-4fde60a1.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_crop_orientation_static_8_bit-c32c7721.onnx",
    },
    "mobilenet_v3_small_page_orientation": {
        "mean": (0.694, 0.695, 0.693),
        "std": (0.299, 0.296, 0.301),
        "input_shape": (3, 512, 512),
        "classes": [0, -90, 180, 90],
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_page_orientation-60606ce4.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_page_orientation_static_8_bit-13b5b014.onnx",
    },
}


class MobileNetV3(Engine):
    """MobileNetV3 Onnx loader

    Args:
        model_path: path or url to onnx model file
        engine_cfg: configuration for the inference engine
        cfg: configuration dictionary
        **kwargs: additional arguments to be passed to `Engine`
    """

    def __init__(
        self,
        model_path: str,
        engine_cfg: EngineConfig | None = None,
        cfg: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)

        self.cfg = cfg

    def __call__(
        self,
        x: np.ndarray,
    ) -> np.ndarray:
        return self.run(x)


def _mobilenet_v3(
    arch: str,
    model_path: str,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> MobileNetV3:
    # Patch the url
    model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
    _cfg = deepcopy(default_cfgs[arch])
    return MobileNetV3(model_path, cfg=_cfg, engine_cfg=engine_cfg, **kwargs)


def mobilenet_v3_small_crop_orientation(
    model_path: str = default_cfgs["mobilenet_v3_small_crop_orientation"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> MobileNetV3:
    """MobileNetV3-Small architecture as described in
    `"Searching for MobileNetV3",
    <https://arxiv.org/pdf/1905.02244.pdf>`_.

    >>> import numpy as np
    >>> from onnxtr.models import mobilenet_v3_small_crop_orientation
    >>> model = mobilenet_v3_small_crop_orientation()
    >>> input_tensor = np.random.rand((1, 3, 256, 256))
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the MobileNetV3 architecture

    Returns:
        MobileNetV3
    """
    return _mobilenet_v3("mobilenet_v3_small_crop_orientation", model_path, load_in_8_bit, engine_cfg, **kwargs)


def mobilenet_v3_small_page_orientation(
    model_path: str = default_cfgs["mobilenet_v3_small_page_orientation"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> MobileNetV3:
    """MobileNetV3-Small architecture as described in
    `"Searching for MobileNetV3",
    <https://arxiv.org/pdf/1905.02244.pdf>`_.

    >>> import numpy as np
    >>> from onnxtr.models import mobilenet_v3_small_page_orientation
    >>> model = mobilenet_v3_small_page_orientation()
    >>> input_tensor = np.random.rand((1, 3, 512, 512))
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the MobileNetV3 architecture

    Returns:
        MobileNetV3
    """
    return _mobilenet_v3("mobilenet_v3_small_page_orientation", model_path, load_in_8_bit, engine_cfg, **kwargs)


================================================
FILE: onnxtr/models/classification/predictor/__init__.py
================================================
from .base import *


================================================
FILE: onnxtr/models/classification/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np
from scipy.special import softmax

from onnxtr.models.preprocessor import PreProcessor
from onnxtr.utils.repr import NestedObject

__all__ = ["OrientationPredictor"]


class OrientationPredictor(NestedObject):
    """Implements an object able to detect the reading direction of a text box or a page.
    4 possible orientations: 0, 90, 180, 270 (-90) degrees counter clockwise.

    Args:
        pre_processor: transform inputs for easier batched model inference
        model: core classification architecture (backbone + classification head)
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
    """

    _children_names: list[str] = ["pre_processor", "model"]

    def __init__(
        self,
        pre_processor: PreProcessor | None,
        model: Any | None,
    ) -> None:
        self.pre_processor = pre_processor if isinstance(pre_processor, PreProcessor) else None
        self.model = model

    def __call__(
        self,
        inputs: list[np.ndarray],
    ) -> list[list[int] | list[float]]:
        # Dimension check
        if any(input.ndim != 3 for input in inputs):
            raise ValueError("incorrect input shape: all inputs are expected to be multi-channel 2D images.")

        if self.model is None or self.pre_processor is None:
            # predictor is disabled
            return [[0] * len(inputs), [0] * len(inputs), [1.0] * len(inputs)]

        processed_batches = self.pre_processor(inputs)
        predicted_batches = [self.model(batch) for batch in processed_batches]

        # confidence
        probs = [np.max(softmax(batch, axis=1), axis=1) for batch in predicted_batches]
        # Postprocess predictions
        predicted_batches = [np.argmax(out_batch, axis=1) for out_batch in predicted_batches]

        class_idxs = [int(pred) for batch in predicted_batches for pred in batch]
        classes = [int(self.model.cfg["classes"][idx]) for idx in class_idxs]
        confs = [round(float(p), 2) for prob in probs for p in prob]

        return [class_idxs, classes, confs]


================================================
FILE: onnxtr/models/classification/zoo.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

from onnxtr.models.engine import EngineConfig

from .. import classification
from ..preprocessor import PreProcessor
from .predictor import OrientationPredictor

__all__ = ["crop_orientation_predictor", "page_orientation_predictor"]

ORIENTATION_ARCHS: list[str] = ["mobilenet_v3_small_crop_orientation", "mobilenet_v3_small_page_orientation"]


def _orientation_predictor(
    arch: Any,
    model_type: str,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    disabled: bool = False,
    **kwargs: Any,
) -> OrientationPredictor:
    if disabled:
        # Case where the orientation predictor is disabled
        return OrientationPredictor(None, None)

    if isinstance(arch, str):
        if arch not in ORIENTATION_ARCHS:
            raise ValueError(f"unknown architecture '{arch}'")
        # Load directly classifier from backbone
        _model = classification.__dict__[arch](load_in_8_bit=load_in_8_bit, engine_cfg=engine_cfg)
    else:
        if not isinstance(arch, classification.MobileNetV3):
            raise ValueError(f"unknown architecture: {type(arch)}")
        _model = arch

    kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
    kwargs["std"] = kwargs.get("std", _model.cfg["std"])
    kwargs["batch_size"] = kwargs.get("batch_size", 512 if model_type == "crop" else 2)
    input_shape = _model.cfg["input_shape"][1:]
    predictor = OrientationPredictor(
        PreProcessor(input_shape, preserve_aspect_ratio=True, symmetric_pad=True, **kwargs),
        _model,
    )
    return predictor


def crop_orientation_predictor(
    arch: Any = "mobilenet_v3_small_crop_orientation",
    batch_size: int = 512,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> OrientationPredictor:
    """Crop orientation classification architecture.

    >>> import numpy as np
    >>> from onnxtr.models import crop_orientation_predictor
    >>> model = crop_orientation_predictor(arch='mobilenet_v3_small_crop_orientation')
    >>> input_crop = (255 * np.random.rand(256, 256, 3)).astype(np.uint8)
    >>> out = model([input_crop])

    Args:
        arch: name of the architecture to use (e.g. 'mobilenet_v3_small_crop_orientation')
        batch_size: number of samples the model processes in parallel
        load_in_8_bit: load the 8-bit quantized version of the model
        engine_cfg: configuration of inference engine
        **kwargs: keyword arguments to be passed to the OrientationPredictor

    Returns:
        OrientationPredictor
    """
    model_type = "crop"
    return _orientation_predictor(
        arch=arch,
        batch_size=batch_size,
        model_type=model_type,
        load_in_8_bit=load_in_8_bit,
        engine_cfg=engine_cfg,
        **kwargs,
    )


def page_orientation_predictor(
    arch: Any = "mobilenet_v3_small_page_orientation",
    batch_size: int = 2,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> OrientationPredictor:
    """Page orientation classification architecture.

    >>> import numpy as np
    >>> from onnxtr.models import page_orientation_predictor
    >>> model = page_orientation_predictor(arch='mobilenet_v3_small_page_orientation')
    >>> input_page = (255 * np.random.rand(512, 512, 3)).astype(np.uint8)
    >>> out = model([input_page])

    Args:
        arch: name of the architecture to use (e.g. 'mobilenet_v3_small_page_orientation')
        batch_size: number of samples the model processes in parallel
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments to be passed to the OrientationPredictor

    Returns:
        OrientationPredictor
    """
    model_type = "page"
    return _orientation_predictor(
        arch=arch,
        batch_size=batch_size,
        model_type=model_type,
        load_in_8_bit=load_in_8_bit,
        engine_cfg=engine_cfg,
        **kwargs,
    )


================================================
FILE: onnxtr/models/detection/__init__.py
================================================
from .models import *
from .zoo import *


================================================
FILE: onnxtr/models/detection/_utils/__init__.py
================================================
from . base import *

================================================
FILE: onnxtr/models/detection/_utils/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.


import numpy as np

__all__ = ["_remove_padding"]


def _remove_padding(
    pages: list[np.ndarray],
    loc_preds: list[np.ndarray],
    preserve_aspect_ratio: bool,
    symmetric_pad: bool,
    assume_straight_pages: bool,
) -> list[np.ndarray]:
    """Remove padding from the localization predictions

    Args:
        pages: list of pages
        loc_preds: list of localization predictions
        preserve_aspect_ratio: whether the aspect ratio was preserved during padding
        symmetric_pad: whether the padding was symmetric
        assume_straight_pages: whether the pages are assumed to be straight

    Returns:
        list of unpaded localization predictions
    """
    if preserve_aspect_ratio:
        # Rectify loc_preds to remove padding
        rectified_preds = []
        for page, loc_pred in zip(pages, loc_preds):
            h, w = page.shape[0], page.shape[1]
            if h > w:
                # y unchanged, dilate x coord
                if symmetric_pad:
                    if assume_straight_pages:
                        loc_pred[:, [0, 2]] = (loc_pred[:, [0, 2]] - 0.5) * h / w + 0.5
                    else:
                        loc_pred[:, :, 0] = (loc_pred[:, :, 0] - 0.5) * h / w + 0.5
                else:
                    if assume_straight_pages:
                        loc_pred[:, [0, 2]] *= h / w
                    else:
                        loc_pred[:, :, 0] *= h / w
            elif w > h:
                # x unchanged, dilate y coord
                if symmetric_pad:
                    if assume_straight_pages:
                        loc_pred[:, [1, 3]] = (loc_pred[:, [1, 3]] - 0.5) * w / h + 0.5
                    else:
                        loc_pred[:, :, 1] = (loc_pred[:, :, 1] - 0.5) * w / h + 0.5
                else:
                    if assume_straight_pages:
                        loc_pred[:, [1, 3]] *= w / h
                    else:
                        loc_pred[:, :, 1] *= w / h
            rectified_preds.append(np.clip(loc_pred, 0, 1))
        return rectified_preds
    return loc_preds


================================================
FILE: onnxtr/models/detection/core.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.


import cv2
import numpy as np

from onnxtr.utils.repr import NestedObject

__all__ = ["DetectionPostProcessor"]


class DetectionPostProcessor(NestedObject):
    """Abstract class to postprocess the raw output of the model

    Args:
        box_thresh (float): minimal objectness score to consider a box
        bin_thresh (float): threshold to apply to segmentation raw heatmap
        assume straight_pages (bool): if True, fit straight boxes only
    """

    def __init__(self, box_thresh: float = 0.5, bin_thresh: float = 0.5, assume_straight_pages: bool = True) -> None:
        self.box_thresh = box_thresh
        self.bin_thresh = bin_thresh
        self.assume_straight_pages = assume_straight_pages
        self._opening_kernel: np.ndarray = np.ones((3, 3), dtype=np.uint8)

    def extra_repr(self) -> str:
        return f"bin_thresh={self.bin_thresh}, box_thresh={self.box_thresh}"

    @staticmethod
    def box_score(pred: np.ndarray, points: np.ndarray, assume_straight_pages: bool = True) -> float:
        """Compute the confidence score for a polygon : mean of the p values on the polygon

        Args:
            pred (np.ndarray): p map returned by the model
            points: coordinates of the polygon
            assume_straight_pages: if True, fit straight boxes only

        Returns:
            polygon objectness
        """
        h, w = pred.shape[:2]

        if assume_straight_pages:
            xmin = np.clip(np.floor(points[:, 0].min()).astype(np.int32), 0, w - 1)
            xmax = np.clip(np.ceil(points[:, 0].max()).astype(np.int32), 0, w - 1)
            ymin = np.clip(np.floor(points[:, 1].min()).astype(np.int32), 0, h - 1)
            ymax = np.clip(np.ceil(points[:, 1].max()).astype(np.int32), 0, h - 1)
            return pred[ymin : ymax + 1, xmin : xmax + 1].mean()

        else:
            mask: np.ndarray = np.zeros((h, w), np.int32)
            cv2.fillPoly(mask, [points.astype(np.int32)], 1.0)
            product = pred * mask
            return np.sum(product) / np.count_nonzero(product)

    def bitmap_to_boxes(
        self,
        pred: np.ndarray,
        bitmap: np.ndarray,
    ) -> np.ndarray:
        raise NotImplementedError

    def __call__(
        self,
        proba_map,
    ) -> list[list[np.ndarray]]:
        """Performs postprocessing for a list of model outputs

        Args:
            proba_map: probability map of shape (N, H, W, C)

        Returns:
            list of N class predictions (for each input sample), where each class predictions is a list of C tensors
            of shape (*, 5) or (*, 6)
        """
        if proba_map.ndim != 4:
            raise AssertionError(f"arg `proba_map` is expected to be 4-dimensional, got {proba_map.ndim}.")

        # Erosion + dilation on the binary map
        bin_map = [
            [
                cv2.morphologyEx(bmap[..., idx], cv2.MORPH_OPEN, self._opening_kernel)
                for idx in range(proba_map.shape[-1])
            ]
            for bmap in (proba_map >= self.bin_thresh).astype(np.uint8)
        ]

        return [
            [self.bitmap_to_boxes(pmaps[..., idx], bmaps[idx]) for idx in range(proba_map.shape[-1])]
            for pmaps, bmaps in zip(proba_map, bin_map)
        ]


================================================
FILE: onnxtr/models/detection/models/__init__.py
================================================
from .fast import *
from .differentiable_binarization import *
from .linknet import *

================================================
FILE: onnxtr/models/detection/models/differentiable_binarization.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np
from scipy.special import expit

from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor

__all__ = ["DBNet", "db_resnet50", "db_resnet34", "db_mobilenet_v3_large"]


default_cfgs: dict[str, dict[str, Any]] = {
    "db_resnet50": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/db_resnet50-69ba0015.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/db_resnet50_static_8_bit-09a6104f.onnx",
    },
    "db_resnet34": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/db_resnet34-b4873198.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/db_resnet34_static_8_bit-027e2c7f.onnx",
    },
    "db_mobilenet_v3_large": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.2.0/db_mobilenet_v3_large-4987e7bd.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.2.0/db_mobilenet_v3_large_static_8_bit-535a6f25.onnx",
    },
}


class DBNet(Engine):
    """DBNet Onnx loader

    Args:
        model_path: path or url to onnx model file
        engine_cfg: configuration for the inference engine
        bin_thresh: threshold for binarization of the output feature map
        box_thresh: minimal objectness score to consider a box
        assume_straight_pages: if True, fit straight bounding boxes only
        cfg: the configuration dict of the model
        **kwargs: additional arguments to be passed to `Engine`
    """

    def __init__(
        self,
        model_path: str,
        engine_cfg: EngineConfig | None = None,
        bin_thresh: float = 0.3,
        box_thresh: float = 0.1,
        assume_straight_pages: bool = True,
        cfg: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)

        self.cfg = cfg
        self.assume_straight_pages = assume_straight_pages

        self.postprocessor = GeneralDetectionPostProcessor(
            assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
        )

    def __call__(
        self,
        x: np.ndarray,
        return_model_output: bool = False,
        **kwargs: Any,
    ) -> dict[str, Any]:
        logits = self.run(x)

        out: dict[str, Any] = {}

        prob_map = expit(logits)
        if return_model_output:
            out["out_map"] = prob_map

        out["preds"] = self.postprocessor(prob_map)

        return out


def _dbnet(
    arch: str,
    model_path: str,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DBNet:
    # Patch the url
    model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
    # Build the model
    return DBNet(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)


def db_resnet34(
    model_path: str = default_cfgs["db_resnet34"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DBNet:
    """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
    <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-34 backbone.

    >>> import numpy as np
    >>> from onnxtr.models import db_resnet34
    >>> model = db_resnet34()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _dbnet("db_resnet34", model_path, load_in_8_bit, engine_cfg, **kwargs)


def db_resnet50(
    model_path: str = default_cfgs["db_resnet50"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DBNet:
    """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
    <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.

    >>> import numpy as np
    >>> from onnxtr.models import db_resnet50
    >>> model = db_resnet50()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _dbnet("db_resnet50", model_path, load_in_8_bit, engine_cfg, **kwargs)


def db_mobilenet_v3_large(
    model_path: str = default_cfgs["db_mobilenet_v3_large"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DBNet:
    """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
    <https://arxiv.org/pdf/1911.08947.pdf>`_, using a MobileNet V3 Large backbone.

    >>> import numpy as np
    >>> from onnxtr.models import db_mobilenet_v3_large
    >>> model = db_mobilenet_v3_large()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _dbnet("db_mobilenet_v3_large", model_path, load_in_8_bit, engine_cfg, **kwargs)


================================================
FILE: onnxtr/models/detection/models/fast.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

import logging
from typing import Any

import numpy as np
from scipy.special import expit

from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor

__all__ = ["FAST", "fast_tiny", "fast_small", "fast_base"]


default_cfgs: dict[str, dict[str, Any]] = {
    "fast_tiny": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_tiny-28867779.onnx",
    },
    "fast_small": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_small-10428b70.onnx",
    },
    "fast_base": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_base-1b89ebf9.onnx",
    },
}


class FAST(Engine):
    """FAST Onnx loader

    Args:
        model_path: path or url to onnx model file
        engine_cfg: configuration for the inference engine
        bin_thresh: threshold for binarization of the output feature map
        box_thresh: minimal objectness score to consider a box
        assume_straight_pages: if True, fit straight bounding boxes only
        cfg: the configuration dict of the model
        **kwargs: additional arguments to be passed to `Engine`
    """

    def __init__(
        self,
        model_path: str,
        engine_cfg: EngineConfig | None = None,
        bin_thresh: float = 0.1,
        box_thresh: float = 0.1,
        assume_straight_pages: bool = True,
        cfg: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)

        self.cfg = cfg
        self.assume_straight_pages = assume_straight_pages

        self.postprocessor = GeneralDetectionPostProcessor(
            assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
        )

    def __call__(
        self,
        x: np.ndarray,
        return_model_output: bool = False,
        **kwargs: Any,
    ) -> dict[str, Any]:
        logits = self.run(x)

        out: dict[str, Any] = {}

        prob_map = expit(logits)
        if return_model_output:
            out["out_map"] = prob_map

        out["preds"] = self.postprocessor(prob_map)

        return out


def _fast(
    arch: str,
    model_path: str,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> FAST:
    if load_in_8_bit:
        logging.warning("FAST models do not support 8-bit quantization yet. Loading full precision model...")
    # Build the model
    return FAST(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)


def fast_tiny(
    model_path: str = default_cfgs["fast_tiny"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> FAST:
    """FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
    <https://arxiv.org/pdf/2111.02394.pdf>`_, using a tiny TextNet backbone.

    >>> import numpy as np
    >>> from onnxtr.models import fast_tiny
    >>> model = fast_tiny()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _fast("fast_tiny", model_path, load_in_8_bit, engine_cfg, **kwargs)


def fast_small(
    model_path: str = default_cfgs["fast_small"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> FAST:
    """FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
    <https://arxiv.org/pdf/2111.02394.pdf>`_, using a small TextNet backbone.

    >>> import numpy as np
    >>> from onnxtr.models import fast_small
    >>> model = fast_small()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _fast("fast_small", model_path, load_in_8_bit, engine_cfg, **kwargs)


def fast_base(
    model_path: str = default_cfgs["fast_base"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> FAST:
    """FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
    <https://arxiv.org/pdf/2111.02394.pdf>`_, using a base TextNet backbone.

    >>> import numpy as np
    >>> from onnxtr.models import fast_base
    >>> model = fast_base()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the DBNet architecture

    Returns:
        text detection architecture
    """
    return _fast("fast_base", model_path, load_in_8_bit, engine_cfg, **kwargs)


================================================
FILE: onnxtr/models/detection/models/linknet.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np
from scipy.special import expit

from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor

__all__ = ["LinkNet", "linknet_resnet18", "linknet_resnet34", "linknet_resnet50"]


default_cfgs: dict[str, dict[str, Any]] = {
    "linknet_resnet18": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet18-e0e0b9dc.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet18_static_8_bit-3b3a37dd.onnx",
    },
    "linknet_resnet34": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet34-93e39a39.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet34_static_8_bit-2824329d.onnx",
    },
    "linknet_resnet50": {
        "input_shape": (3, 1024, 1024),
        "mean": (0.798, 0.785, 0.772),
        "std": (0.264, 0.2749, 0.287),
        "url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet50-15d8c4ec.onnx",
        "url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet50_static_8_bit-65d6b0b8.onnx",
    },
}


class LinkNet(Engine):
    """LinkNet Onnx loader

    Args:
        model_path: path or url to onnx model file
        engine_cfg: configuration for the inference engine
        bin_thresh: threshold for binarization of the output feature map
        box_thresh: minimal objectness score to consider a box
        assume_straight_pages: if True, fit straight bounding boxes only
        cfg: the configuration dict of the model
        **kwargs: additional arguments to be passed to `Engine`
    """

    def __init__(
        self,
        model_path: str,
        engine_cfg: EngineConfig | None = None,
        bin_thresh: float = 0.1,
        box_thresh: float = 0.1,
        assume_straight_pages: bool = True,
        cfg: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)

        self.cfg = cfg
        self.assume_straight_pages = assume_straight_pages

        self.postprocessor = GeneralDetectionPostProcessor(
            assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
        )

    def __call__(
        self,
        x: np.ndarray,
        return_model_output: bool = False,
        **kwargs: Any,
    ) -> dict[str, Any]:
        logits = self.run(x)

        out: dict[str, Any] = {}

        prob_map = expit(logits)
        if return_model_output:
            out["out_map"] = prob_map

        out["preds"] = self.postprocessor(prob_map)

        return out


def _linknet(
    arch: str,
    model_path: str,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> LinkNet:
    # Patch the url
    model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
    # Build the model
    return LinkNet(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)


def linknet_resnet18(
    model_path: str = default_cfgs["linknet_resnet18"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> LinkNet:
    """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
    <https://arxiv.org/pdf/1707.03718.pdf>`_.

    >>> import numpy as np
    >>> from onnxtr.models import linknet_resnet18
    >>> model = linknet_resnet18()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the LinkNet architecture

    Returns:
        text detection architecture
    """
    return _linknet("linknet_resnet18", model_path, load_in_8_bit, engine_cfg, **kwargs)


def linknet_resnet34(
    model_path: str = default_cfgs["linknet_resnet34"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> LinkNet:
    """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
    <https://arxiv.org/pdf/1707.03718.pdf>`_.

    >>> import numpy as np
    >>> from onnxtr.models import linknet_resnet34
    >>> model = linknet_resnet34()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the LinkNet architecture

    Returns:
        text detection architecture
    """
    return _linknet("linknet_resnet34", model_path, load_in_8_bit, engine_cfg, **kwargs)


def linknet_resnet50(
    model_path: str = default_cfgs["linknet_resnet50"]["url"],
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> LinkNet:
    """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
    <https://arxiv.org/pdf/1707.03718.pdf>`_.

    >>> import numpy as np
    >>> from onnxtr.models import linknet_resnet50
    >>> model = linknet_resnet50()
    >>> input_tensor = np.random.rand(1, 3, 1024, 1024)
    >>> out = model(input_tensor)

    Args:
        model_path: path to onnx model file, defaults to url in default_cfgs
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: keyword arguments of the LinkNet architecture

    Returns:
        text detection architecture
    """
    return _linknet("linknet_resnet50", model_path, load_in_8_bit, engine_cfg, **kwargs)


================================================
FILE: onnxtr/models/detection/postprocessor/__init__.py
================================================


================================================
FILE: onnxtr/models/detection/postprocessor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization


import cv2
import numpy as np
import pyclipper

from onnxtr.utils import order_points

from ..core import DetectionPostProcessor

__all__ = ["GeneralDetectionPostProcessor"]


class GeneralDetectionPostProcessor(DetectionPostProcessor):
    """Implements a post processor for FAST model.

    Args:
        bin_thresh: threshold used to binzarized p_map at inference time
        box_thresh: minimal objectness score to consider a box
        assume_straight_pages: whether the inputs were expected to have horizontal text elements
    """

    def __init__(
        self,
        bin_thresh: float = 0.1,
        box_thresh: float = 0.1,
        assume_straight_pages: bool = True,
    ) -> None:
        super().__init__(box_thresh, bin_thresh, assume_straight_pages)
        self.unclip_ratio = 1.5

    def polygon_to_box(
        self,
        points: np.ndarray,
    ) -> np.ndarray:
        """Expand a polygon (points) by a factor unclip_ratio, and returns a polygon

        Args:
            points: The first parameter.

        Returns:
            a box in absolute coordinates (xmin, ymin, xmax, ymax) or (4, 2) array (quadrangle)
        """
        if not self.assume_straight_pages:
            # Compute the rectangle polygon enclosing the raw polygon
            rect = cv2.minAreaRect(points)
            points = cv2.boxPoints(rect)
            # Add 1 pixel to correct cv2 approx
            area = (rect[1][0] + 1) * (1 + rect[1][1])
            length = 2 * (rect[1][0] + rect[1][1]) + 2
        else:
            area = cv2.contourArea(points)
            length = cv2.arcLength(points, closed=True)
        distance = area * self.unclip_ratio / length  # compute distance to expand polygon
        offset = pyclipper.PyclipperOffset()
        offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
        _points = offset.Execute(distance)
        # Take biggest stack of points
        idx = 0
        if len(_points) > 1:
            max_size = 0
            for _idx, p in enumerate(_points):
                if len(p) > max_size:
                    idx = _idx
                    max_size = len(p)
            # We ensure that _points can be correctly casted to a ndarray
            _points = [_points[idx]]
        expanded_points: np.ndarray = np.asarray(_points)  # expand polygon
        if len(expanded_points) < 1:
            return None  # type: ignore[return-value]
        return (
            cv2.boundingRect(expanded_points)  # type: ignore[return-value]
            if self.assume_straight_pages
            else order_points(cv2.boxPoints(cv2.minAreaRect(expanded_points)))
        )

    def bitmap_to_boxes(
        self,
        pred: np.ndarray,
        bitmap: np.ndarray,
    ) -> np.ndarray:
        """Compute boxes from a bitmap/pred_map: find connected components then filter boxes

        Args:
            pred: Pred map from differentiable linknet output
            bitmap: Bitmap map computed from pred (binarized)
            angle_tol: Comparison tolerance of the angle with the median angle across the page
            ratio_tol: Under this limit aspect ratio, we cannot resolve the direction of the crop

        Returns:
            np tensor boxes for the bitmap, each box is a 6-element list
            containing x, y, w, h, alpha, score for the box
        """
        height, width = bitmap.shape[:2]
        boxes: list[np.ndarray | list[float]] = []
        # get contours from connected components on the bitmap
        contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        for contour in contours:
            # Check whether smallest enclosing bounding box is not too small
            if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < 2):
                continue
            # Compute objectness
            if self.assume_straight_pages:
                x, y, w, h = cv2.boundingRect(contour)
                points: np.ndarray = np.array([[x, y], [x, y + h], [x + w, y + h], [x + w, y]])
                score = self.box_score(pred, points, assume_straight_pages=True)
            else:
                score = self.box_score(pred, contour, assume_straight_pages=False)

            if score < self.box_thresh:  # remove polygons with a weak objectness
                continue

            if self.assume_straight_pages:
                _box = self.polygon_to_box(points)
            else:
                _box = self.polygon_to_box(np.squeeze(contour))

            if self.assume_straight_pages:
                # compute relative polygon to get rid of img shape
                x, y, w, h = _box
                xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
                boxes.append([xmin, ymin, xmax, ymax, score])
            else:
                # compute relative box to get rid of img shape
                _box[:, 0] /= width
                _box[:, 1] /= height
                # Add score to box as (0, score)
                boxes.append(np.vstack([_box, np.array([0.0, score])]))

        if not self.assume_straight_pages:
            return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5, 2), dtype=pred.dtype)
        else:
            return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=pred.dtype)


================================================
FILE: onnxtr/models/detection/predictor/__init__.py
================================================
from .base import *


================================================
FILE: onnxtr/models/detection/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

import numpy as np

from onnxtr.models.detection._utils import _remove_padding
from onnxtr.models.preprocessor import PreProcessor
from onnxtr.utils.repr import NestedObject

__all__ = ["DetectionPredictor"]


class DetectionPredictor(NestedObject):
    """Implements an object able to localize text elements in a document

    Args:
        pre_processor: transform inputs for easier batched model inference
        model: core detection architecture
    """

    _children_names: list[str] = ["pre_processor", "model"]

    def __init__(
        self,
        pre_processor: PreProcessor,
        model: Any,
    ) -> None:
        self.pre_processor = pre_processor
        self.model = model

    def __call__(
        self,
        pages: list[np.ndarray],
        return_maps: bool = False,
        **kwargs: Any,
    ) -> list[np.ndarray] | tuple[list[np.ndarray], list[np.ndarray]]:
        # Extract parameters from the preprocessor
        preserve_aspect_ratio = self.pre_processor.resize.preserve_aspect_ratio
        symmetric_pad = self.pre_processor.resize.symmetric_pad
        assume_straight_pages = self.model.assume_straight_pages

        # Dimension check
        if any(page.ndim != 3 for page in pages):
            raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")

        processed_batches = self.pre_processor(pages)
        predicted_batches = [
            self.model(batch, return_preds=True, return_model_output=True, **kwargs) for batch in processed_batches
        ]

        # Remove padding from loc predictions
        preds = _remove_padding(
            pages,
            [pred[0] for batch in predicted_batches for pred in batch["preds"]],
            preserve_aspect_ratio=preserve_aspect_ratio,
            symmetric_pad=symmetric_pad,
            assume_straight_pages=assume_straight_pages,
        )

        if return_maps:
            seg_maps = [pred for batch in predicted_batches for pred in batch["out_map"]]
            return preds, seg_maps
        return preds


================================================
FILE: onnxtr/models/detection/zoo.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from typing import Any

from .. import detection
from ..engine import EngineConfig
from ..preprocessor import PreProcessor
from .predictor import DetectionPredictor

__all__ = ["detection_predictor"]

ARCHS = [
    "db_resnet34",
    "db_resnet50",
    "db_mobilenet_v3_large",
    "linknet_resnet18",
    "linknet_resnet34",
    "linknet_resnet50",
    "fast_tiny",
    "fast_small",
    "fast_base",
]


def _predictor(
    arch: Any,
    assume_straight_pages: bool = True,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DetectionPredictor:
    if isinstance(arch, str):
        if arch not in ARCHS:
            raise ValueError(f"unknown architecture '{arch}'")

        _model = detection.__dict__[arch](
            assume_straight_pages=assume_straight_pages, load_in_8_bit=load_in_8_bit, engine_cfg=engine_cfg
        )
    else:
        if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
            raise ValueError(f"unknown architecture: {type(arch)}")

        _model = arch
        _model.assume_straight_pages = assume_straight_pages
        _model.postprocessor.assume_straight_pages = assume_straight_pages

    kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
    kwargs["std"] = kwargs.get("std", _model.cfg["std"])
    kwargs["batch_size"] = kwargs.get("batch_size", 2)
    predictor = DetectionPredictor(
        PreProcessor(_model.cfg["input_shape"][1:], **kwargs),
        _model,
    )
    return predictor


def detection_predictor(
    arch: Any = "fast_base",
    assume_straight_pages: bool = True,
    preserve_aspect_ratio: bool = True,
    symmetric_pad: bool = True,
    batch_size: int = 2,
    load_in_8_bit: bool = False,
    engine_cfg: EngineConfig | None = None,
    **kwargs: Any,
) -> DetectionPredictor:
    """Text detection architecture.

    >>> import numpy as np
    >>> from onnxtr.models import detection_predictor
    >>> model = detection_predictor(arch='db_resnet50')
    >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
    >>> out = model([input_page])

    Args:
        arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
        assume_straight_pages: If True, fit straight boxes to the page
        preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
            running the detection model on it
        symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
        batch_size: number of samples the model processes in parallel
        load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
        engine_cfg: configuration for the inference engine
        **kwargs: optional keyword arguments passed to the architecture

    Returns:
        Detection predictor
    """
    return _predictor(
        arch=arch,
        assume_straight_pages=assume_straight_pages,
        preserve_aspect_ratio=preserve_aspect_ratio,
        symmetric_pad=symmetric_pad,
        batch_size=batch_size,
        load_in_8_bit=load_in_8_bit,
        engine_cfg=engine_cfg,
        **kwargs,
    )


================================================
FILE: onnxtr/models/engine.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

import logging
import os
from collections.abc import Callable
from typing import Any, TypeAlias

import numpy as np
from onnxruntime import (
    ExecutionMode,
    GraphOptimizationLevel,
    InferenceSession,
    RunOptions,
    SessionOptions,
    get_available_providers,
    get_device,
)
from onnxruntime.capi._pybind_state import set_default_logger_severity

set_default_logger_severity(int(os.getenv("ORT_LOG_SEVERITY_LEVEL", 4)))

from onnxtr.utils.data import download_from_url
from onnxtr.utils.geometry import shape_translate

__all__ = ["EngineConfig", "RunOptionsProvider"]

RunOptionsProvider: TypeAlias = Callable[[RunOptions], RunOptions]


class EngineConfig:
    """Implements a configuration class for the engine of a model

    Args:
        providers: list of providers to use for inference ref.: https://onnxruntime.ai/docs/execution-providers/
        session_options: configuration for the inference session ref.: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
    """

    def __init__(
        self,
        providers: list[tuple[str, dict[str, Any]]] | list[str] | None = None,
        session_options: SessionOptions | None = None,
        run_options_provider: RunOptionsProvider | None = None,
    ):
        self._providers = providers or self._init_providers()
        self._session_options = session_options or self._init_sess_opts()
        self.run_options_provider = run_options_provider

    def _init_providers(self) -> list[tuple[str, dict[str, Any]]]:
        providers: Any = [("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
        available_providers = get_available_providers()
        logging.info(f"Available providers: {available_providers}")
        if "CUDAExecutionProvider" in available_providers and get_device() == "GPU":  # pragma: no cover
            providers.insert(
                0,
                (
                    "CUDAExecutionProvider",
                    {
                        "device_id": 0,
                        "arena_extend_strategy": "kNextPowerOfTwo",
                        "cudnn_conv_algo_search": "DEFAULT",
                        "do_copy_in_default_stream": True,
                    },
                ),
            )
        elif "CoreMLExecutionProvider" in available_providers:  # pragma: no cover
            providers.insert(0, ("CoreMLExecutionProvider", {}))
        return providers

    def _init_sess_opts(self) -> SessionOptions:
        session_options = SessionOptions()
        session_options.enable_cpu_mem_arena = True
        session_options.execution_mode = ExecutionMode.ORT_SEQUENTIAL
        session_options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
        session_options.intra_op_num_threads = -1
        session_options.inter_op_num_threads = -1
        return session_options

    @property
    def providers(self) -> list[tuple[str, dict[str, Any]]] | list[str]:
        return self._providers

    @property
    def session_options(self) -> SessionOptions:
        return self._session_options

    def __repr__(self) -> str:
        return f"EngineConfig(providers={self.providers})"


class Engine:
    """Implements an abstract class for the engine of a model

    Args:
        url: the url to use to download a model if needed
        engine_cfg: the configuration of the engine
        **kwargs: additional arguments to be passed to `download_from_url`
    """

    def __init__(self, url: str, engine_cfg: EngineConfig | None = None, **kwargs: Any) -> None:
        engine_cfg = engine_cfg if isinstance(engine_cfg, EngineConfig) else EngineConfig()
        archive_path = download_from_url(url, cache_subdir="models", **kwargs) if "http" in url else url
        # NOTE: older onnxruntime versions require a string path for windows
        archive_path = rf"{archive_path}"
        # Store model path for each model
        self.model_path = archive_path
        self.session_options = engine_cfg.session_options
        self.providers = engine_cfg.providers
        self.run_options_provider = engine_cfg.run_options_provider
        self.runtime = InferenceSession(archive_path, providers=self.providers, sess_options=self.session_options)
        self.runtime_inputs = self.runtime.get_inputs()[0]
        self.tf_exported = int(self.runtime_inputs.shape[-1]) == 3
        self.fixed_batch_size: int | str = self.runtime_inputs.shape[
            0
        ]  # mostly possible with tensorflow exported models
        self.output_name = [output.name for output in self.runtime.get_outputs()]

    def run(self, inputs: np.ndarray) -> np.ndarray:
        run_options = RunOptions()
        if self.run_options_provider is not None:
            run_options = self.run_options_provider(run_options)
        if self.tf_exported:
            inputs = shape_translate(inputs, format="BHWC")  # sanity check
        else:
            inputs = shape_translate(inputs, format="BCHW")
        if isinstance(self.fixed_batch_size, int) and self.fixed_batch_size != 0:  # dynamic batch size is a string
            inputs = np.broadcast_to(inputs, (self.fixed_batch_size, *inputs.shape))
            # combine the results
            logits = np.concatenate(
                [
                    self.runtime.run(self.output_name, {self.runtime_inputs.name: batch}, run_options=run_options)[0]
                    for batch in inputs
                ],
                axis=0,
            )
        else:
            logits = self.runtime.run(self.output_name, {self.runtime_inputs.name: inputs}, run_options=run_options)[0]
        return shape_translate(logits, format="BHWC")


================================================
FILE: onnxtr/models/factory/__init__.py
================================================
from .hub import *


================================================
FILE: onnxtr/models/factory/hub.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

# Inspired by: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hub.py

import json
import logging
import shutil
import subprocess
import tempfile
import textwrap
from pathlib import Path
from typing import Any

from huggingface_hub import (
    HfApi,
    get_token,
    hf_hub_download,
    login,
)

from onnxtr import models
from onnxtr.models.engine import EngineConfig

__all__ = ["login_to_hub", "push_to_hf_hub", "from_hub", "_save_model_and_config_for_hf_hub"]


AVAILABLE_ARCHS = {
    "classification": models.classification.zoo.ORIENTATION_ARCHS,
    "detection": models.detection.zoo.ARCHS,
    "recognition": models.recognition.zoo.ARCHS,
}


def login_to_hub() -> None:  # pragma: no cover
    """Login to huggingface hub"""
    access_token = get_token()
    if access_token is not None:
        logging.info("Huggingface Hub token found and valid")
        login(token=access_token)
    else:
        login()
    # check if git lfs is installed
    try:
        subprocess.call(["git", "lfs", "version"])
    except FileNotFoundError:
        raise OSError(
            "Looks like you do not have git-lfs installed, please install. \
                      You can install from https://git-lfs.github.com/. \
                      Then run `git lfs install` (you only have to do this once)."
        )


def _save_model_and_config_for_hf_hub(model: Any, save_dir: str, arch: str, task: str) -> None:
    """Save model and config to disk for pushing to huggingface hub

    Args:
        model: Onnx model to be saved
        save_dir: directory to save model and config
        arch: architecture name
        task: task name
    """
    save_directory = Path(save_dir)
    shutil.copy2(model.model_path, save_directory / "model.onnx")

    config_path = save_directory / "config.json"

    # add model configuration
    model_config = model.cfg
    model_config["arch"] = arch
    model_config["task"] = task

    with config_path.open("w") as f:
        json.dump(model_config, f, indent=2, ensure_ascii=False)


def push_to_hf_hub(
    model: Any, model_name: str, task: str, override: bool = False, **kwargs
) -> None:  # pragma: no cover
    """Save model and its configuration on HF hub

    >>> from onnxtr.models import login_to_hub, push_to_hf_hub
    >>> from onnxtr.models.recognition import crnn_mobilenet_v3_small
    >>> login_to_hub()
    >>> model = crnn_mobilenet_v3_small()
    >>> push_to_hf_hub(model, 'my-model', 'recognition', arch='crnn_mobilenet_v3_small')

    Args:
        model: Onnx model to be saved
        model_name: name of the model which is also the repository name
        task: task name
        override: whether to override the existing model / repo on HF hub
        **kwargs: keyword arguments for push_to_hf_hub
    """
    run_config = kwargs.get("run_config", None)
    arch = kwargs.get("arch", None)

    if run_config is None and arch is None:
        raise ValueError("run_config or arch must be specified")
    if task not in ["classification", "detection", "recognition"]:
        raise ValueError("task must be one of classification, detection, recognition")

    # default readme
    readme = textwrap.dedent(
        f"""
    ---
    language:
    - en
    - fr
    license: apache-2.0
    ---

    <p align="center">
    <img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="40%">
    </p>

    **Optical Character Recognition made seamless & accessible to anyone, powered by Onnxruntime**

    ## Task: {task}

    https://github.com/felixdittrich92/OnnxTR

    ### Example usage:

    ```python
    >>> from onnxtr.io import DocumentFile
    >>> from onnxtr.models import ocr_predictor, from_hub

    >>> img = DocumentFile.from_images(['<image_path>'])
    >>> # Load your model from the hub
    >>> model = from_hub('onnxtr/my-model')

    >>> # Pass it to the predictor
    >>> # If your model is a recognition model:
    >>> predictor = ocr_predictor(det_arch='db_mobilenet_v3_large',
    >>>                           reco_arch=model)

    >>> # If your model is a detection model:
    >>> predictor = ocr_predictor(det_arch=model,
    >>>                           reco_arch='crnn_mobilenet_v3_small')

    >>> # Get your predictions
    >>> res = predictor(img)
    ```
    """
    )

    # add run configuration to readme if available
    if run_config is not None:
        arch = run_config.arch
        readme += textwrap.dedent(
            f"""### Run Configuration
                                  \n{json.dumps(vars(run_config), indent=2, ensure_ascii=False)}"""
        )

    if arch not in AVAILABLE_ARCHS[task]:
        raise ValueError(
            f"Architecture: {arch} for task: {task} not found.\
                         \nAvailable architectures: {AVAILABLE_ARCHS}"
        )

    commit_message = f"Add {model_name} model"

    # Create repository
    api = HfApi()
    api.create_repo(model_name, token=get_token(), exist_ok=False)

    # Save model files to a temporary directory
    with tempfile.TemporaryDirectory() as tmp_dir:
        _save_model_and_config_for_hf_hub(model, tmp_dir, arch=arch, task=task)
        readme_path = Path(tmp_dir) / "README.md"
        readme_path.write_text(readme)

        # Upload all files to the hub
        api.upload_folder(
            folder_path=tmp_dir,
            repo_id=model_name,
            commit_message=commit_message,
            token=get_token(),
        )


def from_hub(repo_id: str, engine_cfg: EngineConfig | None = None, **kwargs: Any):
    """Instantiate & load a pretrained model from HF hub.

    >>> from onnxtr.models import from_hub
    >>> model = from_hub("onnxtr/my-model")

    Args:
        repo_id: HuggingFace model hub repo
        engine_cfg: configuration for the inference engine (optional)
        **kwargs: kwargs of `hf_hub_download`

    Returns:
        Model loaded with the checkpoint
    """
    # Get the config
    with open(hf_hub_download(repo_id, filename="config.json", **kwargs), "rb") as f:
        cfg = json.load(f)
        model_path = hf_hub_download(repo_id, filename="model.onnx", **kwargs)

    arch = cfg["arch"]
    task = cfg["task"]
    cfg.pop("arch")
    cfg.pop("task")

    if task == "classification":
        model = models.classification.__dict__[arch](model_path, classes=cfg["classes"], engine_cfg=engine_cfg)
    elif task == "detection":
        model = models.detection.__dict__[arch](model_path, engine_cfg=engine_cfg)
    elif task == "recognition":
        model = models.recognition.__dict__[arch](
            model_path, input_shape=cfg["input_shape"], vocab=cfg["vocab"], engine_cfg=engine_cfg
        )

    # convert all values which are lists to tuples
    for key, value in cfg.items():
        if isinstance(value, list):
            cfg[key] = tuple(value)
    # update model cfg
    model.cfg = cfg

    return model


================================================
FILE: onnxtr/models/predictor/__init__.py
================================================
from .predictor import *


================================================
FILE: onnxtr/models/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.

# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.

from collections.abc import Callable
from typing import Any

import numpy as np

from onnxtr.models.builder import DocumentBuilder
from onnxtr.models.engine import EngineConfig
from onnxtr.utils.geometry import extract_crops, extract_rcrops, remove_image_padding, rotate_image

from .._utils import estimate_orientation, rectify_crops, rectify_loc_preds
from ..classification import crop_orientation_predictor, page_orientation_predictor
from ..classification.predictor import OrientationPredictor
from ..detection.zoo import ARCHS as DETECTION_ARCHS
from ..recognition.zoo import ARCHS as RECOGNITION_ARCHS

__all__ = ["_OCRPredictor"]


class _OCRPredictor:
    """

Download .txt

gitextract_7yglu2_f/

├── .conda/
│   └── meta.yaml
├── .github/
│   ├── CODEOWNERS
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   ├── config.yml
│   │   └── feature_request.yml
│   ├── dependabot.yml
│   ├── release.yml
│   └── workflows/
│       ├── builds.yml
│       ├── clear_caches.yml
│       ├── demo.yml
│       ├── docker.yml
│       ├── main.yml
│       ├── publish.yml
│       └── style.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── demo/
│   ├── README.md
│   ├── app.py
│   ├── packages.txt
│   └── requirements.txt
├── onnxtr/
│   ├── __init__.py
│   ├── contrib/
│   │   ├── __init__.py
│   │   ├── artefacts.py
│   │   └── base.py
│   ├── file_utils.py
│   ├── io/
│   │   ├── __init__.py
│   │   ├── elements.py
│   │   ├── html.py
│   │   ├── image.py
│   │   ├── pdf.py
│   │   └── reader.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── builder.py
│   │   ├── classification/
│   │   │   ├── __init__.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   └── mobilenet.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   └── zoo.py
│   │   ├── detection/
│   │   │   ├── __init__.py
│   │   │   ├── _utils/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   ├── core.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── differentiable_binarization.py
│   │   │   │   ├── fast.py
│   │   │   │   └── linknet.py
│   │   │   ├── postprocessor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   └── base.py
│   │   │   └── zoo.py
│   │   ├── engine.py
│   │   ├── factory/
│   │   │   ├── __init__.py
│   │   │   └── hub.py
│   │   ├── predictor/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── predictor.py
│   │   ├── preprocessor/
│   │   │   ├── __init__.py
│   │   │   └── base.py
│   │   ├── recognition/
│   │   │   ├── __init__.py
│   │   │   ├── core.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── crnn.py
│   │   │   │   ├── master.py
│   │   │   │   ├── parseq.py
│   │   │   │   ├── sar.py
│   │   │   │   ├── viptr.py
│   │   │   │   └── vitstr.py
│   │   │   ├── predictor/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _utils.py
│   │   │   │   └── base.py
│   │   │   ├── utils.py
│   │   │   └── zoo.py
│   │   └── zoo.py
│   ├── py.typed
│   ├── transforms/
│   │   ├── __init__.py
│   │   └── base.py
│   └── utils/
│       ├── __init__.py
│       ├── common_types.py
│       ├── data.py
│       ├── fonts.py
│       ├── geometry.py
│       ├── multithreading.py
│       ├── reconstitution.py
│       ├── repr.py
│       ├── visualization.py
│       └── vocabs.py
├── pyproject.toml
├── scripts/
│   ├── convert_to_float16.py
│   ├── evaluate.py
│   ├── latency.py
│   └── quantize.py
├── setup.py
└── tests/
    ├── common/
    │   ├── test_contrib.py
    │   ├── test_core.py
    │   ├── test_engine_cfg.py
    │   ├── test_headers.py
    │   ├── test_io.py
    │   ├── test_io_elements.py
    │   ├── test_models.py
    │   ├── test_models_builder.py
    │   ├── test_models_classification.py
    │   ├── test_models_detection.py
    │   ├── test_models_detection_utils.py
    │   ├── test_models_factory.py
    │   ├── test_models_preprocessor.py
    │   ├── test_models_recognition.py
    │   ├── test_models_recognition_utils.py
    │   ├── test_models_zoo.py
    │   ├── test_transforms.py
    │   ├── test_utils_data.py
    │   ├── test_utils_fonts.py
    │   ├── test_utils_geometry.py
    │   ├── test_utils_multithreading.py
    │   ├── test_utils_reconstitution.py
    │   ├── test_utils_visualization.py
    │   └── test_utils_vocabs.py
    └── conftest.py

Download .txt

SYMBOL INDEX (387 symbols across 76 files)

FILE: demo/app.py
  class spaces (line 10) | class spaces:  # noqa: N801
    method GPU (line 12) | def GPU(func):  # noqa: N802
  function load_predictor (line 59) | def load_predictor(
  function forward_image (line 119) | def forward_image(predictor: OCRPredictor, image: np.ndarray) -> np.ndar...
  function matplotlib_to_pil (line 138) | def matplotlib_to_pil(fig: Figure | np.ndarray) -> Image.Image:
  function analyze_page (line 159) | def analyze_page(

FILE: onnxtr/contrib/artefacts.py
  class ArtefactDetector (line 26) | class ArtefactDetector(_BasePredictor):
    method __init__ (line 48) | def __init__(
    method preprocess (line 65) | def preprocess(self, img: np.ndarray) -> np.ndarray:
    method postprocess (line 68) | def postprocess(self, output: list[np.ndarray], input_images: list[lis...
    method show (line 106) | def show(self, **kwargs: Any) -> None:

FILE: onnxtr/contrib/base.py
  class _BasePredictor (line 14) | class _BasePredictor:
    method __init__ (line 25) | def __init__(self, batch_size: int, url: str | None = None, model_path...
    method _init_model (line 32) | def _init_model(self, url: str | None = None, model_path: str | None =...
    method preprocess (line 49) | def preprocess(self, img: np.ndarray) -> np.ndarray:
    method postprocess (line 61) | def postprocess(self, output: list[np.ndarray], input_images: list[lis...
    method __call__ (line 74) | def __call__(self, inputs: list[np.ndarray]) -> Any:

FILE: onnxtr/file_utils.py
  function requires_package (line 15) | def requires_package(name: str, extra_message: str | None = None) -> Non...

FILE: onnxtr/io/elements.py
  class Element (line 32) | class Element(NestedObject):
    method __init__ (line 38) | def __init__(self, **kwargs: Any) -> None:
    method export (line 45) | def export(self) -> dict[str, Any]:
    method from_dict (line 54) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
    method render (line 57) | def render(self) -> str:
  class Word (line 61) | class Word(Element):
    method __init__ (line 76) | def __init__(
    method render (line 91) | def render(self) -> str:
    method extra_repr (line 95) | def extra_repr(self) -> str:
    method from_dict (line 99) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
  class Artefact (line 104) | class Artefact(Element):
    method __init__ (line 117) | def __init__(self, artefact_type: str, confidence: float, geometry: Bo...
    method render (line 123) | def render(self) -> str:
    method extra_repr (line 127) | def extra_repr(self) -> str:
    method from_dict (line 131) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
  class Line (line 136) | class Line(Element):
    method __init__ (line 150) | def __init__(
    method render (line 169) | def render(self) -> str:
    method from_dict (line 174) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
  class Block (line 182) | class Block(Element):
    method __init__ (line 198) | def __init__(
    method render (line 221) | def render(self, line_break: str = "\n") -> str:
    method from_dict (line 226) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
  class Page (line 235) | class Page(Element):
    method __init__ (line 251) | def __init__(
    method render (line 267) | def render(self, block_break: str = "\n\n") -> str:
    method extra_repr (line 271) | def extra_repr(self) -> str:
    method show (line 274) | def show(self, interactive: bool = True, preserve_aspect_ratio: bool =...
    method synthesize (line 289) | def synthesize(self, **kwargs) -> np.ndarray:
    method export_as_xml (line 300) | def export_as_xml(self, file_title: str = "OnnxTR - XML export (hOCR)"...
    method from_dict (line 405) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
  class Document (line 411) | class Document(Element):
    method __init__ (line 421) | def __init__(
    method render (line 427) | def render(self, page_break: str = "\n\n\n\n") -> str:
    method show (line 431) | def show(self, **kwargs) -> None:
    method synthesize (line 436) | def synthesize(self, **kwargs) -> list[np.ndarray]:
    method export_as_xml (line 447) | def export_as_xml(self, **kwargs) -> list[tuple[bytes, ET.ElementTree]]:
    method from_dict (line 459) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):

FILE: onnxtr/io/html.py
  function read_html (line 11) | def read_html(url: str, **kwargs: Any) -> bytes:

FILE: onnxtr/io/image.py
  function read_img_as_numpy (line 16) | def read_img_as_numpy(

FILE: onnxtr/io/pdf.py
  function read_pdf (line 16) | def read_pdf(

FILE: onnxtr/io/reader.py
  class DocumentFile (line 21) | class DocumentFile:
    method from_pdf (line 25) | def from_pdf(cls, file: AbstractFile, **kwargs) -> list[np.ndarray]:
    method from_url (line 41) | def from_url(cls, url: str, **kwargs) -> list[np.ndarray]:
    method from_images (line 63) | def from_images(cls, files: Sequence[AbstractFile] | AbstractFile, **k...

FILE: onnxtr/models/_utils.py
  function get_max_width_length_ratio (line 18) | def get_max_width_length_ratio(contour: np.ndarray) -> float:
  function estimate_orientation (line 33) | def estimate_orientation(
  function rectify_crops (line 154) | def rectify_crops(
  function rectify_loc_preds (line 173) | def rectify_loc_preds(
  function get_language (line 193) | def get_language(text: str) -> tuple[str, float]:

FILE: onnxtr/models/builder.py
  class DocumentBuilder (line 19) | class DocumentBuilder(NestedObject):
    method __init__ (line 30) | def __init__(
    method _sort_boxes (line 43) | def _sort_boxes(boxes: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
    method _resolve_sub_lines (line 65) | def _resolve_sub_lines(self, boxes: np.ndarray, word_idcs: list[int]) ...
    method _resolve_lines (line 103) | def _resolve_lines(self, boxes: np.ndarray) -> list[list[int]]:
    method _resolve_blocks (line 149) | def _resolve_blocks(boxes: np.ndarray, lines: list[list[int]]) -> list...
    method _build_blocks (line 214) | def _build_blocks(
    method extra_repr (line 278) | def extra_repr(self) -> str:
    method __call__ (line 285) | def __call__(

FILE: onnxtr/models/classification/models/mobilenet.py
  class MobileNetV3 (line 41) | class MobileNetV3(Engine):
    method __init__ (line 51) | def __init__(
    method __call__ (line 62) | def __call__(
  function _mobilenet_v3 (line 69) | def _mobilenet_v3(
  function mobilenet_v3_small_crop_orientation (line 82) | def mobilenet_v3_small_crop_orientation(
  function mobilenet_v3_small_page_orientation (line 110) | def mobilenet_v3_small_page_orientation(

FILE: onnxtr/models/classification/predictor/base.py
  class OrientationPredictor (line 17) | class OrientationPredictor(NestedObject):
    method __init__ (line 29) | def __init__(
    method __call__ (line 37) | def __call__(

FILE: onnxtr/models/classification/zoo.py
  function _orientation_predictor (line 19) | def _orientation_predictor(
  function crop_orientation_predictor (line 52) | def crop_orientation_predictor(
  function page_orientation_predictor (line 88) | def page_orientation_predictor(

FILE: onnxtr/models/detection/_utils/base.py
  function _remove_padding (line 12) | def _remove_padding(

FILE: onnxtr/models/detection/core.py
  class DetectionPostProcessor (line 15) | class DetectionPostProcessor(NestedObject):
    method __init__ (line 24) | def __init__(self, box_thresh: float = 0.5, bin_thresh: float = 0.5, a...
    method extra_repr (line 30) | def extra_repr(self) -> str:
    method box_score (line 34) | def box_score(pred: np.ndarray, points: np.ndarray, assume_straight_pa...
    method bitmap_to_boxes (line 60) | def bitmap_to_boxes(
    method __call__ (line 67) | def __call__(

FILE: onnxtr/models/detection/models/differentiable_binarization.py
  class DBNet (line 42) | class DBNet(Engine):
    method __init__ (line 55) | def __init__(
    method __call__ (line 74) | def __call__(
  function _dbnet (line 93) | def _dbnet(
  function db_resnet34 (line 106) | def db_resnet34(
  function db_resnet50 (line 133) | def db_resnet50(
  function db_mobilenet_v3_large (line 160) | def db_mobilenet_v3_large(

FILE: onnxtr/models/detection/models/fast.py
  class FAST (line 40) | class FAST(Engine):
    method __init__ (line 53) | def __init__(
    method __call__ (line 72) | def __call__(
  function _fast (line 91) | def _fast(
  function fast_tiny (line 104) | def fast_tiny(
  function fast_small (line 131) | def fast_small(
  function fast_base (line 158) | def fast_base(

FILE: onnxtr/models/detection/models/linknet.py
  class LinkNet (line 42) | class LinkNet(Engine):
    method __init__ (line 55) | def __init__(
    method __call__ (line 74) | def __call__(
  function _linknet (line 93) | def _linknet(
  function linknet_resnet18 (line 106) | def linknet_resnet18(
  function linknet_resnet34 (line 133) | def linknet_resnet34(
  function linknet_resnet50 (line 160) | def linknet_resnet50(

FILE: onnxtr/models/detection/postprocessor/base.py
  class GeneralDetectionPostProcessor (line 20) | class GeneralDetectionPostProcessor(DetectionPostProcessor):
    method __init__ (line 29) | def __init__(
    method polygon_to_box (line 38) | def polygon_to_box(
    method bitmap_to_boxes (line 83) | def bitmap_to_boxes(

FILE: onnxtr/models/detection/predictor/base.py
  class DetectionPredictor (line 17) | class DetectionPredictor(NestedObject):
    method __init__ (line 27) | def __init__(
    method __call__ (line 35) | def __call__(

FILE: onnxtr/models/detection/zoo.py
  function _predictor (line 28) | def _predictor(
  function detection_predictor (line 60) | def detection_predictor(

FILE: onnxtr/models/engine.py
  class EngineConfig (line 33) | class EngineConfig:
    method __init__ (line 41) | def __init__(
    method _init_providers (line 51) | def _init_providers(self) -> list[tuple[str, dict[str, Any]]]:
    method _init_sess_opts (line 72) | def _init_sess_opts(self) -> SessionOptions:
    method providers (line 82) | def providers(self) -> list[tuple[str, dict[str, Any]]] | list[str]:
    method session_options (line 86) | def session_options(self) -> SessionOptions:
    method __repr__ (line 89) | def __repr__(self) -> str:
  class Engine (line 93) | class Engine:
    method __init__ (line 102) | def __init__(self, url: str, engine_cfg: EngineConfig | None = None, *...
    method run (line 120) | def run(self, inputs: np.ndarray) -> np.ndarray:

FILE: onnxtr/models/factory/hub.py
  function login_to_hub (line 37) | def login_to_hub() -> None:  # pragma: no cover
  function _save_model_and_config_for_hf_hub (line 56) | def _save_model_and_config_for_hf_hub(model: Any, save_dir: str, arch: s...
  function push_to_hf_hub (line 79) | def push_to_hf_hub(
  function from_hub (line 185) | def from_hub(repo_id: str, engine_cfg: EngineConfig | None = None, **kwa...

FILE: onnxtr/models/predictor/base.py
  class _OCRPredictor (line 24) | class _OCRPredictor:
    method __init__ (line 45) | def __init__(
    method _general_page_orientations (line 79) | def _general_page_orientations(
    method _get_orientations (line 92) | def _get_orientations(
    method _straighten_pages (line 102) | def _straighten_pages(
    method _generate_crops (line 127) | def _generate_crops(
    method _prepare_crops (line 147) | def _prepare_crops(
    method _rectify_crops (line 166) | def _rectify_crops(
    method _process_predictions (line 187) | def _process_predictions(
    method add_hook (line 204) | def add_hook(self, hook: Callable) -> None:
    method list_archs (line 212) | def list_archs(self) -> dict[str, list[str]]:

FILE: onnxtr/models/predictor/predictor.py
  class OCRPredictor (line 23) | class OCRPredictor(NestedObject, _OCRPredictor):
    method __init__ (line 44) | def __init__(
    method __call__ (line 72) | def __call__(

FILE: onnxtr/models/preprocessor/base.py
  class PreProcessor (line 19) | class PreProcessor(NestedObject):
    method __init__ (line 32) | def __init__(
    method batch_inputs (line 44) | def batch_inputs(self, samples: list[np.ndarray]) -> list[np.ndarray]:
    method sample_transforms (line 61) | def sample_transforms(self, x: np.ndarray) -> np.ndarray:
    method __call__ (line 77) | def __call__(self, x: np.ndarray | list[np.ndarray]) -> list[np.ndarray]:

FILE: onnxtr/models/recognition/core.py
  class RecognitionPostProcessor (line 12) | class RecognitionPostProcessor(NestedObject):
    method __init__ (line 19) | def __init__(
    method extra_repr (line 26) | def extra_repr(self) -> str:

FILE: onnxtr/models/recognition/models/crnn.py
  class CRNNPostProcessor (line 48) | class CRNNPostProcessor(RecognitionPostProcessor):
    method __init__ (line 55) | def __init__(self, vocab):
    method decode_sequence (line 58) | def decode_sequence(self, sequence, vocab):
    method ctc_best_path (line 61) | def ctc_best_path(
    method __call__ (line 89) | def __call__(self, logits):
  class CRNN (line 104) | class CRNN(Engine):
    method __init__ (line 117) | def __init__(
    method __call__ (line 132) | def __call__(
  function _crnn (line 149) | def _crnn(
  function crnn_vgg16_bn (line 168) | def crnn_vgg16_bn(
  function crnn_mobilenet_v3_small (line 195) | def crnn_mobilenet_v3_small(
  function crnn_mobilenet_v3_large (line 222) | def crnn_mobilenet_v3_large(

FILE: onnxtr/models/recognition/models/master.py
  class MASTER (line 32) | class MASTER(Engine):
    method __init__ (line 43) | def __init__(
    method __call__ (line 58) | def __call__(
  class MASTERPostProcessor (line 83) | class MASTERPostProcessor(RecognitionPostProcessor):
    method __init__ (line 90) | def __init__(
    method __call__ (line 97) | def __call__(self, logits: np.ndarray) -> list[tuple[str, float]]:
  function _master (line 112) | def _master(
  function master (line 131) | def master(

FILE: onnxtr/models/recognition/models/parseq.py
  class PARSeq (line 31) | class PARSeq(Engine):
    method __init__ (line 42) | def __init__(
    method __call__ (line 57) | def __call__(
  class PARSeqPostProcessor (line 72) | class PARSeqPostProcessor(RecognitionPostProcessor):
    method __init__ (line 79) | def __init__(
    method __call__ (line 86) | def __call__(self, logits):
  function _parseq (line 103) | def _parseq(
  function parseq (line 123) | def parseq(

FILE: onnxtr/models/recognition/models/sar.py
  class SAR (line 31) | class SAR(Engine):
    method __init__ (line 42) | def __init__(
    method __call__ (line 57) | def __call__(
  class SARPostProcessor (line 73) | class SARPostProcessor(RecognitionPostProcessor):
    method __init__ (line 80) | def __init__(
    method __call__ (line 87) | def __call__(self, logits):
  function _sar (line 102) | def _sar(
  function sar_resnet31 (line 122) | def sar_resnet31(

FILE: onnxtr/models/recognition/models/viptr.py
  class VIPTRPostProcessor (line 33) | class VIPTRPostProcessor(RecognitionPostProcessor):
    method __init__ (line 40) | def __init__(self, vocab):
    method decode_sequence (line 43) | def decode_sequence(self, sequence, vocab):
    method ctc_best_path (line 46) | def ctc_best_path(
    method __call__ (line 74) | def __call__(self, logits):
  class VIPTR (line 89) | class VIPTR(Engine):
    method __init__ (line 102) | def __init__(
    method __call__ (line 117) | def __call__(
  function _viptr (line 134) | def _viptr(
  function viptr_tiny (line 155) | def viptr_tiny(

FILE: onnxtr/models/recognition/models/vitstr.py
  class ViTSTR (line 39) | class ViTSTR(Engine):
    method __init__ (line 50) | def __init__(
    method __call__ (line 65) | def __call__(
  class ViTSTRPostProcessor (line 81) | class ViTSTRPostProcessor(RecognitionPostProcessor):
    method __init__ (line 88) | def __init__(
    method __call__ (line 95) | def __call__(self, logits):
  function _vitstr (line 112) | def _vitstr(
  function vitstr_small (line 132) | def vitstr_small(
  function vitstr_base (line 159) | def vitstr_base(

FILE: onnxtr/models/recognition/predictor/_utils.py
  function split_crops (line 16) | def split_crops(
  function _split_horizontally (line 73) | def _split_horizontally(
  function remap_preds (line 119) | def remap_preds(

FILE: onnxtr/models/recognition/predictor/base.py
  class RecognitionPredictor (line 19) | class RecognitionPredictor(NestedObject):
    method __init__ (line 28) | def __init__(
    method __call__ (line 42) | def __call__(

FILE: onnxtr/models/recognition/utils.py
  function merge_strings (line 12) | def merge_strings(a: str, b: str, overlap_ratio: float) -> str:
  function merge_multi_strings (line 69) | def merge_multi_strings(seq_list: list[str], overlap_ratio: float, last_...

FILE: onnxtr/models/recognition/zoo.py
  function _predictor (line 29) | def _predictor(
  function recognition_predictor (line 61) | def recognition_predictor(

FILE: onnxtr/models/zoo.py
  function _predictor (line 16) | def _predictor(
  function ocr_predictor (line 66) | def ocr_predictor(

FILE: onnxtr/transforms/base.py
  class Resize (line 15) | class Resize:
    method __init__ (line 25) | def __init__(
    method __call__ (line 41) | def __call__(self, img: np.ndarray) -> np.ndarray:
    method __repr__ (line 88) | def __repr__(self) -> str:
  class Normalize (line 96) | class Normalize:
    method __init__ (line 104) | def __init__(
    method __call__ (line 117) | def __call__(
    method __repr__ (line 124) | def __repr__(self) -> str:

FILE: onnxtr/utils/data.py
  function _urlretrieve (line 26) | def _urlretrieve(url: str, filename: Path | str, chunk_size: int = 1024)...
  function _check_integrity (line 37) | def _check_integrity(file_path: str | Path, hash_prefix: str) -> bool:
  function download_from_url (line 44) | def download_from_url(

FILE: onnxtr/utils/fonts.py
  function get_font (line 14) | def get_font(font_family: str | None = None, font_size: int = 13) -> Ima...

FILE: onnxtr/utils/geometry.py
  function bbox_to_polygon (line 33) | def bbox_to_polygon(bbox: BoundingBox) -> Polygon4P:
  function polygon_to_bbox (line 45) | def polygon_to_bbox(polygon: Polygon4P) -> BoundingBox:
  function order_points (line 58) | def order_points(pts: np.ndarray) -> np.ndarray:
  function detach_scores (line 108) | def detach_scores(boxes: list[np.ndarray]) -> tuple[list[np.ndarray], li...
  function shape_translate (line 128) | def shape_translate(data: np.ndarray, format: str) -> np.ndarray:
  function resolve_enclosing_bbox (line 167) | def resolve_enclosing_bbox(bboxes: list[BoundingBox] | np.ndarray) -> Bo...
  function resolve_enclosing_rbbox (line 189) | def resolve_enclosing_rbbox(rbboxes: list[np.ndarray], intermed_size: in...
  function rotate_abs_points (line 210) | def rotate_abs_points(points: np.ndarray, angle: float = 0.0) -> np.ndar...
  function compute_expanded_shape (line 227) | def compute_expanded_shape(img_shape: tuple[int, int], angle: float) -> ...
  function rotate_abs_geoms (line 248) | def rotate_abs_geoms(
  function remap_boxes (line 289) | def remap_boxes(loc_preds: np.ndarray, orig_shape: tuple[int, int], dest...
  function rotate_boxes (line 315) | def rotate_boxes(
  function rotate_image (line 372) | def rotate_image(
  function remove_image_padding (line 421) | def remove_image_padding(image: np.ndarray) -> np.ndarray:
  function estimate_page_angle (line 439) | def estimate_page_angle(polys: np.ndarray) -> float:
  function convert_to_relative_coords (line 457) | def convert_to_relative_coords(geoms: np.ndarray, img_shape: tuple[int, ...
  function extract_crops (line 482) | def extract_crops(img: np.ndarray, boxes: np.ndarray, channels_last: boo...
  function extract_rcrops (line 514) | def extract_rcrops(

FILE: onnxtr/utils/multithreading.py
  function multithread_exec (line 18) | def multithread_exec(func: Callable[[Any], Any], seq: Iterable[Any], thr...

FILE: onnxtr/utils/reconstitution.py
  function _warn_rotation (line 21) | def _warn_rotation(entry: dict[str, Any]) -> None:  # pragma: no cover
  function _synthesize (line 28) | def _synthesize(
  function synthesize_page (line 113) | def synthesize_page(

FILE: onnxtr/utils/repr.py
  function _addindent (line 12) | def _addindent(s_, num_spaces):
  class NestedObject (line 24) | class NestedObject:
    method extra_repr (line 29) | def extra_repr(self) -> str:
    method __repr__ (line 32) | def __repr__(self):

FILE: onnxtr/utils/visualization.py
  function rect_patch (line 20) | def rect_patch(
  function polygon_patch (line 69) | def polygon_patch(
  function create_obj_patch (line 112) | def create_obj_patch(
  function visualize_page (line 137) | def visualize_page(
  function draw_boxes (line 261) | def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: tuple[int, i...

FILE: scripts/convert_to_float16.py
  function _load_model (line 34) | def _load_model(arch: str, model_path: str | None = None) -> Any:
  function _latency_check (line 46) | def _latency_check(args: Any, size: tuple[int], model: Any, img_tensor: ...
  function _validate (line 64) | def _validate(fp32_in: list[np.ndarray], fp16_in: list[np.ndarray]) -> b...
  function main (line 75) | def main(args):

FILE: scripts/evaluate.py
  function _pct (line 27) | def _pct(val):
  function main (line 31) | def main(args):
  function parse_args (line 233) | def parse_args():

FILE: scripts/latency.py
  function main (line 17) | def main(args):

FILE: scripts/quantize.py
  class TaskShapes (line 15) | class TaskShapes(Enum):
  class CalibrationDataLoader (line 24) | class CalibrationDataLoader(CalibrationDataReader):
    method __init__ (line 25) | def __init__(self, calibration_image_folder: str, model_path: str, tas...
    method get_next (line 39) | def get_next(self):
    method rewind (line 46) | def rewind(self):
  function benchmark (line 50) | def benchmark(calibration_image_folder: str, model_path: str, task_shape...
  function benchmark_mean_diff (line 71) | def benchmark_mean_diff(
  function main (line 90) | def main(args):

FILE: tests/common/test_contrib.py
  function test_base_predictor (line 9) | def test_base_predictor():
  function test_artefact_detector (line 22) | def test_artefact_detector(mock_artefact_image_stream):

FILE: tests/common/test_core.py
  function test_version (line 7) | def test_version():
  function test_requires_package (line 11) | def test_requires_package():

FILE: tests/common/test_engine_cfg.py
  function _get_rss_mb (line 14) | def _get_rss_mb():
  function _test_predictor (line 20) | def _test_predictor(predictor):
  function test_engine_cfg (line 41) | def test_engine_cfg(det_arch, reco_arch):
  function test_cpu_memory_arena_shrinkage_enabled (line 86) | def test_cpu_memory_arena_shrinkage_enabled():

FILE: tests/common/test_headers.py
  function test_copyright_header (line 7) | def test_copyright_header():

FILE: tests/common/test_io.py
  function _check_doc_content (line 11) | def _check_doc_content(doc_tensors, num_pages):
  function test_read_pdf (line 18) | def test_read_pdf(mock_pdf):
  function test_read_img_as_numpy (line 39) | def test_read_img_as_numpy(tmpdir_factory, mock_pdf):
  function test_read_html (line 80) | def test_read_html():
  function test_document_file (line 86) | def test_document_file(mock_pdf, mock_artefact_image_stream):
  function test_pdf (line 94) | def test_pdf(mock_pdf):

FILE: tests/common/test_io_elements.py
  function _mock_words (line 9) | def _mock_words(size=(1.0, 1.0), offset=(0, 0), confidence=0.9, objectne...
  function _mock_artefacts (line 57) | def _mock_artefacts(size=(1, 1), offset=(0, 0), confidence=0.8):
  function _mock_lines (line 71) | def _mock_lines(size=(1, 1), offset=(0, 0), polygons=False):
  function _mock_blocks (line 81) | def _mock_blocks(size=(1, 1), offset=(0, 0), polygons=False):
  function _mock_pages (line 97) | def _mock_pages(block_size=(1, 1), block_offset=(0, 0), polygons=False):
  function test_element (line 118) | def test_element():
  function test_word (line 123) | def test_word():
  function test_line (line 165) | def test_line():
  function test_artefact (line 212) | def test_artefact():
  function test_block (line 233) | def test_block():
  function test_page (line 260) | def test_page():
  function test_document (line 309) | def test_document():

FILE: tests/common/test_models.py
  function mock_image (line 14) | def mock_image(tmpdir_factory):
  function mock_bitmap (line 25) | def mock_bitmap(mock_image):
  function test_estimate_orientation (line 31) | def test_estimate_orientation(mock_image, mock_bitmap, mock_tilted_paysl...
  function test_get_lang (line 80) | def test_get_lang():

FILE: tests/common/test_models_builder.py
  function test_documentbuilder (line 10) | def test_documentbuilder():
  function test_sort_boxes (line 108) | def test_sort_boxes(input_boxes, sorted_idxs):
  function test_resolve_lines (line 131) | def test_resolve_lines(input_boxes, lines):

FILE: tests/common/test_models_classification.py
  function test_classification_models (line 17) | def test_classification_models(arch_name, input_shape):
  function test_classification_zoo (line 34) | def test_classification_zoo(arch_name):
  function test_crop_orientation_model (line 65) | def test_crop_orientation_model(mock_text_box, quantized):
  function test_page_orientation_model (line 98) | def test_page_orientation_model(mock_payslip, quantized):

FILE: tests/common/test_models_detection.py
  function test_postprocessor (line 10) | def test_postprocessor():
  function test_detection_models (line 81) | def test_detection_models(arch_name, input_shape, output_size, out_prob,...
  function test_detection_zoo (line 117) | def test_detection_zoo(arch_name, quantized):

FILE: tests/common/test_models_detection_utils.py
  function test_remove_padding (line 11) | def test_remove_padding(pages, preserve_aspect_ratio, symmetric_pad, ass...

FILE: tests/common/test_models_factory.py
  function test_push_to_hf_hub (line 17) | def test_push_to_hf_hub():
  function test_models_huggingface_hub (line 30) | def test_models_huggingface_hub(tmpdir):

FILE: tests/common/test_models_preprocessor.py
  function test_preprocessor (line 16) | def test_preprocessor(batch_size, output_size, input_tensor, expected_ba...

FILE: tests/common/test_models_recognition.py
  function test_recognition_postprocessor (line 12) | def test_recognition_postprocessor():
  function test_split_crops (line 31) | def test_split_crops(crops, max_ratio, target_ratio, target_overlap_rati...
  function test_remap_preds (line 47) | def test_remap_preds(preds, crop_map, split_overlap_ratio, pred):
  function test_split_crops_cases (line 83) | def test_split_crops_cases(
  function test_invalid_split_overlap_ratio (line 124) | def test_invalid_split_overlap_ratio(split_overlap_ratio):
  function test_recognition_models (line 149) | def test_recognition_models(arch_name, input_shape, quantized):
  function test_recognition_zoo (line 213) | def test_recognition_zoo(arch_name, input_shape, quantized):

FILE: tests/common/test_models_recognition_utils.py
  function test_merge_strings (line 46) | def test_merge_strings(a, b, overlap_ratio, merged):
  function test_merge_multi_strings (line 62) | def test_merge_multi_strings(seq_list, overlap_ratio, last_overlap_ratio...

FILE: tests/common/test_models_zoo.py
  class _DummyCallback (line 22) | class _DummyCallback:
    method __call__ (line 23) | def __call__(self, loc_preds):
  function test_ocrpredictor (line 37) | def test_ocrpredictor(
  function test_trained_ocr_predictor (line 121) | def test_trained_ocr_predictor(mock_payslip):
  function _test_predictor (line 185) | def _test_predictor(predictor):
  function test_zoo_models (line 207) | def test_zoo_models(det_arch, reco_arch, quantized):

FILE: tests/common/test_transforms.py
  function test_resize (line 7) | def test_resize():
  function test_normalize (line 58) | def test_normalize(input_shape):

FILE: tests/common/test_utils_data.py
  function test__urlretrieve (line 11) | def test__urlretrieve():
  function test_download_from_url (line 24) | def test_download_from_url(mkdir_mock, urlretrieve_mock):
  function test_download_from_url_customizing_cache_dir (line 32) | def test_download_from_url_customizing_cache_dir(mkdir_mock, urlretrieve...
  function test_download_from_url_error_creating_directory (line 40) | def test_download_from_url_error_creating_directory(logging_mock, mkdir_...
  function test_download_from_url_error_creating_directory_with_env_var (line 52) | def test_download_from_url_error_creating_directory_with_env_var(logging...

FILE: tests/common/test_utils_fonts.py
  function test_get_font (line 6) | def test_get_font():

FILE: tests/common/test_utils_geometry.py
  function test_bbox_to_polygon (line 11) | def test_bbox_to_polygon():
  function test_polygon_to_bbox (line 15) | def test_polygon_to_bbox():
  function test_order_points (line 19) | def test_order_points():
  function test_detach_scores (line 60) | def test_detach_scores():
  function test_resolve_enclosing_bbox (line 81) | def test_resolve_enclosing_bbox():
  function test_resolve_enclosing_rbbox (line 87) | def test_resolve_enclosing_rbbox():
  function test_remap_boxes (line 97) | def test_remap_boxes():
  function test_rotate_boxes (line 141) | def test_rotate_boxes():
  function sample_geoms (line 164) | def sample_geoms():
  function test_rotate_abs_geoms (line 179) | def test_rotate_abs_geoms(sample_geoms):
  function test_rotate_image (line 188) | def test_rotate_image():
  function test_remove_image_padding (line 211) | def test_remove_image_padding():
  function test_convert_to_relative_coords (line 243) | def test_convert_to_relative_coords(abs_geoms, img_size, rel_geoms):
  function test_estimate_page_angle (line 251) | def test_estimate_page_angle():
  function test_extract_crops (line 266) | def test_extract_crops(mock_pdf):
  function test_extract_rcrops (line 315) | def test_extract_rcrops(mock_pdf, assume_horizontal):
  function test_shape_translate (line 363) | def test_shape_translate(format, input_shape, expected_shape):

FILE: tests/common/test_utils_multithreading.py
  function test_multithread_exec (line 22) | def test_multithread_exec(input_seq, func, output_seq):
  function test_multithread_exec_multiprocessing_disable (line 28) | def test_multithread_exec_multiprocessing_disable():

FILE: tests/common/test_utils_reconstitution.py
  function test_synthesize_page (line 7) | def test_synthesize_page():

FILE: tests/common/test_utils_visualization.py
  function test_visualize_page (line 8) | def test_visualize_page():
  function test_draw_boxes (line 35) | def test_draw_boxes():

FILE: tests/common/test_utils_vocabs.py
  function test_vocabs_duplicates (line 6) | def test_vocabs_duplicates():

FILE: tests/conftest.py
  function synthesize_text_img (line 13) | def synthesize_text_img(
  function mock_vocab (line 41) | def mock_vocab():
  function mock_pdf (line 46) | def mock_pdf(tmpdir_factory):
  function mock_payslip (line 65) | def mock_payslip(tmpdir_factory):
  function mock_tilted_payslip (line 76) | def mock_tilted_payslip(mock_payslip, tmpdir_factory):
  function mock_text_box_stream (line 85) | def mock_text_box_stream():
  function mock_text_box (line 91) | def mock_text_box(mock_text_box_stream, tmpdir_factory):
  function mock_artefact_image_stream (line 100) | def mock_artefact_image_stream():

Download .json

Condensed preview — 126 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (567K chars).

[
  {
    "path": ".conda/meta.yaml",
    "chars": 1223,
    "preview": "{% set pyproject = load_file_data('../pyproject.toml', from_recipe_dir=True) %}\n{% set project = pyproject.get('project'"
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 24,
    "preview": "*       @felixdittrich92"
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 859,
    "preview": "# These are supported funding model platforms\n\ngithub: felixdittrich92\npatreon: # Replace with a single Patreon username"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yml",
    "chars": 1730,
    "preview": "name: 🐛 Bug report\ndescription: Create a report to help us improve the library\nlabels: 'type: bug'\n\nbody:\n- type: markdo"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 203,
    "preview": "blank_issues_enabled: true\ncontact_links:\n  - name: Usage questions\n    url: https://github.com/felixdittrich92/OnnxTR/d"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yml",
    "chars": 696,
    "preview": "name: 🚀 Feature request\ndescription: >\n  Submit a proposal/request for a new feature for OnnxTR. Please search for exist"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 481,
    "preview": "version: 2\nupdates:\n  - package-ecosystem: \"pip\"\n    directory: \"/\"\n    open-pull-requests-limit: 10\n    target-branch: "
  },
  {
    "path": ".github/release.yml",
    "chars": 483,
    "preview": "changelog:\n  exclude:\n    labels:\n      - ignore-for-release\n  categories:\n    - title: Breaking Changes 🛠\n      labels:"
  },
  {
    "path": ".github/workflows/builds.yml",
    "chars": 2016,
    "preview": "name: builds\n\non:\n  push:\n    branches: main\n  pull_request:\n    branches: main\n  schedule:\n    # Runs every 2 weeks on "
  },
  {
    "path": ".github/workflows/clear_caches.yml",
    "chars": 301,
    "preview": "name: Clear GitHub runner caches\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron: '0 0 * * *'  # Runs once a day\n\njobs:"
  },
  {
    "path": ".github/workflows/demo.yml",
    "chars": 2472,
    "preview": "name: Sync Hugging Face demo\n\non:\n  # Run 'test-demo' on every pull request to the main branch\n  pull_request:\n    branc"
  },
  {
    "path": ".github/workflows/docker.yml",
    "chars": 4301,
    "preview": "# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages\n#"
  },
  {
    "path": ".github/workflows/main.yml",
    "chars": 1756,
    "preview": "name: tests\n\non:\n  push:\n    branches: main\n  pull_request:\n    branches: main\n  schedule:\n    # Runs every 2 weeks on M"
  },
  {
    "path": ".github/workflows/publish.yml",
    "chars": 3688,
    "preview": "name: publish\n\non:\n  release:\n    types: [published]\n\njobs:\n  pypi:\n    if: \"!github.event.release.prerelease\"\n    strat"
  },
  {
    "path": ".github/workflows/style.yml",
    "chars": 1314,
    "preview": "name: style\n\non:\n  push:\n    branches: main\n  pull_request:\n    branches: main\n\njobs:\n  ruff:\n    runs-on: ${{ matrix.os"
  },
  {
    "path": ".gitignore",
    "chars": 1958,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 611,
    "preview": "repos:\n  - repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: v6.0.0\n    hooks:\n      - id: check-ast\n      -"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 5220,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participa"
  },
  {
    "path": "Dockerfile",
    "chars": 1304,
    "preview": "ARG BASE_IMAGE\n\nFROM ${BASE_IMAGE}\n\nENV DEBIAN_FRONTEND=noninteractive\nENV LANG=C.UTF-8\nENV PYTHONUNBUFFERED=1\nENV PYTHO"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "Makefile",
    "chars": 534,
    "preview": ".PHONY: quality style test  docs-single-version docs\n# this target runs checks on all files\nquality:\n\truff check .\n\tmypy"
  },
  {
    "path": "README.md",
    "chars": 18954,
    "preview": "<p align=\"center\">\n  <img src=\"https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg\" width=\"40%\">\n</p"
  },
  {
    "path": "demo/README.md",
    "chars": 340,
    "preview": "---\ntitle: OnnxTR OCR\nemoji: 🔥\ncolorFrom: red\ncolorTo: purple\nsdk: gradio\nsdk_version: 5.34.2\napp_file: app.py\npinned: f"
  },
  {
    "path": "demo/app.py",
    "chars": 11378,
    "preview": "import io\nimport os\nfrom typing import Any\n\n# NOTE: This is a fix to run the demo on the HuggingFace Zero GPU or CPU spa"
  },
  {
    "path": "demo/packages.txt",
    "chars": 34,
    "preview": "python3-opencv\nfonts-freefont-ttf\n"
  },
  {
    "path": "demo/requirements.txt",
    "chars": 227,
    "preview": "-e \"onnxtr[gpu-headless,viz] @ git+https://github.com/felixdittrich92/OnnxTR.git\"\ngradio>=5.30.0,<7.0.0\nspaces>=0.37.0\n\n"
  },
  {
    "path": "onnxtr/__init__.py",
    "chars": 100,
    "preview": "from . import io, models, contrib, transforms, utils\nfrom .version import __version__  # noqa: F401\n"
  },
  {
    "path": "onnxtr/contrib/__init__.py",
    "chars": 39,
    "preview": "from .artefacts import ArtefactDetector"
  },
  {
    "path": "onnxtr/contrib/artefacts.py",
    "chars": 5323,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/contrib/base.py",
    "chars": 3135,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/file_utils.py",
    "chars": 1045,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/io/__init__.py",
    "chars": 106,
    "preview": "from .elements import *\nfrom .html import *\nfrom .image import *\nfrom .pdf import *\nfrom .reader import *\n"
  },
  {
    "path": "onnxtr/io/elements.py",
    "chars": 17751,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/io/html.py",
    "chars": 716,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/io/image.py",
    "chars": 1700,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/io/pdf.py",
    "chars": 1327,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/io/reader.py",
    "chars": 2755,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/__init__.py",
    "chars": 157,
    "preview": "from .engine import EngineConfig\nfrom .classification import *\nfrom .detection import *\nfrom .recognition import *\nfrom "
  },
  {
    "path": "onnxtr/models/_utils.py",
    "chars": 7550,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/builder.py",
    "chars": 13998,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/classification/__init__.py",
    "chars": 41,
    "preview": "from .models import *\nfrom .zoo import *\n"
  },
  {
    "path": "onnxtr/models/classification/models/__init__.py",
    "chars": 24,
    "preview": "from .mobilenet import *"
  },
  {
    "path": "onnxtr/models/classification/models/mobilenet.py",
    "chars": 4818,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/classification/predictor/__init__.py",
    "chars": 20,
    "preview": "from .base import *\n"
  },
  {
    "path": "onnxtr/models/classification/predictor/base.py",
    "chars": 2313,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/classification/zoo.py",
    "chars": 4271,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/__init__.py",
    "chars": 41,
    "preview": "from .models import *\nfrom .zoo import *\n"
  },
  {
    "path": "onnxtr/models/detection/_utils/__init__.py",
    "chars": 20,
    "preview": "from . base import *"
  },
  {
    "path": "onnxtr/models/detection/_utils/base.py",
    "chars": 2300,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/core.py",
    "chars": 3466,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/models/__init__.py",
    "chars": 85,
    "preview": "from .fast import *\nfrom .differentiable_binarization import *\nfrom .linknet import *"
  },
  {
    "path": "onnxtr/models/detection/models/differentiable_binarization.py",
    "chars": 6630,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/models/fast.py",
    "chars": 6188,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/models/linknet.py",
    "chars": 6666,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/postprocessor/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "onnxtr/models/detection/postprocessor/base.py",
    "chars": 5689,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/predictor/__init__.py",
    "chars": 20,
    "preview": "from .base import *\n"
  },
  {
    "path": "onnxtr/models/detection/predictor/base.py",
    "chars": 2293,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/detection/zoo.py",
    "chars": 3385,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/engine.py",
    "chars": 5880,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/factory/__init__.py",
    "chars": 19,
    "preview": "from .hub import *\n"
  },
  {
    "path": "onnxtr/models/factory/hub.py",
    "chars": 7119,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/predictor/__init__.py",
    "chars": 25,
    "preview": "from .predictor import *\n"
  },
  {
    "path": "onnxtr/models/predictor/base.py",
    "chars": 9432,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/predictor/predictor.py",
    "chars": 6351,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/preprocessor/__init__.py",
    "chars": 20,
    "preview": "from .base import *\n"
  },
  {
    "path": "onnxtr/models/preprocessor/base.py",
    "chars": 3963,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/__init__.py",
    "chars": 41,
    "preview": "from .models import *\nfrom .zoo import *\n"
  },
  {
    "path": "onnxtr/models/recognition/core.py",
    "chars": 730,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/__init__.py",
    "chars": 126,
    "preview": "from .crnn import *\nfrom .sar import *\nfrom .master import *\nfrom .vitstr import *\nfrom .parseq import *\nfrom .viptr imp"
  },
  {
    "path": "onnxtr/models/recognition/models/crnn.py",
    "chars": 8779,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/master.py",
    "chars": 4669,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/parseq.py",
    "chars": 4512,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/sar.py",
    "chars": 4523,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/viptr.py",
    "chars": 5656,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/models/vitstr.py",
    "chars": 5964,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/predictor/__init__.py",
    "chars": 20,
    "preview": "from .base import *\n"
  },
  {
    "path": "onnxtr/models/recognition/predictor/_utils.py",
    "chars": 5182,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/predictor/base.py",
    "chars": 2503,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/utils.py",
    "chars": 3756,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/recognition/zoo.py",
    "chars": 3018,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/models/zoo.py",
    "chars": 5303,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/py.typed",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "onnxtr/transforms/__init__.py",
    "chars": 20,
    "preview": "from .base import *\n"
  },
  {
    "path": "onnxtr/transforms/base.py",
    "chars": 4120,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/__init__.py",
    "chars": 94,
    "preview": "from .common_types import *\nfrom .data import *\nfrom .geometry import *\nfrom .vocabs import *\n"
  },
  {
    "path": "onnxtr/utils/common_types.py",
    "chars": 551,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/data.py",
    "chars": 4248,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/fonts.py",
    "chars": 1282,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/geometry.py",
    "chars": 22002,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/multithreading.py",
    "chars": 1994,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/reconstitution.py",
    "chars": 6119,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/repr.py",
    "chars": 2105,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/visualization.py",
    "chars": 9831,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "onnxtr/utils/vocabs.py",
    "chars": 52988,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "pyproject.toml",
    "chars": 4731,
    "preview": "[build-system]\nrequires = [\"setuptools\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"onnxtr\"\ndes"
  },
  {
    "path": "scripts/convert_to_float16.py",
    "chars": 4423,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "scripts/evaluate.py",
    "chars": 11418,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "scripts/latency.py",
    "chars": 2103,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "scripts/quantize.py",
    "chars": 7313,
    "preview": "import argparse\nimport os\nimport time\nfrom enum import Enum\n\nimport numpy as np\nimport onnxruntime\nfrom onnxruntime.quan"
  },
  {
    "path": "setup.py",
    "chars": 682,
    "preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
  },
  {
    "path": "tests/common/test_contrib.py",
    "chars": 1707,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.contrib import artefacts\nfrom onnxtr.contrib.base import _BasePredictor\nfr"
  },
  {
    "path": "tests/common/test_core.py",
    "chars": 327,
    "preview": "import pytest\n\nimport onnxtr\nfrom onnxtr.file_utils import requires_package\n\n\ndef test_version():\n    assert len(onnxtr."
  },
  {
    "path": "tests/common/test_engine_cfg.py",
    "chars": 4812,
    "preview": "import gc\n\nimport numpy as np\nimport psutil\nimport pytest\nfrom onnxruntime import RunOptions, SessionOptions\n\nfrom onnxt"
  },
  {
    "path": "tests/common/test_headers.py",
    "chars": 1018,
    "preview": "\"\"\"Test for python files copyright headers.\"\"\"\n\nfrom datetime import datetime\nfrom pathlib import Path\n\n\ndef test_copyri"
  },
  {
    "path": "tests/common/test_io.py",
    "chars": 2730,
    "preview": "from io import BytesIO\nfrom pathlib import Path\n\nimport numpy as np\nimport pytest\nimport requests\n\nfrom onnxtr import io"
  },
  {
    "path": "tests/common/test_io_elements.py",
    "chars": 11232,
    "preview": "from xml.etree.ElementTree import ElementTree\n\nimport numpy as np\nimport pytest\n\nfrom onnxtr.io import elements\n\n\ndef _m"
  },
  {
    "path": "tests/common/test_models.py",
    "chars": 3136,
    "preview": "from io import BytesIO\n\nimport cv2\nimport numpy as np\nimport pytest\nimport requests\n\nfrom onnxtr.io import reader\nfrom o"
  },
  {
    "path": "tests/common/test_models_builder.py",
    "chars": 5267,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.io import Document\nfrom onnxtr.models import builder\n\nwords_per_page = 10\n"
  },
  {
    "path": "tests/common/test_models_classification.py",
    "chars": 5209,
    "preview": "import cv2\nimport numpy as np\nimport pytest\n\nfrom onnxtr.models import classification, detection\nfrom onnxtr.models.clas"
  },
  {
    "path": "tests/common/test_models_detection.py",
    "chars": 5308,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models import detection\nfrom onnxtr.models.detection.postprocessor.base im"
  },
  {
    "path": "tests/common/test_models_detection_utils.py",
    "chars": 2068,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models.detection._utils import _remove_padding\n\n\n@pytest.mark.parametrize("
  },
  {
    "path": "tests/common/test_models_factory.py",
    "chars": 2074,
    "preview": "import json\nimport os\nimport tempfile\n\nimport pytest\n\nfrom onnxtr import models\nfrom onnxtr.models.factory import _save_"
  },
  {
    "path": "tests/common/test_models_preprocessor.py",
    "chars": 1667,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models.preprocessor import PreProcessor\n\n\n@pytest.mark.parametrize(\n    \"b"
  },
  {
    "path": "tests/common/test_models_recognition.py",
    "chars": 8293,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models import recognition\nfrom onnxtr.models.engine import Engine\nfrom onn"
  },
  {
    "path": "tests/common/test_models_recognition_utils.py",
    "chars": 2805,
    "preview": "import pytest\n\nfrom onnxtr.models.recognition.utils import merge_multi_strings, merge_strings\n\n\n@pytest.mark.parametrize"
  },
  {
    "path": "tests/common/test_models_zoo.py",
    "chars": 8091,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr import models\nfrom onnxtr.io import Document, DocumentFile\nfrom onnxtr.mod"
  },
  {
    "path": "tests/common/test_transforms.py",
    "chars": 2048,
    "preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.transforms import Normalize, Resize\n\n\ndef test_resize():\n    output_size ="
  },
  {
    "path": "tests/common/test_utils_data.py",
    "chars": 2179,
    "preview": "import os\nimport tempfile\nfrom pathlib import PosixPath\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom onnxtr.util"
  },
  {
    "path": "tests/common/test_utils_fonts.py",
    "chars": 235,
    "preview": "from PIL.ImageFont import FreeTypeFont, ImageFont\n\nfrom onnxtr.utils.fonts import get_font\n\n\ndef test_get_font():\n    # "
  },
  {
    "path": "tests/common/test_utils_geometry.py",
    "chars": 13055,
    "preview": "from copy import deepcopy\nfrom math import hypot\n\nimport numpy as np\nimport pytest\n\nfrom onnxtr.io import DocumentFile\nf"
  },
  {
    "path": "tests/common/test_utils_multithreading.py",
    "chars": 970,
    "preview": "import os\nfrom multiprocessing.pool import ThreadPool\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom onnxtr.utils."
  },
  {
    "path": "tests/common/test_utils_reconstitution.py",
    "chars": 1333,
    "preview": "import numpy as np\nfrom test_io_elements import _mock_pages\n\nfrom onnxtr.utils import reconstitution\n\n\ndef test_synthesi"
  },
  {
    "path": "tests/common/test_utils_visualization.py",
    "chars": 1566,
    "preview": "import numpy as np\nimport pytest\nfrom test_io_elements import _mock_pages\n\nfrom onnxtr.utils import visualization\n\n\ndef "
  },
  {
    "path": "tests/common/test_utils_vocabs.py",
    "chars": 341,
    "preview": "from collections import Counter\n\nfrom onnxtr.utils import VOCABS\n\n\ndef test_vocabs_duplicates():\n    for key, vocab in V"
  },
  {
    "path": "tests/conftest.py",
    "chars": 3463,
    "preview": "from io import BytesIO\n\nimport cv2\nimport pytest\nimport requests\nfrom PIL import Image, ImageDraw\n\nfrom onnxtr.io import"
  }
]

About this extraction

This page contains the full source code of the felixdittrich92/OnnxTR GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 126 files (480.5 KB), approximately 180.5k tokens, and a symbol index with 387 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo