Showing preview only (561K chars total). Download the full file or copy to clipboard to get everything.
Repository: felixdittrich92/OnnxTR
Branch: main
Commit: b10318c76097
Files: 126
Total size: 480.5 KB
Directory structure:
gitextract_7yglu2_f/
├── .conda/
│ └── meta.yaml
├── .github/
│ ├── CODEOWNERS
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.yml
│ │ ├── config.yml
│ │ └── feature_request.yml
│ ├── dependabot.yml
│ ├── release.yml
│ └── workflows/
│ ├── builds.yml
│ ├── clear_caches.yml
│ ├── demo.yml
│ ├── docker.yml
│ ├── main.yml
│ ├── publish.yml
│ └── style.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── demo/
│ ├── README.md
│ ├── app.py
│ ├── packages.txt
│ └── requirements.txt
├── onnxtr/
│ ├── __init__.py
│ ├── contrib/
│ │ ├── __init__.py
│ │ ├── artefacts.py
│ │ └── base.py
│ ├── file_utils.py
│ ├── io/
│ │ ├── __init__.py
│ │ ├── elements.py
│ │ ├── html.py
│ │ ├── image.py
│ │ ├── pdf.py
│ │ └── reader.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── _utils.py
│ │ ├── builder.py
│ │ ├── classification/
│ │ │ ├── __init__.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ └── mobilenet.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ └── zoo.py
│ │ ├── detection/
│ │ │ ├── __init__.py
│ │ │ ├── _utils/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ ├── core.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── differentiable_binarization.py
│ │ │ │ ├── fast.py
│ │ │ │ └── linknet.py
│ │ │ ├── postprocessor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ └── zoo.py
│ │ ├── engine.py
│ │ ├── factory/
│ │ │ ├── __init__.py
│ │ │ └── hub.py
│ │ ├── predictor/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ └── predictor.py
│ │ ├── preprocessor/
│ │ │ ├── __init__.py
│ │ │ └── base.py
│ │ ├── recognition/
│ │ │ ├── __init__.py
│ │ │ ├── core.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── crnn.py
│ │ │ │ ├── master.py
│ │ │ │ ├── parseq.py
│ │ │ │ ├── sar.py
│ │ │ │ ├── viptr.py
│ │ │ │ └── vitstr.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── _utils.py
│ │ │ │ └── base.py
│ │ │ ├── utils.py
│ │ │ └── zoo.py
│ │ └── zoo.py
│ ├── py.typed
│ ├── transforms/
│ │ ├── __init__.py
│ │ └── base.py
│ └── utils/
│ ├── __init__.py
│ ├── common_types.py
│ ├── data.py
│ ├── fonts.py
│ ├── geometry.py
│ ├── multithreading.py
│ ├── reconstitution.py
│ ├── repr.py
│ ├── visualization.py
│ └── vocabs.py
├── pyproject.toml
├── scripts/
│ ├── convert_to_float16.py
│ ├── evaluate.py
│ ├── latency.py
│ └── quantize.py
├── setup.py
└── tests/
├── common/
│ ├── test_contrib.py
│ ├── test_core.py
│ ├── test_engine_cfg.py
│ ├── test_headers.py
│ ├── test_io.py
│ ├── test_io_elements.py
│ ├── test_models.py
│ ├── test_models_builder.py
│ ├── test_models_classification.py
│ ├── test_models_detection.py
│ ├── test_models_detection_utils.py
│ ├── test_models_factory.py
│ ├── test_models_preprocessor.py
│ ├── test_models_recognition.py
│ ├── test_models_recognition_utils.py
│ ├── test_models_zoo.py
│ ├── test_transforms.py
│ ├── test_utils_data.py
│ ├── test_utils_fonts.py
│ ├── test_utils_geometry.py
│ ├── test_utils_multithreading.py
│ ├── test_utils_reconstitution.py
│ ├── test_utils_visualization.py
│ └── test_utils_vocabs.py
└── conftest.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .conda/meta.yaml
================================================
{% set pyproject = load_file_data('../pyproject.toml', from_recipe_dir=True) %}
{% set project = pyproject.get('project') %}
{% set urls = pyproject.get('project', {}).get('urls') %}
{% set version = environ.get('BUILD_VERSION', '0.8.2a0') %}
package:
name: onnxtr
version: {{ version }}
source:
fn: onnxtr-{{ version }}.tar.gz
url: ../dist/onnxtr-{{ version }}.tar.gz
build:
script: python setup.py install --single-version-externally-managed --record=record.txt
requirements:
host:
- python>=3.10, <3.12
- setuptools
run:
- numpy >=1.16.0, <3.0.0
- scipy >=1.4.0, <2.0.0
- pillow >=9.2.0
- opencv >=4.5.0, <5.0.0
- pypdfium2-team::pypdfium2_helpers >=4.11.0, <5.0.0
- pyclipper >=1.2.0, <2.0.0
- langdetect >=1.0.9, <2.0.0
- rapidfuzz >=3.0.0, <4.0.0
- huggingface_hub >=0.20.0, <1.0.0
- defusedxml >=0.7.0
- anyascii >=0.3.2
- tqdm >=4.30.0
test:
requires:
- pip
- onnxruntime
imports:
- onnxtr
about:
home: {{ urls.get('repository') }}
license: Apache-2.0
license_file: {{ project.get('license', {}).get('file') }}
summary: {{ project.get('description') | replace(":", " -")}}
dev_url: {{ urls.get('repository') }}
================================================
FILE: .github/CODEOWNERS
================================================
* @felixdittrich92
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: felixdittrich92
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
polar: # Replace with a single Polar username
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
thanks_dev: # Replace with a single thanks.dev username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yml
================================================
name: 🐛 Bug report
description: Create a report to help us improve the library
labels: 'type: bug'
body:
- type: markdown
attributes:
value: >
#### Before reporting a bug, please check that the issue hasn't already been addressed in [the existing and past issues](https://github.com/felixdittrich92/onnxtr/issues).
- type: textarea
attributes:
label: Bug description
description: |
A clear and concise description of what the bug is.
Please explain the result you observed and the behavior you were expecting.
placeholder: |
A clear and concise description of what the bug is.
validations:
required: true
- type: textarea
attributes:
label: Code snippet to reproduce the bug
description: |
Sample code to reproduce the problem.
Please wrap your code snippet with ```` ```triple quotes blocks``` ```` for readability.
placeholder: |
```python
Sample code to reproduce the problem
```
validations:
required: true
- type: textarea
attributes:
label: Error traceback
description: |
The error message you received running the code snippet, with the full traceback.
Please wrap your error message with ```` ```triple quotes blocks``` ```` for readability.
placeholder: |
```
The error message you got, with the full traceback.
```
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
Please describe your environment:
OS:
Python version:
Library version:
Onnxruntime version:
validations:
required: true
- type: markdown
attributes:
value: >
Thanks for helping us improve the library!
================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
contact_links:
- name: Usage questions
url: https://github.com/felixdittrich92/OnnxTR/discussions
about: Ask questions and discuss with other OnnxTR community members
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yml
================================================
name: 🚀 Feature request
description: >
Submit a proposal/request for a new feature for OnnxTR. Please search for existing issues before creating a new one.
For non-onnx related features please use the [main repository](https://github.com/mindee/doctr/issues).
labels: 'type: enhancement'
body:
- type: textarea
attributes:
label: 🚀 The feature
description: >
A clear and concise description of the feature proposal
validations:
required: true
- type: textarea
attributes:
label: Additional context
description: >
Add any other context or screenshots about the feature request.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉
================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
open-pull-requests-limit: 10
target-branch: "main"
labels: ["topic: build"]
schedule:
interval: weekly
day: sunday
- package-ecosystem: "github-actions"
directory: "/"
open-pull-requests-limit: 10
target-branch: "main"
labels: ["topic: CI/CD"]
schedule:
interval: weekly
day: sunday
groups:
github-actions:
patterns:
- "*"
================================================
FILE: .github/release.yml
================================================
changelog:
exclude:
labels:
- ignore-for-release
categories:
- title: Breaking Changes 🛠
labels:
- "type: breaking change"
# NEW FEATURES
- title: New Features
labels:
- "type: new feature"
# BUG FIXES
- title: Bug Fixes
labels:
- "type: bug"
# IMPROVEMENTS
- title: Improvements
labels:
- "type: enhancement"
# MISC
- title: Miscellaneous
labels:
- "type: misc"
================================================
FILE: .github/workflows/builds.yml
================================================
name: builds
on:
push:
branches: main
pull_request:
branches: main
schedule:
# Runs every 2 weeks on Monday at 03:00 UTC
- cron: '0 3 * * 1'
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python: ["3.10", "3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
# MacOS issue ref.: https://github.com/actions/setup-python/issues/855 & https://github.com/actions/setup-python/issues/865
python-version: ${{ matrix.os == 'macos-latest' && matrix.python == '3.10' && '3.11' || matrix.python }}
architecture: x64
- name: Cache python modules
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
- name: Install package
run: |
python -m pip install --upgrade pip
pip install -e .[cpu-headless,viz] --upgrade
- name: Import package
run: python -c "import onnxtr; print(onnxtr.__version__)"
conda:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: conda-incubator/setup-miniconda@v4
with:
auto-update-conda: true
python-version: "3.10"
channels: pypdfium2-team,bblanchon,defaults,conda-forge
channel-priority: strict
- name: Install dependencies
shell: bash -el {0}
run: conda install -y conda-build conda-verify anaconda-client
- name: Install libEGL
run: sudo apt-get update && sudo apt-get install -y libegl1
- name: Build and verify
shell: bash -el {0}
run: |
python setup.py sdist
mkdir conda-dist
conda build .conda/ --output-folder conda-dist
conda-verify conda-dist/linux-64/*conda --ignore=C1115
================================================
FILE: .github/workflows/clear_caches.yml
================================================
name: Clear GitHub runner caches
on:
workflow_dispatch:
schedule:
- cron: '0 0 * * *' # Runs once a day
jobs:
clear:
name: Clear caches
runs-on: ubuntu-latest
steps:
- uses: MyAlbum/purge-cache@v2
with:
max-age: 172800 # Caches older than 2 days are deleted
================================================
FILE: .github/workflows/demo.yml
================================================
name: Sync Hugging Face demo
on:
# Run 'test-demo' on every pull request to the main branch
pull_request:
branches: [main]
# Run 'sync-to-hub' on push when tagging (e.g., 'v*') and on a scheduled cron job
push:
tags:
- 'v*'
schedule:
- cron: '0 2 10 * *' # At 02:00 on day-of-month 10 (every month)
# Allow manual triggering of the workflow
workflow_dispatch:
jobs:
# This job runs on every pull request to main
test-demo:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python: ["3.10"]
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Cache python modules
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('requirements.txt') }}-${{ hashFiles('demo/requirements.txt') }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r demo/requirements.txt --upgrade
- name: Start Gradio demo
run: |
nohup python demo/app.py &
sleep 10 # Allow some time for the Gradio server to start
- name: Check demo build
run: |
curl --fail http://127.0.0.1:7860/ || exit 1
# This job only runs when a new version tag is pushed or during the cron job
sync-to-hub:
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
needs: test-demo
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.10"
- name: Install huggingface_hub
run: pip install huggingface-hub
- name: Upload folder to Hugging Face
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -c "
from huggingface_hub import HfApi
api = HfApi(token='${{ secrets.HF_TOKEN }}')
repo_id = 'Felix92/OnnxTR-OCR'
api.upload_folder(repo_id=repo_id, repo_type='space', folder_path='demo/')
api.restart_space(repo_id=repo_id, factory_reboot=True)
"
================================================
FILE: .github/workflows/docker.yml
================================================
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
#
name: Docker image on ghcr.io
on:
push:
tags:
- 'v*'
pull_request:
branches: main
schedule:
- cron: '0 2 1 6 *' # At 02:00 on day-of-month 1 in June (once a year actually)
env:
REGISTRY: ghcr.io
jobs:
build-and-push-image:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
image:
- "ubuntu:24.04" # Base image for CPU variants
- "nvidia/cuda:12.6.2-base-ubuntu24.04" # Base image for GPU
variant:
- "cpu-headless" # CPU variant 1
- "openvino-headless" # CPU variant 2
- "gpu-headless" # GPU variant
python: [3.10.13]
# Exclude invalid combinations
exclude:
- image: "nvidia/cuda:12.6.2-base-ubuntu24.04"
variant: "cpu-headless"
- image: "nvidia/cuda:12.6.2-base-ubuntu24.04"
variant: "openvino-headless"
- image: "ubuntu:24.04"
variant: "gpu-headless"
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Log in to the Container registry
uses: docker/login-action@v4
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Sanitize docker tag
run: |
# Start with the base prefix
PREFIX_DOCKER_TAG="OnnxTR-${{ matrix.variant }}-py${{ matrix.python }}"
# Replace any commas with hyphens (if needed)
PREFIX_DOCKER_TAG=$(echo "$PREFIX_DOCKER_TAG" | sed 's/,/-/g')
# Determine suffix based on image
IMAGE="${{ matrix.image }}"
case "$IMAGE" in
"nvidia/cuda:"*)
SUFFIX=$(echo "$IMAGE" | sed -E 's|.*/cuda:([0-9]+\.[0-9]+\.[0-9]+)-base-(ubuntu[0-9]+\.[0-9]+)|-\2-cuda\1|')
;;
"ubuntu:"*)
SUFFIX=$(echo "$IMAGE" | sed -E 's|ubuntu:([0-9]+\.[0-9]+)|-ubuntu\1|')
;;
*)
SUFFIX=""
;;
esac
# Combine the prefix, suffix, and ensure ending hyphen
PREFIX_DOCKER_TAG="${PREFIX_DOCKER_TAG}${SUFFIX}-"
# Export to environment
echo "PREFIX_DOCKER_TAG=${PREFIX_DOCKER_TAG}" >> $GITHUB_ENV
# Debugging output
echo "Final Docker Tag: $PREFIX_DOCKER_TAG"
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v6
with:
images: ${{ env.REGISTRY }}/${{ github.repository }}
tags: |
# used only on schedule event
type=schedule,pattern={{date 'YYYY-MM'}},prefix=${{ env.PREFIX_DOCKER_TAG }}
# used only if a tag following semver is published
type=semver,pattern={{raw}},prefix=${{ env.PREFIX_DOCKER_TAG }}
- name: Build Docker image
id: build
uses: docker/build-push-action@v7
with:
context: .
build-args: |
BASE_IMAGE=${{ matrix.image }}
SYSTEM=${{ matrix.variant }}
PYTHON_VERSION=${{ matrix.python }}
ONNXTR_REPO=${{ github.repository }}
ONNXTR_VERSION=${{ github.sha }}
push: false # push only if `import onnxtr` works
tags: ${{ steps.meta.outputs.tags }}
- name: Check if `import onnxtr` works
run: docker run ${{ steps.build.outputs.imageid }} python3 -c 'import onnxtr; print(onnxtr.__version__)'
- name: Push Docker image
if: ${{ (github.ref == 'refs/heads/main' && github.event_name != 'pull_request') || (startsWith(github.ref, 'refs/tags') && github.event_name == 'push') }}
uses: docker/build-push-action@v7
with:
context: .
build-args: |
BASE_IMAGE=${{ matrix.image }}
SYSTEM=${{ matrix.variant }}
PYTHON_VERSION=${{ matrix.python }}
ONNXTR_REPO=${{ github.repository }}
ONNXTR_VERSION=${{ github.sha }}
push: true
tags: ${{ steps.meta.outputs.tags }}
================================================
FILE: .github/workflows/main.yml
================================================
name: tests
on:
push:
branches: main
pull_request:
branches: main
schedule:
# Runs every 2 weeks on Monday at 03:00 UTC
- cron: '0 3 * * 1'
jobs:
pytest-common:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python: ["3.10", "3.11", "3.12"]
backend: ["cpu-headless", "openvino-headless"]
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Cache python modules
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}-tests
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[${{ matrix.backend }},viz,html,testing] --upgrade
- name: Run unittests
run: |
coverage run -m pytest tests/common/ -rs --memray
coverage xml -o coverage-common-${{ matrix.backend }}-${{ matrix.python }}.xml
- uses: actions/upload-artifact@v7
with:
name: coverage-common-${{ matrix.backend }}-${{ matrix.python }}
path: ./coverage-common-${{ matrix.backend }}-${{ matrix.python }}.xml
if-no-files-found: error
codecov-upload:
runs-on: ubuntu-latest
needs: [ pytest-common ]
steps:
- uses: actions/checkout@v6
- uses: actions/download-artifact@v8
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v6
with:
flags: unittests
fail_ci_if_error: true
token: ${{ secrets.CODECOV_TOKEN }}
================================================
FILE: .github/workflows/publish.yml
================================================
name: publish
on:
release:
types: [published]
jobs:
pypi:
if: "!github.event.release.prerelease"
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python: ["3.10"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Cache python modules
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine --upgrade
- name: Get release tag
id: release_tag
run: echo "VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
VERSION: ${{ env.VERSION }}
run: |
BUILD_VERSION=$VERSION python setup.py sdist bdist_wheel
twine check dist/*
twine upload dist/*
pypi-check:
needs: pypi
if: "!github.event.release.prerelease"
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python: ["3.10"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Install package
run: |
python -m pip install --upgrade pip
pip install onnxtr[cpu] --upgrade
python -c "from importlib.metadata import version; print(version('onnxtr'))"
conda:
if: "!github.event.release.prerelease"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: conda-incubator/setup-miniconda@v4
with:
auto-update-conda: true
python-version: "3.10"
channels: pypdfium2-team,bblanchon,defaults,conda-forge
channel-priority: strict
- name: Install dependencies
shell: bash -el {0}
run: conda install -y conda-build conda-verify anaconda-client
- name: Install libEGL
run: sudo apt-get update && sudo apt-get install -y libegl1
- name: Get release tag
id: release_tag
run: echo "VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV
- name: Build and publish
shell: bash -el {0}
env:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_TOKEN }}
VERSION: ${{ env.VERSION }}
run: |
echo "BUILD_VERSION=${VERSION}" >> $GITHUB_ENV
python setup.py sdist
mkdir conda-dist
conda build .conda/ --output-folder conda-dist
conda-verify conda-dist/linux-64/*conda --ignore=C1115
anaconda upload conda-dist/linux-64/*conda
conda-check:
if: "!github.event.release.prerelease"
runs-on: ubuntu-latest
needs: conda
steps:
- uses: conda-incubator/setup-miniconda@v4
with:
auto-update-conda: true
python-version: "3.10"
- name: Install package
shell: bash -el {0}
run: |
conda config --set channel_priority strict
conda install -c conda-forge onnxruntime
conda install -c felix92 -c pypdfium2-team -c bblanchon -c defaults -c conda-forge onnxtr
python -c "from importlib.metadata import version; print(version('onnxtr'))"
================================================
FILE: .github/workflows/style.yml
================================================
name: style
on:
push:
branches: main
pull_request:
branches: main
jobs:
ruff:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python: ["3.10"]
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Run ruff
run: |
pip install ruff --upgrade
ruff --version
ruff check --diff .
mypy:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python: ["3.10"]
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
architecture: x64
- name: Cache python modules
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pkg-deps-${{ matrix.python }}-${{ hashFiles('pyproject.toml') }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[dev] --upgrade
pip install mypy --upgrade
- name: Run mypy
run: |
mypy --version
mypy
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# Temp files
onnxtr/version.py
logs/
wandb/
.idea/
# Model files
*.onnx
.qodo
# Profile files
yappi_profile.stats
memray_profile.bin
memray_flamegraph.html
================================================
FILE: .pre-commit-config.yaml
================================================
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-ast
- id: check-yaml
exclude: .conda
- id: check-toml
- id: check-json
- id: check-added-large-files
exclude: docs/images/
- id: end-of-file-fixer
- id: trailing-whitespace
- id: debug-statements
- id: check-merge-conflict
- id: no-commit-to-branch
args: ['--branch', 'main']
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.0
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format
================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
contact@mindee.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.
================================================
FILE: Dockerfile
================================================
ARG BASE_IMAGE
FROM ${BASE_IMAGE}
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ARG SYSTEM
ARG PYTHON_VERSION
RUN apt-get update && apt-get install -y --no-install-recommends \
# - Other packages
build-essential \
pkg-config \
curl \
wget \
software-properties-common \
unzip \
git \
# - Packages to build Python
tar make gcc zlib1g-dev libffi-dev libssl-dev liblzma-dev libbz2-dev libsqlite3-dev \
# - Packages for docTR
libgl1-mesa-dev libsm6 libxext6 libxrender-dev libpangocairo-1.0-0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
fi
# Install Python
RUN wget http://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz && \
tar -zxf Python-$PYTHON_VERSION.tgz && \
cd Python-$PYTHON_VERSION && \
mkdir /opt/python/ && \
./configure --prefix=/opt/python && \
make && \
make install && \
cd .. && \
rm Python-$PYTHON_VERSION.tgz && \
rm -r Python-$PYTHON_VERSION
ENV PATH=/opt/python/bin:$PATH
# Install OnnxTR
ARG ONNXTR_REPO='felixdittrich92/onnxtr'
ARG ONNXTR_VERSION=main
RUN pip3 install -U pip setuptools wheel && \
pip3 install "onnxtr[$SYSTEM,html]@git+https://github.com/$ONNXTR_REPO.git@$ONNXTR_VERSION"
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: Makefile
================================================
.PHONY: quality style test docs-single-version docs
# this target runs checks on all files
quality:
ruff check .
mypy onnxtr/
# this target runs checks on all files and potentially modifies some of them
style:
ruff format .
ruff check --fix .
# Run tests for the library
test:
coverage run -m pytest tests/common/ -rs --memray
coverage report --fail-under=80 --show-missing
# Check that docs can build
docs-single-version:
sphinx-build docs/source docs/_build -a
# Check that docs can build
docs:
cd docs && bash build.sh
================================================
FILE: README.md
================================================
<p align="center">
<img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="40%">
</p>
[](LICENSE)

[](https://codecov.io/gh/felixdittrich92/OnnxTR)
[](https://app.codacy.com/gh/felixdittrich92/OnnxTR/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[](https://www.codefactor.io/repository/github/felixdittrich92/onnxtr)
[](https://socket.dev/pypi/package/onnxtr/overview/0.8.1/tar-gz)
[](https://pypi.org/project/OnnxTR/)
[](https://github.com/felixdittrich92/OnnxTR/pkgs/container/onnxtr)
[](https://huggingface.co/spaces/Felix92/OnnxTR-OCR)

> :warning: Please note that this is a wrapper around the [doctr](https://github.com/mindee/doctr) library to provide a Onnx pipeline for docTR. For feature requests, which are not directly related to the Onnx pipeline, please refer to the base project.
**Optical Character Recognition made seamless & accessible to anyone, powered by Onnx**
What you can expect from this repository:
- efficient ways to parse textual information (localize and identify each word) from your documents
- a Onnx pipeline for docTR, a wrapper around the [doctr](https://github.com/mindee/doctr) library - no PyTorch or TensorFlow dependencies
- more lightweight package with faster inference latency and less required resources
- 8-Bit quantized models for faster inference on CPU

## Installation
### Prerequisites
Python 3.10 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install OnnxTR.
### Latest release
You can then install the latest release of the package using [pypi](https://pypi.org/project/OnnxTR/) as follows:
**NOTE:**
Currently supported execution providers by default are: CPU, CUDA (NVIDIA GPU), OpenVINO (Intel CPU | GPU), CoreML (Apple Silicon).
For GPU support please take a look at: [ONNX Runtime](https://onnxruntime.ai/getting-started).
- **Prerequisites:** CUDA & cuDNN needs to be installed before [Version table](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html).
```shell
# standard cpu support
pip install "onnxtr[cpu]"
pip install "onnxtr[cpu-headless]" # same as cpu but with opencv-headless
# with gpu support
pip install "onnxtr[gpu]"
pip install "onnxtr[gpu-headless]" # same as gpu but with opencv-headless
# OpenVINO cpu | gpu support for Intel CPUs | GPUs
pip install "onnxtr[openvino]"
pip install "onnxtr[openvino-headless]" # same as openvino but with opencv-headless
# with HTML support
pip install "onnxtr[html]"
# with support for visualization
pip install "onnxtr[viz]"
# with support for all dependencies
pip install "onnxtr[html, gpu, viz]"
```
**Recommendation:**
If you have:
- a NVIDIA GPU, use one of the `gpu` variants
- an Intel CPU or GPU, use one of the `openvino` variants
- an Apple Silicon Mac, use one of the `cpu` variants (CoreML is auto-detected)
- otherwise, use one of the `cpu` variants
**OpenVINO:**
By default OnnxTR running with the OpenVINO execution provider backend uses the `CPU` device with `FP32` precision, to change the device or for further configuaration please refer to the [ONNX Runtime OpenVINO documentation](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#summary-of-options).
### Reading files
Documents can be interpreted from PDF / Images / Webpages / Multiple page images using the following code snippet:
```python
from onnxtr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
```
### Putting it together
Let's use the default `ocr_predictor` model for an example:
```python
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, EngineConfig
model = ocr_predictor(
det_arch="fast_base", # detection architecture
reco_arch="vitstr_base", # recognition architecture
det_bs=2, # detection batch size
reco_bs=512, # recognition batch size
# Document related parameters
assume_straight_pages=True, # set to `False` if the pages are not straight (rotation, perspective, etc.) (default: True)
straighten_pages=False, # set to `True` if the pages should be straightened before final processing (default: False)
export_as_straight_boxes=False, # set to `True` if the boxes should be exported as if the pages were straight (default: False)
# Preprocessing related parameters
preserve_aspect_ratio=True, # set to `False` if the aspect ratio should not be preserved (default: True)
symmetric_pad=True, # set to `False` to disable symmetric padding (default: True)
# Additional parameters - meta information
detect_orientation=False, # set to `True` if the orientation of the pages should be detected (default: False)
detect_language=False, # set to `True` if the language of the pages should be detected (default: False)
# Orientation specific parameters in combination with `assume_straight_pages=False` and/or `straighten_pages=True`
disable_crop_orientation=False, # set to `True` if the crop orientation classification should be disabled (default: False)
disable_page_orientation=False, # set to `True` if the general page orientation classification should be disabled (default: False)
# DocumentBuilder specific parameters
resolve_lines=True, # whether words should be automatically grouped into lines (default: True)
resolve_blocks=False, # whether lines should be automatically grouped into blocks (default: False)
paragraph_break=0.035, # relative length of the minimum space separating paragraphs (default: 0.035)
# OnnxTR specific parameters
# NOTE: 8-Bit quantized models are not available for FAST detection models and can in general lead to poorer accuracy
load_in_8_bit=False, # set to `True` to load 8-bit quantized models instead of the full precision onces (default: False)
# Advanced engine configuration options
det_engine_cfg=EngineConfig(), # detection model engine configuration (default: internal predefined configuration)
reco_engine_cfg=EngineConfig(), # recognition model engine configuration (default: internal predefined configuration)
clf_engine_cfg=EngineConfig(), # classification (orientation) model engine configuration (default: internal predefined configuration)
)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
# Display the result (requires matplotlib & mplcursors to be installed)
result.show()
```

Or even rebuild the original document from its predictions:
```python
import matplotlib.pyplot as plt
synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0])
plt.axis("off")
plt.show()
```

The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
To get a better understanding of the document model, check out [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):
You can also export them as a nested dict, more appropriate for JSON format / render it or export as XML (hocr format):
```python
json_output = result.export() # nested dict
text_output = result.render() # human-readable text
xml_output = result.export_as_xml() # hocr format
for output in xml_output:
xml_bytes_string = output[0]
xml_element = output[1]
```
<details>
<summary>Advanced engine configuration options</summary>
You can also define advanced engine configurations for the models / predictors:
```python
from onnxruntime import SessionOptions
from onnxtr.models import ocr_predictor, EngineConfig
general_options = (
SessionOptions()
) # For configuartion options see: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
general_options.enable_cpu_mem_arena = False
# NOTE: The following would force to run only on the GPU if no GPU is available it will raise an error
# List of strings e.g. ["CUDAExecutionProvider", "CPUExecutionProvider"] or a list of tuples with the provider and its options e.g.
# [("CUDAExecutionProvider", {"device_id": 0}), ("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
providers = [
("CUDAExecutionProvider", {"device_id": 0, "cudnn_conv_algo_search": "DEFAULT"})
] # For available providers see: https://onnxruntime.ai/docs/execution-providers/
engine_config = EngineConfig(session_options=general_options, providers=providers)
# We use the default predictor with the custom engine configuration
# NOTE: You can define differnt engine configurations for detection, recognition and classification depending on your needs
predictor = ocr_predictor(det_engine_cfg=engine_config, reco_engine_cfg=engine_config, clf_engine_cfg=engine_config)
```
You can also dynamically configure whether the memory arena should shrink:
```python
from random import random
from onnxruntime import RunOptions, SessionOptions
from onnxtr.models import ocr_predictor, EngineConfig
def arena_shrinkage_handler(run_options: RunOptions) -> RunOptions:
"""
Shrink the memory arena on 10% of inference runs.
"""
if random() < 0.1:
run_options.add_run_config_entry("memory.enable_memory_arena_shrinkage", "cpu:0")
return run_options
engine_config = EngineConfig(run_options_provider=arena_shrinkage_handler)
engine_config.session_options.enable_mem_pattern = False
predictor = ocr_predictor(det_engine_cfg=engine_config, reco_engine_cfg=engine_config, clf_engine_cfg=engine_config)
```
</details>
## Loading custom exported models
You can also load docTR custom exported models:
For exporting please take a look at the [doctr documentation](https://mindee.github.io/doctr/using_doctr/using_model_export.html#export-to-onnx).
```python
from onnxtr.models import ocr_predictor, linknet_resnet18, parseq
reco_model = parseq("path_to_custom_model.onnx", vocab="ABC")
det_model = linknet_resnet18("path_to_custom_model.onnx")
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
```
## Loading models from HuggingFace Hub
You can also load models from the HuggingFace Hub:
```python
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, from_hub
img = DocumentFile.from_images(["<image_path>"])
# Load your model from the hub
model = from_hub("onnxtr/my-model")
# Pass it to the predictor
# If your model is a recognition model:
predictor = ocr_predictor(det_arch="db_mobilenet_v3_large", reco_arch=model)
# If your model is a detection model:
predictor = ocr_predictor(det_arch=model, reco_arch="crnn_mobilenet_v3_small")
# Get your predictions
res = predictor(img)
```
HF Hub search: [here](https://huggingface.co/models?search=onnxtr).
Collection: [here](https://huggingface.co/collections/Felix92/onnxtr-66bf213a9f88f7346c90e842)
Or push your own models to the hub:
```python
from onnxtr.models import parseq, push_to_hf_hub, login_to_hub
from onnxtr.utils.vocabs import VOCABS
# Login to the hub
login_to_hub()
# Recogniton model
model = parseq("~/onnxtr-parseq-multilingual-v1.onnx", vocab=VOCABS["multilingual"])
push_to_hf_hub(
model,
model_name="onnxtr-parseq-multilingual-v1",
task="recognition", # The task for which the model is intended [detection, recognition, classification]
arch="parseq", # The name of the model architecture
override=False, # Set to `True` if you want to override an existing model / repository
)
# Detection model
model = linknet_resnet18("~/onnxtr-linknet-resnet18.onnx")
push_to_hf_hub(model, model_name="onnxtr-linknet-resnet18", task="detection", arch="linknet_resnet18", override=True)
```
## Models architectures
Credits where it's due: this repository provides ONNX models for the following architectures, converted from the docTR models:
### Text Detection
- DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).
- LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)
- FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf)
### Text Recognition
- CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).
- SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).
- MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).
- ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf).
- PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966).
- VIPTR: [A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2401.10110).
```python
predictor = ocr_predictor()
predictor.list_archs()
{
"detection archs": [
"db_resnet34",
"db_resnet50",
"db_mobilenet_v3_large",
"linknet_resnet18",
"linknet_resnet34",
"linknet_resnet50",
"fast_tiny", # No 8-bit support
"fast_small", # No 8-bit support
"fast_base", # No 8-bit support
],
"recognition archs": [
"crnn_vgg16_bn",
"crnn_mobilenet_v3_small",
"crnn_mobilenet_v3_large",
"sar_resnet31",
"master",
"vitstr_small",
"vitstr_base",
"parseqviptr_tiny", # No 8-bit support
],
}
```
### Documentation
This repository is in sync with the [doctr](https://github.com/mindee/doctr) library, which provides a high-level API to perform OCR on documents.
This repository stays up-to-date with the latest features and improvements from the base project.
So we can refer to the [doctr documentation](https://mindee.github.io/doctr/) for more detailed information.
NOTE:
- `pretrained` is the default in OnnxTR, and not available as a parameter.
- docTR specific environment variables (e.g.: DOCTR_CACHE_DIR -> ONNXTR_CACHE_DIR) needs to be replaced with `ONNXTR_` prefix.
### Benchmarks
The CPU benchmarks was measured on a `i7-14700K Intel CPU`.
The GPU benchmarks was measured on a `RTX 4080 Nvidia GPU`.
Benchmarking performed on the FUNSD dataset and CORD dataset.
docTR / OnnxTR models used for the benchmarks are `fast_base` (full precision) | `db_resnet50` (8-bit variant) for detection and `crnn_vgg16_bn` for recognition.
The smallest combination in OnnxTR (docTR) of `db_mobilenet_v3_large` and `crnn_mobilenet_v3_small` takes as comparison `~0.17s / Page` on the FUNSD dataset and `~0.12s / Page` on the CORD dataset in **full precision** on CPU.
- CPU benchmarks:
|Library |FUNSD (199 pages) |CORD (900 pages) |
|------------------------------------|-------------------------------|-------------------------------|
|docTR (CPU) - v0.8.1 | ~1.29s / Page | ~0.60s / Page |
|**OnnxTR (CPU)** - v0.6.0 | ~0.57s / Page | **~0.25s / Page** |
|**OnnxTR (CPU) 8-bit** - v0.6.0 | **~0.38s / Page** | **~0.14s / Page** |
|**OnnxTR (CPU-OpenVINO)** - v0.6.0 | **~0.15s / Page** | **~0.14s / Page** |
|EasyOCR (CPU) - v1.7.1 | ~1.96s / Page | ~1.75s / Page |
|**PyTesseract (CPU)** - v0.3.10 | **~0.50s / Page** | ~0.52s / Page |
|Surya (line) (CPU) - v0.4.4 | ~48.76s / Page | ~35.49s / Page |
|PaddleOCR (CPU) - no cls - v2.7.3 | ~1.27s / Page | ~0.38s / Page |
- GPU benchmarks:
|Library |FUNSD (199 pages) |CORD (900 pages) |
|-------------------------------------|-------------------------------|-------------------------------|
|docTR (GPU) - v0.8.1 | ~0.07s / Page | ~0.05s / Page |
|**docTR (GPU) float16** - v0.8.1 | **~0.06s / Page** | **~0.03s / Page** |
|OnnxTR (GPU) - v0.6.0 | **~0.06s / Page** | ~0.04s / Page |
|**OnnxTR (GPU) float16 - v0.6.0** | **~0.05s / Page** | **~0.03s / Page** |
|EasyOCR (GPU) - v1.7.1 | ~0.31s / Page | ~0.19s / Page |
|Surya (GPU) float16 - v0.4.4 | ~3.70s / Page | ~2.81s / Page |
|**PaddleOCR (GPU) - no cls - v2.7.3**| ~0.08s / Page | **~0.03s / Page** |
## Citation
If you wish to cite please refer to the base project citation, feel free to use this [BibTeX](http://www.bibtex.org/) reference:
```bibtex
@misc{doctr2021,
title={docTR: Document Text Recognition},
author={Mindee},
year={2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/mindee/doctr}}
}
```
```bibtex
@misc{onnxtr2024,
title={OnnxTR: Optical Character Recognition made seamless & accessible to anyone, powered by Onnx},
author={Felix Dittrich},
year={2024},
publisher = {GitHub},
howpublished = {\url{https://github.com/felixdittrich92/OnnxTR}}
}
```
## License
Distributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/felixdittrich92/OnnxTR?tab=Apache-2.0-1-ov-file#readme) for more information.
================================================
FILE: demo/README.md
================================================
---
title: OnnxTR OCR
emoji: 🔥
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
license: apache-2.0
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
## Run the demo locally
```bash
cd demo
pip install -r requirements.txt
python3 app.py
```
================================================
FILE: demo/app.py
================================================
import io
import os
from typing import Any
# NOTE: This is a fix to run the demo on the HuggingFace Zero GPU or CPU spaces
if os.environ.get("SPACES_ZERO_GPU") is not None:
import spaces
else:
class spaces: # noqa: N801
@staticmethod
def GPU(func): # noqa: N802
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
import cv2
import gradio as gr
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.figure import Figure
from PIL import Image
from onnxtr.io import DocumentFile
from onnxtr.models import EngineConfig, from_hub, ocr_predictor
from onnxtr.models.predictor import OCRPredictor
from onnxtr.utils.visualization import visualize_page
DET_ARCHS: list[str] = [
"fast_base",
"fast_small",
"fast_tiny",
"db_resnet50",
"db_resnet34",
"db_mobilenet_v3_large",
"linknet_resnet18",
"linknet_resnet34",
"linknet_resnet50",
]
RECO_ARCHS: list[str] = [
"crnn_vgg16_bn",
"crnn_mobilenet_v3_small",
"crnn_mobilenet_v3_large",
"master",
"sar_resnet31",
"vitstr_small",
"vitstr_base",
"parseq",
"viptr_tiny",
]
CUSTOM_RECO_ARCHS: list[str] = [
"Felix92/onnxtr-parseq-multilingual-v1",
]
def load_predictor(
det_arch: str,
reco_arch: str,
use_gpu: bool,
assume_straight_pages: bool,
straighten_pages: bool,
export_as_straight_boxes: bool,
detect_language: bool,
load_in_8_bit: bool,
bin_thresh: float,
box_thresh: float,
disable_crop_orientation: bool = False,
disable_page_orientation: bool = False,
) -> OCRPredictor:
"""Load a predictor from doctr.models
Args:
----
det_arch: detection architecture
reco_arch: recognition architecture
use_gpu: whether to use the GPU or not
assume_straight_pages: whether to assume straight pages or not
disable_crop_orientation: whether to disable crop orientation or not
disable_page_orientation: whether to disable page orientation or not
straighten_pages: whether to straighten rotated pages or not
export_as_straight_boxes: whether to export straight boxes
detect_language: whether to detect the language of the text
load_in_8_bit: whether to load the image in 8 bit mode
bin_thresh: binarization threshold for the segmentation map
box_thresh: minimal objectness score to consider a box
Returns:
-------
instance of OCRPredictor
"""
engine_cfg = (
EngineConfig()
if use_gpu
else EngineConfig(providers=[("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})])
)
predictor = ocr_predictor(
det_arch=det_arch,
reco_arch=reco_arch if reco_arch not in CUSTOM_RECO_ARCHS else from_hub(reco_arch),
assume_straight_pages=assume_straight_pages,
straighten_pages=straighten_pages,
detect_language=detect_language,
load_in_8_bit=load_in_8_bit,
export_as_straight_boxes=export_as_straight_boxes,
detect_orientation=not assume_straight_pages,
disable_crop_orientation=disable_crop_orientation,
disable_page_orientation=disable_page_orientation,
det_engine_cfg=engine_cfg,
reco_engine_cfg=engine_cfg,
clf_engine_cfg=engine_cfg,
)
predictor.det_predictor.model.postprocessor.bin_thresh = bin_thresh
predictor.det_predictor.model.postprocessor.box_thresh = box_thresh
return predictor
def forward_image(predictor: OCRPredictor, image: np.ndarray) -> np.ndarray:
"""Forward an image through the predictor
Args:
----
predictor: instance of OCRPredictor
image: image to process
Returns:
-------
segmentation map
"""
processed_batches = predictor.det_predictor.pre_processor([image])
out = predictor.det_predictor.model(processed_batches[0], return_model_output=True)
seg_map = out["out_map"]
return seg_map
def matplotlib_to_pil(fig: Figure | np.ndarray) -> Image.Image:
"""Convert a matplotlib figure to a PIL image
Args:
----
fig: matplotlib figure or numpy array
Returns:
-------
PIL image
"""
buf = io.BytesIO()
if isinstance(fig, Figure):
fig.savefig(buf)
else:
plt.imsave(buf, fig)
buf.seek(0)
return Image.open(buf)
@spaces.GPU
def analyze_page(
uploaded_file: Any,
page_idx: int,
det_arch: str,
reco_arch: str,
use_gpu: bool,
assume_straight_pages: bool,
disable_crop_orientation: bool,
disable_page_orientation: bool,
straighten_pages: bool,
export_as_straight_boxes: bool,
detect_language: bool,
load_in_8_bit: bool,
bin_thresh: float,
box_thresh: float,
):
"""Analyze a page
Args:
----
uploaded_file: file to analyze
page_idx: index of the page to analyze
det_arch: detection architecture
reco_arch: recognition architecture
use_gpu: whether to use the GPU or not
assume_straight_pages: whether to assume straight pages or not
disable_crop_orientation: whether to disable crop orientation or not
disable_page_orientation: whether to disable page orientation or not
straighten_pages: whether to straighten rotated pages or not
export_as_straight_boxes: whether to export straight boxes
detect_language: whether to detect the language of the text
load_in_8_bit: whether to load the image in 8 bit mode
bin_thresh: binarization threshold for the segmentation map
box_thresh: minimal objectness score to consider a box
Returns:
-------
input image, segmentation heatmap, output image, OCR output, synthesized page
"""
if uploaded_file is None:
return None, "Please upload a document", None, None, None
if uploaded_file.name.endswith(".pdf"):
doc = DocumentFile.from_pdf(uploaded_file)
else:
doc = DocumentFile.from_images(uploaded_file)
try:
page = doc[page_idx - 1]
except IndexError:
page = doc[-1]
img = page
predictor = load_predictor(
det_arch=det_arch,
reco_arch=reco_arch,
use_gpu=use_gpu,
assume_straight_pages=assume_straight_pages,
straighten_pages=straighten_pages,
export_as_straight_boxes=export_as_straight_boxes,
detect_language=detect_language,
load_in_8_bit=load_in_8_bit,
bin_thresh=bin_thresh,
box_thresh=box_thresh,
disable_crop_orientation=disable_crop_orientation,
disable_page_orientation=disable_page_orientation,
)
seg_map = forward_image(predictor, page)
seg_map = np.squeeze(seg_map)
seg_map = cv2.resize(seg_map, (img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR)
seg_heatmap = matplotlib_to_pil(seg_map)
out = predictor([page])
page_export = out.pages[0].export()
fig = visualize_page(out.pages[0].export(), out.pages[0].page, interactive=False, add_labels=False)
out_img = matplotlib_to_pil(fig)
if assume_straight_pages or (not assume_straight_pages and straighten_pages):
synthesized_page = out.pages[0].synthesize()
else:
synthesized_page = None
return img, seg_heatmap, out_img, page_export, synthesized_page
with gr.Blocks(fill_height=True) as demo:
gr.HTML(
"""
<div style="text-align: center;">
<p style="display: flex; justify-content: center;">
<img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="15%">
</p>
<h1>OnnxTR OCR Demo</h1>
<p style="display: flex; justify-content: center; gap: 10px;">
<a href="https://github.com/felixdittrich92/OnnxTR" target="_blank">
<img src="https://img.shields.io/badge/GitHub-blue?logo=github" alt="GitHub OnnxTR">
</a>
<a href="https://pypi.org/project/onnxtr/" target="_blank">
<img src="https://img.shields.io/pypi/v/onnxtr?color=blue" alt="PyPI">
</a>
</p>
</div>
<h2>To use this interactive demo for OnnxTR:</h2>
<h3> 1. Upload a document (PDF, JPG, or PNG)</h3>
<h3> 2. Select the model architectures for text detection and recognition you want to use</h3>
<h3> 3. Press the "Analyze page" button to process the uploaded document</h3>
"""
)
with gr.Row():
with gr.Column(scale=1):
upload = gr.File(label="Upload File [JPG | PNG | PDF]", file_types=[".pdf", ".jpg", ".png"])
page_selection = gr.Slider(minimum=1, maximum=10, step=1, value=1, label="Page selection")
det_model = gr.Dropdown(choices=DET_ARCHS, value=DET_ARCHS[0], label="Text detection model")
reco_model = gr.Dropdown(
choices=RECO_ARCHS + CUSTOM_RECO_ARCHS, value=RECO_ARCHS[0], label="Text recognition model"
)
use_gpu = gr.Checkbox(value=True, label="Use GPU")
assume_straight = gr.Checkbox(value=True, label="Assume straight pages")
disable_crop_orientation = gr.Checkbox(value=False, label="Disable crop orientation")
disable_page_orientation = gr.Checkbox(value=False, label="Disable page orientation")
straighten = gr.Checkbox(value=False, label="Straighten pages")
export_as_straight_boxes = gr.Checkbox(value=False, label="Export as straight boxes")
det_language = gr.Checkbox(value=False, label="Detect language")
load_in_8_bit = gr.Checkbox(value=False, label="Load 8-bit quantized models")
binarization_threshold = gr.Slider(
minimum=0.1, maximum=0.9, value=0.3, step=0.1, label="Binarization threshold"
)
box_threshold = gr.Slider(minimum=0.1, maximum=0.9, value=0.1, step=0.1, label="Box threshold")
analyze_button = gr.Button("Analyze page")
with gr.Column(scale=3):
with gr.Row():
input_image = gr.Image(label="Input page", width=700, height=500)
segmentation_heatmap = gr.Image(label="Segmentation heatmap", width=700, height=500)
output_image = gr.Image(label="Output page", width=700, height=500)
with gr.Row():
with gr.Column(scale=3):
ocr_output = gr.JSON(label="OCR output", render=True, scale=1, height=500)
with gr.Column(scale=3):
synthesized_page = gr.Image(label="Synthesized page", width=700, height=500)
analyze_button.click(
analyze_page,
inputs=[
upload,
page_selection,
det_model,
reco_model,
use_gpu,
assume_straight,
disable_crop_orientation,
disable_page_orientation,
straighten,
export_as_straight_boxes,
det_language,
load_in_8_bit,
binarization_threshold,
box_threshold,
],
outputs=[input_image, segmentation_heatmap, output_image, ocr_output, synthesized_page],
)
demo.launch(inbrowser=True, allowed_paths=["./data/logo.jpg"])
================================================
FILE: demo/packages.txt
================================================
python3-opencv
fonts-freefont-ttf
================================================
FILE: demo/requirements.txt
================================================
-e "onnxtr[gpu-headless,viz] @ git+https://github.com/felixdittrich92/OnnxTR.git"
gradio>=5.30.0,<7.0.0
spaces>=0.37.0
# Quick fix to avoid HuggingFace Spaces cudnn9.x Cuda12.x issue
# NOTE: outdated
# onnxruntime-gpu==1.19.0
================================================
FILE: onnxtr/__init__.py
================================================
from . import io, models, contrib, transforms, utils
from .version import __version__ # noqa: F401
================================================
FILE: onnxtr/contrib/__init__.py
================================================
from .artefacts import ArtefactDetector
================================================
FILE: onnxtr/contrib/artefacts.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import cv2
import numpy as np
from onnxtr.file_utils import requires_package
from .base import _BasePredictor
__all__ = ["ArtefactDetector"]
default_cfgs: dict[str, dict[str, Any]] = {
"yolov8_artefact": {
"input_shape": (3, 1024, 1024),
"labels": ["bar_code", "qr_code", "logo", "photo"],
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/yolo_artefact-f9d66f14.onnx",
},
}
class ArtefactDetector(_BasePredictor):
"""
A class to detect artefacts in images
>>> from onnxtr.io import DocumentFile
>>> from onnxtr.contrib.artefacts import ArtefactDetector
>>> doc = DocumentFile.from_images(["path/to/image.jpg"])
>>> detector = ArtefactDetector()
>>> results = detector(doc)
Args:
arch: the architecture to use
batch_size: the batch size to use
model_path: the path to the model to use
labels: the labels to use
input_shape: the input shape to use
mask_labels: the mask labels to use
conf_threshold: the confidence threshold to use
iou_threshold: the intersection over union threshold to use
**kwargs: additional arguments to be passed to `download_from_url`
"""
def __init__(
self,
arch: str = "yolov8_artefact",
batch_size: int = 2,
model_path: str | None = None,
labels: list[str] | None = None,
input_shape: tuple[int, int, int] | None = None,
conf_threshold: float = 0.5,
iou_threshold: float = 0.5,
**kwargs: Any,
) -> None:
super().__init__(batch_size=batch_size, url=default_cfgs[arch]["url"], model_path=model_path, **kwargs)
self.labels = labels or default_cfgs[arch]["labels"]
self.input_shape = input_shape or default_cfgs[arch]["input_shape"]
self.conf_threshold = conf_threshold
self.iou_threshold = iou_threshold
def preprocess(self, img: np.ndarray) -> np.ndarray:
return np.transpose(cv2.resize(img, (self.input_shape[2], self.input_shape[1])), (2, 0, 1)) / np.array(255.0)
def postprocess(self, output: list[np.ndarray], input_images: list[list[np.ndarray]]) -> list[list[dict[str, Any]]]:
results = []
for batch in zip(output, input_images):
for out, img in zip(batch[0], batch[1]):
org_height, org_width = img.shape[:2]
width_scale, height_scale = org_width / self.input_shape[2], org_height / self.input_shape[1]
for res in out:
sample_results = []
for row in np.transpose(np.squeeze(res)):
classes_scores = row[4:]
max_score = np.amax(classes_scores)
if max_score >= self.conf_threshold:
class_id = np.argmax(classes_scores)
x, y, w, h = row[0], row[1], row[2], row[3]
# to rescaled xmin, ymin, xmax, ymax
xmin = int((x - w / 2) * width_scale)
ymin = int((y - h / 2) * height_scale)
xmax = int((x + w / 2) * width_scale)
ymax = int((y + h / 2) * height_scale)
sample_results.append({
"label": self.labels[class_id],
"confidence": float(max_score),
"box": [xmin, ymin, xmax, ymax],
})
# Filter out overlapping boxes
boxes = [res["box"] for res in sample_results]
scores = [res["confidence"] for res in sample_results]
keep_indices = cv2.dnn.NMSBoxes(boxes, scores, self.conf_threshold, self.iou_threshold) # type: ignore[arg-type]
sample_results = [sample_results[i] for i in keep_indices]
results.append(sample_results)
self._results = results
return results
def show(self, **kwargs: Any) -> None:
"""
Display the results
Args:
**kwargs: additional keyword arguments to be passed to `plt.show`
"""
requires_package("matplotlib", "`.show()` requires matplotlib installed")
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
# visualize the results with matplotlib
if self._results and self._inputs:
for img, res in zip(self._inputs, self._results):
plt.figure(figsize=(10, 10))
plt.imshow(img)
for obj in res:
xmin, ymin, xmax, ymax = obj["box"]
label = obj["label"]
plt.text(xmin, ymin, f"{label} {obj['confidence']:.2f}", color="red")
plt.gca().add_patch(
Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, fill=False, edgecolor="red", linewidth=2)
)
plt.show(**kwargs)
================================================
FILE: onnxtr/contrib/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
import onnxruntime as ort
from onnxtr.utils.data import download_from_url
class _BasePredictor:
"""
Base class for all predictors
Args:
batch_size: the batch size to use
url: the url to use to download a model if needed
model_path: the path to the model to use
**kwargs: additional arguments to be passed to `download_from_url`
"""
def __init__(self, batch_size: int, url: str | None = None, model_path: str | None = None, **kwargs) -> None:
self.batch_size = batch_size
self.session = self._init_model(url, model_path, **kwargs)
self._inputs: list[np.ndarray] = []
self._results: list[Any] = []
def _init_model(self, url: str | None = None, model_path: str | None = None, **kwargs: Any) -> Any:
"""
Download the model from the given url if needed
Args:
url: the url to use
model_path: the path to the model to use
**kwargs: additional arguments to be passed to `download_from_url`
Returns:
Any: the ONNX loaded model
"""
if not url and not model_path:
raise ValueError("You must provide either a url or a model_path")
onnx_model_path = model_path if model_path else str(download_from_url(url, cache_subdir="models", **kwargs)) # type: ignore[arg-type]
return ort.InferenceSession(onnx_model_path, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
def preprocess(self, img: np.ndarray) -> np.ndarray:
"""
Preprocess the input image
Args:
img: the input image to preprocess
Returns:
np.ndarray: the preprocessed image
"""
raise NotImplementedError
def postprocess(self, output: list[np.ndarray], input_images: list[list[np.ndarray]]) -> Any:
"""
Postprocess the model output
Args:
output: the model output to postprocess
input_images: the input images used to generate the output
Returns:
Any: the postprocessed output
"""
raise NotImplementedError
def __call__(self, inputs: list[np.ndarray]) -> Any:
"""
Call the model on the given inputs
Args:
inputs: the inputs to use
Returns:
Any: the postprocessed output
"""
self._inputs = inputs
model_inputs = self.session.get_inputs()
batched_inputs = [inputs[i : i + self.batch_size] for i in range(0, len(inputs), self.batch_size)]
processed_batches = [
np.array([self.preprocess(img) for img in batch], dtype=np.float32) for batch in batched_inputs
]
outputs = [self.session.run(None, {model_inputs[0].name: batch}) for batch in processed_batches]
return self.postprocess(outputs, batched_inputs)
================================================
FILE: onnxtr/file_utils.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import importlib.metadata
import logging
__all__ = ["requires_package"]
ENV_VARS_TRUE_VALUES = {"1", "ON", "YES", "TRUE"}
ENV_VARS_TRUE_AND_AUTO_VALUES = ENV_VARS_TRUE_VALUES.union({"AUTO"})
def requires_package(name: str, extra_message: str | None = None) -> None: # pragma: no cover
"""
package requirement helper
Args:
name: name of the package
extra_message: additional message to display if the package is not found
"""
try:
_pkg_version = importlib.metadata.version(name)
logging.info(f"{name} version {_pkg_version} available.")
except importlib.metadata.PackageNotFoundError:
raise ImportError(
f"\n\n{extra_message if extra_message is not None else ''} "
f"\nPlease install it with the following command: pip install {name}\n"
)
================================================
FILE: onnxtr/io/__init__.py
================================================
from .elements import *
from .html import *
from .image import *
from .pdf import *
from .reader import *
================================================
FILE: onnxtr/io/elements.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
from defusedxml import defuse_stdlib
defuse_stdlib()
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element as ETElement
from xml.etree.ElementTree import SubElement
import numpy as np
import onnxtr
from onnxtr.file_utils import requires_package
from onnxtr.utils.common_types import BoundingBox
from onnxtr.utils.geometry import resolve_enclosing_bbox, resolve_enclosing_rbbox
from onnxtr.utils.reconstitution import synthesize_page
from onnxtr.utils.repr import NestedObject
try: # optional dependency for visualization
from onnxtr.utils.visualization import visualize_page
except ModuleNotFoundError: # pragma: no cover
pass
__all__ = ["Element", "Word", "Artefact", "Line", "Block", "Page", "Document"]
class Element(NestedObject):
"""Implements an abstract document element with exporting and text rendering capabilities"""
_children_names: list[str] = []
_exported_keys: list[str] = []
def __init__(self, **kwargs: Any) -> None:
for k, v in kwargs.items():
if k in self._children_names:
setattr(self, k, v)
else:
raise KeyError(f"{self.__class__.__name__} object does not have any attribute named '{k}'")
def export(self) -> dict[str, Any]:
"""Exports the object into a nested dict format"""
export_dict = {k: getattr(self, k) for k in self._exported_keys}
for children_name in self._children_names:
export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
return export_dict
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
raise NotImplementedError
def render(self) -> str:
raise NotImplementedError
class Word(Element):
"""Implements a word element
Args:
value: the text string of the word
confidence: the confidence associated with the text prediction
geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
the page's size
objectness_score: the objectness score of the detection
crop_orientation: the general orientation of the crop in degrees and its confidence
"""
_exported_keys: list[str] = ["value", "confidence", "geometry", "objectness_score", "crop_orientation"]
_children_names: list[str] = []
def __init__(
self,
value: str,
confidence: float,
geometry: BoundingBox | np.ndarray,
objectness_score: float,
crop_orientation: dict[str, Any],
) -> None:
super().__init__()
self.value = value
self.confidence = confidence
self.geometry = geometry
self.objectness_score = objectness_score
self.crop_orientation = crop_orientation
def render(self) -> str:
"""Renders the full text of the element"""
return self.value
def extra_repr(self) -> str:
return f"value='{self.value}', confidence={self.confidence:.2}"
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
return cls(**kwargs)
class Artefact(Element):
"""Implements a non-textual element
Args:
artefact_type: the type of artefact
confidence: the confidence of the type prediction
geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
the page's size.
"""
_exported_keys: list[str] = ["geometry", "type", "confidence"]
_children_names: list[str] = []
def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
super().__init__()
self.geometry = geometry
self.type = artefact_type
self.confidence = confidence
def render(self) -> str:
"""Renders the full text of the element"""
return f"[{self.type.upper()}]"
def extra_repr(self) -> str:
return f"type='{self.type}', confidence={self.confidence:.2}"
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
return cls(**kwargs)
class Line(Element):
"""Implements a line element as a collection of words
Args:
words: list of word elements
geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
all words in it.
"""
_exported_keys: list[str] = ["geometry", "objectness_score"]
_children_names: list[str] = ["words"]
words: list[Word] = []
def __init__(
self,
words: list[Word],
geometry: BoundingBox | np.ndarray | None = None,
objectness_score: float | None = None,
) -> None:
# Compute the objectness score of the line
if objectness_score is None:
objectness_score = float(np.mean([w.objectness_score for w in words]))
# Resolve the geometry using the smallest enclosing bounding box
if geometry is None:
# Check whether this is a rotated or straight box
box_resolution_fn = resolve_enclosing_rbbox if len(words[0].geometry) == 4 else resolve_enclosing_bbox
geometry = box_resolution_fn([w.geometry for w in words]) # type: ignore[misc]
super().__init__(words=words)
self.geometry = geometry
self.objectness_score = objectness_score
def render(self) -> str:
"""Renders the full text of the element"""
return " ".join(w.render() for w in self.words)
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
kwargs.update({
"words": [Word.from_dict(_dict) for _dict in save_dict["words"]],
})
return cls(**kwargs)
class Block(Element):
"""Implements a block element as a collection of lines and artefacts
Args:
lines: list of line elements
artefacts: list of artefacts
geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
all lines and artefacts in it.
"""
_exported_keys: list[str] = ["geometry", "objectness_score"]
_children_names: list[str] = ["lines", "artefacts"]
lines: list[Line] = []
artefacts: list[Artefact] = []
def __init__(
self,
lines: list[Line] = [],
artefacts: list[Artefact] = [],
geometry: BoundingBox | np.ndarray | None = None,
objectness_score: float | None = None,
) -> None:
# Compute the objectness score of the line
if objectness_score is None:
objectness_score = float(np.mean([w.objectness_score for line in lines for w in line.words]))
# Resolve the geometry using the smallest enclosing bounding box
if geometry is None:
line_boxes = [word.geometry for line in lines for word in line.words]
artefact_boxes = [artefact.geometry for artefact in artefacts]
box_resolution_fn = (
resolve_enclosing_rbbox if isinstance(lines[0].geometry, np.ndarray) else resolve_enclosing_bbox
)
geometry = box_resolution_fn(line_boxes + artefact_boxes) # type: ignore
super().__init__(lines=lines, artefacts=artefacts)
self.geometry = geometry
self.objectness_score = objectness_score
def render(self, line_break: str = "\n") -> str:
"""Renders the full text of the element"""
return line_break.join(line.render() for line in self.lines)
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
kwargs.update({
"lines": [Line.from_dict(_dict) for _dict in save_dict["lines"]],
"artefacts": [Artefact.from_dict(_dict) for _dict in save_dict["artefacts"]],
})
return cls(**kwargs)
class Page(Element):
"""Implements a page element as a collection of blocks
Args:
page: image encoded as a numpy array in uint8
blocks: list of block elements
page_idx: the index of the page in the input raw document
dimensions: the page size in pixels in format (height, width)
orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
language: a dictionary with the language value and confidence of the prediction
"""
_exported_keys: list[str] = ["page_idx", "dimensions", "orientation", "language"]
_children_names: list[str] = ["blocks"]
blocks: list[Block] = []
def __init__(
self,
page: np.ndarray,
blocks: list[Block],
page_idx: int,
dimensions: tuple[int, int],
orientation: dict[str, Any] | None = None,
language: dict[str, Any] | None = None,
) -> None:
super().__init__(blocks=blocks)
self.page = page
self.page_idx = page_idx
self.dimensions = dimensions
self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
def render(self, block_break: str = "\n\n") -> str:
"""Renders the full text of the element"""
return block_break.join(b.render() for b in self.blocks)
def extra_repr(self) -> str:
return f"dimensions={self.dimensions}"
def show(self, interactive: bool = True, preserve_aspect_ratio: bool = False, **kwargs) -> None:
"""Overlay the result on a given image
Args:
interactive: whether the display should be interactive
preserve_aspect_ratio: pass True if you passed True to the predictor
**kwargs: additional keyword arguments passed to the matplotlib.pyplot.show method
"""
requires_package("matplotlib", "`.show()` requires matplotlib & mplcursors installed")
requires_package("mplcursors", "`.show()` requires matplotlib & mplcursors installed")
import matplotlib.pyplot as plt
visualize_page(self.export(), self.page, interactive=interactive, preserve_aspect_ratio=preserve_aspect_ratio)
plt.show(**kwargs)
def synthesize(self, **kwargs) -> np.ndarray:
"""Synthesize the page from the predictions
Args:
**kwargs: keyword arguments passed to the `synthesize_page` method
Returns
synthesized page
"""
return synthesize_page(self.export(), **kwargs)
def export_as_xml(self, file_title: str = "OnnxTR - XML export (hOCR)") -> tuple[bytes, ET.ElementTree]:
"""Export the page as XML (hOCR-format)
convention: https://github.com/kba/hocr-spec/blob/master/1.2/spec.md
Args:
file_title: the title of the XML file
Returns:
a tuple of the XML byte string, and its ElementTree
"""
p_idx = self.page_idx
block_count: int = 1
line_count: int = 1
word_count: int = 1
height, width = self.dimensions
language = self.language if "language" in self.language.keys() else "en"
# Create the XML root element
page_hocr = ETElement("html", attrib={"xmlns": "http://www.w3.org/1999/xhtml", "xml:lang": str(language)})
# Create the header / SubElements of the root element
head = SubElement(page_hocr, "head")
SubElement(head, "title").text = file_title
SubElement(head, "meta", attrib={"http-equiv": "Content-Type", "content": "text/html; charset=utf-8"})
SubElement(
head,
"meta",
attrib={"name": "ocr-system", "content": f"onnxtr {onnxtr.__version__}"}, # type: ignore[attr-defined]
)
SubElement(
head,
"meta",
attrib={"name": "ocr-capabilities", "content": "ocr_page ocr_carea ocr_par ocr_line ocrx_word"},
)
# Create the body
body = SubElement(page_hocr, "body")
page_div = SubElement(
body,
"div",
attrib={
"class": "ocr_page",
"id": f"page_{p_idx + 1}",
"title": f"image; bbox 0 0 {width} {height}; ppageno 0",
},
)
# iterate over the blocks / lines / words and create the XML elements in body line by line with the attributes
for block in self.blocks:
if len(block.geometry) != 2:
raise TypeError("XML export is only available for straight bounding boxes for now.")
(xmin, ymin), (xmax, ymax) = block.geometry
block_div = SubElement(
page_div,
"div",
attrib={
"class": "ocr_carea",
"id": f"block_{block_count}",
"title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
{int(round(xmax * width))} {int(round(ymax * height))}",
},
)
paragraph = SubElement(
block_div,
"p",
attrib={
"class": "ocr_par",
"id": f"par_{block_count}",
"title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
{int(round(xmax * width))} {int(round(ymax * height))}",
},
)
block_count += 1
for line in block.lines:
(xmin, ymin), (xmax, ymax) = line.geometry
# NOTE: baseline, x_size, x_descenders, x_ascenders is currently initalized to 0
line_span = SubElement(
paragraph,
"span",
attrib={
"class": "ocr_line",
"id": f"line_{line_count}",
"title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
{int(round(xmax * width))} {int(round(ymax * height))}; \
baseline 0 0; x_size 0; x_descenders 0; x_ascenders 0",
},
)
line_count += 1
for word in line.words:
(xmin, ymin), (xmax, ymax) = word.geometry
conf = word.confidence
word_div = SubElement(
line_span,
"span",
attrib={
"class": "ocrx_word",
"id": f"word_{word_count}",
"title": f"bbox {int(round(xmin * width))} {int(round(ymin * height))} \
{int(round(xmax * width))} {int(round(ymax * height))}; \
x_wconf {int(round(conf * 100))}",
},
)
# set the text
word_div.text = word.value
word_count += 1
return (ET.tostring(page_hocr, encoding="utf-8", method="xml"), ET.ElementTree(page_hocr))
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
kwargs.update({"blocks": [Block.from_dict(block_dict) for block_dict in save_dict["blocks"]]})
return cls(**kwargs)
class Document(Element):
"""Implements a document element as a collection of pages
Args:
pages: list of page elements
"""
_children_names: list[str] = ["pages"]
pages: list[Page] = []
def __init__(
self,
pages: list[Page],
) -> None:
super().__init__(pages=pages)
def render(self, page_break: str = "\n\n\n\n") -> str:
"""Renders the full text of the element"""
return page_break.join(p.render() for p in self.pages)
def show(self, **kwargs) -> None:
"""Overlay the result on a given image"""
for result in self.pages:
result.show(**kwargs)
def synthesize(self, **kwargs) -> list[np.ndarray]:
"""Synthesize all pages from their predictions
Args:
**kwargs: keyword arguments passed to the `Page.synthesize` method
Returns:
list of synthesized pages
"""
return [page.synthesize(**kwargs) for page in self.pages]
def export_as_xml(self, **kwargs) -> list[tuple[bytes, ET.ElementTree]]:
"""Export the document as XML (hOCR-format)
Args:
**kwargs: additional keyword arguments passed to the Page.export_as_xml method
Returns:
list of tuple of (bytes, ElementTree)
"""
return [page.export_as_xml(**kwargs) for page in self.pages]
@classmethod
def from_dict(cls, save_dict: dict[str, Any], **kwargs):
kwargs = {k: save_dict[k] for k in cls._exported_keys}
kwargs.update({"pages": [Page.from_dict(page_dict) for page_dict in save_dict["pages"]]})
return cls(**kwargs)
================================================
FILE: onnxtr/io/html.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
__all__ = ["read_html"]
def read_html(url: str, **kwargs: Any) -> bytes:
"""Read a PDF file and convert it into an image in numpy format
>>> from onnxtr.io import read_html
>>> doc = read_html("https://www.yoursite.com")
Args:
url: URL of the target web page
**kwargs: keyword arguments from `weasyprint.HTML`
Returns:
decoded PDF file as a bytes stream
"""
from weasyprint import HTML
return HTML(url, **kwargs).write_pdf()
================================================
FILE: onnxtr/io/image.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from pathlib import Path
import cv2
import numpy as np
from onnxtr.utils.common_types import AbstractFile
__all__ = ["read_img_as_numpy"]
def read_img_as_numpy(
file: AbstractFile,
output_size: tuple[int, int] | None = None,
rgb_output: bool = True,
) -> np.ndarray:
"""Read an image file into numpy format
>>> from onnxtr.io import read_img_as_numpy
>>> page = read_img_as_numpy("path/to/your/doc.jpg")
Args:
file: the path to the image file
output_size: the expected output size of each page in format H x W
rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
Returns:
the page decoded as numpy ndarray of shape H x W x 3
"""
if isinstance(file, (str, Path)):
if not Path(file).is_file():
raise FileNotFoundError(f"unable to access {file}")
img = cv2.imread(str(file), cv2.IMREAD_COLOR)
elif isinstance(file, bytes):
_file: np.ndarray = np.frombuffer(file, np.uint8)
img = cv2.imdecode(_file, cv2.IMREAD_COLOR)
else:
raise TypeError("unsupported object type for argument 'file'")
# Validity check
if img is None:
raise ValueError("unable to read file.")
# Resizing
if isinstance(output_size, tuple):
img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
# Switch the channel order
if rgb_output:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img
================================================
FILE: onnxtr/io/pdf.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
import pypdfium2 as pdfium
from onnxtr.utils.common_types import AbstractFile
__all__ = ["read_pdf"]
def read_pdf(
file: AbstractFile,
scale: int = 2,
rgb_mode: bool = True,
password: str | None = None,
**kwargs: Any,
) -> list[np.ndarray]:
"""Read a PDF file and convert it into an image in numpy format
>>> from onnxtr.io import read_pdf
>>> doc = read_pdf("path/to/your/doc.pdf")
Args:
file: the path to the PDF file
scale: rendering scale (1 corresponds to 72dpi)
rgb_mode: if True, the output will be RGB, otherwise BGR
password: a password to unlock the document, if encrypted
**kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`
Returns:
the list of pages decoded as numpy ndarray of shape H x W x C
"""
# Rasterise pages to numpy ndarrays with pypdfium2
pdf = pdfium.PdfDocument(file, password=password)
try:
return [page.render(scale=scale, rev_byteorder=rgb_mode, **kwargs).to_numpy() for page in pdf]
finally:
pdf.close()
================================================
FILE: onnxtr/io/reader.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from collections.abc import Sequence
from pathlib import Path
import numpy as np
from onnxtr.file_utils import requires_package
from onnxtr.utils.common_types import AbstractFile
from .html import read_html
from .image import read_img_as_numpy
from .pdf import read_pdf
__all__ = ["DocumentFile"]
class DocumentFile:
"""Read a document from multiple extensions"""
@classmethod
def from_pdf(cls, file: AbstractFile, **kwargs) -> list[np.ndarray]:
"""Read a PDF file
>>> from onnxtr.io import DocumentFile
>>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
Args:
file: the path to the PDF file or a binary stream
**kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`
Returns:
the list of pages decoded as numpy ndarray of shape H x W x 3
"""
return read_pdf(file, **kwargs)
@classmethod
def from_url(cls, url: str, **kwargs) -> list[np.ndarray]:
"""Interpret a web page as a PDF document
>>> from onnxtr.io import DocumentFile
>>> doc = DocumentFile.from_url("https://www.yoursite.com")
Args:
url: the URL of the target web page
**kwargs: additional parameters to :meth:`pypdfium2.PdfPage.render`
Returns:
the list of pages decoded as numpy ndarray of shape H x W x 3
"""
requires_package(
"weasyprint",
"`.from_url` requires weasyprint installed.\n"
+ "Installation instructions: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation",
)
pdf_stream = read_html(url)
return cls.from_pdf(pdf_stream, **kwargs)
@classmethod
def from_images(cls, files: Sequence[AbstractFile] | AbstractFile, **kwargs) -> list[np.ndarray]:
"""Read an image file (or a collection of image files) and convert it into an image in numpy format
>>> from onnxtr.io import DocumentFile
>>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
Args:
files: the path to the image file or a binary stream, or a collection of those
**kwargs: additional parameters to :meth:`onnxtr.io.image.read_img_as_numpy`
Returns:
the list of pages decoded as numpy ndarray of shape H x W x 3
"""
if isinstance(files, (str, Path, bytes)):
files = [files]
return [read_img_as_numpy(file, **kwargs) for file in files]
================================================
FILE: onnxtr/models/__init__.py
================================================
from .engine import EngineConfig
from .classification import *
from .detection import *
from .recognition import *
from .zoo import *
from .factory import *
================================================
FILE: onnxtr/models/_utils.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from math import floor
from statistics import median_low
import cv2
import numpy as np
from langdetect import LangDetectException, detect_langs
from onnxtr.utils.geometry import rotate_image
__all__ = ["estimate_orientation", "get_language"]
def get_max_width_length_ratio(contour: np.ndarray) -> float:
"""Get the maximum shape ratio of a contour.
Args:
contour: the contour from cv2.findContour
Returns:
the maximum shape ratio
"""
_, (w, h), _ = cv2.minAreaRect(contour)
if w == 0 or h == 0:
return 0.0
return max(w / h, h / w)
def estimate_orientation(
img: np.ndarray,
general_page_orientation: tuple[int, float] | None = None,
n_ct: int = 70,
ratio_threshold_for_lines: float = 3,
min_confidence: float = 0.2,
lower_area: int = 100,
) -> int:
"""Estimate the angle of the general document orientation based on the
lines of the document and the assumption that they should be horizontal.
Args:
img: the img or bitmap to analyze (H, W, C)
general_page_orientation: the general orientation of the page (angle [0, 90, 180, 270 (-90)], confidence)
estimated by a model
n_ct: the number of contours used for the orientation estimation
ratio_threshold_for_lines: this is the ratio w/h used to discriminates lines
min_confidence: the minimum confidence to consider the general_page_orientation
lower_area: the minimum area of a contour to be considered
Returns:
the estimated angle of the page (clockwise, negative for left side rotation, positive for right side rotation)
"""
assert len(img.shape) == 3 and img.shape[-1] in [1, 3], f"Image shape {img.shape} not supported"
# Convert image to grayscale if necessary
if img.shape[-1] == 3:
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_img = cv2.medianBlur(gray_img, 5)
thresh = cv2.threshold(gray_img, thresh=0, maxval=255, type=cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
else:
thresh = img.astype(np.uint8)
page_orientation, orientation_confidence = general_page_orientation or (0, 0.0)
is_confident = page_orientation is not None and orientation_confidence >= min_confidence
base_angle = page_orientation if is_confident else 0
if is_confident:
# We rotate the image to the general orientation which improves the detection
# No expand needed bitmap is already padded
thresh = rotate_image(thresh, -base_angle)
else: # That's only required if we do not work on the detection models bin map
# try to merge words in lines
(h, w) = img.shape[:2]
k_x = max(1, (floor(w / 100)))
k_y = max(1, (floor(h / 100)))
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (k_x, k_y))
thresh = cv2.dilate(thresh, kernel, iterations=1)
# extract contours
contours, _ = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Filter & Sort contours
contours = sorted(
[contour for contour in contours if cv2.contourArea(contour) > lower_area],
key=get_max_width_length_ratio,
reverse=True,
)
angles = []
for contour in contours[:n_ct]:
_, (w, h), angle = cv2.minAreaRect(contour)
# OpenCV version-proof normalization: force 'w' to be the long side
# so the angle is consistently relative to the major axis.
# https://github.com/opencv/opencv/pull/28051/changes
if w < h:
w, h = h, w
angle -= 90
# Normalize angle to be within [-90, 90]
while angle <= -90:
angle += 180
while angle > 90:
angle -= 180
if h > 0:
if w / h > ratio_threshold_for_lines: # select only contours with ratio like lines
angles.append(angle)
elif w / h < 1 / ratio_threshold_for_lines: # if lines are vertical, substract 90 degree
angles.append(angle - 90)
if len(angles) == 0:
skew_angle = 0 # in case no angles is found
else:
# median_low picks a value from the data to avoid outliers
median = -median_low(angles)
skew_angle = -round(median) if abs(median) != 0 else 0
# Resolve the 90-degree flip ambiguity.
# If the estimation is exactly 90/-90, it's usually a vertical detection of horizontal lines.
if abs(skew_angle) == 90:
skew_angle = 0
# combine with the general orientation and the estimated angle
# Apply the detected skew to our base orientation
final_angle = base_angle + skew_angle
# Standardize result to [-179, 180] range to handle wrap-around cases (e.g., 180 + -31)
while final_angle > 180:
final_angle -= 360
while final_angle <= -180:
final_angle += 360
if is_confident:
# If the estimated angle is perpendicular, treat it as 0 to avoid wrong flips
if abs(skew_angle) % 90 == 0:
return page_orientation
# special case where the estimated angle is mostly wrong:
# case 1: - and + swapped
# case 2: estimated angle is completely wrong
# so in this case we prefer the general page orientation
if abs(skew_angle) == abs(page_orientation) and page_orientation != 0:
return page_orientation
return int(
final_angle
) # return the clockwise angle (negative - left side rotation, positive - right side rotation)
def rectify_crops(
crops: list[np.ndarray],
orientations: list[int],
) -> list[np.ndarray]:
"""Rotate each crop of the list according to the predicted orientation:
0: already straight, no rotation
1: 90 ccw, rotate 3 times ccw
2: 180, rotate 2 times ccw
3: 270 ccw, rotate 1 time ccw
"""
# Inverse predictions (if angle of +90 is detected, rotate by -90)
orientations = [4 - pred if pred != 0 else 0 for pred in orientations]
return (
[crop if orientation == 0 else np.rot90(crop, orientation) for orientation, crop in zip(orientations, crops)]
if len(orientations) > 0
else []
)
def rectify_loc_preds(
page_loc_preds: np.ndarray,
orientations: list[int],
) -> np.ndarray | None:
"""Orient the quadrangle (Polygon4P) according to the predicted orientation,
so that the points are in this order: top L, top R, bot R, bot L if the crop is readable
"""
return (
np.stack(
[
np.roll(page_loc_pred, orientation, axis=0)
for orientation, page_loc_pred in zip(orientations, page_loc_preds)
],
axis=0,
)
if len(orientations) > 0
else None
)
def get_language(text: str) -> tuple[str, float]:
"""Get languages of a text using langdetect model.
Get the language with the highest probability or no language if only a few words or a low probability
Args:
text (str): text
Returns:
The detected language in ISO 639 code and confidence score
"""
try:
lang = detect_langs(text.lower())[0]
except LangDetectException:
return "unknown", 0.0
if len(text) <= 1 or (len(text) <= 5 and lang.prob <= 0.2):
return "unknown", 0.0
return lang.lang, lang.prob
================================================
FILE: onnxtr/models/builder.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
from scipy.cluster.hierarchy import fclusterdata
from onnxtr.io.elements import Block, Document, Line, Page, Word
from onnxtr.utils.geometry import estimate_page_angle, resolve_enclosing_bbox, resolve_enclosing_rbbox, rotate_boxes
from onnxtr.utils.repr import NestedObject
__all__ = ["DocumentBuilder"]
class DocumentBuilder(NestedObject):
"""Implements a document builder
Args:
resolve_lines: whether words should be automatically grouped into lines
resolve_blocks: whether lines should be automatically grouped into blocks
paragraph_break: relative length of the minimum space separating paragraphs
export_as_straight_boxes: if True, force straight boxes in the export (fit a rectangle
box to all rotated boxes). Else, keep the boxes format unchanged, no matter what it is.
"""
def __init__(
self,
resolve_lines: bool = True,
resolve_blocks: bool = False,
paragraph_break: float = 0.035,
export_as_straight_boxes: bool = False,
) -> None:
self.resolve_lines = resolve_lines
self.resolve_blocks = resolve_blocks
self.paragraph_break = paragraph_break
self.export_as_straight_boxes = export_as_straight_boxes
@staticmethod
def _sort_boxes(boxes: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Sort bounding boxes from top to bottom, left to right
Args:
boxes: bounding boxes of shape (N, 4) or (N, 4, 2) (in case of rotated bbox)
Returns:
tuple: indices of ordered boxes of shape (N,), boxes
If straight boxes are passed tpo the function, boxes are unchanged
else: boxes returned are straight boxes fitted to the straightened rotated boxes
so that we fit the lines afterwards to the straigthened page
"""
if boxes.ndim == 3:
boxes = rotate_boxes(
loc_preds=boxes,
angle=-estimate_page_angle(boxes),
orig_shape=(1024, 1024),
min_angle=5.0,
)
boxes = np.concatenate((boxes.min(1), boxes.max(1)), -1)
return (boxes[:, 0] + 2 * boxes[:, 3] / np.median(boxes[:, 3] - boxes[:, 1])).argsort(), boxes
def _resolve_sub_lines(self, boxes: np.ndarray, word_idcs: list[int]) -> list[list[int]]:
"""Split a line in sub_lines
Args:
boxes: bounding boxes of shape (N, 4)
word_idcs: list of indexes for the words of the line
Returns:
A list of (sub-)lines computed from the original line (words)
"""
lines = []
# Sort words horizontally
word_idcs = [word_idcs[idx] for idx in boxes[word_idcs, 0].argsort().tolist()]
# Eventually split line horizontally
if len(word_idcs) < 2:
lines.append(word_idcs)
else:
sub_line = [word_idcs[0]]
for i in word_idcs[1:]:
horiz_break = True
prev_box = boxes[sub_line[-1]]
# Compute distance between boxes
dist = boxes[i, 0] - prev_box[2]
# If distance between boxes is lower than paragraph break, same sub-line
if dist < self.paragraph_break:
horiz_break = False
if horiz_break:
lines.append(sub_line)
sub_line = []
sub_line.append(i)
lines.append(sub_line)
return lines
def _resolve_lines(self, boxes: np.ndarray) -> list[list[int]]:
"""Order boxes to group them in lines
Args:
boxes: bounding boxes of shape (N, 4) or (N, 4, 2) in case of rotated bbox
Returns:
nested list of box indices
"""
# Sort boxes, and straighten the boxes if they are rotated
idxs, boxes = self._sort_boxes(boxes)
# Compute median for boxes heights
y_med = np.median(boxes[:, 3] - boxes[:, 1])
lines = []
words = [idxs[0]] # Assign the top-left word to the first line
# Define a mean y-center for the line
y_center_sum = boxes[idxs[0]][[1, 3]].mean()
for idx in idxs[1:]:
vert_break = True
# Compute y_dist
y_dist = abs(boxes[idx][[1, 3]].mean() - y_center_sum / len(words))
# If y-center of the box is close enough to mean y-center of the line, same line
if y_dist < y_med / 2:
vert_break = False
if vert_break:
# Compute sub-lines (horizontal split)
lines.extend(self._resolve_sub_lines(boxes, words))
words = []
y_center_sum = 0
words.append(idx)
y_center_sum += boxes[idx][[1, 3]].mean()
# Use the remaining words to form the last(s) line(s)
if len(words) > 0:
# Compute sub-lines (horizontal split)
lines.extend(self._resolve_sub_lines(boxes, words))
return lines
@staticmethod
def _resolve_blocks(boxes: np.ndarray, lines: list[list[int]]) -> list[list[list[int]]]:
"""Order lines to group them in blocks
Args:
boxes: bounding boxes of shape (N, 4) or (N, 4, 2)
lines: list of lines, each line is a list of idx
Returns:
nested list of box indices
"""
# Resolve enclosing boxes of lines
if boxes.ndim == 3:
box_lines: np.ndarray = np.asarray([
resolve_enclosing_rbbox([tuple(boxes[idx, :, :]) for idx in line]) # type: ignore[misc]
for line in lines
])
else:
_box_lines = [
resolve_enclosing_bbox([(tuple(boxes[idx, :2]), tuple(boxes[idx, 2:])) for idx in line])
for line in lines
]
box_lines = np.asarray([(x1, y1, x2, y2) for ((x1, y1), (x2, y2)) in _box_lines])
# Compute geometrical features of lines to clusterize
# Clusterizing only with box centers yield to poor results for complex documents
if boxes.ndim == 3:
box_features: np.ndarray = np.stack(
(
(box_lines[:, 0, 0] + box_lines[:, 0, 1]) / 2,
(box_lines[:, 0, 0] + box_lines[:, 2, 0]) / 2,
(box_lines[:, 0, 0] + box_lines[:, 2, 1]) / 2,
(box_lines[:, 0, 1] + box_lines[:, 2, 1]) / 2,
(box_lines[:, 0, 1] + box_lines[:, 2, 0]) / 2,
(box_lines[:, 2, 0] + box_lines[:, 2, 1]) / 2,
),
axis=-1,
)
else:
box_features = np.stack(
(
(box_lines[:, 0] + box_lines[:, 3]) / 2,
(box_lines[:, 1] + box_lines[:, 2]) / 2,
(box_lines[:, 0] + box_lines[:, 2]) / 2,
(box_lines[:, 1] + box_lines[:, 3]) / 2,
box_lines[:, 0],
box_lines[:, 1],
),
axis=-1,
)
# Compute clusters
clusters = fclusterdata(box_features, t=0.1, depth=4, criterion="distance", metric="euclidean")
_blocks: dict[int, list[int]] = {}
# Form clusters
for line_idx, cluster_idx in enumerate(clusters):
if cluster_idx in _blocks.keys():
_blocks[cluster_idx].append(line_idx)
else:
_blocks[cluster_idx] = [line_idx]
# Retrieve word-box level to return a fully nested structure
blocks = [[lines[idx] for idx in block] for block in _blocks.values()]
return blocks
def _build_blocks(
self,
boxes: np.ndarray,
objectness_scores: np.ndarray,
word_preds: list[tuple[str, float]],
crop_orientations: list[dict[str, Any]],
) -> list[Block]:
"""Gather independent words in structured blocks
Args:
boxes: bounding boxes of all detected words of the page, of shape (N, 4) or (N, 4, 2)
objectness_scores: objectness scores of all detected words of the page, of shape N
word_preds: list of all detected words of the page, of shape N
crop_orientations: list of dictoinaries containing
the general orientation (orientations + confidences) of the crops
Returns:
list of block elements
"""
if boxes.shape[0] != len(word_preds):
raise ValueError(f"Incompatible argument lengths: {boxes.shape[0]}, {len(word_preds)}")
if boxes.shape[0] == 0:
return []
# Decide whether we try to form lines
_boxes = boxes
if self.resolve_lines:
lines = self._resolve_lines(_boxes if _boxes.ndim == 3 else _boxes[:, :4])
# Decide whether we try to form blocks
if self.resolve_blocks and len(lines) > 1:
_blocks = self._resolve_blocks(_boxes if _boxes.ndim == 3 else _boxes[:, :4], lines)
else:
_blocks = [lines]
else:
# Sort bounding boxes, one line for all boxes, one block for the line
lines = [self._sort_boxes(_boxes if _boxes.ndim == 3 else _boxes[:, :4])[0]] # type: ignore[list-item]
_blocks = [lines]
blocks = [
Block([
Line([
Word(
*word_preds[idx],
tuple(tuple(pt) for pt in boxes[idx].tolist()), # type: ignore[arg-type]
float(objectness_scores[idx]),
crop_orientations[idx],
)
if boxes.ndim == 3
else Word(
*word_preds[idx],
((boxes[idx, 0], boxes[idx, 1]), (boxes[idx, 2], boxes[idx, 3])),
float(objectness_scores[idx]),
crop_orientations[idx],
)
for idx in line
])
for line in lines
])
for lines in _blocks
]
return blocks
def extra_repr(self) -> str:
return (
f"resolve_lines={self.resolve_lines}, resolve_blocks={self.resolve_blocks}, "
f"paragraph_break={self.paragraph_break}, "
f"export_as_straight_boxes={self.export_as_straight_boxes}"
)
def __call__(
self,
pages: list[np.ndarray],
boxes: list[np.ndarray],
objectness_scores: list[np.ndarray],
text_preds: list[list[tuple[str, float]]],
page_shapes: list[tuple[int, int]],
crop_orientations: list[dict[str, Any]],
orientations: list[dict[str, Any]] | None = None,
languages: list[dict[str, Any]] | None = None,
) -> Document:
"""Re-arrange detected words into structured blocks
Args:
pages: list of N elements, where each element represents the page image
boxes: list of N elements, where each element represents the localization predictions, of shape (*, 4)
or (*, 4, 2) for all words for a given page
objectness_scores: list of N elements, where each element represents the objectness scores
text_preds: list of N elements, where each element is the list of all word prediction (text + confidence)
page_shapes: shape of each page, of size N
crop_orientations: list of N elements, where each element is
a dictionary containing the general orientation (orientations + confidences) of the crops
orientations: optional, list of N elements,
where each element is a dictionary containing the orientation (orientation + confidence)
languages: optional, list of N elements,
where each element is a dictionary containing the language (language + confidence)
Returns:
document object
"""
if len(boxes) != len(text_preds) != len(crop_orientations) != len(objectness_scores) or len(boxes) != len(
page_shapes
) != len(crop_orientations) != len(objectness_scores):
raise ValueError("All arguments are expected to be lists of the same size")
_orientations = orientations if isinstance(orientations, list) else [None] * len(boxes)
_languages = languages if isinstance(languages, list) else [None] * len(boxes)
if self.export_as_straight_boxes and len(boxes) > 0:
# If boxes are already straight OK, else fit a bounding rect
if boxes[0].ndim == 3:
# Iterate over pages and boxes
boxes = [np.concatenate((p_boxes.min(1), p_boxes.max(1)), 1) for p_boxes in boxes]
_pages = [
Page(
page,
self._build_blocks(
page_boxes,
loc_scores,
word_preds,
word_crop_orientations,
),
_idx,
shape,
orientation,
language,
)
for page, _idx, shape, page_boxes, loc_scores, word_preds, word_crop_orientations, orientation, language in zip( # noqa: E501
pages,
range(len(boxes)),
page_shapes,
boxes,
objectness_scores,
text_preds,
crop_orientations,
_orientations,
_languages,
)
]
return Document(_pages)
================================================
FILE: onnxtr/models/classification/__init__.py
================================================
from .models import *
from .zoo import *
================================================
FILE: onnxtr/models/classification/models/__init__.py
================================================
from .mobilenet import *
================================================
FILE: onnxtr/models/classification/models/mobilenet.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
# Greatly inspired by https://github.com/pytorch/vision/blob/master/torchvision/models/mobilenetv3.py
from copy import deepcopy
from typing import Any
import numpy as np
from ...engine import Engine, EngineConfig
__all__ = [
"MobileNetV3",
"mobilenet_v3_small_crop_orientation",
"mobilenet_v3_small_page_orientation",
]
default_cfgs: dict[str, dict[str, Any]] = {
"mobilenet_v3_small_crop_orientation": {
"mean": (0.694, 0.695, 0.693),
"std": (0.299, 0.296, 0.301),
"input_shape": (3, 256, 256),
"classes": [0, -90, 180, 90],
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_crop_orientation-4fde60a1.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_crop_orientation_static_8_bit-c32c7721.onnx",
},
"mobilenet_v3_small_page_orientation": {
"mean": (0.694, 0.695, 0.693),
"std": (0.299, 0.296, 0.301),
"input_shape": (3, 512, 512),
"classes": [0, -90, 180, 90],
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_page_orientation-60606ce4.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.6.0/mobilenet_v3_small_page_orientation_static_8_bit-13b5b014.onnx",
},
}
class MobileNetV3(Engine):
"""MobileNetV3 Onnx loader
Args:
model_path: path or url to onnx model file
engine_cfg: configuration for the inference engine
cfg: configuration dictionary
**kwargs: additional arguments to be passed to `Engine`
"""
def __init__(
self,
model_path: str,
engine_cfg: EngineConfig | None = None,
cfg: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)
self.cfg = cfg
def __call__(
self,
x: np.ndarray,
) -> np.ndarray:
return self.run(x)
def _mobilenet_v3(
arch: str,
model_path: str,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> MobileNetV3:
# Patch the url
model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
_cfg = deepcopy(default_cfgs[arch])
return MobileNetV3(model_path, cfg=_cfg, engine_cfg=engine_cfg, **kwargs)
def mobilenet_v3_small_crop_orientation(
model_path: str = default_cfgs["mobilenet_v3_small_crop_orientation"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> MobileNetV3:
"""MobileNetV3-Small architecture as described in
`"Searching for MobileNetV3",
<https://arxiv.org/pdf/1905.02244.pdf>`_.
>>> import numpy as np
>>> from onnxtr.models import mobilenet_v3_small_crop_orientation
>>> model = mobilenet_v3_small_crop_orientation()
>>> input_tensor = np.random.rand((1, 3, 256, 256))
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the MobileNetV3 architecture
Returns:
MobileNetV3
"""
return _mobilenet_v3("mobilenet_v3_small_crop_orientation", model_path, load_in_8_bit, engine_cfg, **kwargs)
def mobilenet_v3_small_page_orientation(
model_path: str = default_cfgs["mobilenet_v3_small_page_orientation"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> MobileNetV3:
"""MobileNetV3-Small architecture as described in
`"Searching for MobileNetV3",
<https://arxiv.org/pdf/1905.02244.pdf>`_.
>>> import numpy as np
>>> from onnxtr.models import mobilenet_v3_small_page_orientation
>>> model = mobilenet_v3_small_page_orientation()
>>> input_tensor = np.random.rand((1, 3, 512, 512))
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the MobileNetV3 architecture
Returns:
MobileNetV3
"""
return _mobilenet_v3("mobilenet_v3_small_page_orientation", model_path, load_in_8_bit, engine_cfg, **kwargs)
================================================
FILE: onnxtr/models/classification/predictor/__init__.py
================================================
from .base import *
================================================
FILE: onnxtr/models/classification/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
from scipy.special import softmax
from onnxtr.models.preprocessor import PreProcessor
from onnxtr.utils.repr import NestedObject
__all__ = ["OrientationPredictor"]
class OrientationPredictor(NestedObject):
"""Implements an object able to detect the reading direction of a text box or a page.
4 possible orientations: 0, 90, 180, 270 (-90) degrees counter clockwise.
Args:
pre_processor: transform inputs for easier batched model inference
model: core classification architecture (backbone + classification head)
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
"""
_children_names: list[str] = ["pre_processor", "model"]
def __init__(
self,
pre_processor: PreProcessor | None,
model: Any | None,
) -> None:
self.pre_processor = pre_processor if isinstance(pre_processor, PreProcessor) else None
self.model = model
def __call__(
self,
inputs: list[np.ndarray],
) -> list[list[int] | list[float]]:
# Dimension check
if any(input.ndim != 3 for input in inputs):
raise ValueError("incorrect input shape: all inputs are expected to be multi-channel 2D images.")
if self.model is None or self.pre_processor is None:
# predictor is disabled
return [[0] * len(inputs), [0] * len(inputs), [1.0] * len(inputs)]
processed_batches = self.pre_processor(inputs)
predicted_batches = [self.model(batch) for batch in processed_batches]
# confidence
probs = [np.max(softmax(batch, axis=1), axis=1) for batch in predicted_batches]
# Postprocess predictions
predicted_batches = [np.argmax(out_batch, axis=1) for out_batch in predicted_batches]
class_idxs = [int(pred) for batch in predicted_batches for pred in batch]
classes = [int(self.model.cfg["classes"][idx]) for idx in class_idxs]
confs = [round(float(p), 2) for prob in probs for p in prob]
return [class_idxs, classes, confs]
================================================
FILE: onnxtr/models/classification/zoo.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
from onnxtr.models.engine import EngineConfig
from .. import classification
from ..preprocessor import PreProcessor
from .predictor import OrientationPredictor
__all__ = ["crop_orientation_predictor", "page_orientation_predictor"]
ORIENTATION_ARCHS: list[str] = ["mobilenet_v3_small_crop_orientation", "mobilenet_v3_small_page_orientation"]
def _orientation_predictor(
arch: Any,
model_type: str,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
disabled: bool = False,
**kwargs: Any,
) -> OrientationPredictor:
if disabled:
# Case where the orientation predictor is disabled
return OrientationPredictor(None, None)
if isinstance(arch, str):
if arch not in ORIENTATION_ARCHS:
raise ValueError(f"unknown architecture '{arch}'")
# Load directly classifier from backbone
_model = classification.__dict__[arch](load_in_8_bit=load_in_8_bit, engine_cfg=engine_cfg)
else:
if not isinstance(arch, classification.MobileNetV3):
raise ValueError(f"unknown architecture: {type(arch)}")
_model = arch
kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
kwargs["std"] = kwargs.get("std", _model.cfg["std"])
kwargs["batch_size"] = kwargs.get("batch_size", 512 if model_type == "crop" else 2)
input_shape = _model.cfg["input_shape"][1:]
predictor = OrientationPredictor(
PreProcessor(input_shape, preserve_aspect_ratio=True, symmetric_pad=True, **kwargs),
_model,
)
return predictor
def crop_orientation_predictor(
arch: Any = "mobilenet_v3_small_crop_orientation",
batch_size: int = 512,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> OrientationPredictor:
"""Crop orientation classification architecture.
>>> import numpy as np
>>> from onnxtr.models import crop_orientation_predictor
>>> model = crop_orientation_predictor(arch='mobilenet_v3_small_crop_orientation')
>>> input_crop = (255 * np.random.rand(256, 256, 3)).astype(np.uint8)
>>> out = model([input_crop])
Args:
arch: name of the architecture to use (e.g. 'mobilenet_v3_small_crop_orientation')
batch_size: number of samples the model processes in parallel
load_in_8_bit: load the 8-bit quantized version of the model
engine_cfg: configuration of inference engine
**kwargs: keyword arguments to be passed to the OrientationPredictor
Returns:
OrientationPredictor
"""
model_type = "crop"
return _orientation_predictor(
arch=arch,
batch_size=batch_size,
model_type=model_type,
load_in_8_bit=load_in_8_bit,
engine_cfg=engine_cfg,
**kwargs,
)
def page_orientation_predictor(
arch: Any = "mobilenet_v3_small_page_orientation",
batch_size: int = 2,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> OrientationPredictor:
"""Page orientation classification architecture.
>>> import numpy as np
>>> from onnxtr.models import page_orientation_predictor
>>> model = page_orientation_predictor(arch='mobilenet_v3_small_page_orientation')
>>> input_page = (255 * np.random.rand(512, 512, 3)).astype(np.uint8)
>>> out = model([input_page])
Args:
arch: name of the architecture to use (e.g. 'mobilenet_v3_small_page_orientation')
batch_size: number of samples the model processes in parallel
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments to be passed to the OrientationPredictor
Returns:
OrientationPredictor
"""
model_type = "page"
return _orientation_predictor(
arch=arch,
batch_size=batch_size,
model_type=model_type,
load_in_8_bit=load_in_8_bit,
engine_cfg=engine_cfg,
**kwargs,
)
================================================
FILE: onnxtr/models/detection/__init__.py
================================================
from .models import *
from .zoo import *
================================================
FILE: onnxtr/models/detection/_utils/__init__.py
================================================
from . base import *
================================================
FILE: onnxtr/models/detection/_utils/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import numpy as np
__all__ = ["_remove_padding"]
def _remove_padding(
pages: list[np.ndarray],
loc_preds: list[np.ndarray],
preserve_aspect_ratio: bool,
symmetric_pad: bool,
assume_straight_pages: bool,
) -> list[np.ndarray]:
"""Remove padding from the localization predictions
Args:
pages: list of pages
loc_preds: list of localization predictions
preserve_aspect_ratio: whether the aspect ratio was preserved during padding
symmetric_pad: whether the padding was symmetric
assume_straight_pages: whether the pages are assumed to be straight
Returns:
list of unpaded localization predictions
"""
if preserve_aspect_ratio:
# Rectify loc_preds to remove padding
rectified_preds = []
for page, loc_pred in zip(pages, loc_preds):
h, w = page.shape[0], page.shape[1]
if h > w:
# y unchanged, dilate x coord
if symmetric_pad:
if assume_straight_pages:
loc_pred[:, [0, 2]] = (loc_pred[:, [0, 2]] - 0.5) * h / w + 0.5
else:
loc_pred[:, :, 0] = (loc_pred[:, :, 0] - 0.5) * h / w + 0.5
else:
if assume_straight_pages:
loc_pred[:, [0, 2]] *= h / w
else:
loc_pred[:, :, 0] *= h / w
elif w > h:
# x unchanged, dilate y coord
if symmetric_pad:
if assume_straight_pages:
loc_pred[:, [1, 3]] = (loc_pred[:, [1, 3]] - 0.5) * w / h + 0.5
else:
loc_pred[:, :, 1] = (loc_pred[:, :, 1] - 0.5) * w / h + 0.5
else:
if assume_straight_pages:
loc_pred[:, [1, 3]] *= w / h
else:
loc_pred[:, :, 1] *= w / h
rectified_preds.append(np.clip(loc_pred, 0, 1))
return rectified_preds
return loc_preds
================================================
FILE: onnxtr/models/detection/core.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import cv2
import numpy as np
from onnxtr.utils.repr import NestedObject
__all__ = ["DetectionPostProcessor"]
class DetectionPostProcessor(NestedObject):
"""Abstract class to postprocess the raw output of the model
Args:
box_thresh (float): minimal objectness score to consider a box
bin_thresh (float): threshold to apply to segmentation raw heatmap
assume straight_pages (bool): if True, fit straight boxes only
"""
def __init__(self, box_thresh: float = 0.5, bin_thresh: float = 0.5, assume_straight_pages: bool = True) -> None:
self.box_thresh = box_thresh
self.bin_thresh = bin_thresh
self.assume_straight_pages = assume_straight_pages
self._opening_kernel: np.ndarray = np.ones((3, 3), dtype=np.uint8)
def extra_repr(self) -> str:
return f"bin_thresh={self.bin_thresh}, box_thresh={self.box_thresh}"
@staticmethod
def box_score(pred: np.ndarray, points: np.ndarray, assume_straight_pages: bool = True) -> float:
"""Compute the confidence score for a polygon : mean of the p values on the polygon
Args:
pred (np.ndarray): p map returned by the model
points: coordinates of the polygon
assume_straight_pages: if True, fit straight boxes only
Returns:
polygon objectness
"""
h, w = pred.shape[:2]
if assume_straight_pages:
xmin = np.clip(np.floor(points[:, 0].min()).astype(np.int32), 0, w - 1)
xmax = np.clip(np.ceil(points[:, 0].max()).astype(np.int32), 0, w - 1)
ymin = np.clip(np.floor(points[:, 1].min()).astype(np.int32), 0, h - 1)
ymax = np.clip(np.ceil(points[:, 1].max()).astype(np.int32), 0, h - 1)
return pred[ymin : ymax + 1, xmin : xmax + 1].mean()
else:
mask: np.ndarray = np.zeros((h, w), np.int32)
cv2.fillPoly(mask, [points.astype(np.int32)], 1.0)
product = pred * mask
return np.sum(product) / np.count_nonzero(product)
def bitmap_to_boxes(
self,
pred: np.ndarray,
bitmap: np.ndarray,
) -> np.ndarray:
raise NotImplementedError
def __call__(
self,
proba_map,
) -> list[list[np.ndarray]]:
"""Performs postprocessing for a list of model outputs
Args:
proba_map: probability map of shape (N, H, W, C)
Returns:
list of N class predictions (for each input sample), where each class predictions is a list of C tensors
of shape (*, 5) or (*, 6)
"""
if proba_map.ndim != 4:
raise AssertionError(f"arg `proba_map` is expected to be 4-dimensional, got {proba_map.ndim}.")
# Erosion + dilation on the binary map
bin_map = [
[
cv2.morphologyEx(bmap[..., idx], cv2.MORPH_OPEN, self._opening_kernel)
for idx in range(proba_map.shape[-1])
]
for bmap in (proba_map >= self.bin_thresh).astype(np.uint8)
]
return [
[self.bitmap_to_boxes(pmaps[..., idx], bmaps[idx]) for idx in range(proba_map.shape[-1])]
for pmaps, bmaps in zip(proba_map, bin_map)
]
================================================
FILE: onnxtr/models/detection/models/__init__.py
================================================
from .fast import *
from .differentiable_binarization import *
from .linknet import *
================================================
FILE: onnxtr/models/detection/models/differentiable_binarization.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
from scipy.special import expit
from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor
__all__ = ["DBNet", "db_resnet50", "db_resnet34", "db_mobilenet_v3_large"]
default_cfgs: dict[str, dict[str, Any]] = {
"db_resnet50": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/db_resnet50-69ba0015.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/db_resnet50_static_8_bit-09a6104f.onnx",
},
"db_resnet34": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/db_resnet34-b4873198.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/db_resnet34_static_8_bit-027e2c7f.onnx",
},
"db_mobilenet_v3_large": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.2.0/db_mobilenet_v3_large-4987e7bd.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.2.0/db_mobilenet_v3_large_static_8_bit-535a6f25.onnx",
},
}
class DBNet(Engine):
"""DBNet Onnx loader
Args:
model_path: path or url to onnx model file
engine_cfg: configuration for the inference engine
bin_thresh: threshold for binarization of the output feature map
box_thresh: minimal objectness score to consider a box
assume_straight_pages: if True, fit straight bounding boxes only
cfg: the configuration dict of the model
**kwargs: additional arguments to be passed to `Engine`
"""
def __init__(
self,
model_path: str,
engine_cfg: EngineConfig | None = None,
bin_thresh: float = 0.3,
box_thresh: float = 0.1,
assume_straight_pages: bool = True,
cfg: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)
self.cfg = cfg
self.assume_straight_pages = assume_straight_pages
self.postprocessor = GeneralDetectionPostProcessor(
assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
)
def __call__(
self,
x: np.ndarray,
return_model_output: bool = False,
**kwargs: Any,
) -> dict[str, Any]:
logits = self.run(x)
out: dict[str, Any] = {}
prob_map = expit(logits)
if return_model_output:
out["out_map"] = prob_map
out["preds"] = self.postprocessor(prob_map)
return out
def _dbnet(
arch: str,
model_path: str,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DBNet:
# Patch the url
model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
# Build the model
return DBNet(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)
def db_resnet34(
model_path: str = default_cfgs["db_resnet34"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DBNet:
"""DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
<https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-34 backbone.
>>> import numpy as np
>>> from onnxtr.models import db_resnet34
>>> model = db_resnet34()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _dbnet("db_resnet34", model_path, load_in_8_bit, engine_cfg, **kwargs)
def db_resnet50(
model_path: str = default_cfgs["db_resnet50"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DBNet:
"""DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
<https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
>>> import numpy as np
>>> from onnxtr.models import db_resnet50
>>> model = db_resnet50()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _dbnet("db_resnet50", model_path, load_in_8_bit, engine_cfg, **kwargs)
def db_mobilenet_v3_large(
model_path: str = default_cfgs["db_mobilenet_v3_large"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DBNet:
"""DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
<https://arxiv.org/pdf/1911.08947.pdf>`_, using a MobileNet V3 Large backbone.
>>> import numpy as np
>>> from onnxtr.models import db_mobilenet_v3_large
>>> model = db_mobilenet_v3_large()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _dbnet("db_mobilenet_v3_large", model_path, load_in_8_bit, engine_cfg, **kwargs)
================================================
FILE: onnxtr/models/detection/models/fast.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import logging
from typing import Any
import numpy as np
from scipy.special import expit
from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor
__all__ = ["FAST", "fast_tiny", "fast_small", "fast_base"]
default_cfgs: dict[str, dict[str, Any]] = {
"fast_tiny": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_tiny-28867779.onnx",
},
"fast_small": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_small-10428b70.onnx",
},
"fast_base": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/rep_fast_base-1b89ebf9.onnx",
},
}
class FAST(Engine):
"""FAST Onnx loader
Args:
model_path: path or url to onnx model file
engine_cfg: configuration for the inference engine
bin_thresh: threshold for binarization of the output feature map
box_thresh: minimal objectness score to consider a box
assume_straight_pages: if True, fit straight bounding boxes only
cfg: the configuration dict of the model
**kwargs: additional arguments to be passed to `Engine`
"""
def __init__(
self,
model_path: str,
engine_cfg: EngineConfig | None = None,
bin_thresh: float = 0.1,
box_thresh: float = 0.1,
assume_straight_pages: bool = True,
cfg: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)
self.cfg = cfg
self.assume_straight_pages = assume_straight_pages
self.postprocessor = GeneralDetectionPostProcessor(
assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
)
def __call__(
self,
x: np.ndarray,
return_model_output: bool = False,
**kwargs: Any,
) -> dict[str, Any]:
logits = self.run(x)
out: dict[str, Any] = {}
prob_map = expit(logits)
if return_model_output:
out["out_map"] = prob_map
out["preds"] = self.postprocessor(prob_map)
return out
def _fast(
arch: str,
model_path: str,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> FAST:
if load_in_8_bit:
logging.warning("FAST models do not support 8-bit quantization yet. Loading full precision model...")
# Build the model
return FAST(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)
def fast_tiny(
model_path: str = default_cfgs["fast_tiny"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> FAST:
"""FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
<https://arxiv.org/pdf/2111.02394.pdf>`_, using a tiny TextNet backbone.
>>> import numpy as np
>>> from onnxtr.models import fast_tiny
>>> model = fast_tiny()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _fast("fast_tiny", model_path, load_in_8_bit, engine_cfg, **kwargs)
def fast_small(
model_path: str = default_cfgs["fast_small"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> FAST:
"""FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
<https://arxiv.org/pdf/2111.02394.pdf>`_, using a small TextNet backbone.
>>> import numpy as np
>>> from onnxtr.models import fast_small
>>> model = fast_small()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _fast("fast_small", model_path, load_in_8_bit, engine_cfg, **kwargs)
def fast_base(
model_path: str = default_cfgs["fast_base"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> FAST:
"""FAST as described in `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation"
<https://arxiv.org/pdf/2111.02394.pdf>`_, using a base TextNet backbone.
>>> import numpy as np
>>> from onnxtr.models import fast_base
>>> model = fast_base()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the DBNet architecture
Returns:
text detection architecture
"""
return _fast("fast_base", model_path, load_in_8_bit, engine_cfg, **kwargs)
================================================
FILE: onnxtr/models/detection/models/linknet.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
from scipy.special import expit
from ...engine import Engine, EngineConfig
from ..postprocessor.base import GeneralDetectionPostProcessor
__all__ = ["LinkNet", "linknet_resnet18", "linknet_resnet34", "linknet_resnet50"]
default_cfgs: dict[str, dict[str, Any]] = {
"linknet_resnet18": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet18-e0e0b9dc.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet18_static_8_bit-3b3a37dd.onnx",
},
"linknet_resnet34": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet34-93e39a39.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet34_static_8_bit-2824329d.onnx",
},
"linknet_resnet50": {
"input_shape": (3, 1024, 1024),
"mean": (0.798, 0.785, 0.772),
"std": (0.264, 0.2749, 0.287),
"url": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.0.1/linknet_resnet50-15d8c4ec.onnx",
"url_8_bit": "https://github.com/felixdittrich92/OnnxTR/releases/download/v0.1.2/linknet_resnet50_static_8_bit-65d6b0b8.onnx",
},
}
class LinkNet(Engine):
"""LinkNet Onnx loader
Args:
model_path: path or url to onnx model file
engine_cfg: configuration for the inference engine
bin_thresh: threshold for binarization of the output feature map
box_thresh: minimal objectness score to consider a box
assume_straight_pages: if True, fit straight bounding boxes only
cfg: the configuration dict of the model
**kwargs: additional arguments to be passed to `Engine`
"""
def __init__(
self,
model_path: str,
engine_cfg: EngineConfig | None = None,
bin_thresh: float = 0.1,
box_thresh: float = 0.1,
assume_straight_pages: bool = True,
cfg: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
super().__init__(url=model_path, engine_cfg=engine_cfg, **kwargs)
self.cfg = cfg
self.assume_straight_pages = assume_straight_pages
self.postprocessor = GeneralDetectionPostProcessor(
assume_straight_pages=self.assume_straight_pages, bin_thresh=bin_thresh, box_thresh=box_thresh
)
def __call__(
self,
x: np.ndarray,
return_model_output: bool = False,
**kwargs: Any,
) -> dict[str, Any]:
logits = self.run(x)
out: dict[str, Any] = {}
prob_map = expit(logits)
if return_model_output:
out["out_map"] = prob_map
out["preds"] = self.postprocessor(prob_map)
return out
def _linknet(
arch: str,
model_path: str,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> LinkNet:
# Patch the url
model_path = default_cfgs[arch]["url_8_bit"] if load_in_8_bit and "http" in model_path else model_path
# Build the model
return LinkNet(model_path, cfg=default_cfgs[arch], engine_cfg=engine_cfg, **kwargs)
def linknet_resnet18(
model_path: str = default_cfgs["linknet_resnet18"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> LinkNet:
"""LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
<https://arxiv.org/pdf/1707.03718.pdf>`_.
>>> import numpy as np
>>> from onnxtr.models import linknet_resnet18
>>> model = linknet_resnet18()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the LinkNet architecture
Returns:
text detection architecture
"""
return _linknet("linknet_resnet18", model_path, load_in_8_bit, engine_cfg, **kwargs)
def linknet_resnet34(
model_path: str = default_cfgs["linknet_resnet34"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> LinkNet:
"""LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
<https://arxiv.org/pdf/1707.03718.pdf>`_.
>>> import numpy as np
>>> from onnxtr.models import linknet_resnet34
>>> model = linknet_resnet34()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the LinkNet architecture
Returns:
text detection architecture
"""
return _linknet("linknet_resnet34", model_path, load_in_8_bit, engine_cfg, **kwargs)
def linknet_resnet50(
model_path: str = default_cfgs["linknet_resnet50"]["url"],
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> LinkNet:
"""LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
<https://arxiv.org/pdf/1707.03718.pdf>`_.
>>> import numpy as np
>>> from onnxtr.models import linknet_resnet50
>>> model = linknet_resnet50()
>>> input_tensor = np.random.rand(1, 3, 1024, 1024)
>>> out = model(input_tensor)
Args:
model_path: path to onnx model file, defaults to url in default_cfgs
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: keyword arguments of the LinkNet architecture
Returns:
text detection architecture
"""
return _linknet("linknet_resnet50", model_path, load_in_8_bit, engine_cfg, **kwargs)
================================================
FILE: onnxtr/models/detection/postprocessor/__init__.py
================================================
================================================
FILE: onnxtr/models/detection/postprocessor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
import cv2
import numpy as np
import pyclipper
from onnxtr.utils import order_points
from ..core import DetectionPostProcessor
__all__ = ["GeneralDetectionPostProcessor"]
class GeneralDetectionPostProcessor(DetectionPostProcessor):
"""Implements a post processor for FAST model.
Args:
bin_thresh: threshold used to binzarized p_map at inference time
box_thresh: minimal objectness score to consider a box
assume_straight_pages: whether the inputs were expected to have horizontal text elements
"""
def __init__(
self,
bin_thresh: float = 0.1,
box_thresh: float = 0.1,
assume_straight_pages: bool = True,
) -> None:
super().__init__(box_thresh, bin_thresh, assume_straight_pages)
self.unclip_ratio = 1.5
def polygon_to_box(
self,
points: np.ndarray,
) -> np.ndarray:
"""Expand a polygon (points) by a factor unclip_ratio, and returns a polygon
Args:
points: The first parameter.
Returns:
a box in absolute coordinates (xmin, ymin, xmax, ymax) or (4, 2) array (quadrangle)
"""
if not self.assume_straight_pages:
# Compute the rectangle polygon enclosing the raw polygon
rect = cv2.minAreaRect(points)
points = cv2.boxPoints(rect)
# Add 1 pixel to correct cv2 approx
area = (rect[1][0] + 1) * (1 + rect[1][1])
length = 2 * (rect[1][0] + rect[1][1]) + 2
else:
area = cv2.contourArea(points)
length = cv2.arcLength(points, closed=True)
distance = area * self.unclip_ratio / length # compute distance to expand polygon
offset = pyclipper.PyclipperOffset()
offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
_points = offset.Execute(distance)
# Take biggest stack of points
idx = 0
if len(_points) > 1:
max_size = 0
for _idx, p in enumerate(_points):
if len(p) > max_size:
idx = _idx
max_size = len(p)
# We ensure that _points can be correctly casted to a ndarray
_points = [_points[idx]]
expanded_points: np.ndarray = np.asarray(_points) # expand polygon
if len(expanded_points) < 1:
return None # type: ignore[return-value]
return (
cv2.boundingRect(expanded_points) # type: ignore[return-value]
if self.assume_straight_pages
else order_points(cv2.boxPoints(cv2.minAreaRect(expanded_points)))
)
def bitmap_to_boxes(
self,
pred: np.ndarray,
bitmap: np.ndarray,
) -> np.ndarray:
"""Compute boxes from a bitmap/pred_map: find connected components then filter boxes
Args:
pred: Pred map from differentiable linknet output
bitmap: Bitmap map computed from pred (binarized)
angle_tol: Comparison tolerance of the angle with the median angle across the page
ratio_tol: Under this limit aspect ratio, we cannot resolve the direction of the crop
Returns:
np tensor boxes for the bitmap, each box is a 6-element list
containing x, y, w, h, alpha, score for the box
"""
height, width = bitmap.shape[:2]
boxes: list[np.ndarray | list[float]] = []
# get contours from connected components on the bitmap
contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
# Check whether smallest enclosing bounding box is not too small
if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < 2):
continue
# Compute objectness
if self.assume_straight_pages:
x, y, w, h = cv2.boundingRect(contour)
points: np.ndarray = np.array([[x, y], [x, y + h], [x + w, y + h], [x + w, y]])
score = self.box_score(pred, points, assume_straight_pages=True)
else:
score = self.box_score(pred, contour, assume_straight_pages=False)
if score < self.box_thresh: # remove polygons with a weak objectness
continue
if self.assume_straight_pages:
_box = self.polygon_to_box(points)
else:
_box = self.polygon_to_box(np.squeeze(contour))
if self.assume_straight_pages:
# compute relative polygon to get rid of img shape
x, y, w, h = _box
xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
boxes.append([xmin, ymin, xmax, ymax, score])
else:
# compute relative box to get rid of img shape
_box[:, 0] /= width
_box[:, 1] /= height
# Add score to box as (0, score)
boxes.append(np.vstack([_box, np.array([0.0, score])]))
if not self.assume_straight_pages:
return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5, 2), dtype=pred.dtype)
else:
return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=pred.dtype)
================================================
FILE: onnxtr/models/detection/predictor/__init__.py
================================================
from .base import *
================================================
FILE: onnxtr/models/detection/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
import numpy as np
from onnxtr.models.detection._utils import _remove_padding
from onnxtr.models.preprocessor import PreProcessor
from onnxtr.utils.repr import NestedObject
__all__ = ["DetectionPredictor"]
class DetectionPredictor(NestedObject):
"""Implements an object able to localize text elements in a document
Args:
pre_processor: transform inputs for easier batched model inference
model: core detection architecture
"""
_children_names: list[str] = ["pre_processor", "model"]
def __init__(
self,
pre_processor: PreProcessor,
model: Any,
) -> None:
self.pre_processor = pre_processor
self.model = model
def __call__(
self,
pages: list[np.ndarray],
return_maps: bool = False,
**kwargs: Any,
) -> list[np.ndarray] | tuple[list[np.ndarray], list[np.ndarray]]:
# Extract parameters from the preprocessor
preserve_aspect_ratio = self.pre_processor.resize.preserve_aspect_ratio
symmetric_pad = self.pre_processor.resize.symmetric_pad
assume_straight_pages = self.model.assume_straight_pages
# Dimension check
if any(page.ndim != 3 for page in pages):
raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")
processed_batches = self.pre_processor(pages)
predicted_batches = [
self.model(batch, return_preds=True, return_model_output=True, **kwargs) for batch in processed_batches
]
# Remove padding from loc predictions
preds = _remove_padding(
pages,
[pred[0] for batch in predicted_batches for pred in batch["preds"]],
preserve_aspect_ratio=preserve_aspect_ratio,
symmetric_pad=symmetric_pad,
assume_straight_pages=assume_straight_pages,
)
if return_maps:
seg_maps = [pred for batch in predicted_batches for pred in batch["out_map"]]
return preds, seg_maps
return preds
================================================
FILE: onnxtr/models/detection/zoo.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
from .. import detection
from ..engine import EngineConfig
from ..preprocessor import PreProcessor
from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
ARCHS = [
"db_resnet34",
"db_resnet50",
"db_mobilenet_v3_large",
"linknet_resnet18",
"linknet_resnet34",
"linknet_resnet50",
"fast_tiny",
"fast_small",
"fast_base",
]
def _predictor(
arch: Any,
assume_straight_pages: bool = True,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DetectionPredictor:
if isinstance(arch, str):
if arch not in ARCHS:
raise ValueError(f"unknown architecture '{arch}'")
_model = detection.__dict__[arch](
assume_straight_pages=assume_straight_pages, load_in_8_bit=load_in_8_bit, engine_cfg=engine_cfg
)
else:
if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
raise ValueError(f"unknown architecture: {type(arch)}")
_model = arch
_model.assume_straight_pages = assume_straight_pages
_model.postprocessor.assume_straight_pages = assume_straight_pages
kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
kwargs["std"] = kwargs.get("std", _model.cfg["std"])
kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
PreProcessor(_model.cfg["input_shape"][1:], **kwargs),
_model,
)
return predictor
def detection_predictor(
arch: Any = "fast_base",
assume_straight_pages: bool = True,
preserve_aspect_ratio: bool = True,
symmetric_pad: bool = True,
batch_size: int = 2,
load_in_8_bit: bool = False,
engine_cfg: EngineConfig | None = None,
**kwargs: Any,
) -> DetectionPredictor:
"""Text detection architecture.
>>> import numpy as np
>>> from onnxtr.models import detection_predictor
>>> model = detection_predictor(arch='db_resnet50')
>>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
>>> out = model([input_page])
Args:
arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
assume_straight_pages: If True, fit straight boxes to the page
preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
running the detection model on it
symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
batch_size: number of samples the model processes in parallel
load_in_8_bit: whether to load the the 8-bit quantized model, defaults to False
engine_cfg: configuration for the inference engine
**kwargs: optional keyword arguments passed to the architecture
Returns:
Detection predictor
"""
return _predictor(
arch=arch,
assume_straight_pages=assume_straight_pages,
preserve_aspect_ratio=preserve_aspect_ratio,
symmetric_pad=symmetric_pad,
batch_size=batch_size,
load_in_8_bit=load_in_8_bit,
engine_cfg=engine_cfg,
**kwargs,
)
================================================
FILE: onnxtr/models/engine.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import logging
import os
from collections.abc import Callable
from typing import Any, TypeAlias
import numpy as np
from onnxruntime import (
ExecutionMode,
GraphOptimizationLevel,
InferenceSession,
RunOptions,
SessionOptions,
get_available_providers,
get_device,
)
from onnxruntime.capi._pybind_state import set_default_logger_severity
set_default_logger_severity(int(os.getenv("ORT_LOG_SEVERITY_LEVEL", 4)))
from onnxtr.utils.data import download_from_url
from onnxtr.utils.geometry import shape_translate
__all__ = ["EngineConfig", "RunOptionsProvider"]
RunOptionsProvider: TypeAlias = Callable[[RunOptions], RunOptions]
class EngineConfig:
"""Implements a configuration class for the engine of a model
Args:
providers: list of providers to use for inference ref.: https://onnxruntime.ai/docs/execution-providers/
session_options: configuration for the inference session ref.: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
"""
def __init__(
self,
providers: list[tuple[str, dict[str, Any]]] | list[str] | None = None,
session_options: SessionOptions | None = None,
run_options_provider: RunOptionsProvider | None = None,
):
self._providers = providers or self._init_providers()
self._session_options = session_options or self._init_sess_opts()
self.run_options_provider = run_options_provider
def _init_providers(self) -> list[tuple[str, dict[str, Any]]]:
providers: Any = [("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
available_providers = get_available_providers()
logging.info(f"Available providers: {available_providers}")
if "CUDAExecutionProvider" in available_providers and get_device() == "GPU": # pragma: no cover
providers.insert(
0,
(
"CUDAExecutionProvider",
{
"device_id": 0,
"arena_extend_strategy": "kNextPowerOfTwo",
"cudnn_conv_algo_search": "DEFAULT",
"do_copy_in_default_stream": True,
},
),
)
elif "CoreMLExecutionProvider" in available_providers: # pragma: no cover
providers.insert(0, ("CoreMLExecutionProvider", {}))
return providers
def _init_sess_opts(self) -> SessionOptions:
session_options = SessionOptions()
session_options.enable_cpu_mem_arena = True
session_options.execution_mode = ExecutionMode.ORT_SEQUENTIAL
session_options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.intra_op_num_threads = -1
session_options.inter_op_num_threads = -1
return session_options
@property
def providers(self) -> list[tuple[str, dict[str, Any]]] | list[str]:
return self._providers
@property
def session_options(self) -> SessionOptions:
return self._session_options
def __repr__(self) -> str:
return f"EngineConfig(providers={self.providers})"
class Engine:
"""Implements an abstract class for the engine of a model
Args:
url: the url to use to download a model if needed
engine_cfg: the configuration of the engine
**kwargs: additional arguments to be passed to `download_from_url`
"""
def __init__(self, url: str, engine_cfg: EngineConfig | None = None, **kwargs: Any) -> None:
engine_cfg = engine_cfg if isinstance(engine_cfg, EngineConfig) else EngineConfig()
archive_path = download_from_url(url, cache_subdir="models", **kwargs) if "http" in url else url
# NOTE: older onnxruntime versions require a string path for windows
archive_path = rf"{archive_path}"
# Store model path for each model
self.model_path = archive_path
self.session_options = engine_cfg.session_options
self.providers = engine_cfg.providers
self.run_options_provider = engine_cfg.run_options_provider
self.runtime = InferenceSession(archive_path, providers=self.providers, sess_options=self.session_options)
self.runtime_inputs = self.runtime.get_inputs()[0]
self.tf_exported = int(self.runtime_inputs.shape[-1]) == 3
self.fixed_batch_size: int | str = self.runtime_inputs.shape[
0
] # mostly possible with tensorflow exported models
self.output_name = [output.name for output in self.runtime.get_outputs()]
def run(self, inputs: np.ndarray) -> np.ndarray:
run_options = RunOptions()
if self.run_options_provider is not None:
run_options = self.run_options_provider(run_options)
if self.tf_exported:
inputs = shape_translate(inputs, format="BHWC") # sanity check
else:
inputs = shape_translate(inputs, format="BCHW")
if isinstance(self.fixed_batch_size, int) and self.fixed_batch_size != 0: # dynamic batch size is a string
inputs = np.broadcast_to(inputs, (self.fixed_batch_size, *inputs.shape))
# combine the results
logits = np.concatenate(
[
self.runtime.run(self.output_name, {self.runtime_inputs.name: batch}, run_options=run_options)[0]
for batch in inputs
],
axis=0,
)
else:
logits = self.runtime.run(self.output_name, {self.runtime_inputs.name: inputs}, run_options=run_options)[0]
return shape_translate(logits, format="BHWC")
================================================
FILE: onnxtr/models/factory/__init__.py
================================================
from .hub import *
================================================
FILE: onnxtr/models/factory/hub.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
# Inspired by: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hub.py
import json
import logging
import shutil
import subprocess
import tempfile
import textwrap
from pathlib import Path
from typing import Any
from huggingface_hub import (
HfApi,
get_token,
hf_hub_download,
login,
)
from onnxtr import models
from onnxtr.models.engine import EngineConfig
__all__ = ["login_to_hub", "push_to_hf_hub", "from_hub", "_save_model_and_config_for_hf_hub"]
AVAILABLE_ARCHS = {
"classification": models.classification.zoo.ORIENTATION_ARCHS,
"detection": models.detection.zoo.ARCHS,
"recognition": models.recognition.zoo.ARCHS,
}
def login_to_hub() -> None: # pragma: no cover
"""Login to huggingface hub"""
access_token = get_token()
if access_token is not None:
logging.info("Huggingface Hub token found and valid")
login(token=access_token)
else:
login()
# check if git lfs is installed
try:
subprocess.call(["git", "lfs", "version"])
except FileNotFoundError:
raise OSError(
"Looks like you do not have git-lfs installed, please install. \
You can install from https://git-lfs.github.com/. \
Then run `git lfs install` (you only have to do this once)."
)
def _save_model_and_config_for_hf_hub(model: Any, save_dir: str, arch: str, task: str) -> None:
"""Save model and config to disk for pushing to huggingface hub
Args:
model: Onnx model to be saved
save_dir: directory to save model and config
arch: architecture name
task: task name
"""
save_directory = Path(save_dir)
shutil.copy2(model.model_path, save_directory / "model.onnx")
config_path = save_directory / "config.json"
# add model configuration
model_config = model.cfg
model_config["arch"] = arch
model_config["task"] = task
with config_path.open("w") as f:
json.dump(model_config, f, indent=2, ensure_ascii=False)
def push_to_hf_hub(
model: Any, model_name: str, task: str, override: bool = False, **kwargs
) -> None: # pragma: no cover
"""Save model and its configuration on HF hub
>>> from onnxtr.models import login_to_hub, push_to_hf_hub
>>> from onnxtr.models.recognition import crnn_mobilenet_v3_small
>>> login_to_hub()
>>> model = crnn_mobilenet_v3_small()
>>> push_to_hf_hub(model, 'my-model', 'recognition', arch='crnn_mobilenet_v3_small')
Args:
model: Onnx model to be saved
model_name: name of the model which is also the repository name
task: task name
override: whether to override the existing model / repo on HF hub
**kwargs: keyword arguments for push_to_hf_hub
"""
run_config = kwargs.get("run_config", None)
arch = kwargs.get("arch", None)
if run_config is None and arch is None:
raise ValueError("run_config or arch must be specified")
if task not in ["classification", "detection", "recognition"]:
raise ValueError("task must be one of classification, detection, recognition")
# default readme
readme = textwrap.dedent(
f"""
---
language:
- en
- fr
license: apache-2.0
---
<p align="center">
<img src="https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg" width="40%">
</p>
**Optical Character Recognition made seamless & accessible to anyone, powered by Onnxruntime**
## Task: {task}
https://github.com/felixdittrich92/OnnxTR
### Example usage:
```python
>>> from onnxtr.io import DocumentFile
>>> from onnxtr.models import ocr_predictor, from_hub
>>> img = DocumentFile.from_images(['<image_path>'])
>>> # Load your model from the hub
>>> model = from_hub('onnxtr/my-model')
>>> # Pass it to the predictor
>>> # If your model is a recognition model:
>>> predictor = ocr_predictor(det_arch='db_mobilenet_v3_large',
>>> reco_arch=model)
>>> # If your model is a detection model:
>>> predictor = ocr_predictor(det_arch=model,
>>> reco_arch='crnn_mobilenet_v3_small')
>>> # Get your predictions
>>> res = predictor(img)
```
"""
)
# add run configuration to readme if available
if run_config is not None:
arch = run_config.arch
readme += textwrap.dedent(
f"""### Run Configuration
\n{json.dumps(vars(run_config), indent=2, ensure_ascii=False)}"""
)
if arch not in AVAILABLE_ARCHS[task]:
raise ValueError(
f"Architecture: {arch} for task: {task} not found.\
\nAvailable architectures: {AVAILABLE_ARCHS}"
)
commit_message = f"Add {model_name} model"
# Create repository
api = HfApi()
api.create_repo(model_name, token=get_token(), exist_ok=False)
# Save model files to a temporary directory
with tempfile.TemporaryDirectory() as tmp_dir:
_save_model_and_config_for_hf_hub(model, tmp_dir, arch=arch, task=task)
readme_path = Path(tmp_dir) / "README.md"
readme_path.write_text(readme)
# Upload all files to the hub
api.upload_folder(
folder_path=tmp_dir,
repo_id=model_name,
commit_message=commit_message,
token=get_token(),
)
def from_hub(repo_id: str, engine_cfg: EngineConfig | None = None, **kwargs: Any):
"""Instantiate & load a pretrained model from HF hub.
>>> from onnxtr.models import from_hub
>>> model = from_hub("onnxtr/my-model")
Args:
repo_id: HuggingFace model hub repo
engine_cfg: configuration for the inference engine (optional)
**kwargs: kwargs of `hf_hub_download`
Returns:
Model loaded with the checkpoint
"""
# Get the config
with open(hf_hub_download(repo_id, filename="config.json", **kwargs), "rb") as f:
cfg = json.load(f)
model_path = hf_hub_download(repo_id, filename="model.onnx", **kwargs)
arch = cfg["arch"]
task = cfg["task"]
cfg.pop("arch")
cfg.pop("task")
if task == "classification":
model = models.classification.__dict__[arch](model_path, classes=cfg["classes"], engine_cfg=engine_cfg)
elif task == "detection":
model = models.detection.__dict__[arch](model_path, engine_cfg=engine_cfg)
elif task == "recognition":
model = models.recognition.__dict__[arch](
model_path, input_shape=cfg["input_shape"], vocab=cfg["vocab"], engine_cfg=engine_cfg
)
# convert all values which are lists to tuples
for key, value in cfg.items():
if isinstance(value, list):
cfg[key] = tuple(value)
# update model cfg
model.cfg = cfg
return model
================================================
FILE: onnxtr/models/predictor/__init__.py
================================================
from .predictor import *
================================================
FILE: onnxtr/models/predictor/base.py
================================================
# Copyright (C) 2021-2026, Mindee | Felix Dittrich.
# This program is licensed under the Apache License 2.0.
# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from collections.abc import Callable
from typing import Any
import numpy as np
from onnxtr.models.builder import DocumentBuilder
from onnxtr.models.engine import EngineConfig
from onnxtr.utils.geometry import extract_crops, extract_rcrops, remove_image_padding, rotate_image
from .._utils import estimate_orientation, rectify_crops, rectify_loc_preds
from ..classification import crop_orientation_predictor, page_orientation_predictor
from ..classification.predictor import OrientationPredictor
from ..detection.zoo import ARCHS as DETECTION_ARCHS
from ..recognition.zoo import ARCHS as RECOGNITION_ARCHS
__all__ = ["_OCRPredictor"]
class _OCRPredictor:
"""
gitextract_7yglu2_f/
├── .conda/
│ └── meta.yaml
├── .github/
│ ├── CODEOWNERS
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.yml
│ │ ├── config.yml
│ │ └── feature_request.yml
│ ├── dependabot.yml
│ ├── release.yml
│ └── workflows/
│ ├── builds.yml
│ ├── clear_caches.yml
│ ├── demo.yml
│ ├── docker.yml
│ ├── main.yml
│ ├── publish.yml
│ └── style.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── demo/
│ ├── README.md
│ ├── app.py
│ ├── packages.txt
│ └── requirements.txt
├── onnxtr/
│ ├── __init__.py
│ ├── contrib/
│ │ ├── __init__.py
│ │ ├── artefacts.py
│ │ └── base.py
│ ├── file_utils.py
│ ├── io/
│ │ ├── __init__.py
│ │ ├── elements.py
│ │ ├── html.py
│ │ ├── image.py
│ │ ├── pdf.py
│ │ └── reader.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── _utils.py
│ │ ├── builder.py
│ │ ├── classification/
│ │ │ ├── __init__.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ └── mobilenet.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ └── zoo.py
│ │ ├── detection/
│ │ │ ├── __init__.py
│ │ │ ├── _utils/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ ├── core.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── differentiable_binarization.py
│ │ │ │ ├── fast.py
│ │ │ │ └── linknet.py
│ │ │ ├── postprocessor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ └── base.py
│ │ │ └── zoo.py
│ │ ├── engine.py
│ │ ├── factory/
│ │ │ ├── __init__.py
│ │ │ └── hub.py
│ │ ├── predictor/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ └── predictor.py
│ │ ├── preprocessor/
│ │ │ ├── __init__.py
│ │ │ └── base.py
│ │ ├── recognition/
│ │ │ ├── __init__.py
│ │ │ ├── core.py
│ │ │ ├── models/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── crnn.py
│ │ │ │ ├── master.py
│ │ │ │ ├── parseq.py
│ │ │ │ ├── sar.py
│ │ │ │ ├── viptr.py
│ │ │ │ └── vitstr.py
│ │ │ ├── predictor/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── _utils.py
│ │ │ │ └── base.py
│ │ │ ├── utils.py
│ │ │ └── zoo.py
│ │ └── zoo.py
│ ├── py.typed
│ ├── transforms/
│ │ ├── __init__.py
│ │ └── base.py
│ └── utils/
│ ├── __init__.py
│ ├── common_types.py
│ ├── data.py
│ ├── fonts.py
│ ├── geometry.py
│ ├── multithreading.py
│ ├── reconstitution.py
│ ├── repr.py
│ ├── visualization.py
│ └── vocabs.py
├── pyproject.toml
├── scripts/
│ ├── convert_to_float16.py
│ ├── evaluate.py
│ ├── latency.py
│ └── quantize.py
├── setup.py
└── tests/
├── common/
│ ├── test_contrib.py
│ ├── test_core.py
│ ├── test_engine_cfg.py
│ ├── test_headers.py
│ ├── test_io.py
│ ├── test_io_elements.py
│ ├── test_models.py
│ ├── test_models_builder.py
│ ├── test_models_classification.py
│ ├── test_models_detection.py
│ ├── test_models_detection_utils.py
│ ├── test_models_factory.py
│ ├── test_models_preprocessor.py
│ ├── test_models_recognition.py
│ ├── test_models_recognition_utils.py
│ ├── test_models_zoo.py
│ ├── test_transforms.py
│ ├── test_utils_data.py
│ ├── test_utils_fonts.py
│ ├── test_utils_geometry.py
│ ├── test_utils_multithreading.py
│ ├── test_utils_reconstitution.py
│ ├── test_utils_visualization.py
│ └── test_utils_vocabs.py
└── conftest.py
SYMBOL INDEX (387 symbols across 76 files)
FILE: demo/app.py
class spaces (line 10) | class spaces: # noqa: N801
method GPU (line 12) | def GPU(func): # noqa: N802
function load_predictor (line 59) | def load_predictor(
function forward_image (line 119) | def forward_image(predictor: OCRPredictor, image: np.ndarray) -> np.ndar...
function matplotlib_to_pil (line 138) | def matplotlib_to_pil(fig: Figure | np.ndarray) -> Image.Image:
function analyze_page (line 159) | def analyze_page(
FILE: onnxtr/contrib/artefacts.py
class ArtefactDetector (line 26) | class ArtefactDetector(_BasePredictor):
method __init__ (line 48) | def __init__(
method preprocess (line 65) | def preprocess(self, img: np.ndarray) -> np.ndarray:
method postprocess (line 68) | def postprocess(self, output: list[np.ndarray], input_images: list[lis...
method show (line 106) | def show(self, **kwargs: Any) -> None:
FILE: onnxtr/contrib/base.py
class _BasePredictor (line 14) | class _BasePredictor:
method __init__ (line 25) | def __init__(self, batch_size: int, url: str | None = None, model_path...
method _init_model (line 32) | def _init_model(self, url: str | None = None, model_path: str | None =...
method preprocess (line 49) | def preprocess(self, img: np.ndarray) -> np.ndarray:
method postprocess (line 61) | def postprocess(self, output: list[np.ndarray], input_images: list[lis...
method __call__ (line 74) | def __call__(self, inputs: list[np.ndarray]) -> Any:
FILE: onnxtr/file_utils.py
function requires_package (line 15) | def requires_package(name: str, extra_message: str | None = None) -> Non...
FILE: onnxtr/io/elements.py
class Element (line 32) | class Element(NestedObject):
method __init__ (line 38) | def __init__(self, **kwargs: Any) -> None:
method export (line 45) | def export(self) -> dict[str, Any]:
method from_dict (line 54) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
method render (line 57) | def render(self) -> str:
class Word (line 61) | class Word(Element):
method __init__ (line 76) | def __init__(
method render (line 91) | def render(self) -> str:
method extra_repr (line 95) | def extra_repr(self) -> str:
method from_dict (line 99) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
class Artefact (line 104) | class Artefact(Element):
method __init__ (line 117) | def __init__(self, artefact_type: str, confidence: float, geometry: Bo...
method render (line 123) | def render(self) -> str:
method extra_repr (line 127) | def extra_repr(self) -> str:
method from_dict (line 131) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
class Line (line 136) | class Line(Element):
method __init__ (line 150) | def __init__(
method render (line 169) | def render(self) -> str:
method from_dict (line 174) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
class Block (line 182) | class Block(Element):
method __init__ (line 198) | def __init__(
method render (line 221) | def render(self, line_break: str = "\n") -> str:
method from_dict (line 226) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
class Page (line 235) | class Page(Element):
method __init__ (line 251) | def __init__(
method render (line 267) | def render(self, block_break: str = "\n\n") -> str:
method extra_repr (line 271) | def extra_repr(self) -> str:
method show (line 274) | def show(self, interactive: bool = True, preserve_aspect_ratio: bool =...
method synthesize (line 289) | def synthesize(self, **kwargs) -> np.ndarray:
method export_as_xml (line 300) | def export_as_xml(self, file_title: str = "OnnxTR - XML export (hOCR)"...
method from_dict (line 405) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
class Document (line 411) | class Document(Element):
method __init__ (line 421) | def __init__(
method render (line 427) | def render(self, page_break: str = "\n\n\n\n") -> str:
method show (line 431) | def show(self, **kwargs) -> None:
method synthesize (line 436) | def synthesize(self, **kwargs) -> list[np.ndarray]:
method export_as_xml (line 447) | def export_as_xml(self, **kwargs) -> list[tuple[bytes, ET.ElementTree]]:
method from_dict (line 459) | def from_dict(cls, save_dict: dict[str, Any], **kwargs):
FILE: onnxtr/io/html.py
function read_html (line 11) | def read_html(url: str, **kwargs: Any) -> bytes:
FILE: onnxtr/io/image.py
function read_img_as_numpy (line 16) | def read_img_as_numpy(
FILE: onnxtr/io/pdf.py
function read_pdf (line 16) | def read_pdf(
FILE: onnxtr/io/reader.py
class DocumentFile (line 21) | class DocumentFile:
method from_pdf (line 25) | def from_pdf(cls, file: AbstractFile, **kwargs) -> list[np.ndarray]:
method from_url (line 41) | def from_url(cls, url: str, **kwargs) -> list[np.ndarray]:
method from_images (line 63) | def from_images(cls, files: Sequence[AbstractFile] | AbstractFile, **k...
FILE: onnxtr/models/_utils.py
function get_max_width_length_ratio (line 18) | def get_max_width_length_ratio(contour: np.ndarray) -> float:
function estimate_orientation (line 33) | def estimate_orientation(
function rectify_crops (line 154) | def rectify_crops(
function rectify_loc_preds (line 173) | def rectify_loc_preds(
function get_language (line 193) | def get_language(text: str) -> tuple[str, float]:
FILE: onnxtr/models/builder.py
class DocumentBuilder (line 19) | class DocumentBuilder(NestedObject):
method __init__ (line 30) | def __init__(
method _sort_boxes (line 43) | def _sort_boxes(boxes: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
method _resolve_sub_lines (line 65) | def _resolve_sub_lines(self, boxes: np.ndarray, word_idcs: list[int]) ...
method _resolve_lines (line 103) | def _resolve_lines(self, boxes: np.ndarray) -> list[list[int]]:
method _resolve_blocks (line 149) | def _resolve_blocks(boxes: np.ndarray, lines: list[list[int]]) -> list...
method _build_blocks (line 214) | def _build_blocks(
method extra_repr (line 278) | def extra_repr(self) -> str:
method __call__ (line 285) | def __call__(
FILE: onnxtr/models/classification/models/mobilenet.py
class MobileNetV3 (line 41) | class MobileNetV3(Engine):
method __init__ (line 51) | def __init__(
method __call__ (line 62) | def __call__(
function _mobilenet_v3 (line 69) | def _mobilenet_v3(
function mobilenet_v3_small_crop_orientation (line 82) | def mobilenet_v3_small_crop_orientation(
function mobilenet_v3_small_page_orientation (line 110) | def mobilenet_v3_small_page_orientation(
FILE: onnxtr/models/classification/predictor/base.py
class OrientationPredictor (line 17) | class OrientationPredictor(NestedObject):
method __init__ (line 29) | def __init__(
method __call__ (line 37) | def __call__(
FILE: onnxtr/models/classification/zoo.py
function _orientation_predictor (line 19) | def _orientation_predictor(
function crop_orientation_predictor (line 52) | def crop_orientation_predictor(
function page_orientation_predictor (line 88) | def page_orientation_predictor(
FILE: onnxtr/models/detection/_utils/base.py
function _remove_padding (line 12) | def _remove_padding(
FILE: onnxtr/models/detection/core.py
class DetectionPostProcessor (line 15) | class DetectionPostProcessor(NestedObject):
method __init__ (line 24) | def __init__(self, box_thresh: float = 0.5, bin_thresh: float = 0.5, a...
method extra_repr (line 30) | def extra_repr(self) -> str:
method box_score (line 34) | def box_score(pred: np.ndarray, points: np.ndarray, assume_straight_pa...
method bitmap_to_boxes (line 60) | def bitmap_to_boxes(
method __call__ (line 67) | def __call__(
FILE: onnxtr/models/detection/models/differentiable_binarization.py
class DBNet (line 42) | class DBNet(Engine):
method __init__ (line 55) | def __init__(
method __call__ (line 74) | def __call__(
function _dbnet (line 93) | def _dbnet(
function db_resnet34 (line 106) | def db_resnet34(
function db_resnet50 (line 133) | def db_resnet50(
function db_mobilenet_v3_large (line 160) | def db_mobilenet_v3_large(
FILE: onnxtr/models/detection/models/fast.py
class FAST (line 40) | class FAST(Engine):
method __init__ (line 53) | def __init__(
method __call__ (line 72) | def __call__(
function _fast (line 91) | def _fast(
function fast_tiny (line 104) | def fast_tiny(
function fast_small (line 131) | def fast_small(
function fast_base (line 158) | def fast_base(
FILE: onnxtr/models/detection/models/linknet.py
class LinkNet (line 42) | class LinkNet(Engine):
method __init__ (line 55) | def __init__(
method __call__ (line 74) | def __call__(
function _linknet (line 93) | def _linknet(
function linknet_resnet18 (line 106) | def linknet_resnet18(
function linknet_resnet34 (line 133) | def linknet_resnet34(
function linknet_resnet50 (line 160) | def linknet_resnet50(
FILE: onnxtr/models/detection/postprocessor/base.py
class GeneralDetectionPostProcessor (line 20) | class GeneralDetectionPostProcessor(DetectionPostProcessor):
method __init__ (line 29) | def __init__(
method polygon_to_box (line 38) | def polygon_to_box(
method bitmap_to_boxes (line 83) | def bitmap_to_boxes(
FILE: onnxtr/models/detection/predictor/base.py
class DetectionPredictor (line 17) | class DetectionPredictor(NestedObject):
method __init__ (line 27) | def __init__(
method __call__ (line 35) | def __call__(
FILE: onnxtr/models/detection/zoo.py
function _predictor (line 28) | def _predictor(
function detection_predictor (line 60) | def detection_predictor(
FILE: onnxtr/models/engine.py
class EngineConfig (line 33) | class EngineConfig:
method __init__ (line 41) | def __init__(
method _init_providers (line 51) | def _init_providers(self) -> list[tuple[str, dict[str, Any]]]:
method _init_sess_opts (line 72) | def _init_sess_opts(self) -> SessionOptions:
method providers (line 82) | def providers(self) -> list[tuple[str, dict[str, Any]]] | list[str]:
method session_options (line 86) | def session_options(self) -> SessionOptions:
method __repr__ (line 89) | def __repr__(self) -> str:
class Engine (line 93) | class Engine:
method __init__ (line 102) | def __init__(self, url: str, engine_cfg: EngineConfig | None = None, *...
method run (line 120) | def run(self, inputs: np.ndarray) -> np.ndarray:
FILE: onnxtr/models/factory/hub.py
function login_to_hub (line 37) | def login_to_hub() -> None: # pragma: no cover
function _save_model_and_config_for_hf_hub (line 56) | def _save_model_and_config_for_hf_hub(model: Any, save_dir: str, arch: s...
function push_to_hf_hub (line 79) | def push_to_hf_hub(
function from_hub (line 185) | def from_hub(repo_id: str, engine_cfg: EngineConfig | None = None, **kwa...
FILE: onnxtr/models/predictor/base.py
class _OCRPredictor (line 24) | class _OCRPredictor:
method __init__ (line 45) | def __init__(
method _general_page_orientations (line 79) | def _general_page_orientations(
method _get_orientations (line 92) | def _get_orientations(
method _straighten_pages (line 102) | def _straighten_pages(
method _generate_crops (line 127) | def _generate_crops(
method _prepare_crops (line 147) | def _prepare_crops(
method _rectify_crops (line 166) | def _rectify_crops(
method _process_predictions (line 187) | def _process_predictions(
method add_hook (line 204) | def add_hook(self, hook: Callable) -> None:
method list_archs (line 212) | def list_archs(self) -> dict[str, list[str]]:
FILE: onnxtr/models/predictor/predictor.py
class OCRPredictor (line 23) | class OCRPredictor(NestedObject, _OCRPredictor):
method __init__ (line 44) | def __init__(
method __call__ (line 72) | def __call__(
FILE: onnxtr/models/preprocessor/base.py
class PreProcessor (line 19) | class PreProcessor(NestedObject):
method __init__ (line 32) | def __init__(
method batch_inputs (line 44) | def batch_inputs(self, samples: list[np.ndarray]) -> list[np.ndarray]:
method sample_transforms (line 61) | def sample_transforms(self, x: np.ndarray) -> np.ndarray:
method __call__ (line 77) | def __call__(self, x: np.ndarray | list[np.ndarray]) -> list[np.ndarray]:
FILE: onnxtr/models/recognition/core.py
class RecognitionPostProcessor (line 12) | class RecognitionPostProcessor(NestedObject):
method __init__ (line 19) | def __init__(
method extra_repr (line 26) | def extra_repr(self) -> str:
FILE: onnxtr/models/recognition/models/crnn.py
class CRNNPostProcessor (line 48) | class CRNNPostProcessor(RecognitionPostProcessor):
method __init__ (line 55) | def __init__(self, vocab):
method decode_sequence (line 58) | def decode_sequence(self, sequence, vocab):
method ctc_best_path (line 61) | def ctc_best_path(
method __call__ (line 89) | def __call__(self, logits):
class CRNN (line 104) | class CRNN(Engine):
method __init__ (line 117) | def __init__(
method __call__ (line 132) | def __call__(
function _crnn (line 149) | def _crnn(
function crnn_vgg16_bn (line 168) | def crnn_vgg16_bn(
function crnn_mobilenet_v3_small (line 195) | def crnn_mobilenet_v3_small(
function crnn_mobilenet_v3_large (line 222) | def crnn_mobilenet_v3_large(
FILE: onnxtr/models/recognition/models/master.py
class MASTER (line 32) | class MASTER(Engine):
method __init__ (line 43) | def __init__(
method __call__ (line 58) | def __call__(
class MASTERPostProcessor (line 83) | class MASTERPostProcessor(RecognitionPostProcessor):
method __init__ (line 90) | def __init__(
method __call__ (line 97) | def __call__(self, logits: np.ndarray) -> list[tuple[str, float]]:
function _master (line 112) | def _master(
function master (line 131) | def master(
FILE: onnxtr/models/recognition/models/parseq.py
class PARSeq (line 31) | class PARSeq(Engine):
method __init__ (line 42) | def __init__(
method __call__ (line 57) | def __call__(
class PARSeqPostProcessor (line 72) | class PARSeqPostProcessor(RecognitionPostProcessor):
method __init__ (line 79) | def __init__(
method __call__ (line 86) | def __call__(self, logits):
function _parseq (line 103) | def _parseq(
function parseq (line 123) | def parseq(
FILE: onnxtr/models/recognition/models/sar.py
class SAR (line 31) | class SAR(Engine):
method __init__ (line 42) | def __init__(
method __call__ (line 57) | def __call__(
class SARPostProcessor (line 73) | class SARPostProcessor(RecognitionPostProcessor):
method __init__ (line 80) | def __init__(
method __call__ (line 87) | def __call__(self, logits):
function _sar (line 102) | def _sar(
function sar_resnet31 (line 122) | def sar_resnet31(
FILE: onnxtr/models/recognition/models/viptr.py
class VIPTRPostProcessor (line 33) | class VIPTRPostProcessor(RecognitionPostProcessor):
method __init__ (line 40) | def __init__(self, vocab):
method decode_sequence (line 43) | def decode_sequence(self, sequence, vocab):
method ctc_best_path (line 46) | def ctc_best_path(
method __call__ (line 74) | def __call__(self, logits):
class VIPTR (line 89) | class VIPTR(Engine):
method __init__ (line 102) | def __init__(
method __call__ (line 117) | def __call__(
function _viptr (line 134) | def _viptr(
function viptr_tiny (line 155) | def viptr_tiny(
FILE: onnxtr/models/recognition/models/vitstr.py
class ViTSTR (line 39) | class ViTSTR(Engine):
method __init__ (line 50) | def __init__(
method __call__ (line 65) | def __call__(
class ViTSTRPostProcessor (line 81) | class ViTSTRPostProcessor(RecognitionPostProcessor):
method __init__ (line 88) | def __init__(
method __call__ (line 95) | def __call__(self, logits):
function _vitstr (line 112) | def _vitstr(
function vitstr_small (line 132) | def vitstr_small(
function vitstr_base (line 159) | def vitstr_base(
FILE: onnxtr/models/recognition/predictor/_utils.py
function split_crops (line 16) | def split_crops(
function _split_horizontally (line 73) | def _split_horizontally(
function remap_preds (line 119) | def remap_preds(
FILE: onnxtr/models/recognition/predictor/base.py
class RecognitionPredictor (line 19) | class RecognitionPredictor(NestedObject):
method __init__ (line 28) | def __init__(
method __call__ (line 42) | def __call__(
FILE: onnxtr/models/recognition/utils.py
function merge_strings (line 12) | def merge_strings(a: str, b: str, overlap_ratio: float) -> str:
function merge_multi_strings (line 69) | def merge_multi_strings(seq_list: list[str], overlap_ratio: float, last_...
FILE: onnxtr/models/recognition/zoo.py
function _predictor (line 29) | def _predictor(
function recognition_predictor (line 61) | def recognition_predictor(
FILE: onnxtr/models/zoo.py
function _predictor (line 16) | def _predictor(
function ocr_predictor (line 66) | def ocr_predictor(
FILE: onnxtr/transforms/base.py
class Resize (line 15) | class Resize:
method __init__ (line 25) | def __init__(
method __call__ (line 41) | def __call__(self, img: np.ndarray) -> np.ndarray:
method __repr__ (line 88) | def __repr__(self) -> str:
class Normalize (line 96) | class Normalize:
method __init__ (line 104) | def __init__(
method __call__ (line 117) | def __call__(
method __repr__ (line 124) | def __repr__(self) -> str:
FILE: onnxtr/utils/data.py
function _urlretrieve (line 26) | def _urlretrieve(url: str, filename: Path | str, chunk_size: int = 1024)...
function _check_integrity (line 37) | def _check_integrity(file_path: str | Path, hash_prefix: str) -> bool:
function download_from_url (line 44) | def download_from_url(
FILE: onnxtr/utils/fonts.py
function get_font (line 14) | def get_font(font_family: str | None = None, font_size: int = 13) -> Ima...
FILE: onnxtr/utils/geometry.py
function bbox_to_polygon (line 33) | def bbox_to_polygon(bbox: BoundingBox) -> Polygon4P:
function polygon_to_bbox (line 45) | def polygon_to_bbox(polygon: Polygon4P) -> BoundingBox:
function order_points (line 58) | def order_points(pts: np.ndarray) -> np.ndarray:
function detach_scores (line 108) | def detach_scores(boxes: list[np.ndarray]) -> tuple[list[np.ndarray], li...
function shape_translate (line 128) | def shape_translate(data: np.ndarray, format: str) -> np.ndarray:
function resolve_enclosing_bbox (line 167) | def resolve_enclosing_bbox(bboxes: list[BoundingBox] | np.ndarray) -> Bo...
function resolve_enclosing_rbbox (line 189) | def resolve_enclosing_rbbox(rbboxes: list[np.ndarray], intermed_size: in...
function rotate_abs_points (line 210) | def rotate_abs_points(points: np.ndarray, angle: float = 0.0) -> np.ndar...
function compute_expanded_shape (line 227) | def compute_expanded_shape(img_shape: tuple[int, int], angle: float) -> ...
function rotate_abs_geoms (line 248) | def rotate_abs_geoms(
function remap_boxes (line 289) | def remap_boxes(loc_preds: np.ndarray, orig_shape: tuple[int, int], dest...
function rotate_boxes (line 315) | def rotate_boxes(
function rotate_image (line 372) | def rotate_image(
function remove_image_padding (line 421) | def remove_image_padding(image: np.ndarray) -> np.ndarray:
function estimate_page_angle (line 439) | def estimate_page_angle(polys: np.ndarray) -> float:
function convert_to_relative_coords (line 457) | def convert_to_relative_coords(geoms: np.ndarray, img_shape: tuple[int, ...
function extract_crops (line 482) | def extract_crops(img: np.ndarray, boxes: np.ndarray, channels_last: boo...
function extract_rcrops (line 514) | def extract_rcrops(
FILE: onnxtr/utils/multithreading.py
function multithread_exec (line 18) | def multithread_exec(func: Callable[[Any], Any], seq: Iterable[Any], thr...
FILE: onnxtr/utils/reconstitution.py
function _warn_rotation (line 21) | def _warn_rotation(entry: dict[str, Any]) -> None: # pragma: no cover
function _synthesize (line 28) | def _synthesize(
function synthesize_page (line 113) | def synthesize_page(
FILE: onnxtr/utils/repr.py
function _addindent (line 12) | def _addindent(s_, num_spaces):
class NestedObject (line 24) | class NestedObject:
method extra_repr (line 29) | def extra_repr(self) -> str:
method __repr__ (line 32) | def __repr__(self):
FILE: onnxtr/utils/visualization.py
function rect_patch (line 20) | def rect_patch(
function polygon_patch (line 69) | def polygon_patch(
function create_obj_patch (line 112) | def create_obj_patch(
function visualize_page (line 137) | def visualize_page(
function draw_boxes (line 261) | def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: tuple[int, i...
FILE: scripts/convert_to_float16.py
function _load_model (line 34) | def _load_model(arch: str, model_path: str | None = None) -> Any:
function _latency_check (line 46) | def _latency_check(args: Any, size: tuple[int], model: Any, img_tensor: ...
function _validate (line 64) | def _validate(fp32_in: list[np.ndarray], fp16_in: list[np.ndarray]) -> b...
function main (line 75) | def main(args):
FILE: scripts/evaluate.py
function _pct (line 27) | def _pct(val):
function main (line 31) | def main(args):
function parse_args (line 233) | def parse_args():
FILE: scripts/latency.py
function main (line 17) | def main(args):
FILE: scripts/quantize.py
class TaskShapes (line 15) | class TaskShapes(Enum):
class CalibrationDataLoader (line 24) | class CalibrationDataLoader(CalibrationDataReader):
method __init__ (line 25) | def __init__(self, calibration_image_folder: str, model_path: str, tas...
method get_next (line 39) | def get_next(self):
method rewind (line 46) | def rewind(self):
function benchmark (line 50) | def benchmark(calibration_image_folder: str, model_path: str, task_shape...
function benchmark_mean_diff (line 71) | def benchmark_mean_diff(
function main (line 90) | def main(args):
FILE: tests/common/test_contrib.py
function test_base_predictor (line 9) | def test_base_predictor():
function test_artefact_detector (line 22) | def test_artefact_detector(mock_artefact_image_stream):
FILE: tests/common/test_core.py
function test_version (line 7) | def test_version():
function test_requires_package (line 11) | def test_requires_package():
FILE: tests/common/test_engine_cfg.py
function _get_rss_mb (line 14) | def _get_rss_mb():
function _test_predictor (line 20) | def _test_predictor(predictor):
function test_engine_cfg (line 41) | def test_engine_cfg(det_arch, reco_arch):
function test_cpu_memory_arena_shrinkage_enabled (line 86) | def test_cpu_memory_arena_shrinkage_enabled():
FILE: tests/common/test_headers.py
function test_copyright_header (line 7) | def test_copyright_header():
FILE: tests/common/test_io.py
function _check_doc_content (line 11) | def _check_doc_content(doc_tensors, num_pages):
function test_read_pdf (line 18) | def test_read_pdf(mock_pdf):
function test_read_img_as_numpy (line 39) | def test_read_img_as_numpy(tmpdir_factory, mock_pdf):
function test_read_html (line 80) | def test_read_html():
function test_document_file (line 86) | def test_document_file(mock_pdf, mock_artefact_image_stream):
function test_pdf (line 94) | def test_pdf(mock_pdf):
FILE: tests/common/test_io_elements.py
function _mock_words (line 9) | def _mock_words(size=(1.0, 1.0), offset=(0, 0), confidence=0.9, objectne...
function _mock_artefacts (line 57) | def _mock_artefacts(size=(1, 1), offset=(0, 0), confidence=0.8):
function _mock_lines (line 71) | def _mock_lines(size=(1, 1), offset=(0, 0), polygons=False):
function _mock_blocks (line 81) | def _mock_blocks(size=(1, 1), offset=(0, 0), polygons=False):
function _mock_pages (line 97) | def _mock_pages(block_size=(1, 1), block_offset=(0, 0), polygons=False):
function test_element (line 118) | def test_element():
function test_word (line 123) | def test_word():
function test_line (line 165) | def test_line():
function test_artefact (line 212) | def test_artefact():
function test_block (line 233) | def test_block():
function test_page (line 260) | def test_page():
function test_document (line 309) | def test_document():
FILE: tests/common/test_models.py
function mock_image (line 14) | def mock_image(tmpdir_factory):
function mock_bitmap (line 25) | def mock_bitmap(mock_image):
function test_estimate_orientation (line 31) | def test_estimate_orientation(mock_image, mock_bitmap, mock_tilted_paysl...
function test_get_lang (line 80) | def test_get_lang():
FILE: tests/common/test_models_builder.py
function test_documentbuilder (line 10) | def test_documentbuilder():
function test_sort_boxes (line 108) | def test_sort_boxes(input_boxes, sorted_idxs):
function test_resolve_lines (line 131) | def test_resolve_lines(input_boxes, lines):
FILE: tests/common/test_models_classification.py
function test_classification_models (line 17) | def test_classification_models(arch_name, input_shape):
function test_classification_zoo (line 34) | def test_classification_zoo(arch_name):
function test_crop_orientation_model (line 65) | def test_crop_orientation_model(mock_text_box, quantized):
function test_page_orientation_model (line 98) | def test_page_orientation_model(mock_payslip, quantized):
FILE: tests/common/test_models_detection.py
function test_postprocessor (line 10) | def test_postprocessor():
function test_detection_models (line 81) | def test_detection_models(arch_name, input_shape, output_size, out_prob,...
function test_detection_zoo (line 117) | def test_detection_zoo(arch_name, quantized):
FILE: tests/common/test_models_detection_utils.py
function test_remove_padding (line 11) | def test_remove_padding(pages, preserve_aspect_ratio, symmetric_pad, ass...
FILE: tests/common/test_models_factory.py
function test_push_to_hf_hub (line 17) | def test_push_to_hf_hub():
function test_models_huggingface_hub (line 30) | def test_models_huggingface_hub(tmpdir):
FILE: tests/common/test_models_preprocessor.py
function test_preprocessor (line 16) | def test_preprocessor(batch_size, output_size, input_tensor, expected_ba...
FILE: tests/common/test_models_recognition.py
function test_recognition_postprocessor (line 12) | def test_recognition_postprocessor():
function test_split_crops (line 31) | def test_split_crops(crops, max_ratio, target_ratio, target_overlap_rati...
function test_remap_preds (line 47) | def test_remap_preds(preds, crop_map, split_overlap_ratio, pred):
function test_split_crops_cases (line 83) | def test_split_crops_cases(
function test_invalid_split_overlap_ratio (line 124) | def test_invalid_split_overlap_ratio(split_overlap_ratio):
function test_recognition_models (line 149) | def test_recognition_models(arch_name, input_shape, quantized):
function test_recognition_zoo (line 213) | def test_recognition_zoo(arch_name, input_shape, quantized):
FILE: tests/common/test_models_recognition_utils.py
function test_merge_strings (line 46) | def test_merge_strings(a, b, overlap_ratio, merged):
function test_merge_multi_strings (line 62) | def test_merge_multi_strings(seq_list, overlap_ratio, last_overlap_ratio...
FILE: tests/common/test_models_zoo.py
class _DummyCallback (line 22) | class _DummyCallback:
method __call__ (line 23) | def __call__(self, loc_preds):
function test_ocrpredictor (line 37) | def test_ocrpredictor(
function test_trained_ocr_predictor (line 121) | def test_trained_ocr_predictor(mock_payslip):
function _test_predictor (line 185) | def _test_predictor(predictor):
function test_zoo_models (line 207) | def test_zoo_models(det_arch, reco_arch, quantized):
FILE: tests/common/test_transforms.py
function test_resize (line 7) | def test_resize():
function test_normalize (line 58) | def test_normalize(input_shape):
FILE: tests/common/test_utils_data.py
function test__urlretrieve (line 11) | def test__urlretrieve():
function test_download_from_url (line 24) | def test_download_from_url(mkdir_mock, urlretrieve_mock):
function test_download_from_url_customizing_cache_dir (line 32) | def test_download_from_url_customizing_cache_dir(mkdir_mock, urlretrieve...
function test_download_from_url_error_creating_directory (line 40) | def test_download_from_url_error_creating_directory(logging_mock, mkdir_...
function test_download_from_url_error_creating_directory_with_env_var (line 52) | def test_download_from_url_error_creating_directory_with_env_var(logging...
FILE: tests/common/test_utils_fonts.py
function test_get_font (line 6) | def test_get_font():
FILE: tests/common/test_utils_geometry.py
function test_bbox_to_polygon (line 11) | def test_bbox_to_polygon():
function test_polygon_to_bbox (line 15) | def test_polygon_to_bbox():
function test_order_points (line 19) | def test_order_points():
function test_detach_scores (line 60) | def test_detach_scores():
function test_resolve_enclosing_bbox (line 81) | def test_resolve_enclosing_bbox():
function test_resolve_enclosing_rbbox (line 87) | def test_resolve_enclosing_rbbox():
function test_remap_boxes (line 97) | def test_remap_boxes():
function test_rotate_boxes (line 141) | def test_rotate_boxes():
function sample_geoms (line 164) | def sample_geoms():
function test_rotate_abs_geoms (line 179) | def test_rotate_abs_geoms(sample_geoms):
function test_rotate_image (line 188) | def test_rotate_image():
function test_remove_image_padding (line 211) | def test_remove_image_padding():
function test_convert_to_relative_coords (line 243) | def test_convert_to_relative_coords(abs_geoms, img_size, rel_geoms):
function test_estimate_page_angle (line 251) | def test_estimate_page_angle():
function test_extract_crops (line 266) | def test_extract_crops(mock_pdf):
function test_extract_rcrops (line 315) | def test_extract_rcrops(mock_pdf, assume_horizontal):
function test_shape_translate (line 363) | def test_shape_translate(format, input_shape, expected_shape):
FILE: tests/common/test_utils_multithreading.py
function test_multithread_exec (line 22) | def test_multithread_exec(input_seq, func, output_seq):
function test_multithread_exec_multiprocessing_disable (line 28) | def test_multithread_exec_multiprocessing_disable():
FILE: tests/common/test_utils_reconstitution.py
function test_synthesize_page (line 7) | def test_synthesize_page():
FILE: tests/common/test_utils_visualization.py
function test_visualize_page (line 8) | def test_visualize_page():
function test_draw_boxes (line 35) | def test_draw_boxes():
FILE: tests/common/test_utils_vocabs.py
function test_vocabs_duplicates (line 6) | def test_vocabs_duplicates():
FILE: tests/conftest.py
function synthesize_text_img (line 13) | def synthesize_text_img(
function mock_vocab (line 41) | def mock_vocab():
function mock_pdf (line 46) | def mock_pdf(tmpdir_factory):
function mock_payslip (line 65) | def mock_payslip(tmpdir_factory):
function mock_tilted_payslip (line 76) | def mock_tilted_payslip(mock_payslip, tmpdir_factory):
function mock_text_box_stream (line 85) | def mock_text_box_stream():
function mock_text_box (line 91) | def mock_text_box(mock_text_box_stream, tmpdir_factory):
function mock_artefact_image_stream (line 100) | def mock_artefact_image_stream():
Condensed preview — 126 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (567K chars).
[
{
"path": ".conda/meta.yaml",
"chars": 1223,
"preview": "{% set pyproject = load_file_data('../pyproject.toml', from_recipe_dir=True) %}\n{% set project = pyproject.get('project'"
},
{
"path": ".github/CODEOWNERS",
"chars": 24,
"preview": "* @felixdittrich92"
},
{
"path": ".github/FUNDING.yml",
"chars": 859,
"preview": "# These are supported funding model platforms\n\ngithub: felixdittrich92\npatreon: # Replace with a single Patreon username"
},
{
"path": ".github/ISSUE_TEMPLATE/bug_report.yml",
"chars": 1730,
"preview": "name: 🐛 Bug report\ndescription: Create a report to help us improve the library\nlabels: 'type: bug'\n\nbody:\n- type: markdo"
},
{
"path": ".github/ISSUE_TEMPLATE/config.yml",
"chars": 203,
"preview": "blank_issues_enabled: true\ncontact_links:\n - name: Usage questions\n url: https://github.com/felixdittrich92/OnnxTR/d"
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.yml",
"chars": 696,
"preview": "name: 🚀 Feature request\ndescription: >\n Submit a proposal/request for a new feature for OnnxTR. Please search for exist"
},
{
"path": ".github/dependabot.yml",
"chars": 481,
"preview": "version: 2\nupdates:\n - package-ecosystem: \"pip\"\n directory: \"/\"\n open-pull-requests-limit: 10\n target-branch: "
},
{
"path": ".github/release.yml",
"chars": 483,
"preview": "changelog:\n exclude:\n labels:\n - ignore-for-release\n categories:\n - title: Breaking Changes 🛠\n labels:"
},
{
"path": ".github/workflows/builds.yml",
"chars": 2016,
"preview": "name: builds\n\non:\n push:\n branches: main\n pull_request:\n branches: main\n schedule:\n # Runs every 2 weeks on "
},
{
"path": ".github/workflows/clear_caches.yml",
"chars": 301,
"preview": "name: Clear GitHub runner caches\n\non:\n workflow_dispatch:\n schedule:\n - cron: '0 0 * * *' # Runs once a day\n\njobs:"
},
{
"path": ".github/workflows/demo.yml",
"chars": 2472,
"preview": "name: Sync Hugging Face demo\n\non:\n # Run 'test-demo' on every pull request to the main branch\n pull_request:\n branc"
},
{
"path": ".github/workflows/docker.yml",
"chars": 4301,
"preview": "# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages\n#"
},
{
"path": ".github/workflows/main.yml",
"chars": 1756,
"preview": "name: tests\n\non:\n push:\n branches: main\n pull_request:\n branches: main\n schedule:\n # Runs every 2 weeks on M"
},
{
"path": ".github/workflows/publish.yml",
"chars": 3688,
"preview": "name: publish\n\non:\n release:\n types: [published]\n\njobs:\n pypi:\n if: \"!github.event.release.prerelease\"\n strat"
},
{
"path": ".github/workflows/style.yml",
"chars": 1314,
"preview": "name: style\n\non:\n push:\n branches: main\n pull_request:\n branches: main\n\njobs:\n ruff:\n runs-on: ${{ matrix.os"
},
{
"path": ".gitignore",
"chars": 1958,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": ".pre-commit-config.yaml",
"chars": 611,
"preview": "repos:\n - repo: https://github.com/pre-commit/pre-commit-hooks\n rev: v6.0.0\n hooks:\n - id: check-ast\n -"
},
{
"path": "CODE_OF_CONDUCT.md",
"chars": 5220,
"preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participa"
},
{
"path": "Dockerfile",
"chars": 1304,
"preview": "ARG BASE_IMAGE\n\nFROM ${BASE_IMAGE}\n\nENV DEBIAN_FRONTEND=noninteractive\nENV LANG=C.UTF-8\nENV PYTHONUNBUFFERED=1\nENV PYTHO"
},
{
"path": "LICENSE",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "Makefile",
"chars": 534,
"preview": ".PHONY: quality style test docs-single-version docs\n# this target runs checks on all files\nquality:\n\truff check .\n\tmypy"
},
{
"path": "README.md",
"chars": 18954,
"preview": "<p align=\"center\">\n <img src=\"https://github.com/felixdittrich92/OnnxTR/raw/main/docs/images/logo.jpg\" width=\"40%\">\n</p"
},
{
"path": "demo/README.md",
"chars": 340,
"preview": "---\ntitle: OnnxTR OCR\nemoji: 🔥\ncolorFrom: red\ncolorTo: purple\nsdk: gradio\nsdk_version: 5.34.2\napp_file: app.py\npinned: f"
},
{
"path": "demo/app.py",
"chars": 11378,
"preview": "import io\nimport os\nfrom typing import Any\n\n# NOTE: This is a fix to run the demo on the HuggingFace Zero GPU or CPU spa"
},
{
"path": "demo/packages.txt",
"chars": 34,
"preview": "python3-opencv\nfonts-freefont-ttf\n"
},
{
"path": "demo/requirements.txt",
"chars": 227,
"preview": "-e \"onnxtr[gpu-headless,viz] @ git+https://github.com/felixdittrich92/OnnxTR.git\"\ngradio>=5.30.0,<7.0.0\nspaces>=0.37.0\n\n"
},
{
"path": "onnxtr/__init__.py",
"chars": 100,
"preview": "from . import io, models, contrib, transforms, utils\nfrom .version import __version__ # noqa: F401\n"
},
{
"path": "onnxtr/contrib/__init__.py",
"chars": 39,
"preview": "from .artefacts import ArtefactDetector"
},
{
"path": "onnxtr/contrib/artefacts.py",
"chars": 5323,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/contrib/base.py",
"chars": 3135,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/file_utils.py",
"chars": 1045,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/io/__init__.py",
"chars": 106,
"preview": "from .elements import *\nfrom .html import *\nfrom .image import *\nfrom .pdf import *\nfrom .reader import *\n"
},
{
"path": "onnxtr/io/elements.py",
"chars": 17751,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/io/html.py",
"chars": 716,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/io/image.py",
"chars": 1700,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/io/pdf.py",
"chars": 1327,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/io/reader.py",
"chars": 2755,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/__init__.py",
"chars": 157,
"preview": "from .engine import EngineConfig\nfrom .classification import *\nfrom .detection import *\nfrom .recognition import *\nfrom "
},
{
"path": "onnxtr/models/_utils.py",
"chars": 7550,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/builder.py",
"chars": 13998,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/classification/__init__.py",
"chars": 41,
"preview": "from .models import *\nfrom .zoo import *\n"
},
{
"path": "onnxtr/models/classification/models/__init__.py",
"chars": 24,
"preview": "from .mobilenet import *"
},
{
"path": "onnxtr/models/classification/models/mobilenet.py",
"chars": 4818,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/classification/predictor/__init__.py",
"chars": 20,
"preview": "from .base import *\n"
},
{
"path": "onnxtr/models/classification/predictor/base.py",
"chars": 2313,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/classification/zoo.py",
"chars": 4271,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/__init__.py",
"chars": 41,
"preview": "from .models import *\nfrom .zoo import *\n"
},
{
"path": "onnxtr/models/detection/_utils/__init__.py",
"chars": 20,
"preview": "from . base import *"
},
{
"path": "onnxtr/models/detection/_utils/base.py",
"chars": 2300,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/core.py",
"chars": 3466,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/models/__init__.py",
"chars": 85,
"preview": "from .fast import *\nfrom .differentiable_binarization import *\nfrom .linknet import *"
},
{
"path": "onnxtr/models/detection/models/differentiable_binarization.py",
"chars": 6630,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/models/fast.py",
"chars": 6188,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/models/linknet.py",
"chars": 6666,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/postprocessor/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "onnxtr/models/detection/postprocessor/base.py",
"chars": 5689,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/predictor/__init__.py",
"chars": 20,
"preview": "from .base import *\n"
},
{
"path": "onnxtr/models/detection/predictor/base.py",
"chars": 2293,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/detection/zoo.py",
"chars": 3385,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/engine.py",
"chars": 5880,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/factory/__init__.py",
"chars": 19,
"preview": "from .hub import *\n"
},
{
"path": "onnxtr/models/factory/hub.py",
"chars": 7119,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/predictor/__init__.py",
"chars": 25,
"preview": "from .predictor import *\n"
},
{
"path": "onnxtr/models/predictor/base.py",
"chars": 9432,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/predictor/predictor.py",
"chars": 6351,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/preprocessor/__init__.py",
"chars": 20,
"preview": "from .base import *\n"
},
{
"path": "onnxtr/models/preprocessor/base.py",
"chars": 3963,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/__init__.py",
"chars": 41,
"preview": "from .models import *\nfrom .zoo import *\n"
},
{
"path": "onnxtr/models/recognition/core.py",
"chars": 730,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/__init__.py",
"chars": 126,
"preview": "from .crnn import *\nfrom .sar import *\nfrom .master import *\nfrom .vitstr import *\nfrom .parseq import *\nfrom .viptr imp"
},
{
"path": "onnxtr/models/recognition/models/crnn.py",
"chars": 8779,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/master.py",
"chars": 4669,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/parseq.py",
"chars": 4512,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/sar.py",
"chars": 4523,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/viptr.py",
"chars": 5656,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/models/vitstr.py",
"chars": 5964,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/predictor/__init__.py",
"chars": 20,
"preview": "from .base import *\n"
},
{
"path": "onnxtr/models/recognition/predictor/_utils.py",
"chars": 5182,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/predictor/base.py",
"chars": 2503,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/utils.py",
"chars": 3756,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/recognition/zoo.py",
"chars": 3018,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/models/zoo.py",
"chars": 5303,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/py.typed",
"chars": 0,
"preview": ""
},
{
"path": "onnxtr/transforms/__init__.py",
"chars": 20,
"preview": "from .base import *\n"
},
{
"path": "onnxtr/transforms/base.py",
"chars": 4120,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/__init__.py",
"chars": 94,
"preview": "from .common_types import *\nfrom .data import *\nfrom .geometry import *\nfrom .vocabs import *\n"
},
{
"path": "onnxtr/utils/common_types.py",
"chars": 551,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/data.py",
"chars": 4248,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/fonts.py",
"chars": 1282,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/geometry.py",
"chars": 22002,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/multithreading.py",
"chars": 1994,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/reconstitution.py",
"chars": 6119,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/repr.py",
"chars": 2105,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/visualization.py",
"chars": 9831,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "onnxtr/utils/vocabs.py",
"chars": 52988,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "pyproject.toml",
"chars": 4731,
"preview": "[build-system]\nrequires = [\"setuptools\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"onnxtr\"\ndes"
},
{
"path": "scripts/convert_to_float16.py",
"chars": 4423,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "scripts/evaluate.py",
"chars": 11418,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "scripts/latency.py",
"chars": 2103,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "scripts/quantize.py",
"chars": 7313,
"preview": "import argparse\nimport os\nimport time\nfrom enum import Enum\n\nimport numpy as np\nimport onnxruntime\nfrom onnxruntime.quan"
},
{
"path": "setup.py",
"chars": 682,
"preview": "# Copyright (C) 2021-2026, Mindee | Felix Dittrich.\n\n# This program is licensed under the Apache License 2.0.\n# See LICE"
},
{
"path": "tests/common/test_contrib.py",
"chars": 1707,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.contrib import artefacts\nfrom onnxtr.contrib.base import _BasePredictor\nfr"
},
{
"path": "tests/common/test_core.py",
"chars": 327,
"preview": "import pytest\n\nimport onnxtr\nfrom onnxtr.file_utils import requires_package\n\n\ndef test_version():\n assert len(onnxtr."
},
{
"path": "tests/common/test_engine_cfg.py",
"chars": 4812,
"preview": "import gc\n\nimport numpy as np\nimport psutil\nimport pytest\nfrom onnxruntime import RunOptions, SessionOptions\n\nfrom onnxt"
},
{
"path": "tests/common/test_headers.py",
"chars": 1018,
"preview": "\"\"\"Test for python files copyright headers.\"\"\"\n\nfrom datetime import datetime\nfrom pathlib import Path\n\n\ndef test_copyri"
},
{
"path": "tests/common/test_io.py",
"chars": 2730,
"preview": "from io import BytesIO\nfrom pathlib import Path\n\nimport numpy as np\nimport pytest\nimport requests\n\nfrom onnxtr import io"
},
{
"path": "tests/common/test_io_elements.py",
"chars": 11232,
"preview": "from xml.etree.ElementTree import ElementTree\n\nimport numpy as np\nimport pytest\n\nfrom onnxtr.io import elements\n\n\ndef _m"
},
{
"path": "tests/common/test_models.py",
"chars": 3136,
"preview": "from io import BytesIO\n\nimport cv2\nimport numpy as np\nimport pytest\nimport requests\n\nfrom onnxtr.io import reader\nfrom o"
},
{
"path": "tests/common/test_models_builder.py",
"chars": 5267,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.io import Document\nfrom onnxtr.models import builder\n\nwords_per_page = 10\n"
},
{
"path": "tests/common/test_models_classification.py",
"chars": 5209,
"preview": "import cv2\nimport numpy as np\nimport pytest\n\nfrom onnxtr.models import classification, detection\nfrom onnxtr.models.clas"
},
{
"path": "tests/common/test_models_detection.py",
"chars": 5308,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models import detection\nfrom onnxtr.models.detection.postprocessor.base im"
},
{
"path": "tests/common/test_models_detection_utils.py",
"chars": 2068,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models.detection._utils import _remove_padding\n\n\n@pytest.mark.parametrize("
},
{
"path": "tests/common/test_models_factory.py",
"chars": 2074,
"preview": "import json\nimport os\nimport tempfile\n\nimport pytest\n\nfrom onnxtr import models\nfrom onnxtr.models.factory import _save_"
},
{
"path": "tests/common/test_models_preprocessor.py",
"chars": 1667,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models.preprocessor import PreProcessor\n\n\n@pytest.mark.parametrize(\n \"b"
},
{
"path": "tests/common/test_models_recognition.py",
"chars": 8293,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.models import recognition\nfrom onnxtr.models.engine import Engine\nfrom onn"
},
{
"path": "tests/common/test_models_recognition_utils.py",
"chars": 2805,
"preview": "import pytest\n\nfrom onnxtr.models.recognition.utils import merge_multi_strings, merge_strings\n\n\n@pytest.mark.parametrize"
},
{
"path": "tests/common/test_models_zoo.py",
"chars": 8091,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr import models\nfrom onnxtr.io import Document, DocumentFile\nfrom onnxtr.mod"
},
{
"path": "tests/common/test_transforms.py",
"chars": 2048,
"preview": "import numpy as np\nimport pytest\n\nfrom onnxtr.transforms import Normalize, Resize\n\n\ndef test_resize():\n output_size ="
},
{
"path": "tests/common/test_utils_data.py",
"chars": 2179,
"preview": "import os\nimport tempfile\nfrom pathlib import PosixPath\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom onnxtr.util"
},
{
"path": "tests/common/test_utils_fonts.py",
"chars": 235,
"preview": "from PIL.ImageFont import FreeTypeFont, ImageFont\n\nfrom onnxtr.utils.fonts import get_font\n\n\ndef test_get_font():\n # "
},
{
"path": "tests/common/test_utils_geometry.py",
"chars": 13055,
"preview": "from copy import deepcopy\nfrom math import hypot\n\nimport numpy as np\nimport pytest\n\nfrom onnxtr.io import DocumentFile\nf"
},
{
"path": "tests/common/test_utils_multithreading.py",
"chars": 970,
"preview": "import os\nfrom multiprocessing.pool import ThreadPool\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom onnxtr.utils."
},
{
"path": "tests/common/test_utils_reconstitution.py",
"chars": 1333,
"preview": "import numpy as np\nfrom test_io_elements import _mock_pages\n\nfrom onnxtr.utils import reconstitution\n\n\ndef test_synthesi"
},
{
"path": "tests/common/test_utils_visualization.py",
"chars": 1566,
"preview": "import numpy as np\nimport pytest\nfrom test_io_elements import _mock_pages\n\nfrom onnxtr.utils import visualization\n\n\ndef "
},
{
"path": "tests/common/test_utils_vocabs.py",
"chars": 341,
"preview": "from collections import Counter\n\nfrom onnxtr.utils import VOCABS\n\n\ndef test_vocabs_duplicates():\n for key, vocab in V"
},
{
"path": "tests/conftest.py",
"chars": 3463,
"preview": "from io import BytesIO\n\nimport cv2\nimport pytest\nimport requests\nfrom PIL import Image, ImageDraw\n\nfrom onnxtr.io import"
}
]
About this extraction
This page contains the full source code of the felixdittrich92/OnnxTR GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 126 files (480.5 KB), approximately 180.5k tokens, and a symbol index with 387 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.