Showing preview only (780K chars total). Download the full file or copy to clipboard to get everything.
Repository: datalab-to/surya
Branch: master
Commit: e735028979a2
Files: 136
Total size: 740.2 KB
Directory structure:
gitextract_x32e43uo/
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── breaking-bug-report.md
│ │ ├── feature_request.md
│ │ └── output-bug-report.md
│ └── workflows/
│ ├── benchmarks.yml
│ ├── ci.yml
│ ├── cla.yml
│ ├── publish.yml
│ └── scripts.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CITATION.cff
├── CLA.md
├── LICENSE
├── MODEL_LICENSE
├── README.md
├── benchmark/
│ ├── detection.py
│ ├── layout.py
│ ├── ordering.py
│ ├── recognition.py
│ ├── table_recognition.py
│ ├── texify.py
│ └── utils/
│ ├── __init__.py
│ ├── bbox.py
│ ├── metrics.py
│ ├── scoring.py
│ ├── tatr.py
│ ├── tesseract.py
│ ├── textract.py
│ └── verify_benchmark_scores.py
├── detect_layout.py
├── detect_text.py
├── ocr_app.py
├── ocr_latex.py
├── ocr_text.py
├── pyproject.toml
├── pytest.ini
├── signatures/
│ └── version1/
│ └── cla.json
├── static/
│ └── fonts/
│ └── .gitignore
├── surya/
│ ├── __init__.py
│ ├── common/
│ │ ├── __init__.py
│ │ ├── adetr/
│ │ │ └── decoder.py
│ │ ├── donut/
│ │ │ ├── encoder.py
│ │ │ └── processor.py
│ │ ├── load.py
│ │ ├── polygon.py
│ │ ├── predictor.py
│ │ ├── pretrained.py
│ │ ├── s3.py
│ │ ├── surya/
│ │ │ ├── __init__.py
│ │ │ ├── config.py
│ │ │ ├── decoder/
│ │ │ │ ├── __init__.py
│ │ │ │ └── config.py
│ │ │ ├── embedder/
│ │ │ │ └── __init__.py
│ │ │ ├── encoder/
│ │ │ │ ├── __init__.py
│ │ │ │ └── config.py
│ │ │ ├── flash_attn_utils.py
│ │ │ ├── processor/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── schema.py
│ │ │ │ └── tokenizer.py
│ │ │ └── schema.py
│ │ ├── util.py
│ │ └── xla.py
│ ├── debug/
│ │ ├── draw.py
│ │ ├── fonts.py
│ │ ├── katex.js
│ │ ├── render_html.py
│ │ └── text.py
│ ├── detection/
│ │ ├── __init__.py
│ │ ├── heatmap.py
│ │ ├── loader.py
│ │ ├── model/
│ │ │ ├── __init__.py
│ │ │ ├── config.py
│ │ │ └── encoderdecoder.py
│ │ ├── parallel.py
│ │ ├── processor.py
│ │ ├── schema.py
│ │ └── util.py
│ ├── foundation/
│ │ ├── __init__.py
│ │ ├── cache/
│ │ │ ├── __init__.py
│ │ │ ├── dynamic_ops.py
│ │ │ └── static_ops.py
│ │ ├── loader.py
│ │ └── util.py
│ ├── input/
│ │ ├── load.py
│ │ └── processing.py
│ ├── layout/
│ │ ├── __init__.py
│ │ ├── label.py
│ │ └── schema.py
│ ├── logging.py
│ ├── models.py
│ ├── ocr_error/
│ │ ├── __init__.py
│ │ ├── loader.py
│ │ ├── model/
│ │ │ ├── __init__.py
│ │ │ ├── config.py
│ │ │ └── encoder.py
│ │ ├── schema.py
│ │ └── tokenizer.py
│ ├── recognition/
│ │ ├── __init__.py
│ │ ├── languages.py
│ │ ├── postprocessing.py
│ │ ├── schema.py
│ │ └── util.py
│ ├── scripts/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── detect_layout.py
│ │ ├── detect_text.py
│ │ ├── finetune_ocr.py
│ │ ├── hf_to_s3.py
│ │ ├── ocr_latex.py
│ │ ├── ocr_text.py
│ │ ├── run_streamlit_app.py
│ │ ├── run_texify_app.py
│ │ ├── streamlit_app.py
│ │ ├── table_recognition.py
│ │ └── texify_app.py
│ ├── settings.py
│ └── table_rec/
│ ├── __init__.py
│ ├── loader.py
│ ├── model/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── decoder.py
│ │ ├── encoder.py
│ │ └── encoderdecoder.py
│ ├── processor.py
│ ├── schema.py
│ └── shaper.py
├── table_recognition.py
├── tests/
│ ├── conftest.py
│ ├── test_detection.py
│ ├── test_foundation.py
│ ├── test_latex_ocr.py
│ ├── test_layout.py
│ ├── test_ocr_errors.py
│ ├── test_recognition.py
│ └── test_table_rec.py
└── texify_app.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/ISSUE_TEMPLATE/breaking-bug-report.md
================================================
---
name: Breaking bug report
about: Create a report about a breaking bug
title: "[BUG: Breaking]"
labels: 'bug: breaking'
assignees: ''
---
## 🧨 Describe the Bug
A clear and concise description of the breaking issue (e.g., crash, OOM, exception, etc).
## 📄 Input Document
Attach the PDF or input file that triggered the error.
## 📤 Output Trace / Stack Trace
Paste the **complete** stack trace or error output, if available.
<details>
<summary>Click to expand</summary>
```
Paste stack trace here
```
</details>
## ⚙️ Environment
Please fill in all relevant details:
- **Marker version**:
- **Surya version**:
- **Python version**:
- **PyTorch version**:
- **Transformers version**:
- **Operating System** (incl. container info if relevant):
## ✅ Expected Behavior
What did you expect Marker to do?
## 📟 Command or Code Used
Paste the **exact bash command** or **Python code** you used to run Marker:
<details>
<summary>Click to expand</summary>
```bash
# or Python code block
your_command_here --with-flags
```
</details>
## 📎 Additional Context
Any other context that might help us debug this (e.g., CLI options, working directory, runtime settings).
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: "[FEAT]"
labels: enhancement
assignees: ''
---
## ✨ Is your feature request related to a problem?
A clear and concise description of what the problem is.
## 💡 Describe the Solution You'd Like
A concise description of what you want to happen or how you envision it working.
## 📋 Alternatives Considered
Any alternative solutions or workarounds you've tried.
## 🧩 Additional Context
Any additional context, references, or related issues.
================================================
FILE: .github/ISSUE_TEMPLATE/output-bug-report.md
================================================
---
name: Output bug report
about: Create a report about poor output quality
title: "[BUG: Output]"
labels: 'bug: output'
assignees: ''
---
## 📝 Describe the Output Issue
A clear and concise description of the incorrect or unexpected output.
## 📄 Input Document
Attach the PDF or input file used.
## 📤 Current Output
Paste the Markdown or HTML that Marker generated:
````markdown
Paste output here
`````
## ✅ Expected Output
Describe or paste what you expected Marker to generate.
## ⚙️ Environment
Please fill in all relevant details:
* **Marker version**:
* **Surya version**:
* **Python version**:
* **PyTorch version**:
* **Transformers version**:
* **Operating System**:
## 📟 Command or Code Used
Paste the **exact bash command** or **Python code** you used to run Marker:
<details>
<summary>Click to expand</summary>
```bash
# or Python code block
your_command_here --with-flags
```
</details>
## 📎 Additional Context
Any other relevant info, configs, or assumptions.
================================================
FILE: .github/workflows/benchmarks.yml
================================================
name: Integration test
on: [push]
env:
PYTHONIOENCODING: "utf-8"
jobs:
build:
runs-on: t4_gpu
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install python dependencies
run: |
pip install poetry
poetry install
- name: Run detection benchmark test
run: |
poetry run python benchmark/detection.py --max_rows 2
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/det_bench/results.json --bench_type detection
- name: Run recognition benchmark test
run: |
poetry run python benchmark/recognition.py --max_rows 2
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/rec_bench/results.json --bench_type recognition
- name: Run layout benchmark test
run: |
poetry run python benchmark/layout.py --max_rows 5
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/layout_bench/results.json --bench_type layout
- name: Run ordering benchmark
run: |
poetry run python benchmark/ordering.py --max_rows 5
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/order_bench/results.json --bench_type ordering
- name: Run table recognition benchmark
run: |
poetry run python benchmark/table_recognition.py --max_rows 5
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/table_rec_bench/results.json --bench_type table_recognition
- name: Run texify benchmark
run: |
poetry run python benchmark/texify.py --max_rows 5
poetry run python benchmark/utils/verify_benchmark_scores.py results/benchmark/texify_bench/results.json --bench_type texify
================================================
FILE: .github/workflows/ci.yml
================================================
name: Unit tests
on: [push]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [t4_gpu, ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install python dependencies
run: |
pip install poetry
poetry install
- name: Run tests
run: poetry run pytest
================================================
FILE: .github/workflows/cla.yml
================================================
name: "Surya CLA Assistant"
on:
issue_comment:
types: [created]
pull_request_target:
types: [opened,closed,synchronize]
# explicitly configure permissions, in case your GITHUB_TOKEN workflow permissions are set to read-only in repository settings
permissions:
actions: write
contents: write
pull-requests: write
statuses: write
jobs:
CLAAssistant:
runs-on: ubuntu-latest
steps:
- name: "Surya CLA Assistant"
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the CLA Document and I hereby sign the CLA') || github.event_name == 'pull_request_target'
uses: contributor-assistant/github-action@v2.3.0
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# the below token should have repo scope and must be manually added by you in the repository's secret
# This token is required only if you have configured to store the signatures in a remote repository/organization
PERSONAL_ACCESS_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
with:
path-to-signatures: 'signatures/version1/cla.json'
path-to-document: 'https://github.com/VikParuchuri/surya/blob/master/CLA.md'
# branch should not be protected
branch: 'master'
allowlist: VikParuchuri
================================================
FILE: .github/workflows/publish.yml
================================================
name: Python package
on:
push:
tags:
- "v*.*.*"
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install python dependencies
run: |
pip install poetry
poetry install
- name: Build package
run: |
poetry build
- name: Publish package
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: |
poetry config pypi-token.pypi "$PYPI_TOKEN"
poetry publish
================================================
FILE: .github/workflows/scripts.yml
================================================
name: Test CLI scripts
on: [push]
jobs:
build:
runs-on: t4_gpu
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install python dependencies
run: |
pip install poetry
poetry install
- name: Download benchmark data
run: |
wget -O benchmark_data.zip "https://drive.google.com/uc?export=download&id=1NHrdYatR1rtqs2gPVfdvO0BAvocH8CJi"
unzip -o benchmark_data.zip
- name: Test detection
run: poetry run surya_detect benchmark_data/pdfs/switch_trans.pdf --page_range 0
- name: Test OCR
env:
RECOGNITION_MAX_TOKENS: 25
run: poetry run surya_ocr benchmark_data/pdfs/switch_trans.pdf --page_range 0
- name: Test layout
run: poetry run surya_layout benchmark_data/pdfs/switch_trans.pdf --page_range 0
- name: Test table
run: poetry run surya_table benchmark_data/pdfs/switch_trans.pdf --page_range 0
- name: Test texify
env:
TEXIFY_MAX_TOKENS: 25
run: poetry run surya_latex_ocr benchmark_data/pdfs/switch_trans.pdf --page_range 0
- name: Test detection folder
run: poetry run surya_detect benchmark_data/pdfs --page_range 0
================================================
FILE: .gitignore
================================================
private.py
.DS_Store
local.env
experiments
test_data
training
wandb
notebooks
results
data
slices
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
================================================
FILE: .pre-commit-config.yaml
================================================
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.10
hooks:
# Run the linter.
- id: ruff
types_or: [ python, pyi ]
args: [ --fix ]
# Run the formatter.
- id: ruff-format
types_or: [ python, pyi ]
================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, please cite it using the following metadata."
title: "Surya: A lightweight framework for analyzing documents and PDFs at scale"
authors:
- family-names: Paruchuri
given-names: Vikas
- name: Datalab Team
date-released: 2025-05-13
url: https://github.com/VikParuchuri/surya
version: 0.14.0
repository-code: https://github.com/VikParuchuri/surya
================================================
FILE: CLA.md
================================================
Surya Contributor Agreement
This Surya Contributor Agreement ("SCA") applies to any contribution that you make to any product or project managed by us (the "project"), and sets out the intellectual property rights you grant to us in the contributed materials. The term "us" shall mean Endless Labs, Inc. The term "you" shall mean the person or entity identified below.
If you agree to be bound by these terms, sign by writing "I have read the CLA document and I hereby sign the CLA" in response to the CLA bot Github comment. Read this agreement carefully before signing. These terms and conditions constitute a binding legal agreement.
1. The term 'contribution' or 'contributed materials' means any source code, object code, patch, tool, sample, graphic, specification, manual, documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and registrations, in your contribution:
- you hereby assign to us joint ownership, and to the extent that such assignment is or becomes invalid, ineffective or unenforceable, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty free, unrestricted license to exercise all rights under those copyrights. This includes, at our option, the right to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements, including dual-license structures for commercial customers;
- you agree that each of us can do all things in relation to your contribution as if each of us were the sole owners, and if one of us makes a derivative work of your contribution, the one who makes the derivative work (or has it made will be the sole owner of that derivative work;
- you agree that you will not assert any moral rights in your contribution against us, our licensees or transferees;
- you agree that we may register a copyright in your contribution and exercise all ownership rights associated with it; and
- you agree that neither of us has any duty to consult with, obtain the consent of, pay or render an accounting to the other for any use or distribution of vour contribution.
3. With respect to any patents you own, or that you can license without payment to any third party, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free license to:
- make, have made, use, sell, offer to sell, import, and otherwise transfer your contribution in whole or in part, alone or in combination with or included in any product, work or materials arising out of the project to which your contribution was submitted, and
- at our option, to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements.
If you or your affiliates institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the contribution or any project it was submitted to constitutes direct or contributory patent infringement, then any patent licenses granted to you under this agreement for that contribution shall terminate as of the date such litigation is filed.
4. Except as set out above, you keep all right, title, and interest in your contribution. The rights that you grant to us under these terms are effective on the date you first submitted a contribution to us, even if your submission took place before the date you sign these terms. Any contribution we make available under any license will also be made available under a suitable FSF (Free Software Foundation) or OSI (Open Source Initiative) approved license.
5. You covenant, represent, warrant and agree that:
- each contribution that you submit is and shall be an original work of authorship and you can legally grant the rights set out in this SCA;
- to the best of your knowledge, each contribution will not violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and
- each contribution shall be in compliance with U.S. export control laws and other applicable export and import laws.
You agree to notify us if you become aware of any circumstance which would make any of the foregoing representations inaccurate in any respect. Endless Labs, Inc. may publicly disclose your participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable U.S. Federal law. Any choice of law rules will not apply.
================================================
FILE: LICENSE
================================================
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Surya OCR
Copyright (C) 2024 Endless Labs, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
Surya OCR Copyright (C) 2024 Endless Labs, Inc.
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.
================================================
FILE: MODEL_LICENSE
================================================
AI PUBS OPEN RAIL-M LICENSE (MODIFIED)
Version 0.1, March 2, 2023 (Modified)
http://licenses.ai/
PLEASE READ THESE TERMS CAREFULLY BEFORE USING THE MODEL OR A DERIVATIVE WORKS OF THE MODEL MADE AVAILABLE IN CONNECTION WITH THESE TERMS. BY DOWNLOADING, REPRODUCING, DISTRIBUTING OR USING THE MODEL OR A DERIVATIVE WORK OF THE MODEL IN ANY MANNER, YOU (“YOU”) AGREE TO BE BOUND BY THESE TERMS (THE “AGREEMENT”) TO THE EXCLUSION OF ALL OTHER TERMS. YOU REPRESENT AND WARRANT THAT YOU HAVE THE AUTHORITY TO ENTER INTO THIS AGREEMENT; IF YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF AN ORGANIZATION OR ENTITY, REFERENCES TO AND “YOU” IN THIS AGREEMENT, REFER TO THAT ORGANIZATION OR ENTITY. IF YOU DO NOT AGREE TO ALL OF THE FOLLOWING, YOU MAY NOT DOWNLOAD, REPRODUCE, DISTRIBUTE OR USE THE MODEL OR A DERIVATIVE WORK OF THE MODEL IN ANY MANNER.
Section I: PREAMBLE
This OpenRAIL-M License, as modified, is generally applicable to any machine-learning Model.
The “Open” nomenclature indicates that the licensed Model is be freely accessible to downstream and other users. The “RAIL” nomenclature indicates that there are use restrictions prohibiting the use of the Model. These restrictions are intended to avoid potential misuse. This License specifies that the use restrictions in the original License must apply to such derivatives.
NOW THEREFORE, You and Licensor agree as follows:
1. Definitions
(a) “Complementary Material” means the applicable source code and scripts used to define, run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if any. This includes any accompanying documentation, tutorials, examples, and any related information, if any. Complementary Material is not licensed under this License.
(b) "Contribution" means any work, including the original version of the Model and any modifications or additions to that Model or Derivatives of the Model thereof, that is intentionally submitted to Licensor for inclusion in the Model by the rights owner or by an individual or legal entity authorized to submit on behalf of the rights owner. For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Model, but excluding communication that is conspicuously marked or otherwise designated in writing by the rights owner as "Not a Contribution."
(c) "Contributor" means Licensor and any individual or legal entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Model.
(d) “Data” means a collection of information and/or content extracted from the dataset used with the Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
(e) “Derivatives of the Model” means all modifications to the Model, works based on the Model, or any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
(f) “Distribution” means any transmission, reproduction, publication, distribution, or other sharing of the Model or Derivatives of the Model to a third party, including providing the Model as a hosted service made available by electronic or other remote means, including but not limited to API-based or web access.
(g) “Harm” includes but is not limited to physical, mental, psychological, financial and reputational damage, pain, or loss
(h) "License" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
(i) “Licensor” means the rights owner or entity authorized by the rights owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
(j) “Model” means any accompanying machine-learning based assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or in part on the Data, using the Complementary Material.
(k) “Output” means the results of operating a Model as embodied in informational content resulting therefrom.
(l) “Third Parties” means individuals or legal entities that are not under common control with Licensor or You.
(m) "You" (or "Your") means an individual or legal entity exercising permissions granted by this License and/or making use of the Model for whichever purpose and in any field of use, including usage of the Model in an end-use application, including but not limited to a chatbot, translator, or image generator.
Section II: INTELLECTUAL PROPERTY RIGHTS
Both copyright and patent grants may apply to the Model and Derivatives of the Model. The Model and Derivatives of the Model are subject to additional terms as described in Section III, which shall govern the use of the Model and Derivatives of the Model even in the event Section II is held unenforceable.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Model and Derivatives of the Model.
3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and/or Derivatives of the Model where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model or Derivatives of the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model or Derivative of the Model and/or a Contribution incorporated within the Model or Derivative of the Model constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Derivative of the Model shall terminate as of the date such litigation is asserted or filed.
Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
4. Distribution and Redistribution. You may host the Model or Derivatives of the Model for remote access by Third Parties, including but not limited to software-as-a-service, reproduce, or Distribute copies of the Model or Derivatives of the Model thereof in any medium, with or without modifications, provided that You meet the conditions in this Section III:
(a) Use-based restrictions in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (for example, a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model and Derivatives of the Model are subject to paragraph 5;
(b) You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License;
(c) You must cause any modified files to carry prominent notices stating that You changed the files; and
(d) You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model or Derivatives of the Model.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions, consistent with paragraph 4.a., for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions. Accordingly, You cannot use the Model or the Derivatives of the Model in violation of such restrictions. You may use the Model subject to this License, including only for lawful purposes and in accordance with the License. Use may include creating any content with, fine-tuning, updating, running, training, evaluating and/or re-parametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model to comply with the terms of this paragraph 5.
6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You generate using the Model. You are solely responsible for the Output you generate and its subsequent uses. No use of the Output can contravene any provision as stated in the License.
7. Attribution. In connection with any Output, or use of Distribution of any Model or Derivatives of the Model, You agree to give appropriate credit and attribution to Licensor, provide a link to the original Model or Derivatives of the Model, provide a copy of this License, and identify any changes You have made to the Model or Derivatives of the Model (collectively, the “Attribution”). The Attribution must not suggest endorsement by any Licensor.
8. Share-a-Like. As a condition to the license and authorizations herein, You agree to apply this License (to the exclusion of all others) to any and all copies of the Model, Derivatives of the Model, any changes or improvements to the Model or Derivatives of the Model, and to the Output and any derivatives, changes or improvements to or of the Output.
Section IV: OTHER PROVISIONS
9. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model through electronic means, or cause modification to the Output resulting from updates to the Model based.
10. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
11. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Model (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model and Derivatives of the Model, and assume any risks associated with Your exercise of permissions under this License.
12. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Model (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
13. Accepting Warranty or Additional Liability. While Distributing the Model or Derivatives of the Model, You may choose to charge a fee in exchange for support, warranty, indemnity, or other obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor or Licensor, and only if You agree to indemnify, defend, and hold each Contributor and the Licensor harmless for any liability incurred by, or claims asserted against, such Contributor or Licensor by reason of your accepting any such warranty or additional liability.
14. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
END OF TERMS AND CONDITIONS
Attachment A
USE RESTRICTIONS
As conditions to the Licenses set forth in this Agreement, You agree not to use, reproduce, modify, create or Distribute the Model, Derivatives of the Model, or Output (collectively, “Use”) in any of the following ways:
1. Legal:
(a) In any way that violates any applicable national, federal, state, local or international law or regulation; or
(b) to directly or indirectly infringe or misappropriate any third party intellectual property rights (including those of Licensor or any Contributor)
2. Commercial:
(a) for any purpose if You (your employer, or the entity you are affiliated with) generated more than two million US Dollars ($2,000,000) in gross revenue in the prior year, except where Your Use is limited to personal use or research purposes;
(b) for any purpose if You (your employer, or the entity you are affiliated with) has raised more than two million US dollars ($2,000,000) in total equity or debt funding from any source, except where Your Use is limited to personal use or research purposes; or
(c) for any purpose if You (your employer, or the entity you are affiliated with) provides or otherwise makes available any product or service that competes with any product or service offered by or made available by Licensor or any of its affiliates.
Commercial and broader use licenses may be available from Licensor at the following URL: https://www.datalab.to/
================================================
FILE: README.md
================================================
# Surya
Surya is a document OCR toolkit that does:
- OCR in 90+ languages that benchmarks favorably vs cloud services
- Line-level text detection in any language
- Layout analysis (table, image, header, etc detection)
- Reading order detection
- Table recognition (detecting rows/columns)
- LaTeX OCR
It works on a range of documents (see [usage](#usage) and [benchmarks](#benchmarks) for more details).
For our managed API or on-prem document intelligence solution, check out [our platform here](https://datalab.to?utm_source=gh-surya).
| Detection | OCR |
|:----------------------------------------------------------------:|:-----------------------------------------------------------------------:|
| <img src="static/images/excerpt.png" width="500px"/> | <img src="static/images/excerpt_text.png" width="500px"/> |
| Layout | Reading Order |
|:------------------------------------------------------------------:|:--------------------------------------------------------------------------:|
| <img src="static/images/excerpt_layout.png" width="500px"/> | <img src="static/images/excerpt_reading.jpg" width="500px"/> |
| Table Recognition | LaTeX OCR |
|:-------------------------------------------------------------:|:------------------------------------------------------:|
| <img src="static/images/scanned_tablerec.png" width="500px"/> | <img src="static/images/latex_ocr.png" width="500px"/> |
Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who has universal vision.
## Community
[Discord](https://discord.gg//KuZwXNGnfH) is where we discuss future development.
## Examples
| Name | Detection | OCR | Layout | Order | Table Rec |
|------------------|:-----------------------------------:|-----------------------------------------:|-------------------------------------------:|--------------------------------------------:|---------------------------------------------:|
| Japanese | [Image](static/images/japanese.jpg) | [Image](static/images/japanese_text.jpg) | [Image](static/images/japanese_layout.jpg) | [Image](static/images/japanese_reading.jpg) | [Image](static/images/japanese_tablerec.png) |
| Chinese | [Image](static/images/chinese.jpg) | [Image](static/images/chinese_text.jpg) | [Image](static/images/chinese_layout.jpg) | [Image](static/images/chinese_reading.jpg) | |
| Hindi | [Image](static/images/hindi.jpg) | [Image](static/images/hindi_text.jpg) | [Image](static/images/hindi_layout.jpg) | [Image](static/images/hindi_reading.jpg) | |
| Arabic | [Image](static/images/arabic.jpg) | [Image](static/images/arabic_text.jpg) | [Image](static/images/arabic_layout.jpg) | [Image](static/images/arabic_reading.jpg) | |
| Chinese + Hindi | [Image](static/images/chi_hind.jpg) | [Image](static/images/chi_hind_text.jpg) | [Image](static/images/chi_hind_layout.jpg) | [Image](static/images/chi_hind_reading.jpg) | |
| Presentation | [Image](static/images/pres.png) | [Image](static/images/pres_text.jpg) | [Image](static/images/pres_layout.jpg) | [Image](static/images/pres_reading.jpg) | [Image](static/images/pres_tablerec.png) |
| Scientific Paper | [Image](static/images/paper.jpg) | [Image](static/images/paper_text.jpg) | [Image](static/images/paper_layout.jpg) | [Image](static/images/paper_reading.jpg) | [Image](static/images/paper_tablerec.png) |
| Scanned Document | [Image](static/images/scanned.png) | [Image](static/images/scanned_text.jpg) | [Image](static/images/scanned_layout.jpg) | [Image](static/images/scanned_reading.jpg) | [Image](static/images/scanned_tablerec.png) |
| New York Times | [Image](static/images/nyt.jpg) | [Image](static/images/nyt_text.jpg) | [Image](static/images/nyt_layout.jpg) | [Image](static/images/nyt_order.jpg) | |
| Scanned Form | [Image](static/images/funsd.png) | [Image](static/images/funsd_text.jpg) | [Image](static/images/funsd_layout.jpg) | [Image](static/images/funsd_reading.jpg) | [Image](static/images/scanned_tablerec2.png) |
| Textbook | [Image](static/images/textbook.jpg) | [Image](static/images/textbook_text.jpg) | [Image](static/images/textbook_layout.jpg) | [Image](static/images/textbook_order.jpg) | |
# Hosted API
There is a hosted API for all surya models available [here](https://www.datalab.to?utm_source=gh-surya):
- Works with PDF, images, word docs, and powerpoints
- Consistent speed, with no latency spikes
- High reliability and uptime
# Commercial usage
Our model weights use a modified AI Pubs Open Rail-M license (free for research, personal use, and startups under $2M funding/revenue) and our code is GPL. For broader commercial licensing or to remove GPL requirements, visit our pricing page [here](https://www.datalab.to/pricing?utm_source=gh-surya).
# Installation
You'll need python 3.10+ and PyTorch. You may need to install the CPU version of torch first if you're not using a Mac or a GPU machine. See [here](https://pytorch.org/get-started/locally/) for more details.
Install with:
```shell
pip install surya-ocr
```
Model weights will automatically download the first time you run surya.
# Usage
- Inspect the settings in `surya/settings.py`. You can override any settings with environment variables.
- Your torch device will be automatically detected, but you can override this. For example, `TORCH_DEVICE=cuda`.
## Interactive App
I've included a streamlit app that lets you interactively try Surya on images or PDF files. Run it with:
```shell
pip install streamlit pdftext
surya_gui
```
## OCR (text recognition)
This command will write out a json file with the detected text and bboxes:
```shell
surya_ocr DATA_PATH
```
- `DATA_PATH` can be an image, pdf, or folder of images/pdfs
- `--task_name` will specify which task to use for predicting the lines. `ocr_with_boxes` is the default, which will format text and give you bboxes. If you get bad performance, try `ocr_without_boxes`, which will give you potentially better performance but no bboxes. For blocks like equations and paragraphs, try `block_without_boxes`.
- `--images` will save images of the pages and detected text lines (optional)
- `--output_dir` specifies the directory to save results to instead of the default
- `--page_range` specifies the page range to process in the PDF, specified as a single number, a comma separated list, a range, or comma separated ranges - example: `0,5-10,20`.
- `--disable_math` - by default, surya will recognize math in text. This can lead to false positives - you can disable this with this flag.
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
- `text_lines` - the detected text and bounding boxes for each line
- `text` - the text in the line
- `confidence` - the confidence of the model in the detected text (0-1)
- `polygon` - the polygon for the text line in (x1, y1), (x2, y2), (x3, y3), (x4, y4) format. The points are in clockwise order from the top left.
- `bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
- `chars` - the individual characters in the line
- `text` - the text of the character
- `bbox` - the character bbox (same format as line bbox)
- `polygon` - the character polygon (same format as line polygon)
- `confidence` - the confidence of the model in the detected character (0-1)
- `bbox_valid` - if the character is a special token or math, the bbox may not be valid
- `words` - the individual words in the line (computed from the characters)
- `text` - the text of the word
- `bbox` - the word bbox (same format as line bbox)
- `polygon` - the word polygon (same format as line polygon)
- `confidence` - mean character confidence
- `bbox_valid` - if the word is a special token or math, the bbox may not be valid
- `page` - the page number in the file
- `image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
**Performance tips**
Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `512`, which will use about 20GB of VRAM. Depending on your CPU core count, it may help, too - the default CPU batch size is `32`.
### From python
```python
from PIL import Image
from surya.foundation import FoundationPredictor
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
image = Image.open(IMAGE_PATH)
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()
predictions = recognition_predictor([image], det_predictor=detection_predictor)
```
## Text line detection
This command will write out a json file with the detected bboxes.
```shell
surya_detect DATA_PATH
```
- `DATA_PATH` can be an image, pdf, or folder of images/pdfs
- `--images` will save images of the pages and detected text lines (optional)
- `--output_dir` specifies the directory to save results to instead of the default
- `--page_range` specifies the page range to process in the PDF, specified as a single number, a comma separated list, a range, or comma separated ranges - example: `0,5-10,20`.
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
- `bboxes` - detected bounding boxes for text
- `bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
- `polygon` - the polygon for the text line in (x1, y1), (x2, y2), (x3, y3), (x4, y4) format. The points are in clockwise order from the top left.
- `confidence` - the confidence of the model in the detected text (0-1)
- `vertical_lines` - vertical lines detected in the document
- `bbox` - the axis-aligned line coordinates.
- `page` - the page number in the file
- `image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
**Performance tips**
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `440MB` of VRAM, so very high batch sizes are possible. The default is a batch size `36`, which will use about 16GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `6`.
### From python
```python
from PIL import Image
from surya.detection import DetectionPredictor
image = Image.open(IMAGE_PATH)
det_predictor = DetectionPredictor()
# predictions is a list of dicts, one per image
predictions = det_predictor([image])
```
## Layout and reading order
This command will write out a json file with the detected layout and reading order.
```shell
surya_layout DATA_PATH
```
- `DATA_PATH` can be an image, pdf, or folder of images/pdfs
- `--images` will save images of the pages and detected text lines (optional)
- `--output_dir` specifies the directory to save results to instead of the default
- `--page_range` specifies the page range to process in the PDF, specified as a single number, a comma separated list, a range, or comma separated ranges - example: `0,5-10,20`.
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
- `bboxes` - detected bounding boxes for text
- `bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
- `polygon` - the polygon for the text line in (x1, y1), (x2, y2), (x3, y3), (x4, y4) format. The points are in clockwise order from the top left.
- `position` - the reading order of the box.
- `label` - the label for the bbox. One of `Caption`, `Footnote`, `Formula`, `List-item`, `Page-footer`, `Page-header`, `Picture`, `Figure`, `Section-header`, `Table`, `Form`, `Table-of-contents`, `Handwriting`, `Text`, `Text-inline-math`.
- `top_k` - the top-k other potential labels for the box. A dictionary with labels as keys and confidences as values.
- `page` - the page number in the file
- `image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
**Performance tips**
Setting the `LAYOUT_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `220MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 7GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `4`.
### From python
```python
from PIL import Image
from surya.foundation import FoundationPredictor
from surya.layout import LayoutPredictor
from surya.settings import settings
image = Image.open(IMAGE_PATH)
layout_predictor = LayoutPredictor(FoundationPredictor(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT))
# layout_predictions is a list of dicts, one per image
layout_predictions = layout_predictor([image])
```
## Table Recognition
This command will write out a json file with the detected table cells and row/column ids, along with row/column bounding boxes. If you want to get cell positions and text, along with nice formatting, check out the [marker](https://www.github.com/VikParuchuri/marker) repo. You can use the `TableConverter` to detect and extract tables in images and PDFs. It supports output in json (with bboxes), markdown, and html.
```shell
surya_table DATA_PATH
```
- `DATA_PATH` can be an image, pdf, or folder of images/pdfs
- `--images` will save images of the pages and detected table cells + rows and columns (optional)
- `--output_dir` specifies the directory to save results to instead of the default
- `--page_range` specifies the page range to process in the PDF, specified as a single number, a comma separated list, a range, or comma separated ranges - example: `0,5-10,20`.
- `--detect_boxes` specifies if cells should be detected. By default, they're pulled out of the PDF, but this is not always possible.
- `--skip_table_detection` tells table recognition not to detect tables first. Use this if your image is already cropped to a table.
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
- `rows` - detected table rows
- `bbox` - the bounding box of the table row
- `row_id` - the id of the row
- `is_header` - if it is a header row.
- `cols` - detected table columns
- `bbox` - the bounding box of the table column
- `col_id`- the id of the column
- `is_header` - if it is a header column
- `cells` - detected table cells
- `bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
- `text` - if text could be pulled out of the pdf, the text of this cell.
- `row_id` - the id of the row the cell belongs to.
- `col_id` - the id of the column the cell belongs to.
- `colspan` - the number of columns spanned by the cell.
- `rowspan` - the number of rows spanned by the cell.
- `is_header` - whether it is a header cell.
- `page` - the page number in the file
- `table_idx` - the index of the table on the page (sorted in vertical order)
- `image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
**Performance tips**
Setting the `TABLE_REC_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `150MB` of VRAM, so very high batch sizes are possible. The default is a batch size `64`, which will use about 10GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `8`.
### From python
```python
from PIL import Image
from surya.table_rec import TableRecPredictor
image = Image.open(IMAGE_PATH)
table_rec_predictor = TableRecPredictor()
table_predictions = table_rec_predictor([image])
```
## LaTeX OCR
This command will write out a json file with the LaTeX of the equations. You must pass in images that are already cropped to the equations. You can do this by running the layout model, then cropping, if you want.
```shell
surya_latex_ocr DATA_PATH
```
- `DATA_PATH` can be an image, pdf, or folder of images/pdfs
- `--output_dir` specifies the directory to save results to instead of the default
- `--page_range` specifies the page range to process in the PDF, specified as a single number, a comma separated list, a range, or comma separated ranges - example: `0,5-10,20`.
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. See the OCR section above for the format of the output.
### From python
```python
from PIL import Image
from surya.texify import TexifyPredictor
image = Image.open(IMAGE_PATH)
predictor = TexifyPredictor()
predictor([image])
```
### Interactive app
You can also run a special interactive app that lets you select equations and OCR them (kind of like MathPix snip) with:
```shell
pip install streamlit==1.40 streamlit-drawable-canvas-jsretry
texify_gui
```
## Compilation
The following models have support for compilation. You will need to set the following environment variables to enable compilation:
- Detection: `COMPILE_DETECTOR=true`
- Layout: `COMPILE_LAYOUT=true`
- Table recognition: `COMPILE_TABLE_REC=true`
Alternatively, you can also set `COMPILE_ALL=true` which will compile all models.
Here are the speedups on an A10 GPU:
| Model | Time per page (s) | Compiled time per page (s) | Speedup (%) |
| ----------------- | ----------------- | -------------------------- | ----------- |
| Detection | 0.108808 | 0.10521 | 3.306742151 |
| Layout | 0.27319 | 0.27063 | 0.93707676 |
| Table recognition | 0.0219 | 0.01938 | 11.50684932 |
# Limitations
- This is specialized for document OCR. It will likely not work on photos or other images.
- It is for printed text, not handwriting (though it may work on some handwriting).
- The text detection model has trained itself to ignore advertisements.
- You can find language support for OCR in `surya/recognition/languages.py`. Text detection, layout analysis, and reading order will work with any language.
## Troubleshooting
If OCR isn't working properly:
- Try increasing resolution of the image so the text is bigger. If the resolution is already very high, try decreasing it to no more than a `2048px` width.
- Preprocessing the image (binarizing, deskewing, etc) can help with very old/blurry images.
- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space. `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text. `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range. Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).
# Manual install
If you want to develop surya, you can install it manually:
- `git clone https://github.com/VikParuchuri/surya.git`
- `cd surya`
- `poetry install` - installs main and dev dependencies
- `poetry shell` - activates the virtual environment
# Benchmarks
## OCR

| Model | Time per page (s) | Avg similarity (⬆) |
|-----------|-------------------|--------------------|
| surya | .62 | 0.97 |
| tesseract | .45 | 0.88 |
[Full language results](static/images/rec_acc_table.png)
Tesseract is CPU-based, and surya is CPU or GPU. I tried to cost-match the resources used, so I used a 1xA6000 (48GB VRAM) for surya, and 28 CPU cores for Tesseract (same price on Lambda Labs/DigitalOcean).
### Google Cloud Vision
I benchmarked OCR against Google Cloud vision since it has similar language coverage to Surya.

[Full language results](static/images/gcloud_full_langs.png)
**Methodology**
I measured normalized sentence similarity (0-1, higher is better) based on a set of real-world and synthetic pdfs. I sampled PDFs from common crawl, then filtered out the ones with bad OCR. I couldn't find PDFs for some languages, so I also generated simple synthetic PDFs for those.
I used the reference line bboxes from the PDFs with both tesseract and surya, to just evaluate the OCR quality.
For Google Cloud, I aligned the output from Google Cloud with the ground truth. I had to skip RTL languages since they didn't align well.
## Text line detection

| Model | Time (s) | Time per page (s) | precision | recall |
|-----------|------------|---------------------|-------------|----------|
| surya | 47.2285 | 0.094452 | 0.835857 | 0.960807 |
| tesseract | 74.4546 | 0.290838 | 0.631498 | 0.997694 |
Tesseract is CPU-based, and surya is CPU or GPU. I ran the benchmarks on a system with an A10 GPU, and a 32 core CPU. This was the resource usage:
- tesseract - 32 CPU cores, or 8 workers using 4 cores each
- surya - 36 batch size, for 16GB VRAM usage
**Methodology**
Surya predicts line-level bboxes, while tesseract and others predict word-level or character-level. It's hard to find 100% correct datasets with line-level annotations. Merging bboxes can be noisy, so I chose not to use IoU as the metric for evaluation.
I instead used coverage, which calculates:
- Precision - how well the predicted bboxes cover ground truth bboxes
- Recall - how well ground truth bboxes cover predicted bboxes
First calculate coverage for each bbox, then add a small penalty for double coverage, since we want the detection to have non-overlapping bboxes. Anything with a coverage of 0.5 or higher is considered a match.
Then we calculate precision and recall for the whole dataset.
## Layout analysis
| Layout Type | precision | recall |
|---------------|-------------|----------|
| Image | 0.91265 | 0.93976 |
| List | 0.80849 | 0.86792 |
| Table | 0.84957 | 0.96104 |
| Text | 0.93019 | 0.94571 |
| Title | 0.92102 | 0.95404 |
Time per image - .13 seconds on GPU (A10).
**Methodology**
I benchmarked the layout analysis on [Publaynet](https://github.com/ibm-aur-nlp/PubLayNet), which was not in the training data. I had to align publaynet labels with the surya layout labels. I was then able to find coverage for each layout type:
- Precision - how well the predicted bboxes cover ground truth bboxes
- Recall - how well ground truth bboxes cover predicted bboxes
## Reading Order
88% mean accuracy, and .4 seconds per image on an A10 GPU. See methodology for notes - this benchmark is not perfect measure of accuracy, and is more useful as a sanity check.
**Methodology**
I benchmarked the reading order on the layout dataset from [here](https://www.icst.pku.edu.cn/cpdp/sjzy/), which was not in the training data. Unfortunately, this dataset is fairly noisy, and not all the labels are correct. It was very hard to find a dataset annotated with reading order and also layout information. I wanted to avoid using a cloud service for the ground truth.
The accuracy is computed by finding if each pair of layout boxes is in the correct order, then taking the % that are correct.
## Table Recognition
| Model | Row Intersection | Col Intersection | Time Per Image |
|-------------------|--------------------|--------------------|------------------|
| Surya | 1 | 0.98625 | 0.30202 |
| Table transformer | 0.84 | 0.86857 | 0.08082 |
Higher is better for intersection, which the percentage of the actual row/column overlapped by the predictions. This benchmark is mostly a sanity check - there is a more rigorous one in [marker](https://www.github.com/VikParuchuri/marker)
**Methodology**
The benchmark uses a subset of [Fintabnet](https://developer.ibm.com/exchanges/data/all/fintabnet/) from IBM. It has labeled rows and columns. After table recognition is run, the predicted rows and columns are compared to the ground truth. There is an additional penalty for predicting too many or too few rows/columns.
## LaTeX OCR
| Method | edit ⬇ | time taken (s) ⬇ |
|--------|----------|------------------|
| texify | 0.122617 | 35.6345 |
This inferences texify on a ground truth set of LaTeX, then does edit distance. This is a bit noisy, since 2 LaTeX strings that render the same can have different symbols in them.
## Running your own benchmarks
You can benchmark the performance of surya on your machine.
- Follow the manual install instructions above.
- `poetry install --group dev` - installs dev dependencies
**Text line detection**
This will evaluate tesseract and surya for text line detection across a randomly sampled set of images from [doclaynet](https://huggingface.co/datasets/vikp/doclaynet_bench).
```shell
python benchmark/detection.py --max_rows 256
```
- `--max_rows` controls how many images to process for the benchmark
- `--debug` will render images and detected bboxes
- `--pdf_path` will let you specify a pdf to benchmark instead of the default data
- `--results_dir` will let you specify a directory to save results to instead of the default one
**Text recognition**
This will evaluate surya and optionally tesseract on multilingual pdfs from common crawl (with synthetic data for missing languages).
```shell
python benchmark/recognition.py --tesseract
```
- `--max_rows` controls how many images to process for the benchmark
- `--debug 2` will render images with detected text
- `--results_dir` will let you specify a directory to save results to instead of the default one
- `--tesseract` will run the benchmark with tesseract. You have to run `sudo apt-get install tesseract-ocr-all` to install all tesseract data, and set `TESSDATA_PREFIX` to the path to the tesseract data folder.
- Set `RECOGNITION_BATCH_SIZE=864` to use the same batch size as the benchmark.
- Set `RECOGNITION_BENCH_DATASET_NAME=vikp/rec_bench_hist` to use the historical document data for benchmarking. This data comes from the [tapuscorpus](https://github.com/HTR-United/tapuscorpus).
**Layout analysis**
This will evaluate surya on the publaynet dataset.
```shell
python benchmark/layout.py
```
- `--max_rows` controls how many images to process for the benchmark
- `--debug` will render images with detected text
- `--results_dir` will let you specify a directory to save results to instead of the default one
**Reading Order**
```shell
python benchmark/ordering.py
```
- `--max_rows` controls how many images to process for the benchmark
- `--debug` will render images with detected text
- `--results_dir` will let you specify a directory to save results to instead of the default one
**Table Recognition**
```shell
python benchmark/table_recognition.py --max_rows 1024 --tatr
```
- `--max_rows` controls how many images to process for the benchmark
- `--debug` will render images with detected text
- `--results_dir` will let you specify a directory to save results to instead of the default one
- `--tatr` specifies whether to also run table transformer
**LaTeX OCR**
```shell
python benchmark/texify.py --max_rows 128
```
- `--max_rows` controls how many images to process for the benchmark
- `--results_dir` will let you specify a directory to save results to instead of the default one
# Training
Text detection was trained on 4x A6000s for 3 days. It used a diverse set of images as training data. It was trained from scratch using a modified efficientvit architecture for semantic segmentation.
Text recognition was trained on 4x A6000s for 2 weeks. It was trained using a modified donut model (GQA, MoE layer, UTF-16 decoding, layer config changes).
# Finetuning Surya OCR
You can now take Surya OCR further by training it on your own data with our [finetuning script](/surya/scripts/finetune_ocr.py).
It’s built on Hugging Face Trainer, and supports all the [arguments](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments) that the huggingface trainer provides, and integrations like torchrun, or deepspeed.
To setup your dataset, follow the example dataset format [here](https://huggingface.co/datasets/datalab-to/ocr_finetune_example) and provide the path to your own dataset when launching the training script.
```bash
# Tested on 1xH100 GPU
# Set --pretrained_checkpoint_path to load from a custom checkpoint, otherwise
# the default surya ocr weights will be loaded as the initialization
python surya/scripts/finetune_ocr.py \
--output_dir $OUTPUT_DIR \
--dataset_name datalab-to/ocr_finetune_example \
--per_device_train_batch_size 64 \
--gradient_checkpointing true \
--max_sequence_length 1024
```
This is a minimal training script to get you started finetuning Surya. Our internal training stack includes character bounding box finetuning, sliding window attention with specialized attention masks, custom kernels, augmentations, and other optimizations that can push OCR accuracy well beyond standard finetuning. If you want to get the most out of your data, reach us at hi@datalab.to!
# Thanks
This work would not have been possible without amazing open source AI work:
- [Segformer](https://arxiv.org/pdf/2105.15203.pdf) from NVIDIA
- [EfficientViT](https://github.com/mit-han-lab/efficientvit) from MIT
- [timm](https://github.com/huggingface/pytorch-image-models) from Ross Wightman
- [Donut](https://github.com/clovaai/donut) from Naver
- [transformers](https://github.com/huggingface/transformers) from huggingface
- [CRAFT](https://github.com/clovaai/CRAFT-pytorch), a great scene text detection model
Thank you to everyone who makes open source AI possible.
# Citation
If you use surya (or the associated models) in your work or research, please consider citing us using the following BibTeX entry:
```bibtex
@misc{paruchuri2025surya,
author = {Vikas Paruchuri and Datalab Team},
title = {Surya: A lightweight document OCR and analysis toolkit},
year = {2025},
howpublished = {\url{https://github.com/VikParuchuri/surya}},
note = {GitHub repository},
}
================================================
FILE: benchmark/detection.py
================================================
import argparse
import collections
import copy
import json
import click
from benchmark.utils.bbox import get_pdf_lines
from benchmark.utils.metrics import precision_recall
from benchmark.utils.tesseract import tesseract_parallel
from surya.input.processing import open_pdf, get_page_images, convert_if_not_rgb
from surya.debug.draw import draw_polys_on_image
from surya.common.util import rescale_bbox
from surya.settings import settings
from surya.detection import DetectionPredictor
import os
import time
from tabulate import tabulate
import datasets
@click.command(help="Benchmark detection model.")
@click.option("--pdf_path", type=str, help="Path to PDF to detect bboxes in.", default=None)
@click.option("--results_dir", type=str, help="Path to JSON file with OCR results.", default=os.path.join(settings.RESULT_DIR, "benchmark"))
@click.option("--max_rows", type=int, help="Maximum number of pdf pages to OCR.", default=100)
@click.option("--debug", is_flag=True, help="Enable debug mode.", default=False)
@click.option("--tesseract", is_flag=True, help="Run tesseract as well.", default=False)
def main(pdf_path: str, results_dir: str, max_rows: int, debug: bool, tesseract: bool):
det_predictor = DetectionPredictor()
if pdf_path is not None:
pathname = pdf_path
doc = open_pdf(pdf_path)
page_count = len(doc)
page_indices = list(range(page_count))
page_indices = page_indices[:max_rows]
images = get_page_images(doc, page_indices)
doc.close()
image_sizes = [img.size for img in images]
correct_boxes = get_pdf_lines(pdf_path, image_sizes)
else:
pathname = "det_bench"
# These have already been shuffled randomly, so sampling from the start is fine
dataset = datasets.load_dataset(settings.DETECTOR_BENCH_DATASET_NAME, split=f"train[:{max_rows}]")
images = list(dataset["image"])
images = convert_if_not_rgb(images)
correct_boxes = []
for i, boxes in enumerate(dataset["bboxes"]):
img_size = images[i].size
# 1000,1000 is bbox size for doclaynet
correct_boxes.append([rescale_bbox(b, (1000, 1000), img_size) for b in boxes])
if settings.DETECTOR_STATIC_CACHE:
# Run through one batch to compile the model
det_predictor(images[:1])
start = time.time()
predictions = det_predictor(images)
surya_time = time.time() - start
if tesseract:
start = time.time()
tess_predictions = tesseract_parallel(images)
tess_time = time.time() - start
else:
tess_predictions = [None] * len(images)
tess_time = None
folder_name = os.path.basename(pathname).split(".")[0]
result_path = os.path.join(results_dir, folder_name)
os.makedirs(result_path, exist_ok=True)
page_metrics = collections.OrderedDict()
for idx, (tb, sb, cb) in enumerate(zip(tess_predictions, predictions, correct_boxes)):
surya_boxes = [s.bbox for s in sb.bboxes]
surya_polys = [s.polygon for s in sb.bboxes]
surya_metrics = precision_recall(surya_boxes, cb)
if tb is not None:
tess_metrics = precision_recall(tb, cb)
else:
tess_metrics = None
page_metrics[idx] = {
"surya": surya_metrics,
"tesseract": tess_metrics
}
if debug:
bbox_image = draw_polys_on_image(surya_polys, copy.deepcopy(images[idx]))
bbox_image.save(os.path.join(result_path, f"{idx}_bbox.png"))
mean_metrics = {}
metric_types = sorted(page_metrics[0]["surya"].keys())
models = ["surya"]
if tesseract:
models.append("tesseract")
for k in models:
for m in metric_types:
metric = []
for page in page_metrics:
metric.append(page_metrics[page][k][m])
if k not in mean_metrics:
mean_metrics[k] = {}
mean_metrics[k][m] = sum(metric) / len(metric)
out_data = {
"times": {
"surya": surya_time,
"tesseract": tess_time
},
"metrics": mean_metrics,
"page_metrics": page_metrics
}
with open(os.path.join(result_path, "results.json"), "w+", encoding="utf-8") as f:
json.dump(out_data, f, indent=4)
table_headers = ["Model", "Time (s)", "Time per page (s)"] + metric_types
table_data = [
["surya", surya_time, surya_time / len(images)] + [mean_metrics["surya"][m] for m in metric_types],
]
if tesseract:
table_data.append(
["tesseract", tess_time, tess_time / len(images)] + [mean_metrics["tesseract"][m] for m in metric_types]
)
print(tabulate(table_data, headers=table_headers, tablefmt="github"))
print("Precision and recall are over the mutual coverage of the detected boxes and the ground truth boxes at a .5 threshold. There is a precision penalty for multiple boxes overlapping reference lines.")
print(f"Wrote results to {result_path}")
if __name__ == "__main__":
main()
================================================
FILE: benchmark/layout.py
================================================
import collections
import copy
import json
import click
from benchmark.utils.metrics import precision_recall
from surya.foundation import FoundationPredictor
from surya.layout import LayoutPredictor
from surya.input.processing import convert_if_not_rgb
from surya.debug.draw import draw_bboxes_on_image
from surya.settings import settings
import os
import time
from tabulate import tabulate
import datasets
@click.command(help="Benchmark surya layout model.")
@click.option(
"--results_dir",
type=str,
help="Path to JSON file with OCR results.",
default=os.path.join(settings.RESULT_DIR, "benchmark"),
)
@click.option(
"--max_rows",
type=int,
help="Maximum number of images to run benchmark on.",
default=100,
)
@click.option("--debug", is_flag=True, help="Run in debug mode.", default=False)
def main(results_dir: str, max_rows: int, debug: bool):
foundation_predictor = FoundationPredictor(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
layout_predictor = LayoutPredictor(foundation_predictor)
pathname = "layout_bench"
# These have already been shuffled randomly, so sampling from the start is fine
dataset = datasets.load_dataset(
settings.LAYOUT_BENCH_DATASET_NAME, split=f"train[:{max_rows}]"
)
images = list(dataset["image"])
images = convert_if_not_rgb(images)
if settings.LAYOUT_STATIC_CACHE:
layout_predictor(images[:1])
start = time.time()
layout_predictions = layout_predictor(images)
surya_time = time.time() - start
folder_name = os.path.basename(pathname).split(".")[0]
result_path = os.path.join(results_dir, folder_name)
os.makedirs(result_path, exist_ok=True)
label_alignment = { # First is publaynet, second is surya
"Image": [["Figure"], ["Picture", "Figure"]],
"Table": [["Table"], ["Table", "Form", "TableOfContents"]],
"Text": [
["Text"],
[
"Text",
"Formula",
"Footnote",
"Caption",
"TextInlineMath",
"Code",
"Handwriting",
],
],
"List": [["List"], ["ListItem"]],
"Title": [["Title"], ["SectionHeader", "Title"]],
}
page_metrics = collections.OrderedDict()
for idx, pred in enumerate(layout_predictions):
row = dataset[idx]
all_correct_bboxes = []
page_results = {}
for label_name in label_alignment:
correct_cats, surya_cats = label_alignment[label_name]
correct_bboxes = [
b
for b, category in zip(row["bboxes"], row["labels"])
if category in correct_cats
]
all_correct_bboxes.extend(correct_bboxes)
pred_bboxes = [b.bbox for b in pred.bboxes if b.label in surya_cats]
metrics = precision_recall(
pred_bboxes, correct_bboxes, penalize_double=False
)
weight = len(correct_bboxes)
metrics["weight"] = weight
page_results[label_name] = metrics
page_metrics[idx] = page_results
if debug:
bbox_image = draw_bboxes_on_image(
all_correct_bboxes, copy.deepcopy(images[idx])
)
bbox_image.save(os.path.join(result_path, f"{idx}_layout.png"))
mean_metrics = collections.defaultdict(dict)
layout_types = sorted(page_metrics[0].keys())
metric_types = sorted(page_metrics[0][layout_types[0]].keys())
metric_types.remove("weight")
for label in layout_types:
for m in metric_types:
metric = []
total = 0
for page in page_metrics:
metric.append(
page_metrics[page][label][m] * page_metrics[page][label]["weight"]
)
total += page_metrics[page][label]["weight"]
value = sum(metric)
if value > 0:
value /= total
mean_metrics[label][m] = value
out_data = {
"time": surya_time,
"metrics": mean_metrics,
"page_metrics": page_metrics,
}
with open(os.path.join(result_path, "results.json"), "w+", encoding="utf-8") as f:
json.dump(out_data, f, indent=4)
table_headers = [
"Layout Type",
] + metric_types
table_data = []
for layout_type in layout_types:
table_data.append(
[
layout_type,
]
+ [f"{mean_metrics[layout_type][m]:.5f}" for m in metric_types]
)
print(tabulate(table_data, headers=table_headers, tablefmt="github"))
print(
f"Took {surya_time / len(images):.5f} seconds per image, and {surya_time:.5f} seconds total."
)
print(
"Precision and recall are over the mutual coverage of the detected boxes and the ground truth boxes at a .5 threshold."
)
print(f"Wrote results to {result_path}")
if __name__ == "__main__":
main()
================================================
FILE: benchmark/ordering.py
================================================
import collections
import json
import click
from surya.foundation import FoundationPredictor
from surya.input.processing import convert_if_not_rgb
from surya.layout import LayoutPredictor
from surya.common.polygon import PolygonBox
from surya.settings import settings
from benchmark.utils.metrics import rank_accuracy
import os
import time
import datasets
@click.command(help="Benchmark surya layout for reading order.")
@click.option(
"--results_dir",
type=str,
help="Path to JSON file with benchmark results.",
default=os.path.join(settings.RESULT_DIR, "benchmark"),
)
@click.option(
"--max_rows",
type=int,
help="Maximum number of images to run benchmark on.",
default=None,
)
def main(results_dir: str, max_rows: int):
foundation_predictor = FoundationPredictor(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
layout_predictor = LayoutPredictor(foundation_predictor)
pathname = "order_bench"
# These have already been shuffled randomly, so sampling from the start is fine
split = "train"
if max_rows is not None:
split = f"train[:{max_rows}]"
dataset = datasets.load_dataset(settings.ORDER_BENCH_DATASET_NAME, split=split)
images = list(dataset["image"])
images = convert_if_not_rgb(images)
start = time.time()
layout_predictions = layout_predictor(images)
surya_time = time.time() - start
folder_name = os.path.basename(pathname).split(".")[0]
result_path = os.path.join(results_dir, folder_name)
os.makedirs(result_path, exist_ok=True)
page_metrics = collections.OrderedDict()
mean_accuracy = 0
for idx, order_pred in enumerate(layout_predictions):
row = dataset[idx]
labels = row["labels"]
bboxes = row["bboxes"]
pred_positions = []
for label, bbox in zip(labels, bboxes):
max_intersection = 0
matching_idx = 0
for pred_box in order_pred.bboxes:
intersection = pred_box.intersection_pct(PolygonBox(polygon=bbox))
if intersection > max_intersection:
max_intersection = intersection
matching_idx = pred_box.position
pred_positions.append(matching_idx)
accuracy = rank_accuracy(pred_positions, labels)
mean_accuracy += accuracy
page_results = {"accuracy": accuracy, "box_count": len(labels)}
page_metrics[idx] = page_results
mean_accuracy /= len(layout_predictions)
out_data = {
"time": surya_time,
"mean_accuracy": mean_accuracy,
"page_metrics": page_metrics,
}
with open(os.path.join(result_path, "results.json"), "w+", encoding="utf-8") as f:
json.dump(out_data, f, indent=4)
print(f"Mean accuracy is {mean_accuracy:.2f}.")
print(
f"Took {surya_time / len(images):.2f} seconds per image, and {surya_time:.1f} seconds total."
)
print("Mean accuracy is the % of correct ranking pairs.")
print(f"Wrote results to {result_path}")
if __name__ == "__main__":
main()
================================================
FILE: benchmark/recognition.py
================================================
import re
import unicodedata
from collections import defaultdict
import click
from benchmark.utils.scoring import overlap_score, overlap_score_exact
from surya.input.processing import convert_if_not_rgb
from surya.debug.text import draw_text_on_image
from surya.foundation import FoundationPredictor
from surya.recognition import RecognitionPredictor
from surya.settings import settings
from surya.recognition.languages import CODE_TO_LANGUAGE
from benchmark.utils.tesseract import (
tesseract_ocr_parallel,
surya_lang_to_tesseract,
TESS_CODE_TO_LANGUAGE,
)
from benchmark.utils.textract import textract_ocr_parallel
import os
import datasets
import json
import time
from tabulate import tabulate
KEY_LANGUAGES = [
"Chinese",
"Spanish",
"English",
"Arabic",
"Hindi",
"Bengali",
"Russian",
"Japanese",
]
def list_in(lst: str | list, lst2: list):
if isinstance(lst, str):
lst = [lst]
return any([item in lst for item in lst2])
def standardize_bullets(text):
patterns = [
r"•\s+",
r"·\s+",
r"○\s+",
r"◦\s+",
r"▪\s+",
r"▫\s+",
r"➢\s+",
r"➤\s+",
r"★\s+",
r"✓\s+",
r"✗\s+",
r"✦\s+",
r"\\bullet\s+",
]
combined_pattern = "|".join(patterns)
text = re.sub(combined_pattern, "*", text)
return text
def normalize_text(text: str) -> str:
# Remove HTML tags
text = re.sub(r"<[^>]+>", "", text)
# Remove LaTeX tags
text = re.sub(r"\\[a-zA-Z]+", "", text)
text = standardize_bullets(text)
text = unicodedata.normalize("NFKC", text)
return text.strip().lower().replace(",", ".")
@click.command(help="Benchmark recognition model.")
@click.option(
"--results_dir",
type=str,
help="Path to JSON file with OCR results.",
default=os.path.join(settings.RESULT_DIR, "benchmark"),
)
@click.option(
"--max_rows", type=int, help="Maximum number of pdf pages to OCR.", default=None
)
@click.option("--debug", is_flag=True, help="Enable debug mode.", default=False)
@click.option(
"--tesseract", is_flag=True, help="Run benchmarks on tesseract.", default=False
)
@click.option(
"--textract", is_flag=True, help="Run benchmarks on textract.", default=False
)
@click.option(
"--tess_cpus", type=int, help="Number of CPUs to use for tesseract.", default=28
)
@click.option(
"--textract_cpus", type=int, help="Number of CPUs to use for textract.", default=28
)
@click.option(
"--languages",
type=str,
help="Comma-separated list of languages to benchmark.",
default=None,
)
@click.option(
"--print_results",
is_flag=True,
)
def main(
results_dir: str,
max_rows: int,
debug: bool,
tesseract: bool,
textract: bool,
tess_cpus: int,
textract_cpus: int,
languages: str | None,
print_results: bool,
):
foundation_predictor = FoundationPredictor()
rec_predictor = RecognitionPredictor(foundation_predictor)
split = "train"
dataset = datasets.load_dataset(
settings.RECOGNITION_BENCH_DATASET_NAME, split=split
)
if languages:
languages = languages.split(",")
dataset = dataset.filter(
lambda x: list_in(x["language"], languages), num_proc=4
)
if max_rows and max_rows < len(dataset):
dataset = dataset.shuffle(seed=1).select(range(max_rows))
images = list(dataset["image"])
images = convert_if_not_rgb(images)
bboxes = list(dataset["bboxes"])
line_text = list(dataset["text"])
languages = list(dataset["language"])
print(f"Loaded {len(images)} images. Running OCR...")
start = time.time()
predictions_by_image = rec_predictor(images, None, bboxes=bboxes)
surya_time = time.time() - start
lang_list = []
for lang in languages:
if not isinstance(lang, list):
lang_list.append([lang])
else:
lang_list.append(lang)
surya_scores = defaultdict(list)
img_surya_scores = []
outputs = []
for idx, (pred, ref_text, langs) in enumerate(
zip(predictions_by_image, line_text, lang_list)
):
pred_text = [line.text for line in pred.text_lines]
score_ref_text = [normalize_text(line) for line in ref_text]
score_pred_text = [normalize_text(text) for text in pred_text]
image_scores, image_weights = overlap_score_exact(
score_pred_text, score_ref_text
)
normalized_scores = [
score / max(1, weight) for score, weight in zip(image_scores, image_weights)
]
image_score = sum(image_scores) / max(1, sum(image_weights))
img_surya_scores.append(image_score)
for lang in langs:
surya_scores[CODE_TO_LANGUAGE[lang]].append(image_score)
assert len(pred_text) == len(ref_text) == len(bboxes[idx])
if debug:
for j, (pred_line, ref_line, score, bbox) in enumerate(
zip(pred_text, ref_text, normalized_scores, bboxes[idx])
):
image_slice = images[idx].crop(bbox)
outputs.append(
{
"image": image_slice,
"bbox": bbox,
"score": score,
"pred": pred_line,
"ref": ref_line,
"langs": ",".join(langs),
}
)
if debug:
out_ds = datasets.Dataset.from_list(outputs)
out_ds.push_to_hub("datalab-to/rec_bench_outputs", private=True)
flat_surya_scores = [score for lang in surya_scores for score in surya_scores[lang]]
benchmark_stats = {
"surya": {
"avg_score": sum(flat_surya_scores) / max(1, len(flat_surya_scores)),
"lang_scores": {
lang: sum(scores) / max(1, len(scores))
for lang, scores in surya_scores.items()
},
"time_per_img": surya_time / max(1, len(images)),
}
}
result_path = os.path.join(results_dir, "rec_bench")
os.makedirs(result_path, exist_ok=True)
with open(os.path.join(result_path, "surya_scores.json"), "w+") as f:
json.dump(surya_scores, f)
if tesseract:
tess_valid = []
tess_langs = []
for idx, lang in enumerate(lang_list):
# Tesseract does not support all languages
tess_lang = surya_lang_to_tesseract(lang[0])
if tess_lang is None:
continue
tess_valid.append(idx)
tess_langs.append(tess_lang)
tess_imgs = [images[i] for i in tess_valid]
tess_bboxes = [bboxes[i] for i in tess_valid]
tess_reference = [line_text[i] for i in tess_valid]
start = time.time()
tess_predictions = tesseract_ocr_parallel(
tess_imgs, tess_bboxes, tess_langs, cpus=tess_cpus
)
tesseract_time = time.time() - start
tess_scores = defaultdict(list)
for idx, (pred, ref_text, lang) in enumerate(
zip(tess_predictions, tess_reference, tess_langs)
):
image_scores, image_weights, _ = overlap_score(pred, ref_text)
image_score = sum(image_scores) / max(1, sum(image_weights))
tess_scores[TESS_CODE_TO_LANGUAGE[lang]].append(image_score)
flat_tess_scores = [
score for lang in tess_scores for score in tess_scores[lang]
]
benchmark_stats["tesseract"] = {
"avg_score": sum(flat_tess_scores) / len(flat_tess_scores),
"lang_scores": {
lang: sum(scores) / len(scores) for lang, scores in tess_scores.items()
},
"time_per_img": tesseract_time / len(tess_imgs),
}
with open(os.path.join(result_path, "tesseract_scores.json"), "w+") as f:
json.dump(tess_scores, f)
if textract:
start = time.time()
textract_predictions = textract_ocr_parallel(images, cpus=textract_cpus)
textract_time = time.time() - start
textract_scores = defaultdict(list)
for idx, (pred, ref_text, lang) in enumerate(
zip(textract_predictions, line_text, lang_list)
):
image_scores, image_weights, _ = overlap_score(pred, ref_text)
image_score = sum(image_scores) / max(1, sum(image_weights))
for lang in lang:
textract_scores[CODE_TO_LANGUAGE[lang]].append(image_score)
flat_textract_scores = [
score for lang in textract_scores for score in textract_scores[lang]
]
benchmark_stats["textract"] = {
"avg_score": sum(flat_textract_scores) / len(flat_textract_scores),
"lang_scores": {
lang: sum(scores) / len(scores)
for lang, scores in textract_scores.items()
},
"time_per_img": textract_time / len(images),
}
print(len(flat_textract_scores))
with open(os.path.join(result_path, "textract_scores.json"), "w+") as f:
json.dump(textract_scores, f)
with open(os.path.join(result_path, "results.json"), "w+", encoding="utf-8") as f:
json.dump(benchmark_stats, f)
key_languages = [k for k in KEY_LANGUAGES if k in surya_scores]
table_headers = ["Model", "Time per page (s)", "Avg Score"] + key_languages
table_data = [
[
"surya",
benchmark_stats["surya"]["time_per_img"],
benchmark_stats["surya"]["avg_score"],
]
+ [benchmark_stats["surya"]["lang_scores"][lang] for lang in key_languages],
]
if tesseract:
table_data.append(
[
"tesseract",
benchmark_stats["tesseract"]["time_per_img"],
benchmark_stats["tesseract"]["avg_score"],
]
+ [
benchmark_stats["tesseract"]["lang_scores"].get(lang, 0)
for lang in key_languages
]
)
if textract:
table_data.append(
[
"textract",
benchmark_stats["textract"]["time_per_img"],
benchmark_stats["textract"]["avg_score"],
]
+ [
benchmark_stats["textract"]["lang_scores"][lang]
for lang in key_languages
],
)
print(tabulate(table_data, headers=table_headers, tablefmt="github"))
print(
"Only a few major languages are displayed. See the result path for additional languages."
)
if debug >= 1:
bad_detections = []
for idx, (score, lang) in enumerate(zip(flat_surya_scores, lang_list)):
if score < 0.8:
bad_detections.append((idx, lang, score))
print(f"Found {len(bad_detections)} bad detections. Writing to file...")
with open(os.path.join(result_path, "bad_detections.json"), "w+") as f:
json.dump(bad_detections, f)
if debug == 2:
for idx, (image, pred, ref_text, bbox, lang) in enumerate(
zip(images, predictions_by_image, line_text, bboxes, lang_list)
):
pred_image_name = f"{'_'.join(lang)}_{idx}_pred.png"
ref_image_name = f"{'_'.join(lang)}_{idx}_ref.png"
pred_text = [line.text for line in pred.text_lines]
pred_image = draw_text_on_image(bbox, pred_text, image.size)
pred_image.save(os.path.join(result_path, pred_image_name))
ref_image = draw_text_on_image(bbox, ref_text, image.size)
ref_image.save(os.path.join(result_path, ref_image_name))
image.save(os.path.join(result_path, f"{'_'.join(lang)}_{idx}_image.png"))
print(f"Wrote results to {result_path}")
if print_results:
for idx, (pred, ref_text) in enumerate(zip(predictions_by_image, line_text)):
print(f"Image {idx}")
print("----")
for line_idx, (pred_line, ref_line) in enumerate(
zip(pred.text_lines, ref_text)
):
print(f"Sample {line_idx}")
print(f"Pred: {pred_line.text}")
print(f"Ref: {ref_line}")
print()
if settings.TORCH_DEVICE == "xla":
import torch_xla.debug.metrics as met
print(met.short_metrics_report())
if __name__ == "__main__":
main()
================================================
FILE: benchmark/table_recognition.py
================================================
import click
import collections
import json
from surya.debug.draw import draw_bboxes_on_image
from tabulate import tabulate
from surya.input.processing import convert_if_not_rgb
from surya.table_rec import TableRecPredictor
from surya.settings import settings
from benchmark.utils.metrics import penalized_iou_score
from benchmark.utils.tatr import load_tatr, batch_inference_tatr
import os
import time
import datasets
@click.command(help="Benchmark table rec dataset")
@click.option(
"--results_dir",
type=str,
help="Path to JSON file with benchmark results.",
default=os.path.join(settings.RESULT_DIR, "benchmark"),
)
@click.option(
"--max_rows",
type=int,
help="Maximum number of images to run benchmark on.",
default=512,
)
@click.option("--tatr", is_flag=True, help="Run table transformer.", default=False)
@click.option("--debug", is_flag=True, help="Enable debug mode.", default=False)
def main(results_dir: str, max_rows: int, tatr: bool, debug: bool):
table_rec_predictor = TableRecPredictor()
pathname = "table_rec_bench"
# These have already been shuffled randomly, so sampling from the start is fine
split = "train"
if max_rows is not None:
split = f"train[:{max_rows}]"
dataset = datasets.load_dataset(settings.TABLE_REC_BENCH_DATASET_NAME, split=split)
images = list(dataset["image"])
images = convert_if_not_rgb(images)
if settings.TABLE_REC_STATIC_CACHE:
# Run through one batch to compile the model
table_rec_predictor(images[:1])
start = time.time()
table_rec_predictions = table_rec_predictor(images)
surya_time = time.time() - start
folder_name = os.path.basename(pathname).split(".")[0]
result_path = os.path.join(results_dir, folder_name)
os.makedirs(result_path, exist_ok=True)
page_metrics = collections.OrderedDict()
mean_col_iou = 0
mean_row_iou = 0
for idx, (pred, image) in enumerate(zip(table_rec_predictions, images)):
row = dataset[idx]
pred_row_boxes = [p.bbox for p in pred.rows]
pred_col_bboxes = [p.bbox for p in pred.cols]
actual_row_bboxes = [r["bbox"] for r in row["rows"]]
actual_col_bboxes = [c["bbox"] for c in row["columns"]]
row_score = penalized_iou_score(pred_row_boxes, actual_row_bboxes)
col_score = penalized_iou_score(pred_col_bboxes, actual_col_bboxes)
page_results = {
"row_score": row_score,
"col_score": col_score,
"row_count": len(actual_row_bboxes),
"col_count": len(actual_col_bboxes),
}
mean_col_iou += col_score
mean_row_iou += row_score
page_metrics[idx] = page_results
if debug:
# Save debug images
draw_img = image.copy()
draw_bboxes_on_image(
pred_row_boxes,
draw_img,
[f"Row {i}" for i in range(len(pred_row_boxes))],
)
draw_bboxes_on_image(
pred_col_bboxes,
draw_img,
[f"Col {i}" for i in range(len(pred_col_bboxes))],
color="blue",
)
draw_img.save(os.path.join(result_path, f"{idx}_bbox.png"))
actual_draw_image = image.copy()
draw_bboxes_on_image(
actual_row_bboxes,
actual_draw_image,
[f"Row {i}" for i in range(len(actual_row_bboxes))],
)
draw_bboxes_on_image(
actual_col_bboxes,
actual_draw_image,
[f"Col {i}" for i in range(len(actual_col_bboxes))],
color="blue",
)
actual_draw_image.save(os.path.join(result_path, f"{idx}_actual.png"))
mean_col_iou /= len(table_rec_predictions)
mean_row_iou /= len(table_rec_predictions)
out_data = {
"surya": {
"time": surya_time,
"mean_row_iou": mean_row_iou,
"mean_col_iou": mean_col_iou,
"page_metrics": page_metrics,
}
}
if tatr:
tatr_model = load_tatr()
start = time.time()
tatr_predictions = batch_inference_tatr(tatr_model, images, 1)
tatr_time = time.time() - start
page_metrics = collections.OrderedDict()
mean_col_iou = 0
mean_row_iou = 0
for idx, pred in enumerate(tatr_predictions):
row = dataset[idx]
pred_row_boxes = [p["bbox"] for p in pred["rows"]]
pred_col_bboxes = [p["bbox"] for p in pred["cols"]]
actual_row_bboxes = [r["bbox"] for r in row["rows"]]
actual_col_bboxes = [c["bbox"] for c in row["columns"]]
row_score = penalized_iou_score(pred_row_boxes, actual_row_bboxes)
col_score = penalized_iou_score(pred_col_bboxes, actual_col_bboxes)
page_results = {
"row_score": row_score,
"col_score": col_score,
"row_count": len(actual_row_bboxes),
"col_count": len(actual_col_bboxes),
}
mean_col_iou += col_score
mean_row_iou += row_score
page_metrics[idx] = page_results
mean_col_iou /= len(tatr_predictions)
mean_row_iou /= len(tatr_predictions)
out_data["tatr"] = {
"time": tatr_time,
"mean_row_iou": mean_row_iou,
"mean_col_iou": mean_col_iou,
"page_metrics": page_metrics,
}
with open(os.path.join(result_path, "results.json"), "w+", encoding="utf-8") as f:
json.dump(out_data, f, indent=4)
table = [
["Model", "Row Intersection", "Col Intersection", "Time Per Image"],
[
"Surya",
f"{out_data['surya']['mean_row_iou']:.2f}",
f"{out_data['surya']['mean_col_iou']:.5f}",
f"{surya_time / len(images):.5f}",
],
]
if tatr:
table.append(
[
"Table transformer",
f"{out_data['tatr']['mean_row_iou']:.2f}",
f"{out_data['tatr']['mean_col_iou']:.5f}",
f"{tatr_time / len(images):.5f}",
]
)
print(tabulate(table, headers="firstrow", tablefmt="github"))
print(
"Intersection is the average of the intersection % between each actual row/column, and the predictions. With penalties for too many/few predictions."
)
print(
"Note that table transformers is unbatched, since the example code in the repo is unbatched."
)
print(f"Wrote results to {result_path}")
if __name__ == "__main__":
main()
================================================
FILE: benchmark/texify.py
================================================
import os.path
import re
import time
from pathlib import Path
from typing import List
import click
import datasets
from tabulate import tabulate
from bs4 import BeautifulSoup
from surya.common.surya.schema import TaskNames
from surya.settings import settings
from surya.foundation import FoundationPredictor
from surya.recognition import RecognitionPredictor, OCRResult
import json
from rapidfuzz.distance import Levenshtein
def normalize_text(text):
soup = BeautifulSoup(text, "html.parser")
# Unwrap math tags
for tag in soup.find_all():
if tag.name == "math":
tag.unwrap()
text = soup.get_text()
text = re.sub(r"\n", " ", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def score_text(predictions, references):
lev_dist = []
for p, r in zip(predictions, references):
p = normalize_text(p)
r = normalize_text(r)
lev_dist.append(Levenshtein.normalized_distance(p, r))
return sum(lev_dist) / len(lev_dist)
def inference_texify(
source_data, predictor: RecognitionPredictor, line_mode: bool = False
):
images = [sd["image"] for sd in source_data]
mode = TaskNames.ocr_with_boxes if line_mode else TaskNames.block_without_boxes
tasks = [mode] * len(images)
bboxes = [[[0, 0, image.width, image.height]] for image in images]
texify_predictions: List[OCRResult] = predictor(images, tasks, bboxes=bboxes)
out_data = [
{
"text": texify_predictions[i].text_lines[0].text,
"equation": source_data[i]["equation"],
}
for i in range(len(texify_predictions))
]
return out_data
@click.command(help="Benchmark the performance of texify.")
@click.option(
"--ds_name",
type=str,
help="Path to dataset file with source images/equations.",
default=settings.TEXIFY_BENCHMARK_DATASET,
)
@click.option(
"--results_dir",
type=str,
help="Path to JSON file with benchmark results.",
default=os.path.join(settings.RESULT_DIR, "benchmark"),
)
@click.option(
"--max_rows", type=int, help="Maximum number of images to benchmark.", default=None
)
@click.option(
"--line_mode", is_flag=True, help="Use line mode for texify.", default=False
)
def main(ds_name: str, results_dir: str, max_rows: int, line_mode: bool):
foundation_predictor = FoundationPredictor()
predictor = RecognitionPredictor(foundation_predictor)
ds = datasets.load_dataset(ds_name, split="train")
if max_rows:
ds = ds.filter(lambda x, idx: idx < max_rows, with_indices=True)
start = time.time()
predictions = inference_texify(ds, predictor, line_mode)
time_taken = time.time() - start
text = [p["text"] for p in predictions]
references = [p["equation"] for p in predictions]
scores = score_text(text, references)
write_data = {
"scores": scores,
"text": [{"prediction": p, "reference": r} for p, r in zip(text, references)],
}
score_table = [["texify", write_data["scores"], time_taken]]
score_headers = ["edit", "time taken (s)"]
score_dirs = ["⬇", "⬇"]
score_headers = [f"{h} {d}" for h, d in zip(score_headers, score_dirs)]
table = tabulate(score_table, headers=["Method", *score_headers])
print()
print(table)
result_path = Path(results_dir) / "texify_bench"
result_path.mkdir(parents=True, exist_ok=True)
with open(result_path / "results.json", "w", encoding="utf-8") as f:
json.dump(write_data, f, indent=4)
if __name__ == "__main__":
main()
================================================
FILE: benchmark/utils/__init__.py
================================================
================================================
FILE: benchmark/utils/bbox.py
================================================
import fitz as pymupdf
from surya.common.util import rescale_bbox
def get_pdf_lines(pdf_path, img_sizes):
doc = pymupdf.open(pdf_path)
page_lines = []
for idx, img_size in enumerate(img_sizes):
page = doc[idx]
blocks = page.get_text("dict", sort=True, flags=pymupdf.TEXTFLAGS_DICT & ~pymupdf.TEXT_PRESERVE_LIGATURES & ~pymupdf.TEXT_PRESERVE_IMAGES)["blocks"]
line_boxes = []
for block_idx, block in enumerate(blocks):
for l in block["lines"]:
line_boxes.append(list(l["bbox"]))
page_box = page.bound()
pwidth, pheight = page_box[2] - page_box[0], page_box[3] - page_box[1]
line_boxes = [rescale_bbox(bbox, (pwidth, pheight), img_size) for bbox in line_boxes]
page_lines.append(line_boxes)
return page_lines
def merge_boxes(box1, box2):
return (min(box1[0], box2[0]), min(box1[1], box2[1]), max(box1[2], box2[2]), max(box1[3], box2[3]))
def join_lines(bboxes, max_gap=5):
to_merge = {}
for i, box1 in bboxes:
for z, box2 in bboxes[i + 1:]:
j = i + z + 1
if box1 == box2:
continue
if box1[0] <= box2[0] and box1[2] >= box2[2]:
if abs(box1[1] - box2[3]) <= max_gap:
if i not in to_merge:
to_merge[i] = []
to_merge[i].append(j)
merged_boxes = set()
merged = []
for i, box in bboxes:
if i in merged_boxes:
continue
if i in to_merge:
for j in to_merge[i]:
box = merge_boxes(box, bboxes[j][1])
merged_boxes.add(j)
merged.append(box)
return merged
================================================
FILE: benchmark/utils/metrics.py
================================================
from functools import partial
from itertools import repeat
import numpy as np
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
def box_area(box):
return (box[2] - box[0]) * (box[3] - box[1])
def calculate_iou(box1, box2, box1_only=False):
intersection = intersection_area(box1, box2)
union = box_area(box1)
if not box1_only:
union += box_area(box2) - intersection
if union == 0:
return 0
return intersection / union
def match_boxes(preds, references):
num_actual = len(references)
num_predicted = len(preds)
iou_matrix = np.zeros((num_actual, num_predicted))
for i, actual in enumerate(references):
for j, pred in enumerate(preds):
iou_matrix[i, j] = calculate_iou(actual, pred, box1_only=True)
sorted_indices = np.argsort(iou_matrix, axis=None)[::-1]
sorted_ious = iou_matrix.flatten()[sorted_indices]
actual_indices, predicted_indices = np.unravel_index(sorted_indices, iou_matrix.shape)
assigned_actual = set()
assigned_pred = set()
matches = []
for idx, iou in zip(zip(actual_indices, predicted_indices), sorted_ious):
i, j = idx
if i not in assigned_actual and j not in assigned_pred:
iou_val = iou_matrix[i, j]
if iou_val > .95: # Account for rounding on box edges
iou_val = 1.0
matches.append((i, j, iou_val))
assigned_actual.add(i)
assigned_pred.add(j)
unassigned_actual = set(range(num_actual)) - assigned_actual
unassigned_pred = set(range(num_predicted)) - assigned_pred
matches.extend([(i, None, -1.0) for i in unassigned_actual])
matches.extend([(None, j, 0.0) for j in unassigned_pred])
return matches
def penalized_iou_score(preds, references):
matches = match_boxes(preds, references)
iou = sum([match[2] for match in matches]) / len(matches)
return iou
def intersection_pixels(box1, box2):
x_left = max(box1[0], box2[0])
y_top = max(box1[1], box2[1])
x_right = min(box1[2], box2[2])
y_bottom = min(box1[3], box2[3])
if x_right < x_left or y_bottom < y_top:
return set()
x_left, x_right = int(x_left), int(x_right)
y_top, y_bottom = int(y_top), int(y_bottom)
coords = np.meshgrid(np.arange(x_left, x_right), np.arange(y_top, y_bottom))
pixels = set(zip(coords[0].flat, coords[1].flat))
return pixels
def calculate_coverage(box, other_boxes, penalize_double=False):
box_area = (box[2] - box[0]) * (box[3] - box[1])
if box_area == 0:
return 0
# find total coverage of the box
covered_pixels = set()
double_coverage = list()
for other_box in other_boxes:
ia = intersection_pixels(box, other_box)
double_coverage.append(list(covered_pixels.intersection(ia)))
covered_pixels = covered_pixels.union(ia)
# Penalize double coverage - having multiple bboxes overlapping the same pixels
double_coverage_penalty = len(double_coverage)
if not penalize_double:
double_coverage_penalty = 0
covered_pixels_count = max(0, len(covered_pixels) - double_coverage_penalty)
return covered_pixels_count / box_area
def intersection_area(box1, box2):
x_left = max(box1[0], box2[0])
y_top = max(box1[1], box2[1])
x_right = min(box1[2], box2[2])
y_bottom = min(box1[3], box2[3])
if x_right < x_left or y_bottom < y_top:
return 0.0
return (x_right - x_left) * (y_bottom - y_top)
def calculate_coverage_fast(box, other_boxes, penalize_double=False):
box = np.array(box)
other_boxes = np.array(other_boxes)
# Calculate box area
box_area = (box[2] - box[0]) * (box[3] - box[1])
if box_area == 0:
return 0
x_left = np.maximum(box[0], other_boxes[:, 0])
y_top = np.maximum(box[1], other_boxes[:, 1])
x_right = np.minimum(box[2], other_boxes[:, 2])
y_bottom = np.minimum(box[3], other_boxes[:, 3])
widths = np.maximum(0, x_right - x_left)
heights = np.maximum(0, y_bottom - y_top)
intersect_areas = widths * heights
total_intersect = np.sum(intersect_areas)
return min(1.0, total_intersect / box_area)
def precision_recall(preds, references, threshold=.5, workers=8, penalize_double=True):
if len(references) == 0:
return {
"precision": 1,
"recall": 1,
}
if len(preds) == 0:
return {
"precision": 0,
"recall": 0,
}
# If we're not penalizing double coverage, we can use a faster calculation
coverage_func = calculate_coverage_fast
if penalize_double:
coverage_func = calculate_coverage
with ThreadPoolExecutor(max_workers=workers) as executor:
precision_func = partial(coverage_func, penalize_double=penalize_double)
precision_iou = executor.map(precision_func, preds, repeat(references))
reference_iou = executor.map(coverage_func, references, repeat(preds))
precision_classes = [1 if i > threshold else 0 for i in precision_iou]
precision = sum(precision_classes) / len(precision_classes)
recall_classes = [1 if i > threshold else 0 for i in reference_iou]
recall = sum(recall_classes) / len(recall_classes)
return {
"precision": precision,
"recall": recall,
}
def mean_coverage(preds, references):
coverages = []
for box1 in references:
coverage = calculate_coverage(box1, preds)
coverages.append(coverage)
for box2 in preds:
coverage = calculate_coverage(box2, references)
coverages.append(coverage)
# Calculate the average coverage over all comparisons
if len(coverages) == 0:
return 0
coverage = sum(coverages) / len(coverages)
return {"coverage": coverage}
def rank_accuracy(preds, references):
# Preds and references need to be aligned so each position refers to the same bbox
pairs = []
for i, pred in enumerate(preds):
for j, pred2 in enumerate(preds):
if i == j:
continue
pairs.append((i, j, pred > pred2))
# Find how many of the prediction rankings are correct
correct = 0
for i, ref in enumerate(references):
for j, ref2 in enumerate(references):
if (i, j, ref > ref2) in pairs:
correct += 1
return correct / len(pairs)
================================================
FILE: benchmark/utils/scoring.py
================================================
import math
from typing import List
from rapidfuzz import fuzz
def overlap_score(pred_lines: List[str], reference_lines: List[str]):
line_scores = []
line_weights = []
line_match = {}
for i, pred_line in enumerate(pred_lines):
max_score = 0
line_weight = 1
match = None
for j, ref_line in enumerate(reference_lines):
score = fuzz.ratio(pred_line, ref_line, score_cutoff=20) / 100
if score > max_score:
max_score = score
line_weight = math.sqrt(len(ref_line))
match = j
line_scores.append(max_score)
line_weights.append(line_weight)
line_match[i] = match
line_scores = [line_scores[i] * line_weights[i] for i in range(len(line_scores))]
return line_scores, line_weights, line_match
def overlap_score_exact(pred_lines: List[str], reference_lines: List[str]):
line_scores = []
line_weights = []
assert len(pred_lines) == len(reference_lines)
for i, (pred_line, ref_line) in enumerate(zip(pred_lines, reference_lines)):
score = fuzz.ratio(pred_line, ref_line, score_cutoff=20) / 100
weight = math.sqrt(len(ref_line))
line_scores.append(score * weight)
line_weights.append(weight)
return line_scores, line_weights
================================================
FILE: benchmark/utils/tatr.py
================================================
import torch
from transformers import AutoModelForObjectDetection
from surya.settings import settings
import numpy as np
class MaxResize(object):
def __init__(self, max_size=800):
self.max_size = max_size
def __call__(self, image):
width, height = image.size
current_max_size = max(width, height)
scale = self.max_size / current_max_size
resized_image = image.resize((int(round(scale * width)), int(round(scale * height))))
return resized_image
def to_tensor(image):
# Convert PIL Image to NumPy array
np_image = np.array(image).astype(np.float32)
# Rearrange dimensions to [C, H, W] format
np_image = np_image.transpose((2, 0, 1))
# Normalize to [0.0, 1.0]
np_image /= 255.0
return torch.from_numpy(np_image)
def normalize(tensor, mean, std):
for t, m, s in zip(tensor, mean, std):
t.sub_(m).div_(s)
return tensor
def structure_transform(image):
image = MaxResize(1000)(image)
tensor = to_tensor(image)
normalized_tensor = normalize(tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
return normalized_tensor
def box_cxcywh_to_xyxy(x):
x_c, y_c, w, h = x.unbind(-1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h), (x_c + 0.5 * w), (y_c + 0.5 * h)]
return torch.stack(b, dim=1)
def rescale_bboxes(out_bbox, size):
width, height = size
boxes = box_cxcywh_to_xyxy(out_bbox)
boxes = boxes * torch.tensor([width, height, width, height], dtype=torch.float32)
return boxes
def outputs_to_objects(outputs, img_sizes, id2label):
m = outputs.logits.softmax(-1).max(-1)
batch_labels = list(m.indices.detach().cpu().numpy())
batch_scores = list(m.values.detach().cpu().numpy())
batch_bboxes = outputs['pred_boxes'].detach().cpu()
batch_objects = []
for i in range(len(img_sizes)):
pred_bboxes = [elem.tolist() for elem in rescale_bboxes(batch_bboxes[i], img_sizes[i])]
pred_scores = batch_scores[i]
pred_labels = batch_labels[i]
objects = []
for label, score, bbox in zip(pred_labels, pred_scores, pred_bboxes):
class_label = id2label[int(label)]
if not class_label == 'no object':
objects.append({
'label': class_label,
'score': float(score),
'bbox': [float(elem) for elem in bbox]}
)
rows = []
cols = []
for cell in objects:
if cell["label"] == "table column":
cols.append(cell)
if cell["label"] == "table row":
rows.append(cell)
batch_objects.append({
"rows": rows,
"cols": cols
})
return batch_objects
def load_tatr():
return AutoModelForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition-v1.1-all").to(settings.TORCH_DEVICE_MODEL)
def batch_inference_tatr(model, images, batch_size):
device = model.device
rows_cols = []
for i in range(0, len(images), batch_size):
batch_images = images[i:i + batch_size]
pixel_values = torch.stack([structure_transform(img) for img in batch_images], dim=0).to(device)
# forward pass
with torch.no_grad():
outputs = model(pixel_values)
id2label = model.config.id2label
id2label[len(model.config.id2label)] = "no object"
rows_cols.extend(outputs_to_objects(outputs, [img.size for img in batch_images], id2label))
return rows_cols
================================================
FILE: benchmark/utils/tesseract.py
================================================
from typing import List, Optional
import numpy as np
from tqdm import tqdm
from surya.input.processing import slice_bboxes_from_image
from surya.settings import settings
import os
from concurrent.futures import ProcessPoolExecutor
from surya.recognition.languages import CODE_TO_LANGUAGE
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
def surya_lang_to_tesseract(code: str) -> Optional[str]:
lang_str = CODE_TO_LANGUAGE[code]
try:
tess_lang = TESS_LANGUAGE_TO_CODE[lang_str]
except KeyError:
return None
return tess_lang
def tesseract_ocr(img, bboxes, lang: str):
import pytesseract
line_imgs = slice_bboxes_from_image(img, bboxes)
config = f'--tessdata-dir "{settings.TESSDATA_PREFIX}"'
lines = []
for line_img in line_imgs:
line = pytesseract.image_to_string(line_img, lang=lang, config=config)
lines.append(line)
return lines
def tesseract_ocr_parallel(imgs, bboxes, langs: List[str], cpus=None):
tess_parallel_cores = min(len(imgs), RecognitionPredictor.get_batch_size())
if not cpus:
cpus = os.cpu_count()
tess_parallel_cores = min(tess_parallel_cores, cpus)
# Tesseract uses up to 4 processes per instance
# Divide by 2 because tesseract doesn't seem to saturate all 4 cores with these small images
tess_parallel = max(tess_parallel_cores // 2, 1)
with ProcessPoolExecutor(max_workers=tess_parallel) as executor:
tess_text = tqdm(executor.map(tesseract_ocr, imgs, bboxes, langs), total=len(imgs), desc="Running tesseract OCR")
tess_text = list(tess_text)
return tess_text
def tesseract_bboxes(img):
import pytesseract
from pytesseract import Output
arr_img = np.asarray(img, dtype=np.uint8)
ocr = pytesseract.image_to_data(arr_img, output_type=Output.DICT)
bboxes = []
n_boxes = len(ocr['level'])
for i in range(n_boxes):
# It is possible to merge by line here with line number, but it gives bad results.
_, x, y, w, h = ocr['text'][i], ocr['left'][i], ocr['top'][i], ocr['width'][i], ocr['height'][i]
bbox = (x, y, x + w, y + h)
bboxes.append(bbox)
return bboxes
def tesseract_parallel(imgs):
# Tesseract uses 4 threads per instance
tess_parallel_cores = min(len(imgs), DetectionPredictor.get_batch_size())
cpus = os.cpu_count()
tess_parallel_cores = min(tess_parallel_cores, cpus)
# Tesseract uses 4 threads per instance
tess_parallel = max(tess_parallel_cores // 4, 1)
with ProcessPoolExecutor(max_workers=tess_parallel) as executor:
tess_bboxes = tqdm(executor.map(tesseract_bboxes, imgs), total=len(imgs), desc="Running tesseract bbox detection")
tess_bboxes = list(tess_bboxes)
return tess_bboxes
TESS_CODE_TO_LANGUAGE = {
"afr": "Afrikaans",
"amh": "Amharic",
"ara": "Arabic",
"asm": "Assamese",
"aze": "Azerbaijani",
"bel": "Belarusian",
"ben": "Bengali",
"bod": "Tibetan",
"bos": "Bosnian",
"bre": "Breton",
"bul": "Bulgarian",
"cat": "Catalan",
"ceb": "Cebuano",
"ces": "Czech",
"chi_sim": "Chinese",
"chr": "Cherokee",
"cym": "Welsh",
"dan": "Danish",
"deu": "German",
"dzo": "Dzongkha",
"ell": "Greek",
"eng": "English",
"epo": "Esperanto",
"est": "Estonian",
"eus": "Basque",
"fas": "Persian",
"fin": "Finnish",
"fra": "French",
"fry": "Western Frisian",
"guj": "Gujarati",
"gla": "Scottish Gaelic",
"gle": "Irish",
"glg": "Galician",
"heb": "Hebrew",
"hin": "Hindi",
"hrv": "Croatian",
"hun": "Hungarian",
"hye": "Armenian",
"iku": "Inuktitut",
"ind": "Indonesian",
"isl": "Icelandic",
"ita": "Italian",
"jav": "Javanese",
"jpn": "Japanese",
"kan": "Kannada",
"kat": "Georgian",
"kaz": "Kazakh",
"khm": "Khmer",
"kir": "Kyrgyz",
"kor": "Korean",
"lao": "Lao",
"lat": "Latin",
"lav": "Latvian",
"lit": "Lithuanian",
"mal": "Malayalam",
"mar": "Marathi",
"mkd": "Macedonian",
"mlt": "Maltese",
"mon": "Mongolian",
"msa": "Malay",
"mya": "Burmese",
"nep": "Nepali",
"nld": "Dutch",
"nor": "Norwegian",
"ori": "Oriya",
"pan": "Punjabi",
"pol": "Polish",
"por": "Portuguese",
"pus": "Pashto",
"ron": "Romanian",
"rus": "Russian",
"san": "Sanskrit",
"sin": "Sinhala",
"slk": "Slovak",
"slv": "Slovenian",
"snd": "Sindhi",
"spa": "Spanish",
"sqi": "Albanian",
"srp": "Serbian",
"swa": "Swahili",
"swe": "Swedish",
"syr": "Syriac",
"tam": "Tamil",
"tel": "Telugu",
"tgk": "Tajik",
"tha": "Thai",
"tir": "Tigrinya",
"tur": "Turkish",
"uig": "Uyghur",
"ukr": "Ukrainian",
"urd": "Urdu",
"uzb": "Uzbek",
"vie": "Vietnamese",
"yid": "Yiddish"
}
TESS_LANGUAGE_TO_CODE = {v:k for k,v in TESS_CODE_TO_LANGUAGE.items()}
================================================
FILE: benchmark/utils/textract.py
================================================
import os
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
import traceback
from surya.input.processing import slice_bboxes_from_image
from surya.recognition import RecognitionPredictor
def textract_ocr(extractor, img):
try:
document = extractor.detect_document_text(file_source=img)
return [line.text for line in document.lines]
except:
traceback.print_exc()
return [None]
def textract_ocr_parallel(imgs, cpus=None):
from textractor import Textractor # Optional dependency
extractor = Textractor(profile_name='default')
parallel_cores = min(len(imgs), RecognitionPredictor().get_batch_size())
if not cpus:
cpus = os.cpu_count()
parallel_cores = min(parallel_cores, cpus)
with ThreadPoolExecutor(max_workers=parallel_cores) as executor:
textract_text = tqdm(executor.map(textract_ocr, [extractor]*len(imgs), imgs), total=len(imgs), desc="Running textract OCR")
textract_text = list(textract_text)
return textract_text
================================================
FILE: benchmark/utils/verify_benchmark_scores.py
================================================
import json
import click
def verify_layout(data):
scores = data["metrics"]
for layout_type, metrics in scores.items():
if layout_type == "List": # Skip lists since none appear early on
continue
if metrics["precision"] <= 0.6 or metrics["recall"] <= 0.6:
raise ValueError("Scores do not meet the required threshold")
def verify_det(data):
scores = data["metrics"]["surya"]
if scores["precision"] <= 0.9 or scores["recall"] <= 0.9:
raise ValueError("Scores do not meet the required threshold")
def verify_rec(data):
scores = data["surya"]
if scores["avg_score"] <= 0.9:
raise ValueError("Scores do not meet the required threshold")
def verify_order(data):
score = data["mean_accuracy"]
if score < 0.75:
raise ValueError("Scores do not meet the required threshold")
def verify_table_rec(data):
row_score = data["surya"]["mean_row_iou"]
col_score = data["surya"]["mean_col_iou"]
if row_score < 0.75 or col_score < 0.75:
raise ValueError("Scores do not meet the required threshold")
def verify_texify(data):
edit_dist = data["scores"]
if edit_dist > 0.2:
raise ValueError("Scores do not meet the required threshold")
@click.command(help="Verify benchmark scores")
@click.argument("file_path", type=str)
@click.option(
"--bench_type", type=str, help="Type of benchmark to verify", default="detection"
)
def main(file_path, bench_type):
with open(file_path, "r") as file:
data = json.load(file)
if bench_type == "detection":
verify_det(data)
elif bench_type == "recognition":
verify_rec(data)
elif bench_type == "layout":
verify_layout(data)
elif bench_type == "ordering":
verify_order(data)
elif bench_type == "table_recognition":
verify_table_rec(data)
elif bench_type == "texify":
verify_texify(data)
else:
raise ValueError("Invalid benchmark type")
if __name__ == "__main__":
main()
================================================
FILE: detect_layout.py
================================================
from surya.scripts.detect_layout import detect_layout_cli
if __name__ == "__main__":
detect_layout_cli()
================================================
FILE: detect_text.py
================================================
from surya.scripts.detect_text import detect_text_cli
if __name__ == "__main__":
detect_text_cli()
================================================
FILE: ocr_app.py
================================================
from surya.scripts.run_streamlit_app import streamlit_app_cli
if __name__ == "__main__":
streamlit_app_cli()
================================================
FILE: ocr_latex.py
================================================
from surya.scripts.ocr_latex import ocr_latex_cli
if __name__ == "__main__":
ocr_latex_cli()
================================================
FILE: ocr_text.py
================================================
from surya.scripts.ocr_text import ocr_text_cli
if __name__ == "__main__":
ocr_text_cli()
================================================
FILE: pyproject.toml
================================================
[tool.poetry]
name = "surya-ocr"
version = "0.17.1"
description = "OCR, layout, reading order, and table recognition in 90+ languages"
authors = ["Vik Paruchuri <vik.paruchuri@gmail.com>"]
readme = "README.md"
license = "GPL-3.0-or-later"
repository = "https://github.com/VikParuchuri/surya"
keywords = ["ocr", "pdf", "text detection", "text recognition", "tables"]
packages = [
{include = "surya"}
]
[tool.poetry.dependencies]
python = "^3.10"
transformers = ">=4.56.1"
torch = "^2.7.0"
pydantic = "^2.5.3"
pydantic-settings = "^2.1.0"
python-dotenv = "^1.0.0"
pillow = "^10.2.0"
pypdfium2 = "=4.30.0"
filetype = "^1.2.0"
click = "^8.1.8"
platformdirs = "^4.3.6"
opencv-python-headless = "==4.11.0.86"
einops = "^0.8.1"
pre-commit = "^4.2.0"
[tool.poetry.group.dev.dependencies]
jupyter = "^1.0.0"
pytesseract = "^0.3.10"
pymupdf = "^1.23.8"
datasets = "^2.16.1"
rapidfuzz = "^3.6.1"
streamlit = "^1.31.0"
pytest = "^8.3.4"
pdftext = "^0.5.1"
tabulate = "^0.9.0"
[tool.poetry.scripts]
surya_detect = "surya.scripts.detect_text:detect_text_cli"
surya_ocr = "surya.scripts.ocr_text:ocr_text_cli"
surya_layout = "surya.scripts.detect_layout:detect_layout_cli"
surya_gui = "surya.scripts.run_streamlit_app:streamlit_app_cli"
surya_table = "surya.scripts.table_recognition:table_recognition_cli"
surya_latex_ocr = "surya.scripts.ocr_latex:ocr_latex_cli"
texify_gui = "surya.scripts.run_texify_app:texify_app_cli"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[[tool.poetry.source]]
name = "libtpu-releases"
url = "https://storage.googleapis.com/libtpu-releases/index.html"
priority = "supplemental"
[[tool.poetry.source]]
name = "libtpu-wheels"
url = "https://storage.googleapis.com/libtpu-wheels/index.html"
priority = "supplemental"
[tool.poetry.group.xla]
optional = true
[tool.poetry.group.xla.dependencies]
torch-xla = {version = "^2.4.1", extras = ["tpu"]}
================================================
FILE: pytest.ini
================================================
[pytest]
testpaths=tests
pythonpath=.
filterwarnings =
ignore::UserWarning
ignore::PendingDeprecationWarning
ignore::DeprecationWarning
================================================
FILE: signatures/version1/cla.json
================================================
{
"signedContributors": [
{
"name": "rishiraj",
"id": 44090649,
"comment_id": 2170578748,
"created_at": "2024-06-15T19:31:20Z",
"repoId": 741297064,
"pullRequestNo": 135
},
{
"name": "mmacvicar",
"id": 59354,
"comment_id": 2236493182,
"created_at": "2024-07-18T13:17:43Z",
"repoId": 741297064,
"pullRequestNo": 152
},
{
"name": "jimexist",
"id": 622789,
"comment_id": 2255151376,
"created_at": "2024-07-29T07:23:55Z",
"repoId": 741297064,
"pullRequestNo": 160
},
{
"name": "michaeldriscoll-avant",
"id": 85255083,
"comment_id": 2259143427,
"created_at": "2024-07-30T20:21:33Z",
"repoId": 741297064,
"pullRequestNo": 161
},
{
"name": "EdoardoPona",
"id": 29152472,
"comment_id": 2271115922,
"created_at": "2024-08-06T11:58:00Z",
"repoId": 741297064,
"pullRequestNo": 167
},
{
"name": "hidenori-endo",
"id": 15546605,
"comment_id": 2307217499,
"created_at": "2024-08-23T14:31:17Z",
"repoId": 741297064,
"pullRequestNo": 182
},
{
"name": "dobosevych",
"id": 12053536,
"comment_id": 2430376828,
"created_at": "2024-10-22T21:48:34Z",
"repoId": 741297064,
"pullRequestNo": 220
},
{
"name": "iammosespaulr",
"id": 28682735,
"comment_id": 2447941238,
"created_at": "2024-10-30T17:55:23Z",
"repoId": 741297064,
"pullRequestNo": 235
},
{
"name": "ArthurMor4is",
"id": 42987302,
"comment_id": 2515315717,
"created_at": "2024-12-03T18:37:45Z",
"repoId": 741297064,
"pullRequestNo": 255
},
{
"name": "tarun-menta",
"id": 66506307,
"comment_id": 2543457960,
"created_at": "2024-12-15T05:43:33Z",
"repoId": 741297064,
"pullRequestNo": 261
},
{
"name": "jonaskahn",
"id": 4338500,
"comment_id": 2556622097,
"created_at": "2024-12-20T09:36:20Z",
"repoId": 741297064,
"pullRequestNo": 269
},
{
"name": "kumsumit",
"id": 95072784,
"comment_id": 2574534622,
"created_at": "2025-01-07T07:05:59Z",
"repoId": 741297064,
"pullRequestNo": 276
},
{
"name": "kevinhu",
"id": 6051736,
"comment_id": 2614135351,
"created_at": "2025-01-25T23:34:12Z",
"repoId": 741297064,
"pullRequestNo": 291
},
{
"name": "zanussbaum",
"id": 33707069,
"comment_id": 3008673416,
"created_at": "2025-06-26T14:20:46Z",
"repoId": 741297064,
"pullRequestNo": 403
},
{
"name": "mebriki",
"id": 35892987,
"comment_id": 3154706976,
"created_at": "2025-08-05T10:54:27Z",
"repoId": 741297064,
"pullRequestNo": 418
},
{
"name": "starikovplusplus",
"id": 56602036,
"comment_id": 3168958011,
"created_at": "2025-08-08T18:29:50Z",
"repoId": 741297064,
"pullRequestNo": 423
},
{
"name": "sandy0kwon",
"id": 78377296,
"comment_id": 3207932260,
"created_at": "2025-08-20T20:07:15Z",
"repoId": 741297064,
"pullRequestNo": 434
},
{
"name": "n0kovo",
"id": 16690056,
"comment_id": 3208251881,
"created_at": "2025-08-20T22:22:06Z",
"repoId": 741297064,
"pullRequestNo": 435
},
{
"name": "davidxifeng",
"id": 158052,
"comment_id": 3249594859,
"created_at": "2025-09-03T14:52:16Z",
"repoId": 741297064,
"pullRequestNo": 445
},
{
"name": "u-ashish",
"id": 14264791,
"comment_id": 3258734182,
"created_at": "2025-09-05T15:16:48Z",
"repoId": 741297064,
"pullRequestNo": 447
},
{
"name": "Mohking1",
"id": 63689545,
"comment_id": 3314908963,
"created_at": "2025-09-20T11:21:42Z",
"repoId": 741297064,
"pullRequestNo": 462
},
{
"name": "wkpark",
"id": 232347,
"comment_id": 3330009557,
"created_at": "2025-09-24T17:42:55Z",
"repoId": 741297064,
"pullRequestNo": 464
},
{
"name": "coval3nte",
"id": 65908512,
"comment_id": 3848768229,
"created_at": "2026-02-04T17:28:32Z",
"repoId": 741297064,
"pullRequestNo": 483
},
{
"name": "bailey-coding",
"id": 29517254,
"comment_id": 3955014177,
"created_at": "2026-02-24T22:09:52Z",
"repoId": 741297064,
"pullRequestNo": 487
},
{
"name": "Br1an67",
"id": 29810238,
"comment_id": 3979412700,
"created_at": "2026-03-01T07:32:18Z",
"repoId": 741297064,
"pullRequestNo": 489
}
]
}
================================================
FILE: static/fonts/.gitignore
================================================
*
!.gitignore
================================================
FILE: surya/__init__.py
================================================
================================================
FILE: surya/common/__init__.py
================================================
================================================
FILE: surya/common/adetr/decoder.py
================================================
from typing import Dict, Optional, Tuple, Union
import torch
import torch.utils.checkpoint
from torch import nn
from transformers import PretrainedConfig
from transformers.activations import ACT2FN
from transformers.modeling_attn_mask_utils import AttentionMaskConverter
from transformers.modeling_outputs import BaseModelOutputWithNoAttention
from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
from surya.common.pretrained import SuryaPreTrainedModel
from surya.common.xla import mark_step
_MAX_SQRT_GRADIENT = 1000.0
class WrappedEmbedding(nn.Embedding):
def forward(self, input_ids, *args, **kwargs):
return super().forward(input_ids)
class SuryaADETRDecoderRMSNorm(nn.Module):
def __init__(self, dim: int, eps: float = 1e-6):
super().__init__()
self.eps = eps
self.weight = nn.Parameter(torch.zeros(dim))
def _norm(self, x):
variance = x.pow(2).mean(-1, keepdim=True)
# Add clipping to prevent division by zero
variance = torch.clamp(variance, min=self.eps)
return x * torch.rsqrt(variance)
def forward(self, x):
output = self._norm(x.float())
# Llama does x.to(float16) * w whilst SuryaADETRDecoder is (x * w).to(float16)
# See https://github.com/huggingface/transformers/pull/29402
output = output * (1.0 + self.weight.float())
# Clamp to float16 range
f16_info = torch.finfo(x.dtype)
output = output.clamp(min=f16_info.min, max=f16_info.max)
output = torch.where(
torch.isnan(output), torch.tensor(0.0, device=output.device), output
)
return output.type_as(x)
def extra_repr(self):
return f"{tuple(self.weight.shape)}, eps={self.eps}"
ALL_LAYERNORM_LAYERS.append(SuryaADETRDecoderRMSNorm)
class SuryaADETRDecoderRotaryEmbedding(nn.Module):
def __init__(self, dim, base=10000, device=None):
super().__init__()
self.dim = dim
self.base = base
inv_freq = 1.0 / (
self.base
** (torch.arange(0, self.dim, 2, dtype=torch.int64).float() / self.dim)
)
self.register_buffer("inv_freq", tensor=inv_freq, persistent=False)
@torch.no_grad()
# Copied from transformers.models.gemma.modeling_gemma.GemmaRotaryEmbedding.forward with Gemma->SuryaADETRDecoder
def forward(self, x, position_ids, seq_len=None):
# x: [bs, num_attention_heads, seq_len, head_size]
self.inv_freq.to(x.device)
inv_freq_expanded = (
self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
)
position_ids_expanded = position_ids[:, None, :].float()
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(
1, 2
)
emb = torch.cat((freqs, freqs), dim=-1)
cos = emb.cos()
sin = emb.sin()
return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
# Copied from transformers.models.llama.modeling_llama.rotate_half
def rotate_half(x):
"""Rotates half the hidden dims of the input."""
x1 = x[..., : x.shape[-1] // 2]
x2 = x[..., x.shape[-1] // 2 :]
return torch.cat((-x2, x1), dim=-1)
# Copied from transformers.models.llama.modeling_llama.apply_rotary_pos_emb
def apply_rotary_pos_emb(q, k, cos, sin, unsqueeze_dim=1):
"""Applies Rotary Position Embedding to the query and key tensors.
Args:
q (`torch.Tensor`): The query tensor.
k (`torch.Tensor`): The key tensor.
cos (`torch.Tensor`): The cosine part of the rotary embedding.
sin (`torch.Tensor`): The sine part of the rotary embedding.
unsqueeze_dim (`int`, *optional*, defaults to 1):
The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
Returns:
`tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
"""
cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed
# Copied from transformers.models.llama.modeling_llama.repeat_kv
def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
"""
This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
"""
batch, num_key_value_heads, slen, head_dim = hidden_states.shape
if n_rep == 1:
return hidden_states
hidden_states = hidden_states[:, :, None, :, :].expand(
batch, num_key_value_heads, n_rep, slen, head_dim
)
return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
class SuryaADETRDecoderSdpaCrossAttention(nn.Module):
"""Multi-headed attention from 'Attention Is All You Need' paper
Modified for GQA
"""
def __init__(self, config: PretrainedConfig):
super().__init__()
self.config = config
self.attention_dropout = config.attention_dropout
self.hidden_size = config.hidden_size
self.num_attention_heads = config.num_attention_heads
self.head_dim = config.head_dim
self.num_key_value_heads = config.num_key_value_heads
self.num_key_value_groups = self.num_attention_heads // self.num_key_value_heads
self.q_proj = nn.Linear(
self.hidden_size,
self.num_attention_heads * self.head_dim,
bias=config.attention_bias,
)
self.k_proj = nn.Linear(
self.config.encoder_hidden_size,
self.num_key_value_heads * self.head_dim,
bias=config.attention_bias,
)
self.v_proj = nn.Linear(
self.config.encoder_hidden_size,
self.num_key_value_heads * self.head_dim,
bias=config.attention_bias,
)
self.o_proj = nn.Linear(
self.num_attention_heads * self.head_dim, self.hidden_size, bias=True
)
self.rotary_emb = SuryaADETRDecoderRotaryEmbedding(
self.head_dim,
base=config.rope_theta,
)
def forward(
self,
hidden_states: torch.Tensor,
encoder_hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
encoder_attention_mask: Optional[torch.Tensor] = None,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
# Encoder attention mask currently ignored
bsz, q_len, _ = hidden_states.size()
_, v_len, _ = encoder_hidden_states.size()
query_states = self.q_proj(hidden_states)
query_states = query_states.view(
bsz, q_len, self.num_attention_heads, self.head_dim
).transpose(1, 2)
if self.key_states is None:
key_states = self.k_proj(encoder_hidden_states)
value_states = self.v_proj(encoder_hidden_states)
key_states = key_states.view(
bsz, v_len, self.num_key_value_heads, self.head_dim
).transpose(1, 2)
value_states = value_states.view(
bsz, v_len, self.num_key_value_heads, self.head_dim
).transpose(1, 2)
if use_cache:
self._update_cache(key_states, value_states)
else:
key_states = self.key_states
value_states = self.value_states
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
attn_output = torch.nn.functional.scaled_dot_product_attention(
query_states,
key_states,
value_states,
attn_mask=None,
dropout_p=self.attention_dropout if self.training else 0.0,
scale=self.head_dim**-0.5,
)
attn_output = attn_output.transpose(1, 2).contiguous()
attn_output = attn_output.view(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
return attn_output
def _clear_cache(self):
if self.value_states is not None:
del self.value_states
if self.key_states is not None:
del self.key_states
def _setup_cache(self, batch_size, device, dtype=None):
# Setup initial caches
self.value_states = None
self.key_states = None
@torch.no_grad()
def _update_cache(self, key_states, value_states, **cache_kwargs):
self.value_states = value_states
self.key_states = key_states
class SuryaADETRDecoderSdpaAttention(nn.Module):
"""Multi-headed attention from 'Attention Is All You Need' paper"""
def __init__(self, config: PretrainedConfig, static_cache=False, max_boxes=None):
super().__init__()
self.config = config
self.attention_dropout = config.attention_dropout
self.hidden_size = config.hidden_size
self.num_attention_heads = config.num_attention_heads
self.head_dim = config.head_dim
self.num_key_value_heads = config.num_key_value_heads
self.num_key_value_groups = self.num_attention_heads // self.num_key_value_heads
self.q_proj = nn.Linear(
self.hidden_size,
self.num_attention_heads * self.head_dim,
bias=config.attention_bias,
)
self.k_proj = nn.Linear(
self.hidden_size,
self.num_key_value_heads * self.head_dim,
bias=config.attention_bias,
)
self.v_proj = nn.Linear(
self.hidden_size,
self.num_key_value_heads * self.head_dim,
bias=config.attention_bias,
)
self.o_proj = nn.Linear(
self.num_attention_heads * self.head_dim, self.hidden_size, bias=True
)
self.rotary_emb = SuryaADETRDecoderRotaryEmbedding(
self.head_dim,
base=config.rope_theta,
)
self.static_cache = static_cache
self.max_boxes = max_boxes
def forward(
self,
hidden_states: torch.Tensor,
position_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
cache_position: Optional[torch.LongTensor] = None,
use_cache: bool = False,
window_attn: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
bsz, q_len, _ = hidden_states.size()
query_states = self.q_proj(hidden_states)
key_states = self.k_proj(hidden_states)
value_states = self.v_proj(hidden_states)
# Final is bsz, num_attention_heads, seq_len, head_dim
query_states = query_states.view(
bsz, q_len, self.num_attention_heads, self.head_dim
).transpose(1, 2)
key_states = key_states.view(
bsz, q_len, self.num_key_value_heads, self.head_dim
).transpose(1, 2)
value_states = value_states.view(
bsz, q_len, self.num_key_value_heads, self.head_dim
).transpose(1, 2)
cos, sin = self.rotary_emb(value_states, position_ids, seq_len=None)
query_states, key_states = apply_rotary_pos_emb(
query_states, key_states, cos, sin
)
if use_cache and hasattr(self, "key_states"):
cache_kwargs = {
"cache_position": cache_position,
"window_attn": window_attn,
}
key_states, value_states = self._update_cache(
key_states, value_states, **cache_kwargs
)
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
causal_mask = attention_mask
if attention_mask is not None:
# Mask is batch, head, seq_len, kv_len
causal_mask = causal_mask[:, :, :, : key_states.shape[-2]]
if cache_position is not None and self.static_cache:
current_pos = cache_position[-1]
causal_mask[:, :, :, current_pos + 1 :] = torch.finfo(
causal_mask.dtype
).min
attn_output = torch.nn.functional.scaled_dot_product_attention(
query_states,
key_states,
value_states,
attn_mask=causal_mask,
dropout_p=self.attention_dropout if self.training else 0.0,
scale=self.head_dim**-0.5,
)
attn_output = attn_output.transpose(1, 2).contiguous()
attn_output = attn_output.view(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
return attn_output
def _setup_cache(self, batch_size, device, dtype=None):
if dtype is None and self.config.torch_dtype is not None:
dtype = self.config.torch_dtype
dtype = dtype if dtype is not None else torch.float32
# Setup initial caches
self.value_states = None
self.key_states = None
if self.static_cache:
cache_shape = (
batch_size,
self.num_key_value_heads,
self.max_boxes,
self.head_dim,
)
self.value_states = torch.zeros(cache_shape, dtype=dtype, device=device)
self.key_states = torch.zeros(cache_shape, dtype=dtype, device=device)
def _clear_cache(self):
if self.value_states is not None:
del self.value_states
if self.key_states is not None:
del self.key_states
def _update_static_cache(self, key_states, value_states, **cache_kwargs):
cache_position = cache_kwargs.get("cache_position")
k_out, v_out = (
self.key_states.to(key_states.device),
self.value_states.to(value_states.device),
)
k_out[:, :, cache_position] = key_states.to(k_out.dtype)
v_out[:, :, cache_position] = value_states.to(v_out.dtype)
self.key_states, self.value_states = k_out, v_out
return k_out, v_out
def _update_dynamic_cache(self, key_states, value_states, **cache_kwargs):
k_out = key_states
if self.key_states is not None:
k_out = torch.cat([self.key_states, key_states], dim=2)
v_out = value_states
if self.value_states is not None:
v_out = torch.cat([self.value_states, value_states], dim=2)
self.key_states, self.value_states = k_out, v_out
return k_out, v_out
@torch.no_grad()
def _update_cache(self, key_states, value_states, **cache_kwargs):
if self.static_cache:
return self._update_static_cache(key_states, value_states, **cache_kwargs)
return self._update_dynamic_cache(key_states, value_states, **cache_kwargs)
class SuryaADETRDecoderMlp(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
self.hidden_size = config.hidden_size
self.intermediate_size = config.intermediate_size
self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
if config.hidden_activation is None:
config.hidden_activation = "gelu_pytorch_tanh"
hidden_activation = config.hidden_activation
self.act_fn = ACT2FN[hidden_activation]
def forward(self, x):
return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
class SuryaADETRDecoderLayer(nn.Module):
def __init__(self, config, layer_idx, static_cache=False, max_boxes=None):
super().__init__()
self.cross_pre_norm = SuryaADETRDecoderRMSNorm(
config.hidden_size, eps=config.rms_norm_eps
)
self.temporal_pre_norm = SuryaADETRDecoderRMSNorm(
config.hidden_size, eps=config.rms_norm_eps
)
self.temporal_block = None
if layer_idx in config.self_attn_layers:
self.temporal_block = SuryaADETRDecoderSdpaAttention(
config, static_cache=static_cache, max_boxes=max_boxes
)
self.cross_attn_block = None
if layer_idx in config.cross_attn_layers:
self.cross_attn_block = SuryaADETRDecoderSdpaCrossAttention(config)
self.window_attn = layer_idx not in config.global_attn_layers
self.channel_pre_norm = SuryaADETRDecoderRMSNorm(
config.hidden_size, eps=config.rms_norm_eps
)
self.mlp_block = SuryaADETRDecoderMlp(config)
self.double_residual_flow = getattr(config, "double_residual_flow", False)
def forward(
self,
activations: torch.Tensor,
position_ids: torch.Tensor,
attention_mask: torch.Tensor,
encoder_hidden_states: torch.Tensor = None,
encoder_attention_mask: torch.Tensor = None,
cache_position: torch.Tensor = None,
use_cache: bool = None,
) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
if self.double_residual_flow:
return self.double_res_forward(
activations,
position_ids,
attention_mask,
encoder_hidden_states,
encoder_attention_mask,
cache_position,
use_cache,
)
hidden_states = activations
if self.cross_attn_block is not None:
# Do cross-attention on encoder outputs
cross_attn_inputs = self.cross_pre_norm(hidden_states)
cross_attn_path = self.cross_attn_block(
cross_attn_inputs,
encoder_hidden_states,
attention_mask,
encoder_attention_mask,
use_cache=use_cache,
)
hidden_states = cross_attn_path + hidden_states
if self.temporal_block is not None:
temporal_inputs = self.temporal_pre_norm(
hidden_states
) # RMSNorm introduces slight slight differences
temporal_path = self.temporal_block(
temporal_inputs,
position_ids,
attention_mask,
cache_position=cache_position,
use_cache=use_cache,
window_attn=self.window_attn,
)
hidden_states = temporal_path + hidden_states
block_input = hidden_states
hidden_states = self.channel_pre_norm(block_input)
hidden_states = self.mlp_block(hidden_states)
hidden_states = hidden_states + block_input
return hidden_states
def double_res_forward(
self,
activations: torch.Tensor,
position_ids: torch.Tensor,
attention_mask: torch.Tensor,
encoder_hidden_states: torch.Tensor = None,
encoder_attention_mask: torch.Tensor = None,
cache_position: torch.Tensor = None,
use_cache: bool = None,
) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
raw_activations = activations
if self.cross_attn_block is not None:
# Do cross-attention on encoder outputs
cross_attn_inputs = self.cross_pre_norm(activations)
cross_attn_path = self.cross_attn_block(
cross_attn_inputs,
encoder_hidden_states,
attention_mask,
encoder_attention_mask,
use_cache=use_cache,
)
cross_attn_output = cross_attn_path + raw_activations
else:
cross_attn_output = raw_activations
if self.temporal_block is not None:
inputs_normalized = self.temporal_pre_norm(
cross_attn_output
) # RMSNorm introduces slight slight differences
hidden_states = self.temporal_block(
inputs_normalized,
position_ids,
attention_mask,
cache_position=cache_position,
use_cache=use_cache,
window_attn=self.window_attn,
)
residual = hidden_states + raw_activations
else:
residual = cross_attn_output
hidden_states = self.channel_pre_norm(residual)
hidden_states = self.mlp_block(hidden_states)
hidden_states = hidden_states + residual
return hidden_states
class SuryaADETRDecoderPreTrainedModel(SuryaPreTrainedModel):
config_class = PretrainedConfig
base_model_prefix = "model"
supports_gradient_checkpointing = True
_no_split_modules = ["SuryaADETRDecoderLayer"]
_skip_keys_device_placement = ["cache"]
_supports_flash_attn_2 = False
_supports_sdpa = False # we can't compare with eager for now
_supports_cache_class = True
_supports_quantized_cache = True
def _init_weights(self, module):
if isinstance(module, SuryaADETRDecoderSdpaAttention):
torch.nn.init.normal_(
module.q_proj.weight, mean=0.0, std=self.config.init_std
)
torch.nn.init.normal_(
module.k_proj.weight, mean=0.0, std=self.config.init_std
)
torch.nn.init.normal_(
module.v_proj.weight, mean=0.0, std=self.config.init_std
)
torch.nn.init.normal_(
module.o_proj.weight, mean=0.0, std=self.config.init_std
)
elif isinstance(module, nn.Linear):
torch.nn.init.normal_(module.weight, mean=0.0, std=self.config.init_std)
if getattr(module, "bias", None) is not None:
torch.nn.init.zeros_(module.bias)
elif isinstance(module, nn.Embedding):
module.weight.data.normal_(mean=0.0, std=self.config.init_std)
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
def _setup_cache(self, config, batch, device, dtype):
layers = getattr(self, "model", self).layers
for layer in layers:
if layer.temporal_block:
layer.temporal_block._setup_cache(batch, device, dtype)
if layer.cross_attn_block:
layer.cross_attn_block._setup_cache(batch, device, dtype)
def _clear_cache(self):
layers = getattr(self, "model", self).layers
for layer in layers:
if layer.temporal_block:
layer.temporal_block._clear_cache()
if layer.cross_attn_block:
layer.cross_attn_block._clear_cache()
def reset_cache(self, batch, device, dtype):
pass
def _tie_weights(self):
pass
def tie_weights(self):
pass
class SuryaADETRDecoderModel(SuryaADETRDecoderPreTrainedModel):
"""
Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`SuryaADETRDecoderDecoderLayer`]
Args:
config: PretrainedConfig
"""
def __init__(
self,
config: PretrainedConfig,
embedder: nn.Module = None,
max_boxes: int = None,
static_cache: bool = False,
):
super().__init__(config)
self.padding_idx = config.pad_token_id
self.vocab_size = config.vocab_size
self.causal = config.causal
self.embed_tokens = embedder
self.max_boxes = max_boxes
self.static_cache = static_cache
self.layers = nn.ModuleList(
[
SuryaADETRDecoderLayer(
config, layer_idx, static_cache=static_cache, max_boxes=max_boxes
)
for layer_idx in range(config.num_hidden_layers)
]
)
self.final_norm = SuryaADETRDecoderRMSNorm(
config.hidden_size, eps=config.rms_norm_eps
)
self.gradient_checkpointing = False
self.register_buffer(
"normalizer",
torch.tensor(self.config.hidden_size**0.5, dtype=torch.float32),
persistent=False,
)
# Initialize weights and apply final processing
self.post_init()
# Copied from transformers.models.llama.modeling_llama.LlamaModel.get_input_embeddings
def get_input_embeddings(self):
return self.embed_tokens
# Copied from transformers.models.llama.modeling_llama.LlamaModel.set_input_embeddings
def set_input_embeddings(self, value):
self.embed_tokens = value
def forward(
self,
input_ids: torch.LongTensor = None,
input_boxes_counts: torch.LongTensor = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
position_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
encoder_hidden_states: Optional[torch.FloatTensor] = None,
encoder_attention_mask: Optional[torch.FloatTensor] = None,
cache_position: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
prefill: bool = False,
) -> Union[Tuple, BaseModelOutputWithNoAttention]:
use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = (
return_dict if return_dict is not None else self.config.use_return_dict
)
if self.gradient_checkpointing and self.training and use_cache:
use_cache = False
inputs_embeds = self.embed_tokens(input_ids, input_boxes_counts)
hidden_states = inputs_embeds
if use_cache and prefill:
self._setup_cache(
self.config,
hidden_states.shape[0],
hidden_states.device,
hidden_states.dtype,
)
if cache_position is None:
cache_position = torch.arange(
hidden_states.shape[1], device=hidden_states.device
)
if position_ids is None:
position_ids = cache_position.unsqueeze(0)
causal_mask = self._update_causal_mask(
attention_mask, inputs_embeds, cache_position
)
all_hidden_states = () if output_hidden_states else None
for i, residual_block in enumerate(self.layers):
if output_hidden_states:
all_hidden_states += (hidden_states,)
if self.gradient_checkpointing and self.training:
hidden_states = self._gradient_checkpointing_func(
residual_block.__call__,
hidden_states,
position_ids,
causal_mask,
encoder_hidden_states,
encoder_attention_mask,
cache_position,
use_cache,
)
else:
hidden_states = residual_block(
hidden_states,
position_ids,
causal_mask,
encoder_hidden_states,
encoder_attention_mask,
cache_position,
use_cache,
)
hidden_states = self.final_norm(hidden_states)
# add hidden states from the last decoder layer
if output_hidden_states:
all_hidden_states += (hidden_states,)
if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states] if v is not None)
return BaseModelOutputWithNoAttention(
last_hidden_state=hidden_states,
hidden_states=all_hidden_states,
)
# TODO: As of torch==2.2.0, the `attention_mask` passed to the model in `generate` is 2D and of dynamic length even when the static
# KV cache is used. This is an issue for torch.compile which then recaptures cudagraphs at each decode steps due to the dynamic shapes.
# (`recording cudagraph tree for symint key 13`, etc.), which is VERY slow. A workaround is `@torch.compiler.disable`, but this prevents using
# `fullgraph=True`. See more context in https://github.com/huggingface/transformers/pull/29114
# Ignore copy
def _update_causal_mask(self, attention_mask, input_tensor, cache_position):
if not self.causal:
return None
dtype, device = input_tensor.dtype, input_tensor.device
min_dtype = torch.finfo(dtype).min
sequence_length = input_tensor.shape[1]
target_length = max(self.max_boxes, sequence_length)
diagonal = torch.full(
(sequence_length, target_length),
fill_value=min_dtype,
dtype=dtype,
device=device,
)
causal_mask = diagonal
if sequence_length != 1:
# Select the upper triangular part of the matrix, but unmask current token (the diagonal)
# triu will be the min_dtype, everything else is 0 (attended to)
causal_mask = torch.triu(diagonal, diagonal=1)
causal_mask *= torch.arange(
target_length, device=device
) > cache_position.reshape(-1, 1)
causal_mask = causal_mask[None, None, :, :].expand(
input_tensor.shape[0], 1, -1, -1
)
if attention_mask is not None:
causal_mask = (
causal_mask.clone()
) # copy to contiguous memory for in-place edit
if attention_mask.dim() == 2:
# Mask positions in the causal mask that are masked in the attention mask
mask_length = attention_mask.shape[-1]
padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[
:, None, None, :
].eq(0.0)
causal_mask[..., :mask_length] = causal_mask[
..., :mask_length
].masked_fill(padding_mask, min_dtype)
if attention_mask is not None and attention_mask.device.type == "cuda":
# Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
# using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
# Detail
gitextract_x32e43uo/ ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ ├── breaking-bug-report.md │ │ ├── feature_request.md │ │ └── output-bug-report.md │ └── workflows/ │ ├── benchmarks.yml │ ├── ci.yml │ ├── cla.yml │ ├── publish.yml │ └── scripts.yml ├── .gitignore ├── .pre-commit-config.yaml ├── CITATION.cff ├── CLA.md ├── LICENSE ├── MODEL_LICENSE ├── README.md ├── benchmark/ │ ├── detection.py │ ├── layout.py │ ├── ordering.py │ ├── recognition.py │ ├── table_recognition.py │ ├── texify.py │ └── utils/ │ ├── __init__.py │ ├── bbox.py │ ├── metrics.py │ ├── scoring.py │ ├── tatr.py │ ├── tesseract.py │ ├── textract.py │ └── verify_benchmark_scores.py ├── detect_layout.py ├── detect_text.py ├── ocr_app.py ├── ocr_latex.py ├── ocr_text.py ├── pyproject.toml ├── pytest.ini ├── signatures/ │ └── version1/ │ └── cla.json ├── static/ │ └── fonts/ │ └── .gitignore ├── surya/ │ ├── __init__.py │ ├── common/ │ │ ├── __init__.py │ │ ├── adetr/ │ │ │ └── decoder.py │ │ ├── donut/ │ │ │ ├── encoder.py │ │ │ └── processor.py │ │ ├── load.py │ │ ├── polygon.py │ │ ├── predictor.py │ │ ├── pretrained.py │ │ ├── s3.py │ │ ├── surya/ │ │ │ ├── __init__.py │ │ │ ├── config.py │ │ │ ├── decoder/ │ │ │ │ ├── __init__.py │ │ │ │ └── config.py │ │ │ ├── embedder/ │ │ │ │ └── __init__.py │ │ │ ├── encoder/ │ │ │ │ ├── __init__.py │ │ │ │ └── config.py │ │ │ ├── flash_attn_utils.py │ │ │ ├── processor/ │ │ │ │ ├── __init__.py │ │ │ │ ├── schema.py │ │ │ │ └── tokenizer.py │ │ │ └── schema.py │ │ ├── util.py │ │ └── xla.py │ ├── debug/ │ │ ├── draw.py │ │ ├── fonts.py │ │ ├── katex.js │ │ ├── render_html.py │ │ └── text.py │ ├── detection/ │ │ ├── __init__.py │ │ ├── heatmap.py │ │ ├── loader.py │ │ ├── model/ │ │ │ ├── __init__.py │ │ │ ├── config.py │ │ │ └── encoderdecoder.py │ │ ├── parallel.py │ │ ├── processor.py │ │ ├── schema.py │ │ └── util.py │ ├── foundation/ │ │ ├── __init__.py │ │ ├── cache/ │ │ │ ├── __init__.py │ │ │ ├── dynamic_ops.py │ │ │ └── static_ops.py │ │ ├── loader.py │ │ └── util.py │ ├── input/ │ │ ├── load.py │ │ └── processing.py │ ├── layout/ │ │ ├── __init__.py │ │ ├── label.py │ │ └── schema.py │ ├── logging.py │ ├── models.py │ ├── ocr_error/ │ │ ├── __init__.py │ │ ├── loader.py │ │ ├── model/ │ │ │ ├── __init__.py │ │ │ ├── config.py │ │ │ └── encoder.py │ │ ├── schema.py │ │ └── tokenizer.py │ ├── recognition/ │ │ ├── __init__.py │ │ ├── languages.py │ │ ├── postprocessing.py │ │ ├── schema.py │ │ └── util.py │ ├── scripts/ │ │ ├── __init__.py │ │ ├── config.py │ │ ├── detect_layout.py │ │ ├── detect_text.py │ │ ├── finetune_ocr.py │ │ ├── hf_to_s3.py │ │ ├── ocr_latex.py │ │ ├── ocr_text.py │ │ ├── run_streamlit_app.py │ │ ├── run_texify_app.py │ │ ├── streamlit_app.py │ │ ├── table_recognition.py │ │ └── texify_app.py │ ├── settings.py │ └── table_rec/ │ ├── __init__.py │ ├── loader.py │ ├── model/ │ │ ├── __init__.py │ │ ├── config.py │ │ ├── decoder.py │ │ ├── encoder.py │ │ └── encoderdecoder.py │ ├── processor.py │ ├── schema.py │ └── shaper.py ├── table_recognition.py ├── tests/ │ ├── conftest.py │ ├── test_detection.py │ ├── test_foundation.py │ ├── test_latex_ocr.py │ ├── test_layout.py │ ├── test_ocr_errors.py │ ├── test_recognition.py │ └── test_table_rec.py └── texify_app.py
SYMBOL INDEX (790 symbols across 99 files)
FILE: benchmark/detection.py
function main (line 29) | def main(pdf_path: str, results_dir: str, max_rows: int, debug: bool, te...
FILE: benchmark/layout.py
function main (line 33) | def main(results_dir: str, max_rows: int, debug: bool):
FILE: benchmark/ordering.py
function main (line 30) | def main(results_dir: str, max_rows: int):
FILE: benchmark/recognition.py
function list_in (line 38) | def list_in(lst: str | list, lst2: list):
function standardize_bullets (line 44) | def standardize_bullets(text):
function normalize_text (line 67) | def normalize_text(text: str) -> str:
function main (line 110) | def main(
FILE: benchmark/table_recognition.py
function main (line 33) | def main(results_dir: str, max_rows: int, tatr: bool, debug: bool):
FILE: benchmark/texify.py
function normalize_text (line 20) | def normalize_text(text):
function score_text (line 32) | def score_text(predictions, references):
function inference_texify (line 42) | def inference_texify(
function main (line 80) | def main(ds_name: str, results_dir: str, max_rows: int, line_mode: bool):
FILE: benchmark/utils/bbox.py
function get_pdf_lines (line 5) | def get_pdf_lines(pdf_path, img_sizes):
function merge_boxes (line 24) | def merge_boxes(box1, box2):
function join_lines (line 28) | def join_lines(bboxes, max_gap=5):
FILE: benchmark/utils/metrics.py
function box_area (line 8) | def box_area(box):
function calculate_iou (line 12) | def calculate_iou(box1, box2, box1_only=False):
function match_boxes (line 23) | def match_boxes(preds, references):
function penalized_iou_score (line 57) | def penalized_iou_score(preds, references):
function intersection_pixels (line 62) | def intersection_pixels(box1, box2):
function calculate_coverage (line 80) | def calculate_coverage(box, other_boxes, penalize_double=False):
function intersection_area (line 101) | def intersection_area(box1, box2):
function calculate_coverage_fast (line 113) | def calculate_coverage_fast(box, other_boxes, penalize_double=False):
function precision_recall (line 136) | def precision_recall(preds, references, threshold=.5, workers=8, penaliz...
function mean_coverage (line 171) | def mean_coverage(preds, references):
function rank_accuracy (line 189) | def rank_accuracy(preds, references):
FILE: benchmark/utils/scoring.py
function overlap_score (line 7) | def overlap_score(pred_lines: List[str], reference_lines: List[str]):
function overlap_score_exact (line 29) | def overlap_score_exact(pred_lines: List[str], reference_lines: List[str]):
FILE: benchmark/utils/tatr.py
class MaxResize (line 7) | class MaxResize(object):
method __init__ (line 8) | def __init__(self, max_size=800):
method __call__ (line 11) | def __call__(self, image):
function to_tensor (line 20) | def to_tensor(image):
function normalize (line 33) | def normalize(tensor, mean, std):
function structure_transform (line 39) | def structure_transform(image):
function box_cxcywh_to_xyxy (line 46) | def box_cxcywh_to_xyxy(x):
function rescale_bboxes (line 52) | def rescale_bboxes(out_bbox, size):
function outputs_to_objects (line 59) | def outputs_to_objects(outputs, img_sizes, id2label):
function load_tatr (line 97) | def load_tatr():
function batch_inference_tatr (line 101) | def batch_inference_tatr(model, images, batch_size):
FILE: benchmark/utils/tesseract.py
function surya_lang_to_tesseract (line 15) | def surya_lang_to_tesseract(code: str) -> Optional[str]:
function tesseract_ocr (line 24) | def tesseract_ocr(img, bboxes, lang: str):
function tesseract_ocr_parallel (line 35) | def tesseract_ocr_parallel(imgs, bboxes, langs: List[str], cpus=None):
function tesseract_bboxes (line 51) | def tesseract_bboxes(img):
function tesseract_parallel (line 68) | def tesseract_parallel(imgs):
FILE: benchmark/utils/textract.py
function textract_ocr (line 9) | def textract_ocr(extractor, img):
function textract_ocr_parallel (line 17) | def textract_ocr_parallel(imgs, cpus=None):
FILE: benchmark/utils/verify_benchmark_scores.py
function verify_layout (line 5) | def verify_layout(data):
function verify_det (line 15) | def verify_det(data):
function verify_rec (line 21) | def verify_rec(data):
function verify_order (line 27) | def verify_order(data):
function verify_table_rec (line 33) | def verify_table_rec(data):
function verify_texify (line 41) | def verify_texify(data):
function main (line 52) | def main(file_path, bench_type):
FILE: surya/common/adetr/decoder.py
class WrappedEmbedding (line 19) | class WrappedEmbedding(nn.Embedding):
method forward (line 20) | def forward(self, input_ids, *args, **kwargs):
class SuryaADETRDecoderRMSNorm (line 24) | class SuryaADETRDecoderRMSNorm(nn.Module):
method __init__ (line 25) | def __init__(self, dim: int, eps: float = 1e-6):
method _norm (line 30) | def _norm(self, x):
method forward (line 37) | def forward(self, x):
method extra_repr (line 50) | def extra_repr(self):
class SuryaADETRDecoderRotaryEmbedding (line 57) | class SuryaADETRDecoderRotaryEmbedding(nn.Module):
method __init__ (line 58) | def __init__(self, dim, base=10000, device=None):
method forward (line 70) | def forward(self, x, position_ids, seq_len=None):
function rotate_half (line 88) | def rotate_half(x):
function apply_rotary_pos_emb (line 96) | def apply_rotary_pos_emb(q, k, cos, sin, unsqueeze_dim=1):
function repeat_kv (line 122) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
class SuryaADETRDecoderSdpaCrossAttention (line 136) | class SuryaADETRDecoderSdpaCrossAttention(nn.Module):
method __init__ (line 141) | def __init__(self, config: PretrainedConfig):
method forward (line 174) | def forward(
method _clear_cache (line 224) | def _clear_cache(self):
method _setup_cache (line 230) | def _setup_cache(self, batch_size, device, dtype=None):
method _update_cache (line 236) | def _update_cache(self, key_states, value_states, **cache_kwargs):
class SuryaADETRDecoderSdpaAttention (line 241) | class SuryaADETRDecoderSdpaAttention(nn.Module):
method __init__ (line 244) | def __init__(self, config: PretrainedConfig, static_cache=False, max_b...
method forward (line 280) | def forward(
method _setup_cache (line 347) | def _setup_cache(self, batch_size, device, dtype=None):
method _clear_cache (line 366) | def _clear_cache(self):
method _update_static_cache (line 372) | def _update_static_cache(self, key_states, value_states, **cache_kwargs):
method _update_dynamic_cache (line 385) | def _update_dynamic_cache(self, key_states, value_states, **cache_kwar...
method _update_cache (line 398) | def _update_cache(self, key_states, value_states, **cache_kwargs):
class SuryaADETRDecoderMlp (line 405) | class SuryaADETRDecoderMlp(nn.Module):
method __init__ (line 406) | def __init__(self, config):
method forward (line 419) | def forward(self, x):
class SuryaADETRDecoderLayer (line 423) | class SuryaADETRDecoderLayer(nn.Module):
method __init__ (line 424) | def __init__(self, config, layer_idx, static_cache=False, max_boxes=No...
method forward (line 451) | def forward(
method double_res_forward (line 507) | def double_res_forward(
class SuryaADETRDecoderPreTrainedModel (line 557) | class SuryaADETRDecoderPreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 568) | def _init_weights(self, module):
method _setup_cache (line 592) | def _setup_cache(self, config, batch, device, dtype):
method _clear_cache (line 600) | def _clear_cache(self):
method reset_cache (line 608) | def reset_cache(self, batch, device, dtype):
method _tie_weights (line 611) | def _tie_weights(self):
method tie_weights (line 614) | def tie_weights(self):
class SuryaADETRDecoderModel (line 618) | class SuryaADETRDecoderModel(SuryaADETRDecoderPreTrainedModel):
method __init__ (line 626) | def __init__(
method get_input_embeddings (line 664) | def get_input_embeddings(self):
method set_input_embeddings (line 668) | def set_input_embeddings(self, value):
method forward (line 671) | def forward(
method _update_causal_mask (line 761) | def _update_causal_mask(self, attention_mask, input_tensor, cache_posi...
FILE: surya/common/donut/encoder.py
class DonutSwinEncoderOutput (line 27) | class DonutSwinEncoderOutput(ModelOutput):
class DonutSwinModelOutput (line 35) | class DonutSwinModelOutput(ModelOutput):
function window_partition (line 40) | def window_partition(input_feature, window_size):
function window_reverse (line 62) | def window_reverse(windows, window_size, height, width):
class DonutSwinEmbeddings (line 84) | class DonutSwinEmbeddings(nn.Module):
method __init__ (line 89) | def __init__(self, config, use_mask_token=False):
method interpolate_pos_encoding (line 119) | def interpolate_pos_encoding(
method forward (line 155) | def forward(
class DonutSwinPatchEmbeddings (line 195) | class DonutSwinPatchEmbeddings(nn.Module):
method __init__ (line 202) | def __init__(self, config):
method maybe_pad (line 232) | def maybe_pad(self, pixel_values, height, width):
method forward (line 241) | def forward(
class DonutSwinPatchMerging (line 256) | class DonutSwinPatchMerging(nn.Module):
method __init__ (line 269) | def __init__(
method maybe_pad (line 281) | def maybe_pad(self, input_feature, height, width):
method forward (line 289) | def forward(
class DonutSwinSelfAttention (line 322) | class DonutSwinSelfAttention(nn.Module):
method __init__ (line 323) | def __init__(self, config, dim, num_heads, num_kv_heads, window_size):
method transpose_for_scores (line 371) | def transpose_for_scores(self, x):
method transpose_kv_for_scores (line 379) | def transpose_kv_for_scores(self, x, repeats):
method forward (line 387) | def forward(
class DonutSwinSelfOutput (line 444) | class DonutSwinSelfOutput(nn.Module):
method __init__ (line 445) | def __init__(self, config, dim):
method forward (line 449) | def forward(
class DonutSwinAttention (line 456) | class DonutSwinAttention(nn.Module):
method __init__ (line 457) | def __init__(self, config, dim, num_heads, num_kv_heads, window_size):
method prune_heads (line 465) | def prune_heads(self, heads):
method forward (line 488) | def forward(
class DonutSwinIntermediate (line 506) | class DonutSwinIntermediate(nn.Module):
method __init__ (line 507) | def __init__(self, config, dim):
method forward (line 515) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
class DonutSwinOutput (line 522) | class DonutSwinOutput(nn.Module):
method __init__ (line 523) | def __init__(self, config, dim):
method forward (line 527) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
class DonutSwinLayer (line 532) | class DonutSwinLayer(nn.Module):
method __init__ (line 533) | def __init__(
method set_shift_and_window_size (line 549) | def set_shift_and_window_size(self, input_resolution):
method get_attn_mask (line 559) | def get_attn_mask(self, height, width, dtype, device):
method maybe_pad (line 589) | def maybe_pad(self, hidden_states, height, width):
method forward (line 596) | def forward(
class DonutSwinStage (line 687) | class DonutSwinStage(nn.Module):
method __init__ (line 688) | def __init__(
method build_2d_sincos_position_embedding (line 735) | def build_2d_sincos_position_embedding(
method forward (line 761) | def forward(
class DonutSwinEncoder (line 811) | class DonutSwinEncoder(nn.Module):
method __init__ (line 812) | def __init__(self, config, grid_size):
method forward (line 841) | def forward(
class DonutSwinPreTrainedModel (line 933) | class DonutSwinPreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 945) | def _init_weights(self, module):
FILE: surya/common/donut/processor.py
class SuryaEncoderImageProcessor (line 23) | class SuryaEncoderImageProcessor(S3DownloaderMixin, ImageProcessingMixin):
method __init__ (line 24) | def __init__(
method __call__ (line 46) | def __call__(self, images, **kwargs) -> PIL.Image.Image:
method numpy_resize (line 51) | def numpy_resize(cls, image: np.ndarray, size, interpolation=cv2.INTER...
method process_inner (line 61) | def process_inner(self, images: List[np.ndarray]):
method preprocess (line 118) | def preprocess(
method pad_image (line 134) | def pad_image(
method align_long_axis (line 166) | def align_long_axis(
method normalize (line 180) | def normalize(
FILE: surya/common/load.py
class ModelLoader (line 8) | class ModelLoader:
method __init__ (line 9) | def __init__(self, checkpoint: Optional[str] = None):
method model (line 12) | def model(
method processor (line 20) | def processor(
FILE: surya/common/polygon.py
class PolygonBox (line 9) | class PolygonBox(BaseModel):
method convert_bbox_to_polygon (line 15) | def convert_bbox_to_polygon(cls, value):
method height (line 41) | def height(self):
method width (line 45) | def width(self):
method area (line 49) | def area(self):
method bbox (line 54) | def bbox(self) -> List[float]:
method rescale (line 59) | def rescale(self, processor_size, image_size):
method round (line 71) | def round(self, divisor):
method fit_to_bounds (line 76) | def fit_to_bounds(self, bounds):
method merge (line 83) | def merge(self, other):
method merge_left (line 90) | def merge_left(self, other):
method merge_right (line 95) | def merge_right(self, other):
method expand (line 100) | def expand(self, x_margin: float, y_margin: float):
method intersection_polygon (line 115) | def intersection_polygon(self, other) -> List[List[float]]:
method intersection_area (line 142) | def intersection_area(self, other, x_margin=0, y_margin=0):
method x_overlap (line 147) | def x_overlap(self, other, x_margin=0):
method y_overlap (line 154) | def y_overlap(self, other, y_margin=0):
method intersection_pct (line 161) | def intersection_pct(self, other, x_margin=0, y_margin=0):
method shift (line 175) | def shift(self, x_shift: float | None = None, y_shift: float | None = ...
method clamp (line 183) | def clamp(self, bbox: List[float]):
method center (line 189) | def center(self):
method distance (line 192) | def distance(self, other):
method __hash__ (line 200) | def __hash__(self):
FILE: surya/common/predictor.py
class BasePredictor (line 9) | class BasePredictor:
method disable_tqdm (line 16) | def disable_tqdm(self) -> bool:
method disable_tqdm (line 20) | def disable_tqdm(self, value: bool) -> None:
method __init__ (line 23) | def __init__(
method to (line 42) | def to(self, device_dtype: torch.device | str | None = None):
method get_batch_size (line 54) | def get_batch_size(self):
method pad_to_batch_size (line 63) | def pad_to_batch_size(tensor: torch.Tensor, batch_size: int):
method __call__ (line 78) | def __call__(self, *args, **kwargs):
FILE: surya/common/pretrained.py
class SuryaPreTrainedModel (line 7) | class SuryaPreTrainedModel(PreTrainedModel):
method _check_and_adjust_attn_implementation (line 9) | def _check_and_adjust_attn_implementation(
FILE: surya/common/s3.py
function join_urls (line 21) | def join_urls(url1: str, url2: str):
function get_model_name (line 27) | def get_model_name(pretrained_model_name_or_path: str):
function download_file (line 31) | def download_file(remote_path: str, local_path: str, chunk_size: int = 1...
function check_manifest (line 68) | def check_manifest(local_dir: str):
function download_directory (line 86) | def download_directory(remote_path: str, local_dir: str):
class S3DownloaderMixin (line 130) | class S3DownloaderMixin:
method get_local_path (line 134) | def get_local_path(cls, pretrained_model_name_or_path) -> str:
method from_pretrained (line 147) | def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
FILE: surya/common/surya/__init__.py
class SuryaModelOutput (line 28) | class SuryaModelOutput(CausalLMOutputWithPast):
class FlashAttentionKwargs (line 33) | class FlashAttentionKwargs(TypedDict, total=False):
class KwargsForCausalLM (line 54) | class KwargsForCausalLM(FlashAttentionKwargs): ...
class DistanceProjection (line 57) | class DistanceProjection(nn.Module):
method __init__ (line 58) | def __init__(self, in_features: int, out_features: int):
method forward (line 64) | def forward(self, x: torch.Tensor) -> torch.Tensor:
method init_weights (line 70) | def init_weights(self):
class BboxHead (line 77) | class BboxHead(nn.Module):
method __init__ (line 78) | def __init__(self, in_features: int, out_features: int):
method forward (line 86) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class SuryaModel (line 95) | class SuryaModel(S3DownloaderMixin, SuryaPreTrainedModel):
method __init__ (line 109) | def __init__(
method tie_weights (line 163) | def tie_weights(self):
method _tie_weights (line 166) | def _tie_weights(self):
method get_output_embeddings (line 170) | def get_output_embeddings(self) -> nn.Module:
method get_input_embeddings (line 173) | def get_input_embeddings(self) -> nn.Module:
method set_output_embeddings (line 176) | def set_output_embeddings(self, new_embeddings: nn.Module):
method set_input_embeddings (line 179) | def set_input_embeddings(self, new_embeddings: nn.Module):
method maybe_static_pad_image_inputs (line 182) | def maybe_static_pad_image_inputs(
method get_image_embeddings (line 210) | def get_image_embeddings(
method embed_ids_boxes_images (line 289) | def embed_ids_boxes_images(
method get_2d_learned_embeddings (line 328) | def get_2d_learned_embeddings(
method get_logits (line 369) | def get_logits(self, hidden_states):
method forward (line 396) | def forward(
method _update_causal_mask (line 510) | def _update_causal_mask(
method _prepare_4d_causal_attention_mask_with_cache_position (line 560) | def _prepare_4d_causal_attention_mask_with_cache_position(
class SuryaXLAModel (line 632) | class SuryaXLAModel(SuryaModel):
method get_image_embeddings (line 633) | def get_image_embeddings(
method embed_ids_boxes_images (line 714) | def embed_ids_boxes_images(
method get_2d_learned_embeddings (line 765) | def get_2d_learned_embeddings(
FILE: surya/common/surya/config.py
class SuryaModelConfig (line 9) | class SuryaModelConfig(S3DownloaderMixin, PretrainedConfig):
method __init__ (line 13) | def __init__(
FILE: surya/common/surya/decoder/__init__.py
class Qwen2MLP (line 28) | class Qwen2MLP(nn.Module):
method __init__ (line 29) | def __init__(self, config):
method forward (line 39) | def forward(self, x):
function rotate_half (line 44) | def rotate_half(x):
function apply_rotary_pos_emb (line 51) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_di...
function repeat_kv (line 78) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
function eager_attention_forward (line 92) | def eager_attention_forward(
class Qwen2Attention (line 122) | class Qwen2Attention(nn.Module):
method __init__ (line 125) | def __init__(self, config: SuryaDecoderConfig, layer_idx: int):
method forward (line 151) | def forward(
class Qwen2RMSNorm (line 246) | class Qwen2RMSNorm(nn.Module):
method __init__ (line 247) | def __init__(self, hidden_size, eps=1e-6):
method forward (line 255) | def forward(self, hidden_states):
method extra_repr (line 262) | def extra_repr(self):
class Qwen2DecoderLayer (line 266) | class Qwen2DecoderLayer(nn.Module):
method __init__ (line 267) | def __init__(self, config: SuryaDecoderConfig, layer_idx: int):
method forward (line 277) | def forward(
class Qwen2RotaryEmbedding (line 332) | class Qwen2RotaryEmbedding(nn.Module):
method __init__ (line 333) | def __init__(self, config: SuryaDecoderConfig, device=None):
method _dynamic_frequency_update (line 352) | def _dynamic_frequency_update(self, position_ids, device):
method forward (line 379) | def forward(self, x, position_ids):
class Qwen2PreTrainedModel (line 410) | class Qwen2PreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 424) | def _init_weights(self, module):
class SuryaDecoderModel (line 436) | class SuryaDecoderModel(Qwen2PreTrainedModel):
method __init__ (line 445) | def __init__(self, config: SuryaDecoderConfig):
method forward (line 463) | def forward(
FILE: surya/common/surya/decoder/config.py
class SuryaDecoderConfig (line 8) | class SuryaDecoderConfig(PretrainedConfig):
method __init__ (line 28) | def __init__(
FILE: surya/common/surya/embedder/__init__.py
class SimpleTokenEmbedder (line 6) | class SimpleTokenEmbedder(nn.Module):
method __init__ (line 7) | def __init__(self, config):
method embed (line 22) | def embed(
FILE: surya/common/surya/encoder/__init__.py
class Qwen2_5_VLMLP (line 22) | class Qwen2_5_VLMLP(nn.Module):
method __init__ (line 23) | def __init__(self, config, bias: bool = False):
method forward (line 32) | def forward(self, hidden_state):
class Qwen2_5_VisionPatchEmbed (line 38) | class Qwen2_5_VisionPatchEmbed(nn.Module):
method __init__ (line 39) | def __init__(
method forward (line 61) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
class Qwen2_5_VisionRotaryEmbedding (line 77) | class Qwen2_5_VisionRotaryEmbedding(nn.Module):
method __init__ (line 78) | def __init__(self, dim: int, theta: float = 10000.0) -> None:
method forward (line 84) | def forward(self, seqlen: int) -> torch.Tensor:
class Qwen2RMSNorm (line 90) | class Qwen2RMSNorm(nn.Module):
method __init__ (line 91) | def __init__(self, hidden_size, eps=1e-6):
method forward (line 99) | def forward(self, hidden_states):
method extra_repr (line 106) | def extra_repr(self):
class Qwen2_5_VLPatchMerger (line 110) | class Qwen2_5_VLPatchMerger(nn.Module):
method __init__ (line 111) | def __init__(self, dim: int, context_dim: int, spatial_merge_size: int...
method forward (line 121) | def forward(self, x: torch.Tensor) -> torch.Tensor:
function apply_rotary_pos_emb_flashatt (line 127) | def apply_rotary_pos_emb_flashatt(
class Qwen2_5_VLVisionXLASdpaAttention (line 139) | class Qwen2_5_VLVisionXLASdpaAttention(nn.Module):
method __init__ (line 140) | def __init__(self, dim: int, num_heads: int = 16) -> None:
method forward (line 147) | def forward(
class Qwen2_5_VLVisionXLAFlashAttention2 (line 206) | class Qwen2_5_VLVisionXLAFlashAttention2(nn.Module):
method __init__ (line 207) | def __init__(self, dim: int, num_heads: int = 16) -> None:
method forward (line 214) | def forward(
class Qwen2_5_VLVisionFlashAttention2 (line 266) | class Qwen2_5_VLVisionFlashAttention2(nn.Module):
method __init__ (line 267) | def __init__(self, dim: int, num_heads: int = 16) -> None:
method forward (line 273) | def forward(
function rotate_half (line 318) | def rotate_half(x):
function apply_rotary_pos_emb_vision (line 325) | def apply_rotary_pos_emb_vision(
class Qwen2_5_VLVisionAttention (line 339) | class Qwen2_5_VLVisionAttention(nn.Module):
method __init__ (line 340) | def __init__(self, dim: int, num_heads: int = 16) -> None:
method forward (line 347) | def forward(
class Qwen2_5_VLVisionSdpaAttention (line 408) | class Qwen2_5_VLVisionSdpaAttention(nn.Module):
method __init__ (line 409) | def __init__(self, dim: int, num_heads: int = 16) -> None:
method unpack_qkv_with_mask (line 415) | def unpack_qkv_with_mask(self, q, k, v, cu_seqlens):
method forward (line 510) | def forward(
class Qwen2_5_VLVisionBlock (line 578) | class Qwen2_5_VLVisionBlock(nn.Module):
method __init__ (line 579) | def __init__(self, config, attn_implementation: str = "sdpa") -> None:
method forward (line 588) | def forward(
class Qwen2_5_VLPreTrainedModel (line 622) | class Qwen2_5_VLPreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 633) | def _init_weights(self, module):
class Qwen2_5_VisionTransformerPretrainedModel (line 645) | class Qwen2_5_VisionTransformerPretrainedModel(Qwen2_5_VLPreTrainedModel):
method __init__ (line 649) | def __init__(self, config, *inputs, **kwargs) -> None:
method rot_pos_emb (line 680) | def rot_pos_emb(self, grid_thw):
method forward (line 720) | def forward(
class SuryaEncoderModel (line 775) | class SuryaEncoderModel(Qwen2_5_VisionTransformerPretrainedModel):
method image_size (line 777) | def image_size(self) -> int:
method hidden_size (line 789) | def hidden_size(self) -> int:
method embed_images (line 793) | def embed_images(
FILE: surya/common/surya/encoder/config.py
class SuryaEncoderConfig (line 7) | class SuryaEncoderConfig(PretrainedConfig):
method __init__ (line 16) | def __init__(
FILE: surya/common/surya/flash_attn_utils.py
function _get_unpad_data (line 9) | def _get_unpad_data(attention_mask: torch.Tensor) -> tuple[torch.Tensor,...
function _upad_input (line 35) | def _upad_input(
function flash_attn_prefill (line 106) | def flash_attn_prefill(
function flash_attn_decode (line 154) | def flash_attn_decode(
FILE: surya/common/surya/processor/__init__.py
class SuryaOCRProcessor (line 46) | class SuryaOCRProcessor(S3DownloaderMixin, ProcessorMixin):
method __init__ (line 54) | def __init__(
method vocab_size (line 145) | def vocab_size(self):
method image_processor (line 148) | def image_processor(self, image: Image.Image) -> np.ndarray:
method scale_to_fit (line 154) | def scale_to_fit(
method _image_processor (line 193) | def _image_processor(self, image: np.ndarray):
method _process_and_tile (line 198) | def _process_and_tile(
method _process_image_input (line 254) | def _process_image_input(self, image_input: ImageInput) -> ProcessorOu...
method _process_text_input (line 281) | def _process_text_input(self, text_input: TextInput, task: str) -> Pro...
method _process_input (line 298) | def _process_input(self, input_dict: dict, task: str):
method _process_ocr_with_boxes (line 310) | def _process_ocr_with_boxes(
method _process_layout (line 356) | def _process_layout(self, mixed_input: List[dict], bos_token_id: int):
method _process_table_structure (line 361) | def _process_table_structure(self, mixed_input: List[dict], bos_token_...
method _process_ocr_without_boxes (line 366) | def _process_ocr_without_boxes(
method _process_block_without_boxes (line 378) | def _process_block_without_boxes(
method align_long_axis (line 388) | def align_long_axis(self, image: np.ndarray) -> Tuple[np.ndarray, bool]:
method __call__ (line 396) | def __call__(
method decode (line 475) | def decode(self, tokens: List[int], task: str):
FILE: surya/common/surya/processor/schema.py
class TaskDict (line 7) | class TaskDict(TypedDict):
class TasksDict (line 12) | class TasksDict(TypedDict):
class ProcessorInput (line 18) | class ProcessorInput(TypedDict):
class ImageInput (line 22) | class ImageInput(ProcessorInput):
class TextInput (line 28) | class TextInput(ProcessorInput):
class ProcessorOutput (line 34) | class ProcessorOutput(TypedDict):
FILE: surya/common/surya/processor/tokenizer.py
function create_token_regex (line 20) | def create_token_regex(tokens):
class Qwen2Tokenizer (line 28) | class Qwen2Tokenizer(S3DownloaderMixin, Qwen2OriginalTokenizer):
class GreedyMathUTF16Tokenizer (line 31) | class GreedyMathUTF16Tokenizer(S3DownloaderMixin, PreTrainedTokenizer):
method _to_utf16_units (line 53) | def _to_utf16_units(s: str) -> List[int]:
method _from_utf16_units (line 58) | def _from_utf16_units(units: List[int]) -> str:
class _TrieNode (line 64) | class _TrieNode:
method __init__ (line 67) | def __init__(self):
method _build_trie (line 73) | def _build_trie(
method _build_escape_patterns (line 85) | def _build_escape_patterns(self, math_token_to_rawid):
method _encode_math_greedy (line 119) | def _encode_math_greedy(
method __init__ (line 152) | def __init__(
method vocab_size (line 302) | def vocab_size(self) -> int:
method get_vocab (line 305) | def get_vocab(self) -> Dict[str, int]:
method __len__ (line 311) | def __len__(self) -> int:
method _encode_core (line 315) | def _encode_core(self, text: str) -> List[int]:
method _fix_latex_escapes (line 346) | def _fix_latex_escapes(self, text: str) -> str:
method _decode_core (line 375) | def _decode_core(self, ids: Iterable[int]) -> str:
method _tokenize (line 398) | def _tokenize(self, text: str, **kwargs) -> List[str]:
method _convert_token_to_id (line 410) | def _convert_token_to_id(self, token: str) -> int:
method _convert_id_to_token (line 428) | def _convert_id_to_token(self, index: int) -> str:
method convert_tokens_to_string (line 435) | def convert_tokens_to_string(self, tokens: List[str]) -> str:
method decode (line 439) | def decode(self, token_ids, skip_special_tokens: bool = False, **kwarg...
method build_inputs_with_special_tokens (line 454) | def build_inputs_with_special_tokens(
method get_special_tokens_mask (line 466) | def get_special_tokens_mask(
method create_token_type_ids_from_sequences (line 481) | def create_token_type_ids_from_sequences(
method save_vocabulary (line 491) | def save_vocabulary(
class SuryaOCRTokenizer (line 515) | class SuryaOCRTokenizer(S3DownloaderMixin, PreTrainedTokenizer):
method __init__ (line 516) | def __init__(
method get_vocab (line 538) | def get_vocab(self) -> Dict[str, int]:
method _add_tokens (line 541) | def _add_tokens(
method vocab_size (line 551) | def vocab_size(self):
method _tokenize (line 554) | def _tokenize(self, text: str, **kwargs):
method __call__ (line 562) | def __call__(
method decode (line 585) | def decode(self, token_ids, **kwargs):
FILE: surya/common/surya/schema.py
class TaskNames (line 1) | class TaskNames:
FILE: surya/common/util.py
function clean_boxes (line 11) | def clean_boxes(boxes: List[PolygonBox]) -> List[PolygonBox]:
function rescale_bbox (line 41) | def rescale_bbox(bbox, processor_size, image_size):
function expand_bbox (line 56) | def expand_bbox(bbox, expansion_factor=0.01):
function script_ranges (line 99) | def script_ranges():
function get_top_scripts (line 207) | def get_top_scripts(text: str, max_scripts: int = 5):
function is_flash_attn_2_supported (line 223) | def is_flash_attn_2_supported(device: str | torch.device) -> bool:
function pad_to_batch_size_repeat (line 247) | def pad_to_batch_size_repeat(tensor: torch.Tensor, batch_size: int):
function pad_to_batch_size (line 263) | def pad_to_batch_size(tensor: torch.Tensor, batch_size: int):
FILE: surya/common/xla.py
function get_nearest_pad (line 10) | def get_nearest_pad(
function get_compile_args (line 16) | def get_compile_args(device: str) -> dict:
function mark_step (line 25) | def mark_step():
FILE: surya/debug/draw.py
function draw_bboxes_on_image (line 7) | def draw_bboxes_on_image(
function draw_polys_on_image (line 21) | def draw_polys_on_image(
FILE: surya/debug/fonts.py
function get_font_path (line 8) | def get_font_path(langs: Optional[List[str]] = None) -> str:
FILE: surya/debug/render_html.py
function render_text_as_html (line 7) | def render_text_as_html(
FILE: surya/debug/text.py
function strip_html_tags (line 17) | def strip_html_tags(html_text):
function get_text_size (line 24) | def get_text_size(text, font):
function render_text (line 31) | def render_text(draw, text, s_bbox, bbox_width, bbox_height, font_path, ...
function draw_text_with_playwright (line 47) | def draw_text_with_playwright(
function draw_text_on_image (line 71) | def draw_text_on_image(
FILE: surya/detection/__init__.py
class DetectionPredictor (line 22) | class DetectionPredictor(BasePredictor):
method __call__ (line 27) | def __call__(
method prepare_image (line 50) | def prepare_image(self, img):
method batch_detection (line 64) | def batch_detection(
FILE: surya/detection/heatmap.py
function get_dynamic_thresholds (line 13) | def get_dynamic_thresholds(linemap, text_threshold, low_text, typical_to...
function detect_boxes (line 26) | def detect_boxes(linemap, text_threshold, low_text):
function get_detected_boxes (line 113) | def get_detected_boxes(textmap, text_threshold=None, low_text=None) -> L...
function get_and_clean_boxes (line 130) | def get_and_clean_boxes(
function parallel_get_boxes (line 142) | def parallel_get_boxes(preds, orig_sizes, include_maps=False):
FILE: surya/detection/loader.py
class DetectionModelLoader (line 16) | class DetectionModelLoader(ModelLoader):
method __init__ (line 17) | def __init__(self, checkpoint: Optional[str] = None):
method model (line 23) | def model(
method processor (line 58) | def processor(
FILE: surya/detection/model/config.py
class EfficientViTConfig (line 6) | class EfficientViTConfig(S3DownloaderMixin, PretrainedConfig):
method __init__ (line 12) | def __init__(
FILE: surya/detection/model/encoderdecoder.py
function val2list (line 28) | def val2list(x: Union[List, Tuple, Any], repeat_time=1):
function val2tuple (line 34) | def val2tuple(x: Union[List, Tuple, Any], min_len: int = 1, idx_repeat: ...
function get_same_padding (line 43) | def get_same_padding(
function get_padding (line 53) | def get_padding(kernel_size: int, stride: int = 1, dilation: int = 1) ->...
class ConvNormAct (line 58) | class ConvNormAct(nn.Module):
method __init__ (line 59) | def __init__(
method forward (line 90) | def forward(self, x):
class DSConv (line 97) | class DSConv(nn.Module):
method __init__ (line 98) | def __init__(
method forward (line 132) | def forward(self, x):
class ConvBlock (line 138) | class ConvBlock(nn.Module):
method __init__ (line 139) | def __init__(
method forward (line 176) | def forward(self, x):
class MBConv (line 182) | class MBConv(nn.Module):
method __init__ (line 183) | def __init__(
method forward (line 229) | def forward(self, x):
class FusedMBConv (line 236) | class FusedMBConv(nn.Module):
method __init__ (line 237) | def __init__(
method forward (line 275) | def forward(self, x):
class LiteMLA (line 281) | class LiteMLA(nn.Module):
method __init__ (line 284) | def __init__(
method _attn (line 348) | def _attn(self, q, k, v):
method forward (line 356) | def forward(self, x):
class EfficientVitBlock (line 385) | class EfficientVitBlock(nn.Module):
method __init__ (line 386) | def __init__(
method forward (line 418) | def forward(self, x):
class ResidualBlock (line 424) | class ResidualBlock(nn.Module):
method __init__ (line 425) | def __init__(
method forward (line 436) | def forward(self, x):
function build_local_block (line 443) | def build_local_block(
class Stem (line 502) | class Stem(nn.Sequential):
method __init__ (line 503) | def __init__(
class EfficientVitLargeStage (line 548) | class EfficientVitLargeStage(nn.Module):
method __init__ (line 549) | def __init__(
method forward (line 614) | def forward(self, x):
class EfficientVitLarge (line 618) | class EfficientVitLarge(nn.Module):
method __init__ (line 619) | def __init__(
method set_grad_checkpointing (line 672) | def set_grad_checkpointing(self, enable=True):
method forward (line 675) | def forward(self, x):
class EfficientViTPreTrainedModel (line 685) | class EfficientViTPreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 695) | def _init_weights(self, module):
class DecodeMLP (line 712) | class DecodeMLP(nn.Module):
method __init__ (line 713) | def __init__(self, input_dim, output_dim):
method forward (line 717) | def forward(self, hidden_states: torch.Tensor):
class DecodeHead (line 725) | class DecodeHead(EfficientViTPreTrainedModel):
method __init__ (line 726) | def __init__(self, config: EfficientViTConfig):
method forward (line 755) | def forward(self, encoder_hidden_states: torch.FloatTensor) -> torch.T...
class EfficientViTForSemanticSegmentation (line 786) | class EfficientViTForSemanticSegmentation(
method __init__ (line 789) | def __init__(self, config, **kwargs):
method forward (line 797) | def forward(
class EfficientViTForSemanticLayoutSegmentation (line 815) | class EfficientViTForSemanticLayoutSegmentation(EfficientViTPreTrainedMo...
method __init__ (line 816) | def __init__(self, config, **kwargs):
method forward (line 824) | def forward(
FILE: surya/detection/parallel.py
class FakeFuture (line 1) | class FakeFuture:
method __init__ (line 2) | def __init__(self, func, *args, **kwargs):
method result (line 5) | def result(self):
class FakeExecutor (line 8) | class FakeExecutor:
method __init__ (line 9) | def __init__(self, **kwargs):
method __enter__ (line 12) | def __enter__(self):
method __exit__ (line 15) | def __exit__(self, *excinfo):
method submit (line 18) | def submit(self, fn, *args, **kwargs):
FILE: surya/detection/processor.py
class SegformerImageProcessor (line 42) | class SegformerImageProcessor(S3DownloaderMixin, BaseImageProcessor):
method __init__ (line 80) | def __init__(
method from_dict (line 131) | def from_dict(cls, image_processor_dict: Dict[str, Any], **kwargs):
method _preprocess (line 142) | def _preprocess(
method _preprocess_image (line 164) | def _preprocess_image(
method __call__ (line 199) | def __call__(self, images, segmentation_maps=None, **kwargs):
method preprocess (line 208) | def preprocess(
FILE: surya/detection/schema.py
class TextDetectionResult (line 8) | class TextDetectionResult(BaseModel):
FILE: surya/detection/util.py
function get_total_splits (line 7) | def get_total_splits(image_size, height):
function split_image (line 16) | def split_image(img, height):
FILE: surya/foundation/__init__.py
class ContinuousBatchInput (line 35) | class ContinuousBatchInput:
class ContinuousBatchOutput (line 47) | class ContinuousBatchOutput:
class FoundationPrompt (line 56) | class FoundationPrompt:
class FoundationPredictor (line 64) | class FoundationPredictor(BasePredictor):
method __init__ (line 106) | def __init__(
method to (line 138) | def to(self, device_dtype: torch.device | str | None = None):
method get_encoder_chunk_size (line 142) | def get_encoder_chunk_size(self) -> int:
method setup_cache (line 152) | def setup_cache(self, batch_size: int, max_cache_len: int, max_sliding...
method num_empty_slots (line 166) | def num_empty_slots(self):
method num_active_slots (line 170) | def num_active_slots(self):
method prepare_input (line 173) | def prepare_input(
method process_outputs (line 208) | def process_outputs(
method maybe_insert_beacon_tokens (line 243) | def maybe_insert_beacon_tokens(
method decode (line 315) | def decode(
method pad_and_shift_input_ids_position_ids (line 400) | def pad_and_shift_input_ids_position_ids(
method get_cache_position (line 442) | def get_cache_position(
method prefill (line 465) | def prefill(
method get_max_image_token_count (line 670) | def get_max_image_token_count(
method prediction_loop (line 714) | def prediction_loop(
FILE: surya/foundation/cache/dynamic_ops.py
class DynamicOpsCache (line 16) | class DynamicOpsCache:
method __init__ (line 17) | def __init__(
method update (line 74) | def update(
method update_text_counts (line 92) | def update_text_counts(
method prefill_attention_mask_update (line 107) | def prefill_attention_mask_update(
method _prefill_update (line 150) | def _prefill_update(
method decode_attention_mask_update (line 234) | def decode_attention_mask_update(
method _decode_update (line 297) | def _decode_update(
method get_seq_length (line 398) | def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
FILE: surya/foundation/cache/static_ops.py
class StaticOpsCache (line 18) | class StaticOpsCache(DynamicOpsCache):
method __init__ (line 19) | def __init__(
method update (line 76) | def update(
method _prefill_update (line 94) | def _prefill_update(
method decode_attention_mask_update (line 128) | def decode_attention_mask_update(
method prefill_attention_mask_update (line 143) | def prefill_attention_mask_update(
method _decode_update (line 157) | def _decode_update(
method get_seq_length (line 193) | def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
FILE: surya/foundation/loader.py
class FoundationModelLoader (line 19) | class FoundationModelLoader(ModelLoader):
method __init__ (line 20) | def __init__(self, checkpoint: Optional[str] = None):
method model (line 26) | def model(
method processor (line 94) | def processor(
FILE: surya/foundation/util.py
function detect_repeat_token (line 5) | def detect_repeat_token(predicted_tokens: List[int], max_repeats: int = ...
function prediction_to_polygon_batch (line 17) | def prediction_to_polygon_batch(
FILE: surya/input/load.py
function get_name_from_path (line 15) | def get_name_from_path(path):
function load_pdf (line 19) | def load_pdf(pdf_path, page_range: List[int] | None = None, dpi=settings...
function load_image (line 36) | def load_image(image_path):
function load_from_file (line 42) | def load_from_file(
function load_from_folder (line 52) | def load_from_folder(
function load_lang_file (line 81) | def load_lang_file(lang_path, names):
FILE: surya/input/processing.py
function convert_if_not_rgb (line 14) | def convert_if_not_rgb(images: List[Image.Image]) -> List[Image.Image]:
function open_pdf (line 23) | def open_pdf(pdf_filepath):
function get_page_images (line 27) | def get_page_images(doc, indices: List, dpi=settings.IMAGE_DPI):
function slice_bboxes_from_image (line 35) | def slice_bboxes_from_image(image: np.ndarray, bboxes):
function slice_polys_from_image (line 57) | def slice_polys_from_image(image: np.ndarray, polys):
function slice_and_pad_poly (line 64) | def slice_and_pad_poly(image_array: np.array, coordinates):
FILE: surya/layout/__init__.py
class LayoutPredictor (line 15) | class LayoutPredictor(BasePredictor):
method __init__ (line 20) | def __init__(self, foundation_predictor: FoundationPredictor):
method disable_tqdm (line 29) | def disable_tqdm(self) -> bool:
method disable_tqdm (line 33) | def disable_tqdm(self, value: bool) -> None:
method __call__ (line 37) | def __call__(
FILE: surya/layout/schema.py
class LayoutBox (line 8) | class LayoutBox(PolygonBox):
class LayoutResult (line 14) | class LayoutResult(BaseModel):
FILE: surya/logging.py
function configure_logging (line 6) | def configure_logging():
function get_logger (line 26) | def get_logger():
FILE: surya/models.py
function load_predictors (line 18) | def load_predictors(
FILE: surya/ocr_error/__init__.py
class OCRErrorPredictor (line 14) | class OCRErrorPredictor(BasePredictor):
method __call__ (line 19) | def __call__(self, texts: List[str], batch_size: Optional[int] = None):
method batch_ocr_error_detection (line 22) | def batch_ocr_error_detection(
FILE: surya/ocr_error/loader.py
class OCRErrorModelLoader (line 15) | class OCRErrorModelLoader(ModelLoader):
method __init__ (line 16) | def __init__(self, checkpoint: Optional[str] = None):
method model (line 22) | def model(
method processor (line 56) | def processor(
FILE: surya/ocr_error/model/config.py
class DistilBertConfig (line 14) | class DistilBertConfig(S3DownloaderMixin, PretrainedConfig):
method __init__ (line 22) | def __init__(
class DistilBertOnnxConfig (line 56) | class DistilBertOnnxConfig(OnnxConfig):
method inputs (line 58) | def inputs(self) -> Mapping[str, Mapping[int, str]]:
FILE: surya/ocr_error/model/encoder.py
function _get_unpad_data (line 28) | def _get_unpad_data(attention_mask):
function create_sinusoidal_embeddings (line 40) | def create_sinusoidal_embeddings(n_pos: int, dim: int, out: torch.Tensor):
class Embeddings (line 53) | class Embeddings(nn.Module):
method __init__ (line 54) | def __init__(self, config: DistilBertConfig):
method forward (line 71) | def forward(
class MultiHeadSelfAttention (line 113) | class MultiHeadSelfAttention(nn.Module):
method __init__ (line 114) | def __init__(self, config: DistilBertConfig):
method prune_heads (line 138) | def prune_heads(self, heads: List[int]):
method forward (line 154) | def forward(
class DistilBertFlashAttention2 (line 225) | class DistilBertFlashAttention2(MultiHeadSelfAttention):
method __init__ (line 233) | def __init__(self, *args, **kwargs):
method forward (line 241) | def forward(
method _flash_attention_forward (line 311) | def _flash_attention_forward(
method _upad_input (line 395) | def _upad_input(
class FFN (line 443) | class FFN(nn.Module):
method __init__ (line 444) | def __init__(self, config: DistilBertConfig):
method forward (line 453) | def forward(self, input: torch.Tensor) -> torch.Tensor:
method ff_chunk (line 458) | def ff_chunk(self, input: torch.Tensor) -> torch.Tensor:
class TransformerBlock (line 472) | class TransformerBlock(nn.Module):
method __init__ (line 473) | def __init__(self, config: DistilBertConfig):
method forward (line 490) | def forward(
class Transformer (line 536) | class Transformer(nn.Module):
method __init__ (line 537) | def __init__(self, config: DistilBertConfig):
method forward (line 545) | def forward(
class DistilBertPreTrainedModel (line 625) | class DistilBertPreTrainedModel(SuryaPreTrainedModel):
method _init_weights (line 637) | def _init_weights(self, module: nn.Module):
class DistilBertModel (line 660) | class DistilBertModel(DistilBertPreTrainedModel):
method __init__ (line 661) | def __init__(self, config: DistilBertConfig):
method get_position_embeddings (line 671) | def get_position_embeddings(self) -> nn.Embedding:
method resize_position_embeddings (line 677) | def resize_position_embeddings(self, new_num_position_embeddings: int):
method get_input_embeddings (line 726) | def get_input_embeddings(self) -> nn.Embedding:
method set_input_embeddings (line 729) | def set_input_embeddings(self, new_embeddings: nn.Embedding):
method _prune_heads (line 732) | def _prune_heads(self, heads_to_prune: Dict[int, List[List[int]]]):
method forward (line 740) | def forward(
class DistilBertForSequenceClassification (line 805) | class DistilBertForSequenceClassification(S3DownloaderMixin, DistilBertP...
method __init__ (line 806) | def __init__(self, config: DistilBertConfig, **kwargs):
method get_position_embeddings (line 819) | def get_position_embeddings(self) -> nn.Embedding:
method resize_position_embeddings (line 825) | def resize_position_embeddings(self, new_num_position_embeddings: int):
method forward (line 839) | def forward(
FILE: surya/ocr_error/schema.py
class OCRErrorDetectionResult (line 6) | class OCRErrorDetectionResult(BaseModel):
FILE: surya/ocr_error/tokenizer.py
function load_vocab (line 17) | def load_vocab(vocab_file):
function whitespace_tokenize (line 29) | def whitespace_tokenize(text):
class DistilBertTokenizer (line 38) | class DistilBertTokenizer(S3DownloaderMixin, PreTrainedTokenizer):
method __init__ (line 83) | def __init__(
method do_lower_case (line 131) | def do_lower_case(self):
method vocab_size (line 136) | def vocab_size(self):
method get_vocab (line 140) | def get_vocab(self):
method _tokenize (line 144) | def _tokenize(self, text, split_special_tokens=False):
method _convert_token_to_id (line 160) | def _convert_token_to_id(self, token):
method _convert_id_to_token (line 165) | def _convert_id_to_token(self, index):
method convert_tokens_to_string (line 170) | def convert_tokens_to_string(self, tokens):
method build_inputs_with_special_tokens (line 176) | def build_inputs_with_special_tokens(
method get_special_tokens_mask (line 202) | def get_special_tokens_mask(
method create_token_type_ids_from_sequences (line 231) | def create_token_type_ids_from_sequences(
method save_vocabulary (line 261) | def save_vocabulary(self, save_directory: str, filename_prefix: Option...
class BasicTokenizer (line 283) | class BasicTokenizer(object):
method __init__ (line 306) | def __init__(
method tokenize (line 322) | def tokenize(self, text, never_split=None):
method _run_strip_accents (line 360) | def _run_strip_accents(self, text):
method _run_split_on_punc (line 371) | def _run_split_on_punc(self, text, never_split=None):
method _tokenize_chinese_chars (line 393) | def _tokenize_chinese_chars(self, text):
method _is_chinese_char (line 406) | def _is_chinese_char(self, cp):
method _clean_text (line 430) | def _clean_text(self, text):
class WordpieceTokenizer (line 445) | class WordpieceTokenizer(object):
method __init__ (line 448) | def __init__(self, vocab, unk_token, max_input_chars_per_word=100):
method tokenize (line 453) | def tokenize(self, text):
FILE: surya/recognition/__init__.py
class RecognitionPredictor (line 40) | class RecognitionPredictor(BasePredictor):
method __init__ (line 45) | def __init__(self, foundation_predictor: FoundationPredictor):
method disable_tqdm (line 54) | def disable_tqdm(self) -> bool:
method disable_tqdm (line 58) | def disable_tqdm(self, value: bool) -> None:
method detect_and_slice_bboxes (line 62) | def detect_and_slice_bboxes(
method slice_bboxes (line 123) | def slice_bboxes(
method get_bboxes_text (line 183) | def get_bboxes_text(
method __call__ (line 338) | def __call__(
FILE: surya/recognition/postprocessing.py
function truncate_repetitions (line 7) | def truncate_repetitions(text: str, min_len=15):
function extract_tags (line 38) | def extract_tags(proposed_tags: List[str]) -> List[str]:
function cleanup_math (line 55) | def cleanup_math(line: str):
function fix_unbalanced_tags (line 74) | def fix_unbalanced_tags(
FILE: surya/recognition/schema.py
class BaseChar (line 10) | class BaseChar(PolygonBox):
method validate_confidence (line 16) | def validate_confidence(cls, v: float) -> float:
class TextChar (line 24) | class TextChar(BaseChar):
class TextWord (line 28) | class TextWord(BaseChar):
class TextLine (line 32) | class TextLine(BaseChar):
class OCRResult (line 38) | class OCRResult(BaseModel):
FILE: surya/recognition/util.py
function unwrap_math (line 13) | def unwrap_math(text: str) -> str:
function filter_blacklist_tags (line 34) | def filter_blacklist_tags(text_chars: List[TextChar], tags_to_filter: Li...
function clean_math_tags (line 79) | def clean_math_tags(html: str) -> str:
function sort_text_lines (line 104) | def sort_text_lines(lines: List[TextLine] | List[dict], tolerance=1.25):
function clean_close_polygons (line 132) | def clean_close_polygons(bboxes: List[List[List[int]]], thresh: float = ...
function words_from_chars (line 155) | def words_from_chars(chars: List[TextChar], line_box: PolygonBox):
FILE: surya/scripts/config.py
class CLILoader (line 9) | class CLILoader:
method __init__ (line 10) | def __init__(self, filepath: str, cli_options: dict, highres: bool = F...
method common_options (line 23) | def common_options(fn):
method load (line 31) | def load(self, highres: bool = False):
method parse_range_str (line 53) | def parse_range_str(range_str: str) -> List[int]:
FILE: surya/scripts/detect_layout.py
function detect_layout_cli (line 21) | def detect_layout_cli(input_path: str, **kwargs):
FILE: surya/scripts/detect_text.py
function detect_text_cli (line 19) | def detect_text_cli(input_path: str, **kwargs):
FILE: surya/scripts/finetune_ocr.py
class SuryaOCRDataset (line 25) | class SuryaOCRDataset(torch.utils.data.Dataset):
method __init__ (line 26) | def __init__(self, processor: SuryaOCRProcessor, data_args: SuryaOCRDa...
method __len__ (line 31) | def __len__(self):
method get_script_text (line 34) | def get_script_text(self, text: str) -> str:
method __getitem__ (line 39) | def __getitem__(self, index):
class SuryaOCRDataCollator (line 65) | class SuryaOCRDataCollator:
method __init__ (line 66) | def __init__(self, processor: SuryaOCRProcessor, data_args: SuryaOCRDa...
method __call__ (line 70) | def __call__(self, inputs):
function load_model_and_processor (line 91) | def load_model_and_processor(checkpoint_path: Optional[str] = None) -> T...
class SuryaOCRModelArguments (line 96) | class SuryaOCRModelArguments:
class SuryaOCRDataArguments (line 100) | class SuryaOCRDataArguments:
class SuryaOCRTrainingArguments (line 106) | class SuryaOCRTrainingArguments(TrainingArguments):
function main (line 109) | def main():
FILE: surya/scripts/hf_to_s3.py
function main (line 25) | def main(
FILE: surya/scripts/ocr_latex.py
function ocr_latex_cli (line 20) | def ocr_latex_cli(input_path: str, **kwargs):
FILE: surya/scripts/ocr_text.py
function ocr_text_cli (line 25) | def ocr_text_cli(input_path: str, task_name: str, disable_math: bool, **...
FILE: surya/scripts/run_streamlit_app.py
function streamlit_app_cli (line 5) | def streamlit_app_cli():
FILE: surya/scripts/run_texify_app.py
function texify_app_cli (line 5) | def texify_app_cli():
FILE: surya/scripts/streamlit_app.py
function load_predictors_cached (line 24) | def load_predictors_cached():
function ocr_errors (line 28) | def ocr_errors(pdf_file, page_count, sample_len=512, max_samples=10, max...
function text_detection (line 61) | def text_detection(img) -> (Image.Image, TextDetectionResult):
function layout_detection (line 68) | def layout_detection(img) -> (Image.Image, LayoutResult):
function table_recognition (line 80) | def table_recognition(
function ocr (line 135) | def ocr(
function open_pdf (line 180) | def open_pdf(pdf_file):
function get_page_image (line 186) | def get_page_image(pdf_file, page_num, dpi=settings.IMAGE_DPI):
function page_counter (line 200) | def page_counter(pdf_file):
FILE: surya/scripts/table_recognition.py
function table_recognition_cli (line 28) | def table_recognition_cli(input_path: str, skip_table_detection: bool, *...
FILE: surya/scripts/texify_app.py
function replace_fences (line 28) | def replace_fences(text):
function load_predictor (line 36) | def load_predictor():
function inference (line 42) | def inference(pil_image: Image.Image, bbox: List[float]):
function open_pdf (line 51) | def open_pdf(pdf_file):
function get_page_image (line 57) | def get_page_image(pdf_file, page_num, dpi=settings.IMAGE_DPI_HIGHRES):
function page_counter (line 71) | def page_counter(pdf_file):
function resize_image (line 78) | def resize_image(pil_image):
function get_canvas_hash (line 84) | def get_canvas_hash(pil_image):
FILE: surya/settings.py
class Settings (line 12) | class Settings(BaseSettings):
method TORCH_DEVICE_MODEL (line 34) | def TORCH_DEVICE_MODEL(self) -> str:
method DETECTOR_STATIC_CACHE (line 140) | def DETECTOR_STATIC_CACHE(self) -> bool:
method LAYOUT_STATIC_CACHE (line 148) | def LAYOUT_STATIC_CACHE(self) -> bool:
method FOUNDATION_XLA (line 154) | def FOUNDATION_XLA(self) -> bool:
method FOUNDATION_STATIC_CACHE (line 160) | def FOUNDATION_STATIC_CACHE(self) -> bool:
method TABLE_REC_STATIC_CACHE (line 168) | def TABLE_REC_STATIC_CACHE(self) -> bool:
method OCR_ERROR_STATIC_CACHE (line 176) | def OCR_ERROR_STATIC_CACHE(self) -> bool:
method MODEL_DTYPE (line 184) | def MODEL_DTYPE(self) -> torch.dtype:
method MODEL_DTYPE_BFLOAT (line 192) | def MODEL_DTYPE_BFLOAT(self) -> torch.dtype:
method INFERENCE_MODE (line 200) | def INFERENCE_MODE(self) -> Callable:
class Config (line 205) | class Config:
FILE: surya/table_rec/__init__.py
class TableRecPredictor (line 21) | class TableRecPredictor(BasePredictor):
method __call__ (line 31) | def __call__(self, images: List[Image.Image], batch_size: int | None =...
method inference_loop (line 34) | def inference_loop(
method batch_table_recognition (line 134) | def batch_table_recognition(
method decode_batch_predictions (line 239) | def decode_batch_predictions(self, rowcol_predictions, cell_prediction...
FILE: surya/table_rec/loader.py
class TableRecModelLoader (line 19) | class TableRecModelLoader(ModelLoader):
method __init__ (line 20) | def __init__(self, checkpoint: Optional[str] = None):
method model (line 26) | def model(
method processor (line 77) | def processor(
FILE: surya/table_rec/model/config.py
class TableRecModelOutput (line 48) | class TableRecModelOutput(ModelOutput):
class SuryaTableRecConfig (line 53) | class SuryaTableRecConfig(S3DownloaderMixin, PretrainedConfig):
method __init__ (line 57) | def __init__(self, **kwargs):
class DonutSwinTableRecConfig (line 81) | class DonutSwinTableRecConfig(PretrainedConfig):
method __init__ (line 89) | def __init__(
class SuryaTableRecDecoderConfig (line 139) | class SuryaTableRecDecoderConfig(PretrainedConfig):
method __init__ (line 142) | def __init__(
method layers_block_type (line 234) | def layers_block_type(self):
FILE: surya/table_rec/model/decoder.py
class LabelEmbedding (line 12) | class LabelEmbedding(nn.Module):
method __init__ (line 13) | def __init__(self, config):
method forward (line 46) | def forward(self, boxes: torch.LongTensor, *args):
class SuryaTableRecDecoder (line 76) | class SuryaTableRecDecoder(SuryaADETRDecoderPreTrainedModel):
method __init__ (line 79) | def __init__(self, config, **kwargs):
method get_input_embeddings (line 102) | def get_input_embeddings(self):
method set_input_embeddings (line 105) | def set_input_embeddings(self, value):
method get_output_embeddings (line 108) | def get_output_embeddings(self):
method set_output_embeddings (line 111) | def set_output_embeddings(self, new_embeddings):
method set_decoder (line 114) | def set_decoder(self, decoder):
method get_decoder (line 117) | def get_decoder(self):
method forward (line 121) | def forward(
FILE: surya/table_rec/model/encoder.py
class DonutSwinModel (line 9) | class DonutSwinModel(DonutSwinPreTrainedModel):
method __init__ (line 10) | def __init__(self, config, add_pooling_layer=True, use_mask_token=False):
method get_input_embeddings (line 26) | def get_input_embeddings(self):
method _prune_heads (line 29) | def _prune_heads(self, heads_to_prune):
method forward (line 37) | def forward(
FILE: surya/table_rec/model/encoderdecoder.py
class TableRecOutput (line 15) | class TableRecOutput(ModelOutput):
class TableRecEncoderDecoderModel (line 20) | class TableRecEncoderDecoderModel(S3DownloaderMixin, SuryaPreTrainedModel):
method __init__ (line 27) | def __init__(
method get_encoder (line 56) | def get_encoder(self):
method get_decoder (line 59) | def get_decoder(self):
method get_output_embeddings (line 62) | def get_output_embeddings(self):
method set_output_embeddings (line 65) | def set_output_embeddings(self, new_embeddings):
method forward (line 68) | def forward(
method resize_token_embeddings (line 101) | def resize_token_embeddings(self, *args, **kwargs):
method _reorder_cache (line 107) | def _reorder_cache(self, past_key_values, beam_idx):
FILE: surya/table_rec/processor.py
class SuryaTableRecProcessor (line 14) | class SuryaTableRecProcessor(S3DownloaderMixin, ProcessorMixin):
method __init__ (line 18) | def __init__(self, checkpoint, **kwargs):
method resize_polygon (line 29) | def resize_polygon(self, polygon, orig_size, new_size):
method __call__ (line 48) | def __call__(
FILE: surya/table_rec/schema.py
class TableCell (line 8) | class TableCell(PolygonBox):
method label (line 21) | def label(self):
class TableRow (line 25) | class TableRow(PolygonBox):
method label (line 30) | def label(self):
class TableCol (line 34) | class TableCol(PolygonBox):
method label (line 39) | def label(self):
class TableResult (line 43) | class TableResult(BaseModel):
FILE: surya/table_rec/shaper.py
class LabelShaper (line 8) | class LabelShaper:
method __init__ (line 9) | def __init__(self):
method dict_to_labels (line 12) | def dict_to_labels(self, label_components: List[dict]):
method component_idx (line 54) | def component_idx(self, key):
method get_box_property (line 68) | def get_box_property(self, key, add_special_tokens=True):
method component_idx_dict (line 77) | def component_idx_dict(self):
method convert_polygons_to_bboxes (line 83) | def convert_polygons_to_bboxes(self, label_components: List[Dict]):
method convert_bbox_to_polygon (line 114) | def convert_bbox_to_polygon(self, box, skew_scaler=BOX_DIM // 2, skew_...
FILE: tests/conftest.py
function ocr_error_predictor (line 17) | def ocr_error_predictor() -> OCRErrorPredictor:
function layout_predictor (line 24) | def layout_predictor() -> LayoutPredictor:
function detection_predictor (line 31) | def detection_predictor() -> DetectionPredictor:
function recognition_predictor (line 38) | def recognition_predictor() -> RecognitionPredictor:
function table_rec_predictor (line 45) | def table_rec_predictor() -> TableRecPredictor:
function test_image (line 52) | def test_image():
function test_image_tall (line 66) | def test_image_tall():
function test_image_latex (line 79) | def test_image_latex():
FILE: tests/test_detection.py
function test_detection (line 1) | def test_detection(detection_predictor, test_image):
function test_detection_chunking (line 11) | def test_detection_chunking(detection_predictor, test_image_tall):
FILE: tests/test_foundation.py
function test_foundation_flash2 (line 4) | def test_foundation_flash2():
FILE: tests/test_latex_ocr.py
function test_latex_ocr (line 9) | def test_latex_ocr(recognition_predictor, test_image_latex):
FILE: tests/test_layout.py
function test_layout_topk (line 1) | def test_layout_topk(layout_predictor, test_image):
FILE: tests/test_ocr_errors.py
function test_garbled_text (line 1) | def test_garbled_text(ocr_error_predictor):
function test_good_text (line 10) | def test_good_text(ocr_error_predictor):
FILE: tests/test_recognition.py
function test_recognition (line 6) | def test_recognition(recognition_predictor, detection_predictor, test_im...
function test_recognition_input_text (line 17) | def test_recognition_input_text(recognition_predictor, detection_predict...
function test_recognition_drop_repeats (line 41) | def test_recognition_drop_repeats(recognition_predictor, detection_predi...
function test_recognition_clean_math (line 55) | def test_recognition_clean_math():
function test_recognition_clean_math_preserve_text (line 63) | def test_recognition_clean_math_preserve_text():
FILE: tests/test_table_rec.py
function test_table_rec (line 3) | def test_table_rec(table_rec_predictor):
function draw_table (line 23) | def draw_table(data, cell_width=100, cell_height=40):
Condensed preview — 136 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (791K chars).
[
{
"path": ".github/ISSUE_TEMPLATE/breaking-bug-report.md",
"chars": 1182,
"preview": "---\nname: Breaking bug report\nabout: Create a report about a breaking bug\ntitle: \"[BUG: Breaking]\"\nlabels: 'bug: breakin"
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.md",
"chars": 519,
"preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: \"[FEAT]\"\nlabels: enhancement\nassignees: ''\n\n---"
},
{
"path": ".github/ISSUE_TEMPLATE/output-bug-report.md",
"chars": 994,
"preview": "---\nname: Output bug report\nabout: Create a report about poor output quality\ntitle: \"[BUG: Output]\"\nlabels: 'bug: output"
},
{
"path": ".github/workflows/benchmarks.yml",
"chars": 1926,
"preview": "name: Integration test\n\non: [push]\n\nenv:\n PYTHONIOENCODING: \"utf-8\"\n\njobs:\n build:\n runs-on: t4_gpu\n steps:\n "
},
{
"path": ".github/workflows/ci.yml",
"chars": 481,
"preview": "name: Unit tests\n\non: [push]\n\njobs:\n build:\n runs-on: ${{ matrix.os }}\n strategy:\n matrix:\n os: [t4_g"
},
{
"path": ".github/workflows/cla.yml",
"chars": 1326,
"preview": "name: \"Surya CLA Assistant\"\non:\n issue_comment:\n types: [created]\n pull_request_target:\n types: [opened,closed,s"
},
{
"path": ".github/workflows/publish.yml",
"chars": 625,
"preview": "name: Python package\non:\n push:\n tags:\n - \"v*.*.*\"\njobs:\n build:\n runs-on: ubuntu-latest\n steps:\n -"
},
{
"path": ".github/workflows/scripts.yml",
"chars": 1330,
"preview": "name: Test CLI scripts\n\non: [push]\n\njobs:\n build:\n runs-on: t4_gpu\n steps:\n - uses: actions/checkout@v3\n "
},
{
"path": ".gitignore",
"chars": 3176,
"preview": "private.py\n.DS_Store\nlocal.env\nexperiments\ntest_data\ntraining\nwandb\nnotebooks\nresults\ndata\nslices\n\n# Byte-compiled / opt"
},
{
"path": ".pre-commit-config.yaml",
"chars": 271,
"preview": "repos:\n- repo: https://github.com/astral-sh/ruff-pre-commit\n # Ruff version.\n rev: v0.9.10\n hooks:\n # Run the lint"
},
{
"path": "CITATION.cff",
"chars": 405,
"preview": "cff-version: 1.2.0\nmessage: \"If you use this software, please cite it using the following metadata.\"\ntitle: \"Surya: A li"
},
{
"path": "CLA.md",
"chars": 4583,
"preview": "Surya Contributor Agreement\n\nThis Surya Contributor Agreement (\"SCA\") applies to any contribution that you make to any p"
},
{
"path": "LICENSE",
"chars": 35085,
"preview": " GNU GENERAL PUBLIC LICENSE\n Version 3, 29 June 2007\n\n Copyright (C) 2007 Free "
},
{
"path": "MODEL_LICENSE",
"chars": 14654,
"preview": " AI PUBS OPEN RAIL-M LICENSE (MODIFIED)\n\nVersion 0.1, March 2, 2023 (Modified)\nhttp://licenses.ai/\n\nPL"
},
{
"path": "README.md",
"chars": 32780,
"preview": "# Surya\n\nSurya is a document OCR toolkit that does:\n\n- OCR in 90+ languages that benchmarks favorably vs cloud services\n"
},
{
"path": "benchmark/detection.py",
"chars": 5087,
"preview": "import argparse\nimport collections\nimport copy\nimport json\n\nimport click\n\nfrom benchmark.utils.bbox import get_pdf_lines"
},
{
"path": "benchmark/layout.py",
"chars": 5020,
"preview": "import collections\nimport copy\nimport json\n\nimport click\n\nfrom benchmark.utils.metrics import precision_recall\nfrom sury"
},
{
"path": "benchmark/ordering.py",
"chars": 3062,
"preview": "import collections\nimport json\n\nimport click\n\nfrom surya.foundation import FoundationPredictor\nfrom surya.input.processi"
},
{
"path": "benchmark/recognition.py",
"chars": 12440,
"preview": "import re\nimport unicodedata\nfrom collections import defaultdict\n\nimport click\n\nfrom benchmark.utils.scoring import over"
},
{
"path": "benchmark/table_recognition.py",
"chars": 6677,
"preview": "import click\nimport collections\nimport json\n\nfrom surya.debug.draw import draw_bboxes_on_image\nfrom tabulate import tabu"
},
{
"path": "benchmark/texify.py",
"chars": 3552,
"preview": "import os.path\nimport re\nimport time\nfrom pathlib import Path\nfrom typing import List\n\nimport click\nimport datasets\nfrom"
},
{
"path": "benchmark/utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "benchmark/utils/bbox.py",
"chars": 1708,
"preview": "import fitz as pymupdf\nfrom surya.common.util import rescale_bbox\n\n\ndef get_pdf_lines(pdf_path, img_sizes):\n doc = py"
},
{
"path": "benchmark/utils/metrics.py",
"chars": 6400,
"preview": "from functools import partial\nfrom itertools import repeat\n\nimport numpy as np\nfrom concurrent.futures import ProcessPoo"
},
{
"path": "benchmark/utils/scoring.py",
"chars": 1321,
"preview": "import math\nfrom typing import List\n\nfrom rapidfuzz import fuzz\n\n\ndef overlap_score(pred_lines: List[str], reference_lin"
},
{
"path": "benchmark/utils/tatr.py",
"chars": 3547,
"preview": "import torch\nfrom transformers import AutoModelForObjectDetection\nfrom surya.settings import settings\nimport numpy as np"
},
{
"path": "benchmark/utils/tesseract.py",
"chars": 5012,
"preview": "from typing import List, Optional\n\nimport numpy as np\nfrom tqdm import tqdm\n\nfrom surya.input.processing import slice_bb"
},
{
"path": "benchmark/utils/textract.py",
"chars": 1037,
"preview": "import os\nfrom concurrent.futures import ThreadPoolExecutor\nfrom tqdm import tqdm\nimport traceback\n\nfrom surya.input.pro"
},
{
"path": "benchmark/utils/verify_benchmark_scores.py",
"chars": 2036,
"preview": "import json\nimport click\n\n\ndef verify_layout(data):\n scores = data[\"metrics\"]\n for layout_type, metrics in scores."
},
{
"path": "detect_layout.py",
"chars": 110,
"preview": "from surya.scripts.detect_layout import detect_layout_cli\n\nif __name__ == \"__main__\":\n detect_layout_cli()\n"
},
{
"path": "detect_text.py",
"chars": 111,
"preview": "from surya.scripts.detect_text import detect_text_cli\n\nif __name__ == \"__main__\":\n detect_text_cli()\n\n\n\n\n\n\n\n"
},
{
"path": "ocr_app.py",
"chars": 113,
"preview": "from surya.scripts.run_streamlit_app import streamlit_app_cli\n\nif __name__ == \"__main__\":\n streamlit_app_cli()"
},
{
"path": "ocr_latex.py",
"chars": 98,
"preview": "from surya.scripts.ocr_latex import ocr_latex_cli\n\nif __name__ == \"__main__\":\n ocr_latex_cli()\n"
},
{
"path": "ocr_text.py",
"chars": 95,
"preview": "from surya.scripts.ocr_text import ocr_text_cli\n\nif __name__ == \"__main__\":\n ocr_text_cli()\n"
},
{
"path": "pyproject.toml",
"chars": 1909,
"preview": "[tool.poetry]\nname = \"surya-ocr\"\nversion = \"0.17.1\"\ndescription = \"OCR, layout, reading order, and table recognition in "
},
{
"path": "pytest.ini",
"chars": 147,
"preview": "[pytest]\ntestpaths=tests\npythonpath=.\nfilterwarnings =\n ignore::UserWarning\n ignore::PendingDeprecationWarning\n "
},
{
"path": "signatures/version1/cla.json",
"chars": 4847,
"preview": "{\n \"signedContributors\": [\n {\n \"name\": \"rishiraj\",\n \"id\": 44090649,\n \"comment_id\": 2170578748,\n "
},
{
"path": "static/fonts/.gitignore",
"chars": 13,
"preview": "*\n!.gitignore"
},
{
"path": "surya/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/common/__init__.py",
"chars": 3,
"preview": "\n\n\n"
},
{
"path": "surya/common/adetr/decoder.py",
"chars": 31222,
"preview": "from typing import Dict, Optional, Tuple, Union\n\nimport torch\nimport torch.utils.checkpoint\nfrom torch import nn\nfrom tr"
},
{
"path": "surya/common/donut/encoder.py",
"chars": 36502,
"preview": "import collections.abc\nimport math\nfrom dataclasses import dataclass\nfrom typing import Optional, Tuple, Union\n\nimport t"
},
{
"path": "surya/common/donut/processor.py",
"chars": 6496,
"preview": "from typing import Dict, Union, Optional, List, Iterable\n\nimport cv2\nfrom torch import TensorType\nfrom transformers impo"
},
{
"path": "surya/common/load.py",
"chars": 700,
"preview": "from typing import Optional, Any\n\nimport torch\n\nfrom surya.settings import settings\n\n\nclass ModelLoader:\n def __init_"
},
{
"path": "surya/common/polygon.py",
"chars": 7134,
"preview": "import copy\nfrom typing import List, Optional\n\nimport numpy as np\nfrom pydantic import BaseModel, field_validator, compu"
},
{
"path": "surya/common/predictor.py",
"chars": 2643,
"preview": "from typing import Optional\nimport torch\nimport torch.nn.functional as F\n\nfrom surya.common.load import ModelLoader\nfrom"
},
{
"path": "surya/common/pretrained.py",
"chars": 784,
"preview": "from typing import Optional\n\nfrom transformers import PreTrainedModel\nfrom transformers.utils import is_flash_attn_2_ava"
},
{
"path": "surya/common/s3.py",
"chars": 6018,
"preview": "import json\nimport os\nimport shutil\nimport tempfile\nimport time\nfrom concurrent.futures import ThreadPoolExecutor\nfrom p"
},
{
"path": "surya/common/surya/__init__.py",
"chars": 29832,
"preview": "import warnings\nfrom typing import Optional, Tuple, TypedDict\nfrom dataclasses import dataclass\n\nimport torch\nfrom torch"
},
{
"path": "surya/common/surya/config.py",
"chars": 3262,
"preview": "from typing import Optional\nfrom transformers import PretrainedConfig\n\nfrom surya.common.s3 import S3DownloaderMixin\nfro"
},
{
"path": "surya/common/surya/decoder/__init__.py",
"chars": 21243,
"preview": "from typing import Callable, List, Optional, Tuple, Union\n\nimport torch\nfrom torch import nn\n\nfrom transformers.activati"
},
{
"path": "surya/common/surya/decoder/config.py",
"chars": 3029,
"preview": "from transformers.configuration_utils import PretrainedConfig\nfrom transformers.modeling_rope_utils import rope_config_v"
},
{
"path": "surya/common/surya/embedder/__init__.py",
"chars": 2092,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass SimpleTokenEmbedder(nn.Module):\n def __ini"
},
{
"path": "surya/common/surya/encoder/__init__.py",
"chars": 30609,
"preview": "import math\nfrom typing import Optional, Tuple\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom "
},
{
"path": "surya/common/surya/encoder/config.py",
"chars": 1610,
"preview": "from transformers.configuration_utils import PretrainedConfig\nfrom transformers.utils import logging\n\nlogger = logging.g"
},
{
"path": "surya/common/surya/flash_attn_utils.py",
"chars": 8590,
"preview": "from typing import Optional\nimport torch\nimport torch.nn.functional as F\nfrom flash_attn import flash_attn_varlen_func a"
},
{
"path": "surya/common/surya/processor/__init__.py",
"chars": 17795,
"preview": "import math\n\nimport cv2\nimport numpy as np\nimport torch\nfrom PIL import Image\nfrom torch.nn.utils.rnn import pad_sequenc"
},
{
"path": "surya/common/surya/processor/schema.py",
"chars": 714,
"preview": "from typing import TypedDict, Literal, List, Tuple\n\nimport torch\nfrom PIL import Image\n\n\nclass TaskDict(TypedDict):\n "
},
{
"path": "surya/common/surya/processor/tokenizer.py",
"chars": 21847,
"preview": "import html\nimport re\nfrom typing import List, Union, Dict, Optional, Tuple, Iterable\nimport numpy as np\nimport torch\nfr"
},
{
"path": "surya/common/surya/schema.py",
"chars": 379,
"preview": "class TaskNames:\n block_without_boxes = \"block_without_boxes\"\n ocr_with_boxes = \"ocr_with_boxes\"\n ocr_without_b"
},
{
"path": "surya/common/util.py",
"chars": 9636,
"preview": "import copy\nfrom typing import List\nimport torch\nfrom functools import lru_cache\n\nimport torch.nn.functional as F\n\nfrom "
},
{
"path": "surya/common/xla.py",
"chars": 523,
"preview": "import math\nfrom surya.settings import settings\n\nif settings.TORCH_DEVICE_MODEL == \"xla\":\n import torch_xla.core.xla_"
},
{
"path": "surya/debug/draw.py",
"chars": 2068,
"preview": "from PIL import ImageDraw, ImageFont\n\nfrom surya.debug.fonts import get_font_path\nfrom surya.debug.text import get_text_"
},
{
"path": "surya/debug/fonts.py",
"chars": 853,
"preview": "from typing import List, Optional\nimport os\nimport requests\n\nfrom surya.settings import settings\n\n\ndef get_font_path(lan"
},
{
"path": "surya/debug/katex.js",
"chars": 2017,
"preview": "<style>\n .katex-display-container {\n display: inline-block;\n max-width: 100%;\n overflow-x: auto;"
},
{
"path": "surya/debug/render_html.py",
"chars": 2613,
"preview": "import html as htmllib\nimport os.path\nimport re\n\nfilepath = os.path.abspath(__file__)\n\ndef render_text_as_html(\n "
},
{
"path": "surya/debug/text.py",
"chars": 3140,
"preview": "import re\nfrom io import BytesIO\nfrom typing import List, Tuple\nfrom PIL import Image, ImageDraw, ImageFont\n\nfrom surya."
},
{
"path": "surya/detection/__init__.py",
"chars": 6255,
"preview": "from concurrent.futures import ThreadPoolExecutor\nfrom typing import List, Generator, Tuple\n\nimport numpy as np\nimport t"
},
{
"path": "surya/detection/heatmap.py",
"chars": 5412,
"preview": "from typing import List\n\nimport cv2\nimport numpy as np\nfrom PIL import Image\n\nfrom surya.common.util import clean_boxes\n"
},
{
"path": "surya/detection/loader.py",
"chars": 2218,
"preview": "from typing import Optional\n\nimport torch\n\nfrom surya.common.load import ModelLoader\nfrom surya.detection.processor impo"
},
{
"path": "surya/detection/model/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/detection/model/config.py",
"chars": 1645,
"preview": "from transformers import PretrainedConfig\n\nfrom surya.common.s3 import S3DownloaderMixin\n\n\nclass EfficientViTConfig(S3Do"
},
{
"path": "surya/detection/model/encoderdecoder.py",
"chars": 25568,
"preview": "\"\"\"\nThis is an implementation of efficientvit, with some modifications (decode head, etc).\n\nOriginal paper at https://ar"
},
{
"path": "surya/detection/parallel.py",
"chars": 414,
"preview": "class FakeFuture:\n def __init__(self, func, *args, **kwargs):\n self._result = func(*args, **kwargs)\n\n def r"
},
{
"path": "surya/detection/processor.py",
"chars": 14567,
"preview": "# coding=utf-8\n# Copyright 2022 The HuggingFace Inc. team. All rights reserved.\n#\n# Licensed under the Apache License, V"
},
{
"path": "surya/detection/schema.py",
"chars": 272,
"preview": "from typing import List, Optional, Any\n\nfrom pydantic import BaseModel\n\nfrom surya.common.polygon import PolygonBox\n\n\ncl"
},
{
"path": "surya/detection/util.py",
"chars": 1250,
"preview": "import math\nfrom PIL import ImageOps\n\nfrom surya.settings import settings\n\n\ndef get_total_splits(image_size, height):\n "
},
{
"path": "surya/foundation/__init__.py",
"chars": 36172,
"preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass\nfrom typing import List, Optional, Tuple\nfrom coll"
},
{
"path": "surya/foundation/cache/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/foundation/cache/dynamic_ops.py",
"chars": 16711,
"preview": "from typing import Any, Dict, List, Optional, Tuple\nimport torch\nfrom transformers import PretrainedConfig\n\n\"\"\"\nSpecial "
},
{
"path": "surya/foundation/cache/static_ops.py",
"chars": 7507,
"preview": "from typing import Any, Dict, List, Optional, Tuple\nimport torch\nfrom transformers import PretrainedConfig\n\nfrom surya.f"
},
{
"path": "surya/foundation/loader.py",
"chars": 5020,
"preview": "from typing import Optional\n\nimport torch\nfrom transformers.utils import is_flash_attn_2_available\n\nfrom surya.common.lo"
},
{
"path": "surya/foundation/util.py",
"chars": 1724,
"preview": "from typing import List, Tuple\nimport numpy as np\nimport torch\n\ndef detect_repeat_token(predicted_tokens: List[int], max"
},
{
"path": "surya/input/load.py",
"chars": 2382,
"preview": "from typing import List\nimport PIL\n\nfrom surya.input.processing import open_pdf, get_page_images\nfrom surya.logging impo"
},
{
"path": "surya/input/processing.py",
"chars": 2904,
"preview": "from typing import List\n\nimport cv2\nimport numpy as np\nimport pypdfium2\nfrom PIL import Image\n\nfrom surya.logging import"
},
{
"path": "surya/layout/__init__.py",
"chars": 4417,
"preview": "from typing import List\n\nfrom PIL import Image\n\nfrom surya.common.predictor import BasePredictor\nfrom surya.layout.schem"
},
{
"path": "surya/layout/label.py",
"chars": 489,
"preview": "LAYOUT_PRED_RELABEL = {\n \"<page-header>\": \"PageHeader\",\n \"<page-footer>\": \"PageFooter\",\n \"<footnote>\": \"Footnot"
},
{
"path": "surya/layout/schema.py",
"chars": 390,
"preview": "from typing import Optional, Dict, List\n\nfrom pydantic import BaseModel\n\nfrom surya.common.polygon import PolygonBox\n\n\nc"
},
{
"path": "surya/logging.py",
"chars": 733,
"preview": "import logging\nimport warnings\nfrom surya.settings import settings\n\n\ndef configure_logging():\n logger = get_logger()\n"
},
{
"path": "surya/models.py",
"chars": 1049,
"preview": "from typing import Dict\n\nimport torch\n\nfrom surya.common.predictor import BasePredictor\nfrom surya.detection import Dete"
},
{
"path": "surya/ocr_error/__init__.py",
"chars": 2374,
"preview": "import math\nfrom typing import List, Optional\n\nfrom tqdm import tqdm\n\nfrom surya.common.predictor import BasePredictor\nf"
},
{
"path": "surya/ocr_error/loader.py",
"chars": 2026,
"preview": "from typing import Optional\n\nimport torch\n\nfrom surya.common.load import ModelLoader\nfrom surya.logging import get_logge"
},
{
"path": "surya/ocr_error/model/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/ocr_error/model/config.py",
"chars": 1983,
"preview": "from collections import OrderedDict\nfrom typing import Mapping\n\nfrom transformers.configuration_utils import PretrainedC"
},
{
"path": "surya/ocr_error/model/encoder.py",
"chars": 36761,
"preview": "from __future__ import annotations\n\nimport math\nfrom typing import Optional, Set, List, Tuple, Union, Dict\n\nimport numpy"
},
{
"path": "surya/ocr_error/schema.py",
"chars": 143,
"preview": "from typing import List\n\nfrom pydantic import BaseModel\n\n\nclass OCRErrorDetectionResult(BaseModel):\n texts: List[str]"
},
{
"path": "surya/ocr_error/tokenizer.py",
"chars": 21391,
"preview": "import collections\nimport os\nimport json\nimport unicodedata\nfrom typing import List, Optional, Tuple\n\nfrom tokenizers im"
},
{
"path": "surya/recognition/__init__.py",
"chars": 19689,
"preview": "from __future__ import annotations\n\nimport re\nfrom typing import List\n\nimport numpy as np\nimport torch\nfrom PIL import I"
},
{
"path": "surya/recognition/languages.py",
"chars": 2076,
"preview": "CODE_TO_LANGUAGE = {\n \"_math\": \"Math\",\n \"af\": \"Afrikaans\",\n \"am\": \"Amharic\",\n \"ar\": \"Arabic\",\n \"as\": \"Ass"
},
{
"path": "surya/recognition/postprocessing.py",
"chars": 3261,
"preview": "import re\nfrom typing import List, Dict\n\nfrom surya.recognition.schema import TextChar\n\n\ndef truncate_repetitions(text: "
},
{
"path": "surya/recognition/schema.py",
"chars": 903,
"preview": "import math\nimport numpy as np\nfrom typing import Optional, List\n\nfrom pydantic import BaseModel, field_validator\n\nfrom "
},
{
"path": "surya/recognition/util.py",
"chars": 5642,
"preview": "import re\nfrom typing import List, Tuple\n\nimport numpy\nimport torch\n\nfrom surya.common.polygon import PolygonBox\nfrom su"
},
{
"path": "surya/scripts/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/scripts/config.py",
"chars": 2778,
"preview": "from typing import List\n\nimport click\nimport os\nfrom surya.input.load import load_from_folder, load_from_file\nfrom surya"
},
{
"path": "surya/scripts/detect_layout.py",
"chars": 2024,
"preview": "import time\nimport click\nimport copy\nimport json\nfrom collections import defaultdict\n\nfrom surya.foundation import Found"
},
{
"path": "surya/scripts/detect_text.py",
"chars": 1875,
"preview": "import click\nimport copy\nimport json\nimport time\nfrom collections import defaultdict\n\nfrom surya.detection import Detect"
},
{
"path": "surya/scripts/finetune_ocr.py",
"chars": 4898,
"preview": "from __future__ import annotations\nfrom dataclasses import dataclass, field\nfrom typing import Optional, Tuple\nfrom data"
},
{
"path": "surya/scripts/hf_to_s3.py",
"chars": 2289,
"preview": "import json\nimport shutil\nimport datetime\nfrom pathlib import Path\nimport boto3\n\nfrom huggingface_hub import snapshot_do"
},
{
"path": "surya/scripts/ocr_latex.py",
"chars": 1716,
"preview": "import os\n\nimport click\nimport json\nimport time\nfrom collections import defaultdict\n\nfrom surya.logging import configure"
},
{
"path": "surya/scripts/ocr_text.py",
"chars": 2468,
"preview": "import os\nimport click\nimport json\nimport time\nfrom collections import defaultdict\n\nfrom surya.common.surya.schema impor"
},
{
"path": "surya/scripts/run_streamlit_app.py",
"chars": 350,
"preview": "import subprocess\nimport os\n\n\ndef streamlit_app_cli():\n cur_dir = os.path.dirname(os.path.abspath(__file__))\n ocr_"
},
{
"path": "surya/scripts/run_texify_app.py",
"chars": 344,
"preview": "import subprocess\nimport os\n\n\ndef texify_app_cli():\n cur_dir = os.path.dirname(os.path.abspath(__file__))\n ocr_app"
},
{
"path": "surya/scripts/streamlit_app.py",
"chars": 10363,
"preview": "import io\nimport tempfile\nfrom typing import List\n\nimport pypdfium2\nimport streamlit as st\n\nfrom surya.common.surya.sche"
},
{
"path": "surya/scripts/table_recognition.py",
"chars": 4515,
"preview": "import os\nimport click\nimport copy\nimport json\nfrom collections import defaultdict\n\nfrom surya.logging import configure_"
},
{
"path": "surya/scripts/texify_app.py",
"chars": 4753,
"preview": "import os\nimport re\nfrom typing import List\n\nfrom surya.recognition import RecognitionPredictor\nfrom surya.foundation im"
},
{
"path": "surya/settings.py",
"chars": 7241,
"preview": "import os\nfrom typing import Callable, Dict, Optional\n\nimport torch\nfrom dotenv import find_dotenv\nfrom pydantic import "
},
{
"path": "surya/table_rec/__init__.py",
"chars": 17509,
"preview": "from copy import deepcopy\nfrom itertools import chain\nfrom typing import List\n\nimport numpy as np\nimport torch\nfrom PIL "
},
{
"path": "surya/table_rec/loader.py",
"chars": 2997,
"preview": "from typing import Optional\n\nimport torch\n\nfrom surya.common.load import ModelLoader\nfrom surya.logging import get_logge"
},
{
"path": "surya/table_rec/model/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "surya/table_rec/model/config.py",
"chars": 8035,
"preview": "from dataclasses import dataclass\nfrom typing import Dict\n\nimport torch\nfrom transformers import PretrainedConfig\nfrom t"
},
{
"path": "surya/table_rec/model/decoder.py",
"chars": 6726,
"preview": "from typing import Optional, Tuple, Union\n\nimport torch\nfrom torch import nn\n\nfrom surya.common.adetr.decoder import Sur"
},
{
"path": "surya/table_rec/model/encoder.py",
"chars": 3664,
"preview": "from typing import Optional, Union, Tuple\n\nimport torch\nimport torch.nn as nn\n\nfrom surya.common.donut.encoder import Do"
},
{
"path": "surya/table_rec/model/encoderdecoder.py",
"chars": 3951,
"preview": "from dataclasses import dataclass\nfrom typing import Optional, Union, Tuple, Dict\n\nimport torch\nfrom transformers import"
},
{
"path": "surya/table_rec/processor.py",
"chars": 3273,
"preview": "from typing import List\n\nimport PIL\nimport torch\nfrom transformers import ProcessorMixin\n\nfrom surya.common.s3 import S3"
},
{
"path": "surya/table_rec/schema.py",
"chars": 939,
"preview": "from typing import List\n\nfrom pydantic import BaseModel\n\nfrom surya.common.polygon import PolygonBox\n\n\nclass TableCell(P"
},
{
"path": "surya/table_rec/shaper.py",
"chars": 5021,
"preview": "import math\nfrom typing import List, Dict\nimport numpy as np\n\nfrom surya.table_rec.model.config import BOX_PROPERTIES, S"
},
{
"path": "table_recognition.py",
"chars": 121,
"preview": "from surya.scripts.table_recognition import table_recognition_cli\n\nif __name__ == \"__main__\":\n table_recognition_cli("
},
{
"path": "tests/conftest.py",
"chars": 2481,
"preview": "import os\n\nos.environ[\"PYTORCH_ENABLE_MPS_FALLBACK\"] = \"1\"\n\nimport pytest\nfrom PIL import Image, ImageDraw\n\nfrom surya.d"
},
{
"path": "tests/test_detection.py",
"chars": 670,
"preview": "def test_detection(detection_predictor, test_image):\n detection_results = detection_predictor([test_image])\n\n asse"
},
{
"path": "tests/test_foundation.py",
"chars": 476,
"preview": "from surya.foundation import FoundationPredictor\n\n\ndef test_foundation_flash2():\n try:\n f = FoundationPredicto"
},
{
"path": "tests/test_latex_ocr.py",
"chars": 549,
"preview": "from typing import List\n\nfrom PIL import Image, ImageDraw\n\nfrom surya.common.surya.schema import TaskNames\nfrom surya.re"
},
{
"path": "tests/test_layout.py",
"chars": 429,
"preview": "def test_layout_topk(layout_predictor, test_image):\n layout_results = layout_predictor([test_image])\n\n assert len("
},
{
"path": "tests/test_ocr_errors.py",
"chars": 510,
"preview": "def test_garbled_text(ocr_error_predictor):\n text = \"\"\"\"\n ; dh vksj ls mifLFkr vf/koDrk % Jh vfuy dqekj\n 2. vfH"
},
{
"path": "tests/test_recognition.py",
"chars": 2846,
"preview": "import time\nfrom PIL import ImageDraw, Image\nfrom surya.recognition.util import clean_math_tags\n\n\ndef test_recognition(r"
},
{
"path": "tests/test_table_rec.py",
"chars": 1657,
"preview": "from PIL import Image, ImageDraw\n\ndef test_table_rec(table_rec_predictor):\n data = [\n [\"Name\", \"Age\", \"City\"],"
},
{
"path": "texify_app.py",
"chars": 104,
"preview": "from surya.scripts.run_texify_app import texify_app_cli\n\nif __name__ == \"__main__\":\n texify_app_cli()"
}
]
About this extraction
This page contains the full source code of the datalab-to/surya GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 136 files (740.2 KB), approximately 177.8k tokens, and a symbol index with 790 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.